[PATCH net-next v4 3/5] vsock_addr: Check for supported flag values

2020-12-14 Thread Andra Paraschiv
Check if the provided flags value from the vsock address data structure
includes the supported flags in the corresponding kernel version.

The first byte of the "svm_zero" field is used as "svm_flags", so add
the flags check instead.

Changelog

v3 -> v4

* New patch in v4.

Signed-off-by: Andra Paraschiv 
---
 net/vmw_vsock/vsock_addr.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/vmw_vsock/vsock_addr.c b/net/vmw_vsock/vsock_addr.c
index 909de26cb0e70..223b9660a759f 100644
--- a/net/vmw_vsock/vsock_addr.c
+++ b/net/vmw_vsock/vsock_addr.c
@@ -22,13 +22,15 @@ EXPORT_SYMBOL_GPL(vsock_addr_init);
 
 int vsock_addr_validate(const struct sockaddr_vm *addr)
 {
+   __u8 svm_valid_flags = VMADDR_FLAG_TO_HOST;
+
if (!addr)
return -EFAULT;
 
if (addr->svm_family != AF_VSOCK)
return -EAFNOSUPPORT;
 
-   if (addr->svm_zero[0] != 0)
+   if (addr->svm_flags & ~svm_valid_flags)
return -EINVAL;
 
return 0;
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v4 2/5] vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag

2020-12-14 Thread Andra Paraschiv
Add VMADDR_FLAG_TO_HOST vsock flag that is used to setup a vsock
connection where all the packets are forwarded to the host.

Then, using this type of vsock channel, vsock communication between
sibling VMs can be built on top of it.

Changelog

v3 -> v4

* Update the "VMADDR_FLAG_TO_HOST" value, as the size of the field has
  been updated to 1 byte.

v2 -> v3

* Update comments to mention when the flag is set in the connect and
  listen paths.

v1 -> v2

* New patch in v2, it was split from the first patch in the series.
* Remove the default value for the vsock flags field.
* Update the naming for the vsock flag to "VMADDR_FLAG_TO_HOST".

Signed-off-by: Andra Paraschiv 
Reviewed-by: Stefano Garzarella 
---
 include/uapi/linux/vm_sockets.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
index c2eac3d0a9f00..46918a1852d7b 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -115,6 +115,26 @@
 
 #define VMADDR_CID_HOST 2
 
+/* The current default use case for the vsock channel is the following:
+ * local vsock communication between guest and host and nested VMs setup.
+ * In addition to this, implicitly, the vsock packets are forwarded to the host
+ * if no host->guest vsock transport is set.
+ *
+ * Set this flag value in the sockaddr_vm corresponding field if the vsock
+ * packets need to be always forwarded to the host. Using this behavior,
+ * vsock communication between sibling VMs can be setup.
+ *
+ * This way can explicitly distinguish between vsock channels created for
+ * different use cases, such as nested VMs (or local communication between
+ * guest and host) and sibling VMs.
+ *
+ * The flag can be set in the connect logic in the user space application flow.
+ * In the listen logic (from kernel space) the flag is set on the remote peer
+ * address. This happens for an incoming connection when it is routed from the
+ * host and comes from the guest (local CID and remote CID > VMADDR_CID_HOST).
+ */
+#define VMADDR_FLAG_TO_HOST 0x01
+
 /* Invalid vSockets version. */
 
 #define VM_SOCKETS_INVALID_VERSION -1U
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v4 0/5] vsock: Add flags field in the vsock address

2020-12-14 Thread Andra Paraschiv
vsock enables communication between virtual machines and the host they are
running on. Nested VMs can be setup to use vsock channels, as the multi
transport support has been available in the mainline since the v5.5 Linux kernel
has been released.

Implicitly, if no host->guest vsock transport is loaded, all the vsock packets
are forwarded to the host. This behavior can be used to setup communication
channels between sibling VMs that are running on the same host. One example can
be the vsock channels that can be established within AWS Nitro Enclaves
(see Documentation/virt/ne_overview.rst).

To be able to explicitly mark a connection as being used for a certain use case,
add a flags field in the vsock address data structure. The value of the flags
field is taken into consideration when the vsock transport is assigned. This way
can distinguish between different use cases, such as nested VMs / local
communication and sibling VMs.

The flags field can be set in the user space application connect logic. On the
listen path, the field can be set in the kernel space logic.

Thank you.

Andra

---

Patch Series Changelog

The patch series is built on top of v5.10.

GitHub repo branch for the latest version of the patch series:

* https://github.com/andraprs/linux/tree/vsock-flag-sibling-comm-v4

v3 -> v4

* Rebase on top of v5.10.
* Add check for supported flag values. 
* Update the "svm_flags" field to be 1 byte instead of 2 bytes.
* v3: https://lore.kernel.org/lkml/20201211103241.17751-1-andra...@amazon.com/

v2 -> v3

* Rebase on top of v5.10-rc7.
* Add "svm_flags" as a new field, not reusing "svm_reserved1".
* Update comments to mention when the "VMADDR_FLAG_TO_HOST" flag is set in the
  connect and listen paths.
* Update bitwise check logic to not compare result to the flag value.
* v2: https://lore.kernel.org/lkml/20201204170235.84387-1-andra...@amazon.com/

v1 -> v2

* Update the vsock flag naming to "VMADDR_FLAG_TO_HOST".
* Use bitwise operators to setup and check the vsock flag.
* Set the vsock flag on the receive path in the vsock transport assignment
  logic.
* Merge the checks for the g2h transport assignment in one "if" block.
* v1: https://lore.kernel.org/lkml/20201201152505.19445-1-andra...@amazon.com/

---

Andra Paraschiv (5):
  vm_sockets: Add flags field in the vsock address data structure
  vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag
  vsock_addr: Check for supported flag values
  af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path
  af_vsock: Assign the vsock transport considering the vsock address
flags

 include/uapi/linux/vm_sockets.h | 26 +-
 net/vmw_vsock/af_vsock.c| 21 +++--
 net/vmw_vsock/vsock_addr.c  |  4 +++-
 3 files changed, 47 insertions(+), 4 deletions(-)

-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v4 4/5] af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path

2020-12-14 Thread Andra Paraschiv
The vsock flags can be set during the connect() setup logic, when
initializing the vsock address data structure variable. Then the vsock
transport is assigned, also considering this flags field.

The vsock transport is also assigned on the (listen) receive path. The
flags field needs to be set considering the use case.

Set the value of the vsock flags of the remote address to the one
targeted for packets forwarding to the host, if the following conditions
are met:

* The source CID of the packet is higher than VMADDR_CID_HOST.
* The destination CID of the packet is higher than VMADDR_CID_HOST.

Changelog

v3 -> v4

* No changes.

v2 -> v3

* No changes.

v1 -> v2

* Set the vsock flag on the receive path in the vsock transport
  assignment logic.
* Use bitwise operator for the vsock flag setup.
* Use the updated "VMADDR_FLAG_TO_HOST" flag naming.

Signed-off-by: Andra Paraschiv 
Reviewed-by: Stefano Garzarella 
---
 net/vmw_vsock/af_vsock.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index d10916ab45267..83d035eab0b05 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -431,6 +431,18 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct 
vsock_sock *psk)
unsigned int remote_cid = vsk->remote_addr.svm_cid;
int ret;
 
+   /* If the packet is coming with the source and destination CIDs higher
+* than VMADDR_CID_HOST, then a vsock channel where all the packets are
+* forwarded to the host should be established. Then the host will
+* need to forward the packets to the guest.
+*
+* The flag is set on the (listen) receive path (psk is not NULL). On
+* the connect path the flag can be set by the user space application.
+*/
+   if (psk && vsk->local_addr.svm_cid > VMADDR_CID_HOST &&
+   vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
+   vsk->remote_addr.svm_flags |= VMADDR_FLAG_TO_HOST;
+
switch (sk->sk_type) {
case SOCK_DGRAM:
new_transport = transport_dgram;
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v4 5/5] af_vsock: Assign the vsock transport considering the vsock address flags

2020-12-14 Thread Andra Paraschiv
The vsock flags field can be set in the connect path (user space app)
and the (listen) receive path (kernel space logic).

When the vsock transport is assigned, the remote CID is used to
distinguish between types of connection.

Use the vsock flags value (in addition to the CID) from the remote
address to decide which vsock transport to assign. For the sibling VMs
use case, all the vsock packets need to be forwarded to the host, so
always assign the guest->host transport if the VMADDR_FLAG_TO_HOST flag
is set. For the other use cases, the vsock transport assignment logic is
not changed.

Changelog

v3 -> v4

* Update the "remote_flags" local variable type to reflect the change of
  the "svm_flags" field to be 1 byte in size.

v2 -> v3

* Update bitwise check logic to not compare result to the flag value.

v1 -> v2

* Use bitwise operator to check the vsock flag.
* Use the updated "VMADDR_FLAG_TO_HOST" flag naming.
* Merge the checks for the g2h transport assignment in one "if" block.

Signed-off-by: Andra Paraschiv 
Reviewed-by: Stefano Garzarella 
---
 net/vmw_vsock/af_vsock.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 83d035eab0b05..fc484fb37fffb 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -421,7 +421,8 @@ static void vsock_deassign_transport(struct vsock_sock *vsk)
  * The vsk->remote_addr is used to decide which transport to use:
  *  - remote CID == VMADDR_CID_LOCAL or g2h->local_cid or VMADDR_CID_HOST if
  *g2h is not loaded, will use local transport;
- *  - remote CID <= VMADDR_CID_HOST will use guest->host transport;
+ *  - remote CID <= VMADDR_CID_HOST or h2g is not loaded or remote flags field
+ *includes VMADDR_FLAG_TO_HOST flag value, will use guest->host transport;
  *  - remote CID > VMADDR_CID_HOST will use host->guest transport;
  */
 int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
@@ -429,6 +430,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct 
vsock_sock *psk)
const struct vsock_transport *new_transport;
struct sock *sk = sk_vsock(vsk);
unsigned int remote_cid = vsk->remote_addr.svm_cid;
+   __u8 remote_flags;
int ret;
 
/* If the packet is coming with the source and destination CIDs higher
@@ -443,6 +445,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct 
vsock_sock *psk)
vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
vsk->remote_addr.svm_flags |= VMADDR_FLAG_TO_HOST;
 
+   remote_flags = vsk->remote_addr.svm_flags;
+
switch (sk->sk_type) {
case SOCK_DGRAM:
new_transport = transport_dgram;
@@ -450,7 +454,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct 
vsock_sock *psk)
case SOCK_STREAM:
if (vsock_use_local_transport(remote_cid))
new_transport = transport_local;
-   else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g)
+   else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g ||
+(remote_flags & VMADDR_FLAG_TO_HOST))
new_transport = transport_g2h;
else
new_transport = transport_h2g;
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v4 1/5] vm_sockets: Add flags field in the vsock address data structure

2020-12-14 Thread Andra Paraschiv
vsock enables communication between virtual machines and the host they
are running on. With the multi transport support (guest->host and
host->guest), nested VMs can also use vsock channels for communication.

In addition to this, by default, all the vsock packets are forwarded to
the host, if no host->guest transport is loaded. This behavior can be
implicitly used for enabling vsock communication between sibling VMs.

Add a flags field in the vsock address data structure that can be used
to explicitly mark the vsock connection as being targeted for a certain
type of communication. This way, can distinguish between different use
cases such as nested VMs and sibling VMs.

This field can be set when initializing the vsock address variable used
for the connect() call.

Changelog

v3 -> v4

* Update the size of "svm_flags" field to be 1 byte instead of 2 bytes.

v2 -> v3

* Add "svm_flags" as a new field, not reusing "svm_reserved1".

v1 -> v2

* Update the field name to "svm_flags".
* Split the current patch in 2 patches.

Signed-off-by: Andra Paraschiv 
Reviewed-by: Stefano Garzarella 
---
 include/uapi/linux/vm_sockets.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
index fd0ed7221645d..c2eac3d0a9f00 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -18,6 +18,7 @@
 #define _UAPI_VM_SOCKETS_H
 
 #include 
+#include 
 
 /* Option name for STREAM socket buffer size.  Use as the option name in
  * setsockopt(3) or getsockopt(3) to set or get an unsigned long long that
@@ -148,10 +149,13 @@ struct sockaddr_vm {
unsigned short svm_reserved1;
unsigned int svm_port;
unsigned int svm_cid;
+   __u8 svm_flags;
unsigned char svm_zero[sizeof(struct sockaddr) -
   sizeof(sa_family_t) -
   sizeof(unsigned short) -
-  sizeof(unsigned int) - sizeof(unsigned int)];
+  sizeof(unsigned int) -
+  sizeof(unsigned int) -
+  sizeof(__u8)];
 };
 
 #define IOCTL_VM_SOCKETS_GET_LOCAL_CID _IO(7, 0xb9)
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v3 2/4] vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag

2020-12-11 Thread Andra Paraschiv
Add VMADDR_FLAG_TO_HOST vsock flag that is used to setup a vsock
connection where all the packets are forwarded to the host.

Then, using this type of vsock channel, vsock communication between
sibling VMs can be built on top of it.

Changelog

v2 -> v3

* Update comments to mention when the flag is set in the connect and
  listen paths.

v1 -> v2

* New patch in v2, it was split from the first patch in the series.
* Remove the default value for the vsock flags field.
* Update the naming for the vsock flag to "VMADDR_FLAG_TO_HOST".

Signed-off-by: Andra Paraschiv 
---
 include/uapi/linux/vm_sockets.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
index 619f8e9d55ca4..c99ed29602345 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -114,6 +114,26 @@
 
 #define VMADDR_CID_HOST 2
 
+/* The current default use case for the vsock channel is the following:
+ * local vsock communication between guest and host and nested VMs setup.
+ * In addition to this, implicitly, the vsock packets are forwarded to the host
+ * if no host->guest vsock transport is set.
+ *
+ * Set this flag value in the sockaddr_vm corresponding field if the vsock
+ * packets need to be always forwarded to the host. Using this behavior,
+ * vsock communication between sibling VMs can be setup.
+ *
+ * This way can explicitly distinguish between vsock channels created for
+ * different use cases, such as nested VMs (or local communication between
+ * guest and host) and sibling VMs.
+ *
+ * The flag can be set in the connect logic in the user space application flow.
+ * In the listen logic (from kernel space) the flag is set on the remote peer
+ * address. This happens for an incoming connection when it is routed from the
+ * host and comes from the guest (local CID and remote CID > VMADDR_CID_HOST).
+ */
+#define VMADDR_FLAG_TO_HOST 0x0001
+
 /* Invalid vSockets version. */
 
 #define VM_SOCKETS_INVALID_VERSION -1U
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v3 0/4] vsock: Add flags field in the vsock address

2020-12-11 Thread Andra Paraschiv
vsock enables communication between virtual machines and the host they are
running on. Nested VMs can be setup to use vsock channels, as the multi
transport support has been available in the mainline since the v5.5 Linux kernel
has been released.

Implicitly, if no host->guest vsock transport is loaded, all the vsock packets
are forwarded to the host. This behavior can be used to setup communication
channels between sibling VMs that are running on the same host. One example can
be the vsock channels that can be established within AWS Nitro Enclaves
(see Documentation/virt/ne_overview.rst).

To be able to explicitly mark a connection as being used for a certain use case,
add a flags field in the vsock address data structure. The value of the flags
field is taken into consideration when the vsock transport is assigned. This way
can distinguish between different use cases, such as nested VMs / local
communication and sibling VMs.

The flags field can be set in the user space application connect logic. On the
listen path, the field can be set in the kernel space logic.

Thank you.

Andra

---

Patch Series Changelog

The patch series is built on top of v5.10-rc7.

GitHub repo branch for the latest version of the patch series:

* https://github.com/andraprs/linux/tree/vsock-flag-sibling-comm-v3

v2 -> v3

* Rebase on top of v5.10-rc7.
* Add "svm_flags" as a new field, not reusing "svm_reserved1".
* Update comments to mention when the "VMADDR_FLAG_TO_HOST" flag is set in the
  connect and listen paths.
* Update bitwise check logic to not compare result to the flag value.
* v2: https://lore.kernel.org/lkml/20201204170235.84387-1-andra...@amazon.com/

v1 -> v2

* Update the vsock flag naming to "VMADDR_FLAG_TO_HOST".
* Use bitwise operators to setup and check the vsock flag.
* Set the vsock flag on the receive path in the vsock transport assignment
  logic.
* Merge the checks for the g2h transport assignment in one "if" block.
* v1: https://lore.kernel.org/lkml/20201201152505.19445-1-andra...@amazon.com/

---

Andra Paraschiv (4):
  vm_sockets: Add flags field in the vsock address data structure
  vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag
  af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path
  af_vsock: Assign the vsock transport considering the vsock address
flags

 include/uapi/linux/vm_sockets.h | 25 -
 net/vmw_vsock/af_vsock.c| 21 +++--
 2 files changed, 43 insertions(+), 3 deletions(-)

-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v3 1/4] vm_sockets: Add flags field in the vsock address data structure

2020-12-11 Thread Andra Paraschiv
vsock enables communication between virtual machines and the host they
are running on. With the multi transport support (guest->host and
host->guest), nested VMs can also use vsock channels for communication.

In addition to this, by default, all the vsock packets are forwarded to
the host, if no host->guest transport is loaded. This behavior can be
implicitly used for enabling vsock communication between sibling VMs.

Add a flags field in the vsock address data structure that can be used
to explicitly mark the vsock connection as being targeted for a certain
type of communication. This way, can distinguish between different use
cases such as nested VMs and sibling VMs.

This field can be set when initializing the vsock address variable used
for the connect() call.

Changelog

v2 -> v3

* Add "svm_flags" as a new field, not reusing "svm_reserved1".

v1 -> v2

* Update the field name to "svm_flags".
* Split the current patch in 2 patches.

Signed-off-by: Andra Paraschiv 
Reviewed-by: Stefano Garzarella 
---
 include/uapi/linux/vm_sockets.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
index fd0ed7221645d..619f8e9d55ca4 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -148,10 +148,13 @@ struct sockaddr_vm {
unsigned short svm_reserved1;
unsigned int svm_port;
unsigned int svm_cid;
+   unsigned short svm_flags;
unsigned char svm_zero[sizeof(struct sockaddr) -
   sizeof(sa_family_t) -
   sizeof(unsigned short) -
-  sizeof(unsigned int) - sizeof(unsigned int)];
+  sizeof(unsigned int) -
+  sizeof(unsigned int) -
+  sizeof(unsigned short)];
 };
 
 #define IOCTL_VM_SOCKETS_GET_LOCAL_CID _IO(7, 0xb9)
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v3 3/4] af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path

2020-12-11 Thread Andra Paraschiv
The vsock flags can be set during the connect() setup logic, when
initializing the vsock address data structure variable. Then the vsock
transport is assigned, also considering this flags field.

The vsock transport is also assigned on the (listen) receive path. The
flags field needs to be set considering the use case.

Set the value of the vsock flags of the remote address to the one
targeted for packets forwarding to the host, if the following conditions
are met:

* The source CID of the packet is higher than VMADDR_CID_HOST.
* The destination CID of the packet is higher than VMADDR_CID_HOST.

Changelog

v2 -> v3

* No changes.

v1 -> v2

* Set the vsock flag on the receive path in the vsock transport
  assignment logic.
* Use bitwise operator for the vsock flag setup.
* Use the updated "VMADDR_FLAG_TO_HOST" flag naming.

Signed-off-by: Andra Paraschiv 
Reviewed-by: Stefano Garzarella 
---
 net/vmw_vsock/af_vsock.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index d10916ab45267..83d035eab0b05 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -431,6 +431,18 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct 
vsock_sock *psk)
unsigned int remote_cid = vsk->remote_addr.svm_cid;
int ret;
 
+   /* If the packet is coming with the source and destination CIDs higher
+* than VMADDR_CID_HOST, then a vsock channel where all the packets are
+* forwarded to the host should be established. Then the host will
+* need to forward the packets to the guest.
+*
+* The flag is set on the (listen) receive path (psk is not NULL). On
+* the connect path the flag can be set by the user space application.
+*/
+   if (psk && vsk->local_addr.svm_cid > VMADDR_CID_HOST &&
+   vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
+   vsk->remote_addr.svm_flags |= VMADDR_FLAG_TO_HOST;
+
switch (sk->sk_type) {
case SOCK_DGRAM:
new_transport = transport_dgram;
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v3 4/4] af_vsock: Assign the vsock transport considering the vsock address flags

2020-12-11 Thread Andra Paraschiv
The vsock flags field can be set in the connect path (user space app)
and the (listen) receive path (kernel space logic).

When the vsock transport is assigned, the remote CID is used to
distinguish between types of connection.

Use the vsock flags value (in addition to the CID) from the remote
address to decide which vsock transport to assign. For the sibling VMs
use case, all the vsock packets need to be forwarded to the host, so
always assign the guest->host transport if the VMADDR_FLAG_TO_HOST flag
is set. For the other use cases, the vsock transport assignment logic is
not changed.

Changelog

v2 -> v3

* Update bitwise check logic to not compare result to the flag value.

v1 -> v2

* Use bitwise operator to check the vsock flag.
* Use the updated "VMADDR_FLAG_TO_HOST" flag naming.
* Merge the checks for the g2h transport assignment in one "if" block.

Signed-off-by: Andra Paraschiv 
Reviewed-by: Stefano Garzarella 
---
 net/vmw_vsock/af_vsock.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 83d035eab0b05..7c306ecf75250 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -421,7 +421,8 @@ static void vsock_deassign_transport(struct vsock_sock *vsk)
  * The vsk->remote_addr is used to decide which transport to use:
  *  - remote CID == VMADDR_CID_LOCAL or g2h->local_cid or VMADDR_CID_HOST if
  *g2h is not loaded, will use local transport;
- *  - remote CID <= VMADDR_CID_HOST will use guest->host transport;
+ *  - remote CID <= VMADDR_CID_HOST or h2g is not loaded or remote flags field
+ *includes VMADDR_FLAG_TO_HOST flag value, will use guest->host transport;
  *  - remote CID > VMADDR_CID_HOST will use host->guest transport;
  */
 int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
@@ -429,6 +430,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct 
vsock_sock *psk)
const struct vsock_transport *new_transport;
struct sock *sk = sk_vsock(vsk);
unsigned int remote_cid = vsk->remote_addr.svm_cid;
+   unsigned short remote_flags;
int ret;
 
/* If the packet is coming with the source and destination CIDs higher
@@ -443,6 +445,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct 
vsock_sock *psk)
vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
vsk->remote_addr.svm_flags |= VMADDR_FLAG_TO_HOST;
 
+   remote_flags = vsk->remote_addr.svm_flags;
+
switch (sk->sk_type) {
case SOCK_DGRAM:
new_transport = transport_dgram;
@@ -450,7 +454,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct 
vsock_sock *psk)
case SOCK_STREAM:
if (vsock_use_local_transport(remote_cid))
new_transport = transport_local;
-   else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g)
+   else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g ||
+(remote_flags & VMADDR_FLAG_TO_HOST))
new_transport = transport_g2h;
else
new_transport = transport_h2g;
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v2 4/4] af_vsock: Assign the vsock transport considering the vsock address flags

2020-12-04 Thread Andra Paraschiv
The vsock flags field can be set in the connect and (listen) receive
paths.

When the vsock transport is assigned, the remote CID is used to
distinguish between types of connection.

Use the vsock flags value (in addition to the CID) from the remote
address to decide which vsock transport to assign. For the sibling VMs
use case, all the vsock packets need to be forwarded to the host, so
always assign the guest->host transport if the VMADDR_FLAG_TO_HOST flag
is set. For the other use cases, the vsock transport assignment logic is
not changed.

Changelog

v1 -> v2

* Use bitwise operator to check the vsock flag.
* Use the updated "VMADDR_FLAG_TO_HOST" flag naming.
* Merge the checks for the g2h transport assignment in one "if" block.

Signed-off-by: Andra Paraschiv 
---
 net/vmw_vsock/af_vsock.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 83d035eab0b05..66e643c3b5f85 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -421,7 +421,8 @@ static void vsock_deassign_transport(struct vsock_sock *vsk)
  * The vsk->remote_addr is used to decide which transport to use:
  *  - remote CID == VMADDR_CID_LOCAL or g2h->local_cid or VMADDR_CID_HOST if
  *g2h is not loaded, will use local transport;
- *  - remote CID <= VMADDR_CID_HOST will use guest->host transport;
+ *  - remote CID <= VMADDR_CID_HOST or h2g is not loaded or remote flags field
+ *includes VMADDR_FLAG_TO_HOST flag value, will use guest->host transport;
  *  - remote CID > VMADDR_CID_HOST will use host->guest transport;
  */
 int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
@@ -429,6 +430,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct 
vsock_sock *psk)
const struct vsock_transport *new_transport;
struct sock *sk = sk_vsock(vsk);
unsigned int remote_cid = vsk->remote_addr.svm_cid;
+   unsigned short remote_flags;
int ret;
 
/* If the packet is coming with the source and destination CIDs higher
@@ -443,6 +445,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct 
vsock_sock *psk)
vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
vsk->remote_addr.svm_flags |= VMADDR_FLAG_TO_HOST;
 
+   remote_flags = vsk->remote_addr.svm_flags;
+
switch (sk->sk_type) {
case SOCK_DGRAM:
new_transport = transport_dgram;
@@ -450,7 +454,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct 
vsock_sock *psk)
case SOCK_STREAM:
if (vsock_use_local_transport(remote_cid))
new_transport = transport_local;
-   else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g)
+   else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g ||
+(remote_flags & VMADDR_FLAG_TO_HOST) == 
VMADDR_FLAG_TO_HOST)
new_transport = transport_g2h;
else
new_transport = transport_h2g;
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v2 3/4] af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path

2020-12-04 Thread Andra Paraschiv
The vsock flags can be set during the connect() setup logic, when
initializing the vsock address data structure variable. Then the vsock
transport is assigned, also considering this flags field.

The vsock transport is also assigned on the (listen) receive path. The
flags field needs to be set considering the use case.

Set the value of the vsock flags of the remote address to the one
targeted for packets forwarding to the host, if the following conditions
are met:

* The source CID of the packet is higher than VMADDR_CID_HOST.
* The destination CID of the packet is higher than VMADDR_CID_HOST.

Changelog

v1 -> v2

* Set the vsock flag on the receive path in the vsock transport
  assignment logic.
* Use bitwise operator for the vsock flag setup.
* Use the updated "VMADDR_FLAG_TO_HOST" flag naming.

Signed-off-by: Andra Paraschiv 
---
 net/vmw_vsock/af_vsock.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index d10916ab45267..83d035eab0b05 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -431,6 +431,18 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct 
vsock_sock *psk)
unsigned int remote_cid = vsk->remote_addr.svm_cid;
int ret;
 
+   /* If the packet is coming with the source and destination CIDs higher
+* than VMADDR_CID_HOST, then a vsock channel where all the packets are
+* forwarded to the host should be established. Then the host will
+* need to forward the packets to the guest.
+*
+* The flag is set on the (listen) receive path (psk is not NULL). On
+* the connect path the flag can be set by the user space application.
+*/
+   if (psk && vsk->local_addr.svm_cid > VMADDR_CID_HOST &&
+   vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
+   vsk->remote_addr.svm_flags |= VMADDR_FLAG_TO_HOST;
+
switch (sk->sk_type) {
case SOCK_DGRAM:
new_transport = transport_dgram;
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v2 2/4] vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag

2020-12-04 Thread Andra Paraschiv
Add VMADDR_FLAG_TO_HOST vsock flag that is used to setup a vsock
connection where all the packets are forwarded to the host.

Then, using this type of vsock channel, vsock communication between
sibling VMs can be built on top of it.

Changelog

v1 -> v2

* New patch in v2, it was split from the first patch in the series.
* Remove the default value for the vsock flags field.
* Update the naming for the vsock flag to "VMADDR_FLAG_TO_HOST".

Signed-off-by: Andra Paraschiv 
---
 include/uapi/linux/vm_sockets.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
index 46735376a57a8..72e1a3d05682d 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -114,6 +114,21 @@
 
 #define VMADDR_CID_HOST 2
 
+/* The current default use case for the vsock channel is the following:
+ * local vsock communication between guest and host and nested VMs setup.
+ * In addition to this, implicitly, the vsock packets are forwarded to the host
+ * if no host->guest vsock transport is set.
+ *
+ * Set this flag value in the sockaddr_vm corresponding field if the vsock
+ * packets need to be always forwarded to the host. Using this behavior,
+ * vsock communication between sibling VMs can be setup.
+ *
+ * This way can explicitly distinguish between vsock channels created for
+ * different use cases, such as nested VMs (or local communication between
+ * guest and host) and sibling VMs.
+ */
+#define VMADDR_FLAG_TO_HOST 0x0001
+
 /* Invalid vSockets version. */
 
 #define VM_SOCKETS_INVALID_VERSION -1U
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v2 0/4] vsock: Add flags field in the vsock address

2020-12-04 Thread Andra Paraschiv
vsock enables communication between virtual machines and the host they are
running on. Nested VMs can be setup to use vsock channels, as the multi
transport support has been available in the mainline since the v5.5 Linux kernel
has been released.

Implicitly, if no host->guest vsock transport is loaded, all the vsock packets
are forwarded to the host. This behavior can be used to setup communication
channels between sibling VMs that are running on the same host. One example can
be the vsock channels that can be established within AWS Nitro Enclaves
(see Documentation/virt/ne_overview.rst).

To be able to explicitly mark a connection as being used for a certain use case,
add a flags field in the vsock address data structure. The "svm_reserved1" field
has been repurposed to be the flags field. The value of the flags will then be
taken into consideration when the vsock transport is assigned. This way can
distinguish between different use cases, such as nested VMs / local 
communication
and sibling VMs.

Thank you.

Andra

---

Patch Series Changelog

The patch series is built on top of v5.10-rc6.

GitHub repo branch for the latest version of the patch series:

* https://github.com/andraprs/linux/tree/vsock-flag-sibling-comm-v2

v1 -> v2

* Update the vsock flag naming to "VMADDR_FLAG_TO_HOST".
* Use bitwise operators to setup and check the vsock flag.
* Set the vsock flag on the receive path in the vsock transport assignment
  logic.
* Merge the checks for the g2h transport assignment in one "if" block.
* v1: https://lore.kernel.org/lkml/20201201152505.19445-1-andra...@amazon.com/

---

Andra Paraschiv (4):
  vm_sockets: Include flags field in the vsock address data structure
  vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag
  af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path
  af_vsock: Assign the vsock transport considering the vsock address
flags

 include/uapi/linux/vm_sockets.h | 17 -
 net/vmw_vsock/af_vsock.c| 21 +++--
 2 files changed, 35 insertions(+), 3 deletions(-)

-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v2 1/4] vm_sockets: Include flags field in the vsock address data structure

2020-12-04 Thread Andra Paraschiv
vsock enables communication between virtual machines and the host they
are running on. With the multi transport support (guest->host and
host->guest), nested VMs can also use vsock channels for communication.

In addition to this, by default, all the vsock packets are forwarded to
the host, if no host->guest transport is loaded. This behavior can be
implicitly used for enabling vsock communication between sibling VMs.

Add a flags field in the vsock address data structure that can be used
to explicitly mark the vsock connection as being targeted for a certain
type of communication. This way, can distinguish between different use
cases such as nested VMs and sibling VMs.

Use the already available "svm_reserved1" field and mark it as a flags
field instead. This field can be set when initializing the vsock address
variable used for the connect() call.

Changelog

v1 -> v2

* Update the field name to "svm_flags".
* Split the current patch in 2 patches.

Signed-off-by: Andra Paraschiv 
---
 include/uapi/linux/vm_sockets.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
index fd0ed7221645d..46735376a57a8 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -145,7 +145,7 @@
 
 struct sockaddr_vm {
__kernel_sa_family_t svm_family;
-   unsigned short svm_reserved1;
+   unsigned short svm_flags;
unsigned int svm_port;
unsigned int svm_cid;
unsigned char svm_zero[sizeof(struct sockaddr) -
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v1 3/3] af_vsock: Assign the vsock transport considering the vsock address flag

2020-12-01 Thread Andra Paraschiv
The vsock flag has been set in the connect and (listen) receive paths.

When the vsock transport is assigned, the remote CID is used to
distinguish between types of connection.

Use the vsock flag (in addition to the CID) from the remote address to
decide which vsock transport to assign. For the sibling VMs use case,
all the vsock packets need to be forwarded to the host, so always assign
the guest->host transport if the vsock flag is set. For the other use
cases, the vsock transport assignment logic is not changed.

Signed-off-by: Andra Paraschiv 
---
 net/vmw_vsock/af_vsock.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index d10916ab45267..bafc1cb20abd4 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -419,16 +419,21 @@ static void vsock_deassign_transport(struct vsock_sock 
*vsk)
  * (e.g. during the connect() or when a connection request on a listener
  * socket is received).
  * The vsk->remote_addr is used to decide which transport to use:
- *  - remote CID == VMADDR_CID_LOCAL or g2h->local_cid or VMADDR_CID_HOST if
- *g2h is not loaded, will use local transport;
- *  - remote CID <= VMADDR_CID_HOST will use guest->host transport;
- *  - remote CID > VMADDR_CID_HOST will use host->guest transport;
+ *  - remote flag == VMADDR_FLAG_SIBLING_VMS_COMMUNICATION, will always
+ *forward the vsock packets to the host and use guest->host transport;
+ *  - otherwise, going forward with the remote flag default value:
+ *- remote CID == VMADDR_CID_LOCAL or g2h->local_cid or VMADDR_CID_HOST
+ *  if g2h is not loaded, will use local transport;
+ *- remote CID <= VMADDR_CID_HOST or h2g is not loaded, will use
+ *  guest->host transport;
+ *- remote CID > VMADDR_CID_HOST will use host->guest transport;
  */
 int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
 {
const struct vsock_transport *new_transport;
struct sock *sk = sk_vsock(vsk);
unsigned int remote_cid = vsk->remote_addr.svm_cid;
+   unsigned short remote_flag = vsk->remote_addr.svm_flag;
int ret;
 
switch (sk->sk_type) {
@@ -438,6 +443,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct 
vsock_sock *psk)
case SOCK_STREAM:
if (vsock_use_local_transport(remote_cid))
new_transport = transport_local;
+   else if (remote_flag == VMADDR_FLAG_SIBLING_VMS_COMMUNICATION)
+   new_transport = transport_g2h;
else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g)
new_transport = transport_g2h;
else
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v1 2/3] virtio_transport_common: Set sibling VMs flag on the receive path

2020-12-01 Thread Andra Paraschiv
The vsock flag can be set during the connect() setup logic, when
initializing the vsock address data structure variable. Then the vsock
transport is assigned, also considering this flag.

The vsock transport is also assigned on the (listen) receive path. The
flag needs to be set considering the use case.

Set the vsock flag of the remote address to the one targeted for sibling
VMs communication if the following conditions are met:

* The source CID of the packet is higher than VMADDR_CID_HOST.
* The destination CID of the packet is higher than VMADDR_CID_HOST.

Signed-off-by: Andra Paraschiv 
---
 net/vmw_vsock/virtio_transport_common.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
index 5956939eebb78..871c84e0916b1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1062,6 +1062,14 @@ virtio_transport_recv_listen(struct sock *sk, struct 
virtio_vsock_pkt *pkt,
vsock_addr_init(>remote_addr, le64_to_cpu(pkt->hdr.src_cid),
le32_to_cpu(pkt->hdr.src_port));
 
+   /* If the packet is coming with the source and destination CIDs higher
+* than VMADDR_CID_HOST, then a vsock channel should be established for
+* sibling VMs communication.
+*/
+   if (vchild->local_addr.svm_cid > VMADDR_CID_HOST &&
+   vchild->remote_addr.svm_cid > VMADDR_CID_HOST)
+   vchild->remote_addr.svm_flag = 
VMADDR_FLAG_SIBLING_VMS_COMMUNICATION;
+
ret = vsock_assign_transport(vchild, vsk);
/* Transport assigned (looking at remote_addr) must be the same
 * where we received the request.
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v1 1/3] vm_sockets: Include flag field in the vsock address data structure

2020-12-01 Thread Andra Paraschiv
vsock enables communication between virtual machines and the host they
are running on. With the multi transport support (guest->host and
host->guest), nested VMs can also use vsock channels for communication.

In addition to this, by default, all the vsock packets are forwarded to
the host, if no host->guest transport is loaded. This behavior can be
implicitly used for enabling vsock communication between sibling VMs.

Add a flag field in the vsock address data structure that can be used to
explicitly mark the vsock connection as being targeted for a certain
type of communication. This way, can distinguish between nested VMs and
sibling VMs use cases and can also setup them at the same time. Till
now, could either have nested VMs or sibling VMs at a time using the
vsock communication stack.

Use the already available "svm_reserved1" field and mark it as a flag
field instead. This flag can be set when initializing the vsock address
variable used for the connect() call.

Signed-off-by: Andra Paraschiv 
---
 include/uapi/linux/vm_sockets.h | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
index fd0ed7221645d..58da5a91413ac 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -114,6 +114,22 @@
 
 #define VMADDR_CID_HOST 2
 
+/* This sockaddr_vm flag value covers the current default use case:
+ * local vsock communication between guest and host and nested VMs setup.
+ * In addition to this, implicitly, the vsock packets are forwarded to the host
+ * if no host->guest vsock transport is set.
+ */
+#define VMADDR_FLAG_DEFAULT_COMMUNICATION  0x
+
+/* Set this flag value in the sockaddr_vm corresponding field if the vsock
+ * channel needs to be setup between two sibling VMs running on the same host.
+ * This way can explicitly distinguish between vsock channels created for 
nested
+ * VMs (or local communication between guest and host) and the ones created for
+ * sibling VMs. And vsock channels for multiple use cases (nested / sibling 
VMs)
+ * can be setup at the same time.
+ */
+#define VMADDR_FLAG_SIBLING_VMS_COMMUNICATION  0x0001
+
 /* Invalid vSockets version. */
 
 #define VM_SOCKETS_INVALID_VERSION -1U
@@ -145,7 +161,7 @@
 
 struct sockaddr_vm {
__kernel_sa_family_t svm_family;
-   unsigned short svm_reserved1;
+   unsigned short svm_flag;
unsigned int svm_port;
unsigned int svm_cid;
unsigned char svm_zero[sizeof(struct sockaddr) -
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH net-next v1 0/3] vsock: Add flag field in the vsock address

2020-12-01 Thread Andra Paraschiv
vsock enables communication between virtual machines and the host they are
running on. Nested VMs can be setup to use vsock channels, as the multi
transport support has been available in the mainline since the v5.5 Linux kernel
has been released.

Implicitly, if no host->guest vsock transport is loaded, all the vsock packets
are forwarded to the host. This behavior can be used to setup communication
channels between sibling VMs that are running on the same host. One example can
be the vsock channels that can be established within AWS Nitro Enclaves
(see Documentation/virt/ne_overview.rst).

To be able to explicitly mark a connection as being used for a certain use case,
add a flag field in the vsock address data structure. The "svm_reserved1" field
has been repurposed to be the flag field. The value of the flag will then be
taken into consideration when the vsock transport is assigned.

This way can distinguish between nested VMs / local communication and sibling
VMs use cases. And can also setup one or more types of communication at the same
time.

Thank you.

Andra

---

Patch Series Changelog

The patch series is built on top of v5.10-rc6.

GitHub repo branch for the latest version of the patch series:

* https://github.com/andraprs/linux/tree/vsock-flag-sibling-comm-v1

---

Andra Paraschiv (3):
  vm_sockets: Include flag field in the vsock address data structure
  virtio_transport_common: Set sibling VMs flag on the receive path
  af_vsock: Assign the vsock transport considering the vsock address
flag

 include/uapi/linux/vm_sockets.h | 18 +-
 net/vmw_vsock/af_vsock.c| 15 +++
 net/vmw_vsock/virtio_transport_common.c |  8 
 3 files changed, 36 insertions(+), 5 deletions(-)

-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v2] nitro_enclaves: Fixup type and simplify logic of the poll mask setup

2020-11-02 Thread Andra Paraschiv
Update the assigned value of the poll result to be EPOLLHUP instead of
POLLHUP to match the __poll_t type.

While at it, simplify the logic of setting the mask result of the poll
function.

Changelog

v1 -> v2

* Simplify the mask setting logic from the poll function.

Signed-off-by: Andra Paraschiv 
Reported-by: kernel test robot 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index f06622b48d695..f1964ea4b8269 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -1505,10 +1505,8 @@ static __poll_t ne_enclave_poll(struct file *file, 
poll_table *wait)
 
poll_wait(file, _enclave->eventq, wait);
 
-   if (!ne_enclave->has_event)
-   return mask;
-
-   mask = POLLHUP;
+   if (ne_enclave->has_event)
+   mask |= EPOLLHUP;
 
return mask;
 }
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v1] nitro_enclaves: Fixup type of the poll result assigned value

2020-10-14 Thread Andra Paraschiv
Update the assigned value of the poll result to be EPOLLHUP instead of
POLLHUP to match the __poll_t type.

Signed-off-by: Andra Paraschiv 
Reported-by: kernel test robot 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index f06622b48d69..9148566455e8 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -1508,7 +1508,7 @@ static __poll_t ne_enclave_poll(struct file *file, 
poll_table *wait)
if (!ne_enclave->has_event)
return mask;
 
-   mask = POLLHUP;
+   mask = EPOLLHUP;
 
return mask;
 }
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v10 18/18] MAINTAINERS: Add entry for the Nitro Enclaves driver

2020-09-21 Thread Andra Paraschiv
Add entry in the MAINTAINERS file for the Nitro Enclaves files such as
the documentation, the header files, the driver itself and the user
space sample.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* Update the location of the documentation, as it has been moved to the
  "virt" directory.

v7 -> v8

* No changes.

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* No changes.

v2 -> v3

* Update file entries to be in alphabetical order.

v1 -> v2

* No changes.

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 MAINTAINERS | 13 +
 1 file changed, 13 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index d746519253c3..4bd4820a7f45 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12293,6 +12293,19 @@ S: Maintained
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2.git
 F: arch/nios2/
 
+NITRO ENCLAVES (NE)
+M: Andra Paraschiv 
+M: Alexandru Vasile 
+M: Alexandru Ciobotaru 
+L: linux-kernel@vger.kernel.org
+S: Supported
+W: https://aws.amazon.com/ec2/nitro/nitro-enclaves/
+F: Documentation/virt/ne_overview.rst
+F: drivers/virt/nitro_enclaves/
+F: include/linux/nitro_enclaves.h
+F: include/uapi/linux/nitro_enclaves.h
+F: samples/nitro_enclaves/
+
 NOHZ, DYNTICKS SUPPORT
 M: Frederic Weisbecker 
 M: Thomas Gleixner 
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v10 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver

2020-09-21 Thread Andra Paraschiv
Add kernel config entry for Nitro Enclaves, including dependencies.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* No changes.

v7 -> v8

* No changes.

v6 -> v7

* Remove, for now, the dependency on ARM64 arch. x86 is currently
  supported, with Arm to come afterwards. The NE kernel driver can be
  built for aarch64 arch.

v5 -> v6

* No changes.

v4 -> v5

* Add arch dependency for Arm / x86.

v3 -> v4

* Add PCI and SMP dependencies.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path to Kconfig to match the drivers/virt/nitro_enclaves
  directory.
* Update help in Kconfig.

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 drivers/virt/Kconfig|  2 ++
 drivers/virt/nitro_enclaves/Kconfig | 20 
 2 files changed, 22 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/Kconfig

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index cbc1f25c79ab..80c5f9c16ec1 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -32,4 +32,6 @@ config FSL_HV_MANAGER
 partition shuts down.
 
 source "drivers/virt/vboxguest/Kconfig"
+
+source "drivers/virt/nitro_enclaves/Kconfig"
 endif
diff --git a/drivers/virt/nitro_enclaves/Kconfig 
b/drivers/virt/nitro_enclaves/Kconfig
new file mode 100644
index ..8c9387a232df
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Kconfig
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Amazon Nitro Enclaves (NE) support.
+# Nitro is a hypervisor that has been developed by Amazon.
+
+# TODO: Add dependency for ARM64 once NE is supported on Arm platforms. For 
now,
+# the NE kernel driver can be built for aarch64 arch.
+# depends on (ARM64 || X86) && HOTPLUG_CPU && PCI && SMP
+
+config NITRO_ENCLAVES
+   tristate "Nitro Enclaves Support"
+   depends on X86 && HOTPLUG_CPU && PCI && SMP
+   help
+ This driver consists of support for enclave lifetime management
+ for Nitro Enclaves (NE).
+
+ To compile this driver as a module, choose M here.
+ The module will be called nitro_enclaves.
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v10 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves driver

2020-09-21 Thread Andra Paraschiv
Add Makefile for the Nitro Enclaves driver, considering the option set
in the kernel config.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* Remove -Wall flags, could use W=1 as an option for this.

v7 -> v8

* No changes.

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* No changes.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path to Makefile to match the drivers/virt/nitro_enclaves
  directory.

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 drivers/virt/Makefile| 2 ++
 drivers/virt/nitro_enclaves/Makefile | 9 +
 2 files changed, 11 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/Makefile

diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index fd331247c27a..f28425ce4b39 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -5,3 +5,5 @@
 
 obj-$(CONFIG_FSL_HV_MANAGER)   += fsl_hypervisor.o
 obj-y  += vboxguest/
+
+obj-$(CONFIG_NITRO_ENCLAVES)   += nitro_enclaves/
diff --git a/drivers/virt/nitro_enclaves/Makefile 
b/drivers/virt/nitro_enclaves/Makefile
new file mode 100644
index ..da61260f2be6
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Makefile
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Enclave lifetime management support for Nitro Enclaves (NE).
+
+obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves.o
+
+nitro_enclaves-y := ne_pci_dev.o ne_misc_dev.o
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v10 17/18] nitro_enclaves: Add overview documentation

2020-09-21 Thread Andra Paraschiv
Add documentation on the overview of Nitro Enclaves. Include it in the
virtualization specific directory.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* Move the Nitro Enclaves documentation to the "virt" directory and add
  an entry for it in the corresponding index file.

v7 -> v8

* Add info about the primary / parent VM CID value.
* Update reference link for huge pages.
* Add reference link for the x86 boot protocol.
* Add license mention and update doc title / chapter formatting.

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* Update doc type from .txt to .rst.
* Update documentation based on the changes from v4.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 Documentation/virt/index.rst   |  1 +
 Documentation/virt/ne_overview.rst | 95 ++
 2 files changed, 96 insertions(+)
 create mode 100644 Documentation/virt/ne_overview.rst

diff --git a/Documentation/virt/index.rst b/Documentation/virt/index.rst
index de1ab81df958..e4224305dbef 100644
--- a/Documentation/virt/index.rst
+++ b/Documentation/virt/index.rst
@@ -11,6 +11,7 @@ Linux Virtualization Support
uml/user_mode_linux
paravirt_ops
guest-halt-polling
+   ne_overview
 
 .. only:: html and subproject
 
diff --git a/Documentation/virt/ne_overview.rst 
b/Documentation/virt/ne_overview.rst
new file mode 100644
index ..39b0c8fe2654
--- /dev/null
+++ b/Documentation/virt/ne_overview.rst
@@ -0,0 +1,95 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==
+Nitro Enclaves
+==
+
+Overview
+
+
+Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
+that allows customers to carve out isolated compute environments within EC2
+instances [1].
+
+For example, an application that processes sensitive data and runs in a VM,
+can be separated from other applications running in the same VM. This
+application then runs in a separate VM than the primary VM, namely an enclave.
+
+An enclave runs alongside the VM that spawned it. This setup matches low 
latency
+applications needs. The resources that are allocated for the enclave, such as
+memory and CPUs, are carved out of the primary VM. Each enclave is mapped to a
+process running in the primary VM, that communicates with the NE driver via an
+ioctl interface.
+
+In this sense, there are two components:
+
+1. An enclave abstraction process - a user space process running in the primary
+VM guest that uses the provided ioctl interface of the NE driver to spawn an
+enclave VM (that's 2 below).
+
+There is a NE emulated PCI device exposed to the primary VM. The driver for 
this
+new PCI device is included in the NE driver.
+
+The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE 
ioctl
+maps to an enclave start PCI command. The PCI device commands are then
+translated into  actions taken on the hypervisor side; that's the Nitro
+hypervisor running on the host where the primary VM is running. The Nitro
+hypervisor is based on core KVM technology.
+
+2. The enclave itself - a VM running on the same host as the primary VM that
+spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
+for the enclave VM. An enclave does not have persistent storage attached.
+
+The memory regions carved out of the primary VM and given to an enclave need to
+be aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
+this size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
+user space [2][3]. The memory size for an enclave needs to be at least 64 MiB.
+The enclave memory and CPUs need to be from the same NUMA node.
+
+An enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
+available for the primary VM. A CPU pool has to be set for NE purposes by an
+user with admin capability. See the cpu list section from the kernel
+documentation [4] for how a CPU pool format looks.
+
+An enclave communicates with the primary VM via a local communication channel,
+using virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
+while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
+uses eventfd for signaling. The enclave VM sees the usual interfaces - local
+APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
+device is placed in memory below the typical 4 GiB.
+
+The application that runs in the enclave needs to be packaged in an enclave
+image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
+enclave VM. The enclave VM has its own kernel and follows the standard Linux
+boot protocol [6].
+
+The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
+Enclave Image Format (EIF); plus an EIF header including metadata such as magic
+number, eif

[PATCH v10 16/18] nitro_enclaves: Add sample for ioctl interface usage

2020-09-21 Thread Andra Paraschiv
Add a user space sample for the usage of the ioctl interface provided by
the Nitro Enclaves driver.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* No changes.

v7 -> v8

* Track NE custom error codes for invalid page size, invalid flags and
  enclave CID.
* Update the heartbeat logic to have a listener fd first, then start the
  enclave and then accept connection to get the heartbeat.
* Update the reference link to the hugetlb documentation.

v6 -> v7

* Track POLLNVAL as poll event in addition to POLLHUP.

v5 -> v6

* Remove "rc" mentioning when printing errno string.
* Remove the ioctl to query API version.
* Include usage info for NUMA-aware hugetlb configuration.
* Update documentation to kernel-doc format.
* Add logic for enclave image loading.

v4 -> v5

* Print enclave vCPU ids when they are created.
* Update logic to map the modified vCPU ioctl call.
* Add check for the path to the enclave image to be less than PATH_MAX.
* Update the ioctl calls error checking logic to match the NE specific
  error codes.

v3 -> v4

* Update usage details to match the updates in v4.
* Update NE ioctl interface usage.

v2 -> v3

* Remove the include directory to use the uapi from the kernel.
* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* New in v2.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 samples/nitro_enclaves/.gitignore|   2 +
 samples/nitro_enclaves/Makefile  |  16 +
 samples/nitro_enclaves/ne_ioctl_sample.c | 883 +++
 3 files changed, 901 insertions(+)
 create mode 100644 samples/nitro_enclaves/.gitignore
 create mode 100644 samples/nitro_enclaves/Makefile
 create mode 100644 samples/nitro_enclaves/ne_ioctl_sample.c

diff --git a/samples/nitro_enclaves/.gitignore 
b/samples/nitro_enclaves/.gitignore
new file mode 100644
index ..827934129c90
--- /dev/null
+++ b/samples/nitro_enclaves/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+ne_ioctl_sample
diff --git a/samples/nitro_enclaves/Makefile b/samples/nitro_enclaves/Makefile
new file mode 100644
index ..a3ec78fefb52
--- /dev/null
+++ b/samples/nitro_enclaves/Makefile
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Enclave lifetime management support for Nitro Enclaves (NE) - ioctl sample
+# usage.
+
+.PHONY: all clean
+
+CFLAGS += -Wall
+
+all:
+   $(CC) $(CFLAGS) -o ne_ioctl_sample ne_ioctl_sample.c -lpthread
+
+clean:
+   rm -f ne_ioctl_sample
diff --git a/samples/nitro_enclaves/ne_ioctl_sample.c 
b/samples/nitro_enclaves/ne_ioctl_sample.c
new file mode 100644
index ..480b763142b3
--- /dev/null
+++ b/samples/nitro_enclaves/ne_ioctl_sample.c
@@ -0,0 +1,883 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * DOC: Sample flow of using the ioctl interface provided by the Nitro 
Enclaves (NE)
+ * kernel driver.
+ *
+ * Usage
+ * -
+ *
+ * Load the nitro_enclaves module, setting also the enclave CPU pool. The
+ * enclave CPUs need to be full cores from the same NUMA node. CPU 0 and its
+ * siblings have to remain available for the primary / parent VM, so they
+ * cannot be included in the enclave CPU pool.
+ *
+ * See the cpu list section from the kernel documentation.
+ * 
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html#cpu-lists
+ *
+ * insmod drivers/virt/nitro_enclaves/nitro_enclaves.ko
+ * lsmod
+ *
+ * The CPU pool can be set at runtime, after the kernel module is loaded.
+ *
+ * echo  > /sys/module/nitro_enclaves/parameters/ne_cpus
+ *
+ * NUMA and CPU siblings information can be found using:
+ *
+ * lscpu
+ * /proc/cpuinfo
+ *
+ * Check the online / offline CPU list. The CPUs from the pool should be
+ * offlined.
+ *
+ * lscpu
+ *
+ * Check dmesg for any warnings / errors through the NE driver lifetime / 
usage.
+ * The NE logs contain the "nitro_enclaves" or "pci :00:02.0" pattern.
+ *
+ * dmesg
+ *
+ * Setup hugetlbfs huge pages. The memory needs to be from the same NUMA node 
as
+ * the enclave CPUs.
+ *
+ * https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html
+ *
+ * By default, the allocation of hugetlb pages are distributed on all possible
+ * NUMA nodes. Use the following configuration files to set the number of huge
+ * pages from a NUMA node:
+ *
+ * /sys/devices/system/node/node/hugepages/hugepages-2048kB/nr_hugepages
+ * 
/sys/devices/system/node/node/hugepages/hugepages-1048576kB/nr_hugepages
+ *
+ * or, if not on a system with multiple NUMA nodes, can also set the number
+ * of 2 MiB / 1 GiB huge pages using
+ *
+ * /sys/kernel/mm/hugepages/hu

[PATCH v10 09/18] nitro_enclaves: Add logic for setting an enclave vCPU

2020-09-21 Thread Andra Paraschiv
An enclave, before being started, has its resources set. One of its
resources is CPU.

A NE CPU pool is set and enclave CPUs are chosen from it. Offline the
CPUs from the NE CPU pool during the pool setup and online them back
during the NE CPU pool teardown. The CPU offline is necessary so that
there would not be more vCPUs than physical CPUs available to the
primary / parent VM. In that case the CPUs would be overcommitted and
would change the initial configuration of the primary / parent VM of
having dedicated vCPUs to physical CPUs.

The enclave CPUs need to be full cores and from the same NUMA node. CPU
0 and its siblings have to remain available to the primary / parent VM.

Add ioctl command logic for setting an enclave vCPU.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* Use the ne_devs data structure to get the refs for the NE PCI device.

v7 -> v8

* No changes.

v6 -> v7

* Check for error return value when setting the kernel parameter string.
* Use the NE misc device parent field to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.
* Calculate the number of threads per core and not use smp_num_siblings
  that is x86 specific.

v5 -> v6

* Check CPUs are from the same NUMA node before going through CPU
  siblings during the NE CPU pool setup.
* Update documentation to kernel-doc format.

v4 -> v5

* Set empty string in case of invalid NE CPU pool.
* Clear NE CPU pool mask on pool setup failure.
* Setup NE CPU cores out of the NE CPU pool.
* Early exit on NE CPU pool setup if enclave(s) already running.
* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.
* Add check for maximum vCPU id possible before looking into the CPU
  pool.
* Remove log on copy_from_user() / copy_to_user() failure and on admin
  capability check for setting the NE CPU pool.
* Update the ioctl call to not create a file descriptor for the vCPU.
* Split the CPU pool usage logic in 2 separate functions - one to get a
  CPU from the pool and the other to check the given CPU is available in
  the pool.

v3 -> v4

* Setup the NE CPU pool at runtime via a sysfs file for the kernel
  parameter.
* Check enclave CPUs to be from the same NUMA node.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open, ioctl and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave state is init when setting enclave vCPU.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 695 ++
 1 file changed, 695 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 3515b163ad0e..24e4270d181d 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -83,8 +83,11 @@ struct ne_devs ne_devs = {
  * TODO: Update logic to create new sysfs entries instead of using
  * a kernel parameter e.g. if multiple sysfs files needed.
  */
+static int ne_set_kernel_param(const char *val, const struct kernel_param *kp);
+
 static const struct kernel_param_ops ne_cpu_pool_ops = {
.get= param_get_string,
+   .set= ne_set_kernel_param,
 };
 
 static char ne_cpus[NE_CPUS_SIZE];
@@ -122,6 +125,695 @@ struct ne_cpu_pool {
 
 static struct ne_cpu_pool ne_cpu_pool;
 
+/**
+ * ne_check_enclaves_created() - Verify if at least one enclave has been 
created.
+ * @void:  No parameters provided.
+ *
+ * Context: Process context.
+ * Return:
+ * * True if at least one enclave is created.
+ * * False otherwise.
+ */
+static bool ne_check_enclaves_created(void)
+{
+   struct ne_pci_dev *ne_pci_dev = ne_devs.ne_pci_dev;
+   bool ret = false;
+
+   if (!ne_pci_dev)
+   return ret;
+
+   mutex_lock(_pci_dev->enclaves_list_mutex);
+
+   if (!list_empty(_pci_dev->enclaves_list))
+   ret = true;
+
+   mutex_unlock(_pci_dev->enclaves_list_mutex);
+
+   return ret;
+}
+
+/**
+ * ne_setup_cpu_pool() - Set the NE CPU pool after handling sanity checks such
+ *  as not sharing CPU cores with the primary / parent VM
+ *  or not using CPU 0, which should remain available for
+ *  the primary / parent VM. Offline the CPUs from the
+ *  pool after the checks passed.
+ * @ne_cpu_list:   The CPU list used for setting NE CPU pool.
+ *
+ * Context: Process context.
+ * Return:
+ * *

[PATCH v10 11/18] nitro_enclaves: Add logic for setting an enclave memory region

2020-09-21 Thread Andra Paraschiv
Another resource that is being set for an enclave is memory. User space
memory regions, that need to be backed by contiguous memory regions,
are associated with the enclave.

One solution for allocating / reserving contiguous memory regions, that
is used for integration, is hugetlbfs. The user space process that is
associated with the enclave passes to the driver these memory regions.

The enclave memory regions need to be from the same NUMA node as the
enclave CPUs.

Add ioctl command logic for setting user space memory region for an
enclave.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* Use the ne_devs data structure to get the refs for the NE PCI device.

v7 -> v8

* Add early check, while getting user pages, to be multiple of 2 MiB for
  the pages that back the user space memory region.
* Add custom error code for incorrect user space memory region flag.
* Include in a separate function the sanity checks for each page of the
  user space memory region.

v6 -> v7

* Update check for duplicate user space memory regions to cover
  additional possible scenarios.

v5 -> v6

* Check for max number of pages allocated for the internal data
  structure for pages.
* Check for invalid memory region flags.
* Check for aligned physical memory regions.
* Update documentation to kernel-doc format.
* Check for duplicate user space memory regions.
* Use directly put_page() instead of unpin_user_pages(), to match the
  get_user_pages() calls.

v4 -> v5

* Add early exit on set memory region ioctl function call error.
* Remove log on copy_from_user() failure.
* Exit without unpinning the pages on NE PCI dev request failure as
  memory regions from the user space range may have already been added.
* Add check for the memory region user space address to be 2 MiB
  aligned.
* Update logic to not have a hardcoded check for 2 MiB memory regions.

v3 -> v4

* Check enclave memory regions are from the same NUMA node as the
  enclave CPUs.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave max memory regions is reached when setting an enclave
  memory region.
* Check if enclave state is init when setting an enclave memory region.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 317 ++
 1 file changed, 317 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index fe83588c8b02..f2252f67302c 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -722,6 +722,286 @@ static int ne_add_vcpu_ioctl(struct ne_enclave 
*ne_enclave, u32 vcpu_id)
return 0;
 }
 
+/**
+ * ne_sanity_check_user_mem_region() - Sanity check the user space memory
+ *region received during the set user
+ *memory region ioctl call.
+ * @ne_enclave :   Private data associated with the current enclave.
+ * @mem_region :   User space memory region to be sanity checked.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_sanity_check_user_mem_region(struct ne_enclave *ne_enclave,
+   struct ne_user_memory_region mem_region)
+{
+   struct ne_mem_region *ne_mem_region = NULL;
+
+   if (ne_enclave->mm != current->mm)
+   return -EIO;
+
+   if (mem_region.memory_size & (NE_MIN_MEM_REGION_SIZE - 1)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "User space memory size is not multiple of 
2 MiB\n");
+
+   return -NE_ERR_INVALID_MEM_REGION_SIZE;
+   }
+
+   if (!IS_ALIGNED(mem_region.userspace_addr, NE_MIN_MEM_REGION_SIZE)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "User space address is not 2 MiB 
aligned\n");
+
+   return -NE_ERR_UNALIGNED_MEM_REGION_ADDR;
+   }
+
+   if ((mem_region.userspace_addr & (NE_MIN_MEM_REGION_SIZE - 1)) ||
+   !access_ok((void __user *)(unsigned long)mem_region.userspace_addr,
+  mem_region.memory_size)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Invalid user space address range\n");
+
+   return -NE_ERR_INVALID_MEM_REGION_ADDR;
+   }
+
+   list_for_each_entry(ne_mem_region, _encla

[PATCH v10 13/18] nitro_enclaves: Add logic for terminating an enclave

2020-09-21 Thread Andra Paraschiv
An enclave is associated with an fd that is returned after the enclave
creation logic is completed. This enclave fd is further used to setup
enclave resources. Once the enclave needs to be terminated, the enclave
fd is closed.

Add logic for enclave termination, that is mapped to the enclave fd
release callback. Free the internal enclave info used for bookkeeping.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* Use the ne_devs data structure to get the refs for the NE PCI device.

v7 -> v8

* No changes.

v6 -> v7

* Remove the pci_dev_put() call as the NE misc device parent field is
  used now to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Update documentation to kernel-doc format.
* Use directly put_page() instead of unpin_user_pages(), to match the
  get_user_pages() calls.

v4 -> v5

* Release the reference to the NE PCI device on enclave fd release.
* Adapt the logic to cpumask enclave vCPU ids and CPU cores.
* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add early exit in release() if there was a slot alloc error in the fd
  creation path.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 166 ++
 1 file changed, 166 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index dd4752b99ece..f06622b48d69 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -1324,6 +1324,171 @@ static long ne_enclave_ioctl(struct file *file, 
unsigned int cmd, unsigned long
return 0;
 }
 
+/**
+ * ne_enclave_remove_all_mem_region_entries() - Remove all memory region 
entries
+ * from the enclave data structure.
+ * @ne_enclave :   Private data associated with the current enclave.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ */
+static void ne_enclave_remove_all_mem_region_entries(struct ne_enclave 
*ne_enclave)
+{
+   unsigned long i = 0;
+   struct ne_mem_region *ne_mem_region = NULL;
+   struct ne_mem_region *ne_mem_region_tmp = NULL;
+
+   list_for_each_entry_safe(ne_mem_region, ne_mem_region_tmp,
+_enclave->mem_regions_list,
+mem_region_list_entry) {
+   list_del(_mem_region->mem_region_list_entry);
+
+   for (i = 0; i < ne_mem_region->nr_pages; i++)
+   put_page(ne_mem_region->pages[i]);
+
+   kfree(ne_mem_region->pages);
+
+   kfree(ne_mem_region);
+   }
+}
+
+/**
+ * ne_enclave_remove_all_vcpu_id_entries() - Remove all vCPU id entries from
+ *  the enclave data structure.
+ * @ne_enclave :   Private data associated with the current enclave.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ */
+static void ne_enclave_remove_all_vcpu_id_entries(struct ne_enclave 
*ne_enclave)
+{
+   unsigned int cpu = 0;
+   unsigned int i = 0;
+
+   mutex_lock(_cpu_pool.mutex);
+
+   for (i = 0; i < ne_enclave->nr_parent_vm_cores; i++) {
+   for_each_cpu(cpu, ne_enclave->threads_per_core[i])
+   /* Update the available NE CPU pool. */
+   cpumask_set_cpu(cpu, 
ne_cpu_pool.avail_threads_per_core[i]);
+
+   free_cpumask_var(ne_enclave->threads_per_core[i]);
+   }
+
+   mutex_unlock(_cpu_pool.mutex);
+
+   kfree(ne_enclave->threads_per_core);
+
+   free_cpumask_var(ne_enclave->vcpu_ids);
+}
+
+/**
+ * ne_pci_dev_remove_enclave_entry() - Remove the enclave entry from the data
+ *structure that is part of the NE PCI
+ *device private data.
+ * @ne_enclave :   Private data associated with the current enclave.
+ * @ne_pci_dev :   Private data associated with the PCI device.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev 
enclave
+ * mutex held.
+ */
+static void ne_pci_dev_remove_enclave_entry(struct ne_enclave *ne_enclave,
+   struct ne_pci_dev *ne_pci_dev)
+{
+   struct ne_enclave *ne_enclave_entry = NULL;
+   struct ne_enclave *

[PATCH v10 12/18] nitro_enclaves: Add logic for starting an enclave

2020-09-21 Thread Andra Paraschiv
After all the enclave resources are set, the enclave is ready for
beginning to run.

Add ioctl command logic for starting an enclave after all its resources,
memory regions and CPUs, have been set.

The enclave start information includes the local channel addressing -
vsock CID - and the flags associated with the enclave.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* Use the ne_devs data structure to get the refs for the NE PCI device.

v7 -> v8

* Add check for invalid enclave CID value e.g. well-known CIDs and
  parent VM CID.
* Add custom error code for incorrect flag in enclave start info and
  invalid enclave CID.

v6 -> v7

* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Check for invalid enclave start flags.
* Update documentation to kernel-doc format.

v4 -> v5

* Add early exit on enclave start ioctl function call error.
* Move sanity checks in the enclave start ioctl function, outside of the
  switch-case block.
* Remove log on copy_from_user() / copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update the naming for the ioctl command from metadata to info.
* Check for minimum enclave memory size.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.

v1 -> v2

* Add log pattern for NE.
* Check if enclave state is init when starting an enclave.
* Remove the BUG_ON calls.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 157 ++
 1 file changed, 157 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index f2252f67302c..dd4752b99ece 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -1002,6 +1002,79 @@ static int ne_set_user_memory_region_ioctl(struct 
ne_enclave *ne_enclave,
return rc;
 }
 
+/**
+ * ne_start_enclave_ioctl() - Trigger enclave start after the enclave 
resources,
+ *   such as memory and CPU, have been set.
+ * @ne_enclave :   Private data associated with the current 
enclave.
+ * @enclave_start_info :   Enclave info that includes enclave cid and 
flags.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_start_enclave_ioctl(struct ne_enclave *ne_enclave,
+   struct ne_enclave_start_info *enclave_start_info)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   unsigned int cpu = 0;
+   struct enclave_start_req enclave_start_req = {};
+   unsigned int i = 0;
+   struct pci_dev *pdev = ne_devs.ne_pci_dev->pdev;
+   int rc = -EINVAL;
+
+   if (!ne_enclave->nr_mem_regions) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave has no mem regions\n");
+
+   return -NE_ERR_NO_MEM_REGIONS_ADDED;
+   }
+
+   if (ne_enclave->mem_size < NE_MIN_ENCLAVE_MEM_SIZE) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave memory is less than %ld\n",
+   NE_MIN_ENCLAVE_MEM_SIZE);
+
+   return -NE_ERR_ENCLAVE_MEM_MIN_SIZE;
+   }
+
+   if (!ne_enclave->nr_vcpus) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave has no vCPUs\n");
+
+   return -NE_ERR_NO_VCPUS_ADDED;
+   }
+
+   for (i = 0; i < ne_enclave->nr_parent_vm_cores; i++)
+   for_each_cpu(cpu, ne_enclave->threads_per_core[i])
+   if (!cpumask_test_cpu(cpu, ne_enclave->vcpu_ids)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Full CPU cores not 
used\n");
+
+   return -NE_ERR_FULL_CORES_NOT_USED;
+   }
+
+   enclave_start_req.enclave_cid = enclave_start_info->enclave_cid;
+   enclave_start_req.flags = enclave_start_info->flags;
+   enclave_start_req.slot_uid = ne_enclave->slot_uid;
+
+   rc = ne_do_request(pdev, ENCLAVE_START,
+  _start_req, sizeof(enclave_start_req),
+  _reply, sizeof(cmd_reply));
+   if (rc < 0) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Error in enclave start [rc=%d]\n", rc);
+
+   return rc;
+   }
+
+   ne_enclave->state = NE_STATE_RUNNING;
+
+   enclave_start_info->enclave_cid = cmd_reply.

[PATCH v10 10/18] nitro_enclaves: Add logic for getting the enclave image load info

2020-09-21 Thread Andra Paraschiv
Before setting the memory regions for the enclave, the enclave image
needs to be placed in memory. After the memory regions are set, this
memory cannot be used anymore by the VM, being carved out.

Add ioctl command logic to get the offset in enclave memory where to
place the enclave image. Then the user space tooling copies the enclave
image in the memory using the given memory offset.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* No changes.

v7 -> v8

* Add custom error code for incorrect enclave image load info flag.

v6 -> v7

* No changes.

v5 -> v6

* Check for invalid enclave image load flags.

v4 -> v5

* Check for the enclave not being started when invoking this ioctl call.
* Remove log on copy_from_user() / copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Set enclave image load offset based on flags.
* Update the naming for the ioctl command from metadata to info.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 36 +++
 1 file changed, 36 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 24e4270d181d..fe83588c8b02 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -807,6 +807,42 @@ static long ne_enclave_ioctl(struct file *file, unsigned 
int cmd, unsigned long
return 0;
}
 
+   case NE_GET_IMAGE_LOAD_INFO: {
+   struct ne_image_load_info image_load_info = {};
+
+   if (copy_from_user(_load_info, (void __user *)arg, 
sizeof(image_load_info)))
+   return -EFAULT;
+
+   mutex_lock(_enclave->enclave_info_mutex);
+
+   if (ne_enclave->state != NE_STATE_INIT) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave is not in init state\n");
+
+   mutex_unlock(_enclave->enclave_info_mutex);
+
+   return -NE_ERR_NOT_IN_INIT_STATE;
+   }
+
+   mutex_unlock(_enclave->enclave_info_mutex);
+
+   if (!image_load_info.flags ||
+   image_load_info.flags >= NE_IMAGE_LOAD_MAX_FLAG_VAL) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Incorrect flag in enclave image 
load info\n");
+
+   return -NE_ERR_INVALID_FLAG_VALUE;
+   }
+
+   if (image_load_info.flags == NE_EIF_IMAGE)
+   image_load_info.memory_offset = NE_EIF_LOAD_OFFSET;
+
+   if (copy_to_user((void __user *)arg, _load_info, 
sizeof(image_load_info)))
+   return -EFAULT;
+
+   return 0;
+   }
+
default:
return -ENOTTY;
}
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v10 07/18] nitro_enclaves: Init misc device providing the ioctl interface

2020-09-21 Thread Andra Paraschiv
The Nitro Enclaves driver provides an ioctl interface to the user space
for enclave lifetime management e.g. enclave creation / termination and
setting enclave resources such as memory and CPU.

This ioctl interface is mapped to a Nitro Enclaves misc device.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* Use the ne_devs data structure to get the refs for the NE misc device
  in the NE PCI device driver logic.

v7 -> v8

* Add define for the CID of the primary / parent VM.
* Update the NE PCI driver shutdown logic to include misc device
  deregister.

v6 -> v7

* Set the NE PCI device the parent of the NE misc device to be able to
  use it in the ioctl logic.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Remove the ioctl to query API version.
* Update documentation to kernel-doc format.

v4 -> v5

* Update the size of the NE CPU pool string from 4096 to 512 chars.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Remove the NE CPU pool init during kernel module loading, as the CPU
  pool is now setup at runtime, via a sysfs file for the kernel
  parameter.
* Add minimum enclave memory size definition.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.
* Remove the WARN_ON calls.
* Remove linux/bug and linux/kvm_host includes that are not needed.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.
* Remove file ops that do nothing for now - open and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Update ne_cpu_pool data structure to include the global mutex.
* Update NE misc device mode to 0660.
* Check if the CPU siblings are included in the NE CPU pool, as full CPU
  cores are given for the enclave(s).

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 139 ++
 drivers/virt/nitro_enclaves/ne_pci_dev.c  |  14 +++
 2 files changed, 153 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
new file mode 100644
index ..c06825070313
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * DOC: Enclave lifetime management driver for Nitro Enclaves (NE).
+ * Nitro is a hypervisor that has been developed by Amazon.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+/**
+ * NE_CPUS_SIZE - Size for max 128 CPUs, for now, in a cpu-list string, comma
+ *   separated. The NE CPU pool includes CPUs from a single NUMA
+ *   node.
+ */
+#define NE_CPUS_SIZE   (512)
+
+/**
+ * NE_EIF_LOAD_OFFSET - The offset where to copy the Enclave Image Format (EIF)
+ * image in enclave memory.
+ */
+#define NE_EIF_LOAD_OFFSET (8 * 1024UL * 1024UL)
+
+/**
+ * NE_MIN_ENCLAVE_MEM_SIZE - The minimum memory size an enclave can be launched
+ *  with.
+ */
+#define NE_MIN_ENCLAVE_MEM_SIZE(64 * 1024UL * 1024UL)
+
+/**
+ * NE_MIN_MEM_REGION_SIZE - The minimum size of an enclave memory region.
+ */
+#define NE_MIN_MEM_REGION_SIZE (2 * 1024UL * 1024UL)
+
+/**
+ * NE_PARENT_VM_CID - The CID for the vsock device of the primary / parent VM.
+ */
+#define NE_PARENT_VM_CID   (3)
+
+static const struct file_operations ne_fops = {
+   .owner  = THIS_MODULE,
+   .llseek = noop_llseek,
+};
+
+static struct miscdevice ne_misc_dev = {
+   .minor  = MISC_DYNAMIC_MINOR,
+   .name   = "nitro_enclaves",
+   .fops   = _fops,
+   .mode   = 0660,
+};
+
+struct ne_devs ne_devs = {
+   .ne_misc_dev= _misc_dev,
+};
+
+/*
+ * TODO: Update logic to create new sysfs entries instead of using
+ * a kernel parameter e.g. if multiple sysfs files needed.
+ */
+static const struct kernel_param_ops ne_cpu_pool_ops = {
+   .get= param_get_string,
+};
+
+static char ne_cpus[NE_CPUS_SIZE];
+static struct kparam_string ne_cpus_arg = {
+   .maxlen = sizeof(ne_cpus),
+   .string = ne_cpus,
+};
+
+module_param_cb(ne_cpus, _cpu_pool_ops, _cpus_arg, 0644);
+/* 
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html#cpu-lists
 */
+MODULE_PARM_DESC(ne_cpus, " - CPU pool used for Nitro Enclaves");
+
+/**
+ * struct ne_cpu_pool - CPU pool used for Nitro Encla

[PATCH v10 03/18] nitro_enclaves: Define enclave info for internal bookkeeping

2020-09-21 Thread Andra Paraschiv
The Nitro Enclaves driver keeps an internal info per each enclave.

This is needed to be able to manage enclave resources state, enclave
notifications and have a reference of the PCI device that handles
command requests for enclave lifetime management.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* Add data structure to keep references to both Nitro Enclaves misc and
  PCI devices.

v7 -> v8

* No changes.

v6 -> v7

* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Update documentation to kernel-doc format.
* Include in the enclave memory region data structure the user space
  address and size for duplicate user space memory regions checks.

v4 -> v5

* Include enclave cores field in the enclave metadata.
* Update the vCPU ids data structure to be a cpumask instead of a list.

v3 -> v4

* Add NUMA node field for an enclave metadata as the enclave memory and
  CPUs need to be from the same NUMA node.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Add enclave memory regions and vcpus count for enclave bookkeeping.
* Update ne_state comments to reflect NE_START_ENCLAVE ioctl naming
  update.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.h | 109 ++
 1 file changed, 109 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.h

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.h 
b/drivers/virt/nitro_enclaves/ne_misc_dev.h
new file mode 100644
index ..2a4d2224baba
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.h
@@ -0,0 +1,109 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _NE_MISC_DEV_H_
+#define _NE_MISC_DEV_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_pci_dev.h"
+
+/**
+ * struct ne_mem_region - Entry in the enclave user space memory regions list.
+ * @mem_region_list_entry: Entry in the list of enclave memory regions.
+ * @memory_size:   Size of the user space memory region.
+ * @nr_pages:  Number of pages that make up the memory region.
+ * @pages: Pages that make up the user space memory region.
+ * @userspace_addr:User space address of the memory region.
+ */
+struct ne_mem_region {
+   struct list_headmem_region_list_entry;
+   u64 memory_size;
+   unsigned long   nr_pages;
+   struct page **pages;
+   u64 userspace_addr;
+};
+
+/**
+ * struct ne_enclave - Per-enclave data used for enclave lifetime management.
+ * @enclave_info_mutex :   Mutex for accessing this internal state.
+ * @enclave_list_entry :   Entry in the list of created enclaves.
+ * @eventq:Wait queue used for out-of-band event 
notifications
+ * triggered from the PCI device event handler to
+ * the enclave process via the poll function.
+ * @has_event: Variable used to determine if the out-of-band 
event
+ * was triggered.
+ * @max_mem_regions:   The maximum number of memory regions that can be
+ * handled by the hypervisor.
+ * @mem_regions_list:  Enclave user space memory regions list.
+ * @mem_size:  Enclave memory size.
+ * @mm :   Enclave process abstraction mm data struct.
+ * @nr_mem_regions:Number of memory regions associated with the 
enclave.
+ * @nr_parent_vm_cores :   The size of the threads per core array. The
+ * total number of CPU cores available on the
+ * parent / primary VM.
+ * @nr_threads_per_core:   The number of threads that a full CPU core has.
+ * @nr_vcpus:  Number of vcpus associated with the enclave.
+ * @numa_node: NUMA node of the enclave memory and CPUs.
+ * @slot_uid:  Slot unique id mapped to the enclave.
+ * @state: Enclave state, updated during enclave lifetime.
+ * @threads_per_core:  Enclave full CPU cores array, indexed by core 
id,
+ * consisting of cpumasks with all their threads.
+ * Full CPU cores are taken from the NE CPU pool
+ * and are available to the enclave.
+ * @vcpu_ids:  Cpumask of the vCPUs that are set for the 
enclave.
+ */
+struct ne_enclave {
+   struct mutexenclave_info_mutex;
+   struc

[PATCH v10 06/18] nitro_enclaves: Handle out-of-band PCI device events

2020-09-21 Thread Andra Paraschiv
In addition to the replies sent by the Nitro Enclaves PCI device in
response to command requests, out-of-band enclave events can happen e.g.
an enclave crashes. In this case, the Nitro Enclaves driver needs to be
aware of the event and notify the corresponding user space process that
abstracts the enclave.

Register an MSI-X interrupt vector to be used for this kind of
out-of-band events. The interrupt notifies that the state of an enclave
changed and the driver logic scans the state of each running enclave to
identify for which this notification is intended.

Create an workqueue to handle the out-of-band events. Notify user space
enclave process that is using a polling mechanism on the enclave fd.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* Use the reference to the pdev directly from the ne_pci_dev instead of
  the one from the enclave data structure.

v7 -> v8

* No changes.

v6 -> v7

* No changes.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 118 +++
 1 file changed, 118 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index cedc4dd2dd39..6654cc8a1bc3 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -199,6 +199,90 @@ static irqreturn_t ne_reply_handler(int irq, void *args)
return IRQ_HANDLED;
 }
 
+/**
+ * ne_event_work_handler() - Work queue handler for notifying enclaves on a
+ *  state change received by the event interrupt
+ *  handler.
+ * @work:  Item containing the NE PCI device for which an out-of-band event
+ * was issued.
+ *
+ * An out-of-band event is being issued by the Nitro Hypervisor when at least
+ * one enclave is changing state without client interaction.
+ *
+ * Context: Work queue context.
+ */
+static void ne_event_work_handler(struct work_struct *work)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   struct ne_enclave *ne_enclave = NULL;
+   struct ne_pci_dev *ne_pci_dev =
+   container_of(work, struct ne_pci_dev, notify_work);
+   struct pci_dev *pdev = ne_pci_dev->pdev;
+   int rc = -EINVAL;
+   struct slot_info_req slot_info_req = {};
+
+   mutex_lock(_pci_dev->enclaves_list_mutex);
+
+   /*
+* Iterate over all enclaves registered for the Nitro Enclaves
+* PCI device and determine for which enclave(s) the out-of-band event
+* is corresponding to.
+*/
+   list_for_each_entry(ne_enclave, _pci_dev->enclaves_list, 
enclave_list_entry) {
+   mutex_lock(_enclave->enclave_info_mutex);
+
+   /*
+* Enclaves that were never started cannot receive out-of-band
+* events.
+*/
+   if (ne_enclave->state != NE_STATE_RUNNING)
+   goto unlock;
+
+   slot_info_req.slot_uid = ne_enclave->slot_uid;
+
+   rc = ne_do_request(pdev, SLOT_INFO,
+  _info_req, sizeof(slot_info_req),
+  _reply, sizeof(cmd_reply));
+   if (rc < 0)
+   dev_err(>dev, "Error in slot info [rc=%d]\n", rc);
+
+   /* Notify enclave process that the enclave state changed. */
+   if (ne_enclave->state != cmd_reply.state) {
+   ne_enclave->state = cmd_reply.state;
+
+   ne_enclave->has_event = true;
+
+   wake_up_interruptible(_enclave->eventq);
+   }
+
+unlock:
+mutex_unlock(_enclave->enclave_info_mutex);
+   }
+
+   mutex_unlock(_pci_dev->enclaves_list_mutex);
+}
+
+/**
+ * ne_event_handler() - Interrupt handler for PCI device out-of-band events.
+ * This interrupt does not supply any data in the MMIO
+ * region. It notifies a change in the state of any of
+ * the launched enclaves.
+ * @irq:   Received interrupt for an out-of-band event.
+ * @args:  PCI device private data structure.
+ *
+ * Context: Interrupt context.
+ * Return:
+ * * IRQ_HANDLED on handled in

[PATCH v10 04/18] nitro_enclaves: Init PCI device driver

2020-09-21 Thread Andra Paraschiv
The Nitro Enclaves PCI device is used by the kernel driver as a means of
communication with the hypervisor on the host where the primary VM and
the enclaves run. It handles requests with regard to enclave lifetime.

Setup the PCI device driver and add support for MSI-X interrupts.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* Init the reference to the ne_pci_dev in the ne_devs data structure.

v7 -> v8

* Add NE PCI driver shutdown logic.

v6 -> v7

* No changes.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update NE PCI driver name to "nitro_enclaves".

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.
* Remove the WARN_ON calls.
* Remove linux/bug include that is not needed.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update PCI device setup functions to receive PCI device data structure and
  then get private data from it inside the functions logic.
* Remove the BUG_ON calls.
* Add teardown function for MSI-X setup.
* Update goto labels to match their purpose.
* Implement TODO for NE PCI device disable state check.
* Update function name for NE PCI device probe / remove.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Alexandru Ciobotaru 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 304 +++
 1 file changed, 304 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
new file mode 100644
index ..32f07345c3b5
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -0,0 +1,304 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * DOC: Nitro Enclaves (NE) PCI device driver.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+/**
+ * NE_DEFAULT_TIMEOUT_MSECS - Default timeout to wait for a reply from
+ *   the NE PCI device.
+ */
+#define NE_DEFAULT_TIMEOUT_MSECS   (12) /* 120 sec */
+
+static const struct pci_device_id ne_pci_ids[] = {
+   { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_NE) },
+   { 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, ne_pci_ids);
+
+/**
+ * ne_setup_msix() - Setup MSI-X vectors for the PCI device.
+ * @pdev:  PCI device to setup the MSI-X for.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_setup_msix(struct pci_dev *pdev)
+{
+   int nr_vecs = 0;
+   int rc = -EINVAL;
+
+   nr_vecs = pci_msix_vec_count(pdev);
+   if (nr_vecs < 0) {
+   rc = nr_vecs;
+
+   dev_err(>dev, "Error in getting vec count [rc=%d]\n", rc);
+
+   return rc;
+   }
+
+   rc = pci_alloc_irq_vectors(pdev, nr_vecs, nr_vecs, PCI_IRQ_MSIX);
+   if (rc < 0) {
+   dev_err(>dev, "Error in alloc MSI-X vecs [rc=%d]\n", rc);
+
+   return rc;
+   }
+
+   return 0;
+}
+
+/**
+ * ne_teardown_msix() - Teardown MSI-X vectors for the PCI device.
+ * @pdev:  PCI device to teardown the MSI-X for.
+ *
+ * Context: Process context.
+ */
+static void ne_teardown_msix(struct pci_dev *pdev)
+{
+   pci_free_irq_vectors(pdev);
+}
+
+/**
+ * ne_pci_dev_enable() - Select the PCI device version and enable it.
+ * @pdev:  PCI device to select version for and then enable.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_pci_dev_enable(struct pci_dev *pdev)
+{
+   u8 dev_enable_reply = 0;
+   u16 dev_version_reply = 0;
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+   iowrite16(NE_VERSION_MAX, ne_pci_dev->iomem_base + NE_VERSION);
+
+   dev_version_reply = ioread16(ne_pci_dev->iomem_base + NE_VERSION);
+   if (dev_version_reply != NE_VERSION_MAX) {
+   dev_err(>dev, "Error in pci dev version cmd\n");
+
+   return -EIO;
+   }
+
+   iowrite8(NE_ENABLE_ON, ne_pci_dev->iomem_base + NE_ENABLE);
+
+   dev_enable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
+   if (dev_enable_reply != NE_ENABLE_ON) {
+   dev_err(>dev, "Error in pci dev enable cmd\n");
+
+

[PATCH v10 08/18] nitro_enclaves: Add logic for creating an enclave VM

2020-09-21 Thread Andra Paraschiv
Add ioctl command logic for enclave VM creation. It triggers a slot
allocation. The enclave resources will be associated with this slot and
it will be used as an identifier for triggering enclave run.

Return a file descriptor, namely enclave fd. This is further used by the
associated user space enclave process to set enclave resources and
trigger enclave termination.

The poll function is implemented in order to notify the enclave process
when an enclave exits without a specific enclave termination command
trigger e.g. when an enclave crashes.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* Use the ne_devs data structure to get the refs for the NE PCI device.

v7 -> v8

* No changes.

v6 -> v7

* Use the NE misc device parent field to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Update the code base to init the ioctl function in this patch.
* Update documentation to kernel-doc format.

v4 -> v5

* Release the reference to the NE PCI device on create VM error.
* Close enclave fd on copy_to_user() failure; rename fd to enclave fd
  while at it.
* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.
* Remove log on copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.
* Add metadata for the NUMA node for the enclave memory and CPUs.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 223 ++
 1 file changed, 223 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index c06825070313..3515b163ad0e 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -60,9 +60,12 @@
  */
 #define NE_PARENT_VM_CID   (3)
 
+static long ne_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+
 static const struct file_operations ne_fops = {
.owner  = THIS_MODULE,
.llseek = noop_llseek,
+   .unlocked_ioctl = ne_ioctl,
 };
 
 static struct miscdevice ne_misc_dev = {
@@ -119,6 +122,226 @@ struct ne_cpu_pool {
 
 static struct ne_cpu_pool ne_cpu_pool;
 
+/**
+ * ne_enclave_poll() - Poll functionality used for enclave out-of-band events.
+ * @file:  File associated with this poll function.
+ * @wait:  Poll table data structure.
+ *
+ * Context: Process context.
+ * Return:
+ * * Poll mask.
+ */
+static __poll_t ne_enclave_poll(struct file *file, poll_table *wait)
+{
+   __poll_t mask = 0;
+   struct ne_enclave *ne_enclave = file->private_data;
+
+   poll_wait(file, _enclave->eventq, wait);
+
+   if (!ne_enclave->has_event)
+   return mask;
+
+   mask = POLLHUP;
+
+   return mask;
+}
+
+static const struct file_operations ne_enclave_fops = {
+   .owner  = THIS_MODULE,
+   .llseek = noop_llseek,
+   .poll   = ne_enclave_poll,
+};
+
+/**
+ * ne_create_vm_ioctl() - Alloc slot to be associated with an enclave. Create
+ *   enclave file descriptor to be further used for enclave
+ *   resources handling e.g. memory regions and CPUs.
+ * @ne_pci_dev :   Private data associated with the PCI device.
+ * @slot_uid:  Generated unique slot id associated with an enclave.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev 
enclave
+ * mutex held.
+ * Return:
+ * * Enclave fd on success.
+ * * Negative return value on failure.
+ */
+static int ne_create_vm_ioctl(struct ne_pci_dev *ne_pci_dev, u64 *slot_uid)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   int enclave_fd = -1;
+   struct file *enclave_file = NULL;
+   unsigned int i = 0;
+   struct ne_enclave *ne_enclave = NULL;
+   struct pci_dev *pdev = ne_pci_dev->pdev;
+   int rc = -EINVAL;
+   struct slot_alloc_req slot_alloc_req = {};
+
+   mutex_lock(_cpu_pool.mutex);
+
+   for (i = 0; i < ne_cpu_pool.nr_parent_vm_cores; i++)
+   if (!cpumask_empty(ne_cpu_pool.avail_threads_per_core[i]))
+   break;
+
+   if (i == ne_cpu_pool.nr_parent_vm_cores) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "No CPUs available in CPU pool\n");
+
+   mutex_unlock(_cpu_pool.mutex);
+

[PATCH v10 05/18] nitro_enclaves: Handle PCI device command requests

2020-09-21 Thread Andra Paraschiv
The Nitro Enclaves PCI device exposes a MMIO space that this driver
uses to submit command requests and to receive command replies e.g. for
enclave creation / termination or setting enclave resources.

Add logic for handling PCI device command requests based on the given
command type.

Register an MSI-X interrupt vector for command reply notifications to
handle this type of communication events.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* No changes.

v7 -> v8

* Update function signature for submit request and retrive reply
  functions as they only returned 0, no error code.
* Include command type value in the error logs of ne_do_request().

v6 -> v7

* No changes.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.

v1 -> v2

* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add fix for kbuild report:
  https://lore.kernel.org/lkml/202004231644.xtmn4z1z%25...@intel.com/

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 189 +++
 1 file changed, 189 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index 32f07345c3b5..cedc4dd2dd39 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -33,6 +33,172 @@ static const struct pci_device_id ne_pci_ids[] = {
 
 MODULE_DEVICE_TABLE(pci, ne_pci_ids);
 
+/**
+ * ne_submit_request() - Submit command request to the PCI device based on the
+ *  command type.
+ * @pdev:  PCI device to send the command to.
+ * @cmd_type:  Command type of the request sent to the PCI device.
+ * @cmd_request:   Command request payload.
+ * @cmd_request_size:  Size of the command request payload.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev mutex 
held.
+ */
+static void ne_submit_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type 
cmd_type,
+ void *cmd_request, size_t cmd_request_size)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+   memcpy_toio(ne_pci_dev->iomem_base + NE_SEND_DATA, cmd_request, 
cmd_request_size);
+
+   iowrite32(cmd_type, ne_pci_dev->iomem_base + NE_COMMAND);
+}
+
+/**
+ * ne_retrieve_reply() - Retrieve reply from the PCI device.
+ * @pdev:  PCI device to receive the reply from.
+ * @cmd_reply: Command reply payload.
+ * @cmd_reply_size:Size of the command reply payload.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev mutex 
held.
+ */
+static void ne_retrieve_reply(struct pci_dev *pdev, struct 
ne_pci_dev_cmd_reply *cmd_reply,
+ size_t cmd_reply_size)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+   memcpy_fromio(cmd_reply, ne_pci_dev->iomem_base + NE_RECV_DATA, 
cmd_reply_size);
+}
+
+/**
+ * ne_wait_for_reply() - Wait for a reply of a PCI device command.
+ * @pdev:  PCI device for which a reply is waited.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_wait_for_reply(struct pci_dev *pdev)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+   int rc = -EINVAL;
+
+   /*
+* TODO: Update to _interruptible and handle interrupted wait event
+* e.g. -ERESTARTSYS, incoming signals + update timeout, if needed.
+*/
+   rc = wait_event_timeout(ne_pci_dev->cmd_reply_wait_q,
+   atomic_read(_pci_dev->cmd_reply_avail) != 0,
+   msecs_to_jiffies(NE_DEFAULT_TIMEOUT_MSECS));
+   if (!rc)
+   return -ETIMEDOUT;
+
+   return 0;
+}
+
+int ne_do_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type cmd_type,
+ void *cmd_request, size_t cmd_request_size,
+ struct ne_pci_dev_cmd_reply *cmd_reply, size_t cmd_reply_size)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+   int rc = -EINVAL;
+
+   if (cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD) {
+   dev_err_ratelimited(>dev, "Invalid cmd type=%u\n", 
cmd_type);
+
+   return -EINVAL;
+   }
+
+   if (!cmd_request) {
+  

[PATCH v10 02/18] nitro_enclaves: Define the PCI device interface

2020-09-21 Thread Andra Paraschiv
The Nitro Enclaves (NE) driver communicates with a new PCI device, that
is exposed to a virtual machine (VM) and handles commands meant for
handling enclaves lifetime e.g. creation, termination, setting memory
regions. The communication with the PCI device is handled using a MMIO
space and MSI-X interrupts.

This device communicates with the hypervisor on the host, where the VM
that spawned the enclave itself runs, e.g. to launch a VM that is used
for the enclave.

Define the MMIO space of the NE PCI device, the commands that are
provided by this device. Add an internal data structure used as private
data for the PCI device driver and the function for the PCI device
command requests handling.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* Fix indent for the NE PCI device command types enum.

v7 -> v8

* No changes.

v6 -> v7

* Update the documentation to include references to the NE PCI device id
  and MMIO bar.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Add a TODO for including flags in the request to the NE PCI device to
  set a memory region for an enclave. It is not used for now.

v3 -> v4

* Remove the "packed" attribute and include padding in the NE data
  structures.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path naming to drivers/virt/nitro_enclaves.
* Update NE_ENABLE_OFF / NE_ENABLE_ON defines.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Alexandru Ciobotaru 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
 drivers/virt/nitro_enclaves/ne_pci_dev.h | 327 +++
 1 file changed, 327 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.h

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.h 
b/drivers/virt/nitro_enclaves/ne_pci_dev.h
new file mode 100644
index ..8bfbc6607818
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.h
@@ -0,0 +1,327 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _NE_PCI_DEV_H_
+#define _NE_PCI_DEV_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * DOC: Nitro Enclaves (NE) PCI device
+ */
+
+/**
+ * PCI_DEVICE_ID_NE - Nitro Enclaves PCI device id.
+ */
+#define PCI_DEVICE_ID_NE   (0xe4c1)
+/**
+ * PCI_BAR_NE - Nitro Enclaves PCI device MMIO BAR.
+ */
+#define PCI_BAR_NE (0x03)
+
+/**
+ * DOC: Device registers in the NE PCI device MMIO BAR
+ */
+
+/**
+ * NE_ENABLE - (1 byte) Register to notify the device that the driver is using
+ *it (Read/Write).
+ */
+#define NE_ENABLE  (0x)
+#define NE_ENABLE_OFF  (0x00)
+#define NE_ENABLE_ON   (0x01)
+
+/**
+ * NE_VERSION - (2 bytes) Register to select the device run-time version
+ * (Read/Write).
+ */
+#define NE_VERSION (0x0002)
+#define NE_VERSION_MAX (0x0001)
+
+/**
+ * NE_COMMAND - (4 bytes) Register to notify the device what command was
+ * requested (Write-Only).
+ */
+#define NE_COMMAND (0x0004)
+
+/**
+ * NE_EVTCNT - (4 bytes) Register to notify the driver that a reply or a device
+ *event is available (Read-Only):
+ *- Lower half  - command reply counter
+ *- Higher half - out-of-band device event counter
+ */
+#define NE_EVTCNT  (0x000c)
+#define NE_EVTCNT_REPLY_SHIFT  (0)
+#define NE_EVTCNT_REPLY_MASK   (0x)
+#define NE_EVTCNT_REPLY(cnt)   (((cnt) & NE_EVTCNT_REPLY_MASK) >> \
+   NE_EVTCNT_REPLY_SHIFT)
+#define NE_EVTCNT_EVENT_SHIFT  (16)
+#define NE_EVTCNT_EVENT_MASK   (0x)
+#define NE_EVTCNT_EVENT(cnt)   (((cnt) & NE_EVTCNT_EVENT_MASK) >> \
+   NE_EVTCNT_EVENT_SHIFT)
+
+/**
+ * NE_SEND_DATA - (240 bytes) Buffer for sending the command request payload
+ *   (Read/Write).
+ */
+#define NE_SEND_DATA   (0x0010)
+
+/**
+ * NE_RECV_DATA - (240 bytes) Buffer for receiving the command reply payload
+ *   (Read-Only).
+ */
+#define NE_RECV_DATA   (0x0100)
+
+/**
+ * DOC: Device MMIO buffer sizes
+ */
+
+/**
+ * NE_SEND_DATA_SIZE / NE_RECV_DATA_SIZE - 240 bytes for send / recv buffer.
+ */
+#define NE_SEND_DATA_SIZE  (240)
+#define NE_RECV_DATA_SIZE  (240)
+
+/**
+ * DOC: MSI-X interrupt vectors
+ */
+
+/**
+ * NE_VEC_REPLY - MSI-X vector used for command reply notification.
+ */
+#define NE_VEC_REPLY   (0)
+
+/**
+ * NE_VEC_EVENT - MSI-X vector used for out-of-band events e.g. enclave crash.
+ */
+#define NE_VEC_EVENT   (1)
+
+/**
+ * enum ne_pci_dev_cmd_type - Device command types.
+ * @INVALID_CMD:   Invalid command.
+ * @ENCLAVE_START: Start an enclave, after setting its resources.
+ * @

[PATCH v10 00/18] Add support for Nitro Enclaves

2020-09-21 Thread Andra Paraschiv
 ioctl call paths.
* Update static calls sanity checks.
* Remove file ops that do nothing for now.
* Remove GPL additional wording as SPDX-License-Identifier is already in place.
* v2: https://lore.kernel.org/lkml/20200522062946.28973-1-andra...@amazon.com/

v1 -> v2

* Rebase on top of v5.7-rc6.
* Adapt codebase based on feedback from v1.
* Update ioctl number definition - major and minor.
* Add sample / documentation for the ioctl interface basic flow usage.
* Update cover letter to include more context on the NE overall.
* Add fix for the enclave / vcpu fd creation error cleanup path.
* Add fix reported by kbuild test robot .
* v1: https://lore.kernel.org/lkml/20200421184150.68011-1-andra...@amazon.com/

---

Andra Paraschiv (18):
  nitro_enclaves: Add ioctl interface definition
  nitro_enclaves: Define the PCI device interface
  nitro_enclaves: Define enclave info for internal bookkeeping
  nitro_enclaves: Init PCI device driver
  nitro_enclaves: Handle PCI device command requests
  nitro_enclaves: Handle out-of-band PCI device events
  nitro_enclaves: Init misc device providing the ioctl interface
  nitro_enclaves: Add logic for creating an enclave VM
  nitro_enclaves: Add logic for setting an enclave vCPU
  nitro_enclaves: Add logic for getting the enclave image load info
  nitro_enclaves: Add logic for setting an enclave memory region
  nitro_enclaves: Add logic for starting an enclave
  nitro_enclaves: Add logic for terminating an enclave
  nitro_enclaves: Add Kconfig for the Nitro Enclaves driver
  nitro_enclaves: Add Makefile for the Nitro Enclaves driver
  nitro_enclaves: Add sample for ioctl interface usage
  nitro_enclaves: Add overview documentation
  MAINTAINERS: Add entry for the Nitro Enclaves driver

 .../userspace-api/ioctl/ioctl-number.rst  |5 +-
 Documentation/virt/index.rst  |1 +
 Documentation/virt/ne_overview.rst|   95 +
 MAINTAINERS   |   13 +
 drivers/virt/Kconfig  |2 +
 drivers/virt/Makefile |2 +
 drivers/virt/nitro_enclaves/Kconfig   |   20 +
 drivers/virt/nitro_enclaves/Makefile  |9 +
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 1733 +
 drivers/virt/nitro_enclaves/ne_misc_dev.h |  109 ++
 drivers/virt/nitro_enclaves/ne_pci_dev.c  |  625 ++
 drivers/virt/nitro_enclaves/ne_pci_dev.h  |  327 
 include/linux/nitro_enclaves.h|   11 +
 include/uapi/linux/nitro_enclaves.h   |  359 
 samples/nitro_enclaves/.gitignore |2 +
 samples/nitro_enclaves/Makefile   |   16 +
 samples/nitro_enclaves/ne_ioctl_sample.c  |  883 +
 17 files changed, 4211 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/virt/ne_overview.rst
 create mode 100644 drivers/virt/nitro_enclaves/Kconfig
 create mode 100644 drivers/virt/nitro_enclaves/Makefile
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.h
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.c
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.h
 create mode 100644 include/linux/nitro_enclaves.h
 create mode 100644 include/uapi/linux/nitro_enclaves.h
 create mode 100644 samples/nitro_enclaves/.gitignore
 create mode 100644 samples/nitro_enclaves/Makefile
 create mode 100644 samples/nitro_enclaves/ne_ioctl_sample.c

-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v10 01/18] nitro_enclaves: Add ioctl interface definition

2020-09-21 Thread Andra Paraschiv
The Nitro Enclaves driver handles the enclave lifetime management. This
includes enclave creation, termination and setting up its resources such
as memory and CPU.

An enclave runs alongside the VM that spawned it. It is abstracted as a
process running in the VM that launched it. The process interacts with
the NE driver, that exposes an ioctl interface for creating an enclave
and setting up its resources.

Changelog

v9 -> v10

* Update commit message to include the changelog before the SoB tag(s).

v8 -> v9

* No changes.

v7 -> v8

* Add NE custom error codes for user space memory regions not backed by
  pages multiple of 2 MiB, invalid flags and enclave CID.
* Add max flag value for enclave image load info.

v6 -> v7

* Clarify in the ioctls documentation that the return value is -1 and
  errno is set on failure.
* Update the error code value for NE_ERR_INVALID_MEM_REGION_SIZE as it
  gets in user space as value 25 (ENOTTY) instead of 515. Update the
  NE custom error codes values range to not be the same as the ones
  defined in include/linux/errno.h, although these are not propagated
  to user space.

v5 -> v6

* Fix typo in the description about the NE CPU pool.
* Update documentation to kernel-doc format.
* Remove the ioctl to query API version.

v4 -> v5

* Add more details about the ioctl calls usage e.g. error codes, file
  descriptors used.
* Update the ioctl to set an enclave vCPU to not return a file
  descriptor.
* Add specific NE error codes.

v3 -> v4

* Decouple NE ioctl interface from KVM API.
* Add NE API version and the corresponding ioctl call.
* Add enclave / image load flags options.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Add ioctl for getting enclave image load metadata.
* Update NE_ENCLAVE_START ioctl name to NE_START_ENCLAVE.
* Add entry in Documentation/userspace-api/ioctl/ioctl-number.rst for NE
  ioctls.
* Update NE ioctls definition based on the updated ioctl range for major
  and minor.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
Reviewed-by: Stefan Hajnoczi 
---
 .../userspace-api/ioctl/ioctl-number.rst  |   5 +-
 include/linux/nitro_enclaves.h|  11 +
 include/uapi/linux/nitro_enclaves.h   | 359 ++
 3 files changed, 374 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/nitro_enclaves.h
 create mode 100644 include/uapi/linux/nitro_enclaves.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 2a198838fca9..5f7ff00f394e 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -328,8 +328,11 @@ Code  Seq#Include File 
  Comments
 0xAC  00-1F  linux/raw.h
 0xAD  00 Netfilter 
device in development:
  
<mailto:ru...@rustcorp.com.au>
-0xAE  alllinux/kvm.h 
Kernel-based Virtual Machine
+0xAE  00-1F  linux/kvm.h 
Kernel-based Virtual Machine
  
<mailto:k...@vger.kernel.org>
+0xAE  40-FF  linux/kvm.h 
Kernel-based Virtual Machine
+ 
<mailto:k...@vger.kernel.org>
+0xAE  20-3F  linux/nitro_enclaves.h  Nitro 
Enclaves
 0xAF  00-1F  linux/fsl_hypervisor.h  Freescale 
hypervisor
 0xB0  allRATIO 
devices in development:
  
<mailto:v...@ratio.de>
diff --git a/include/linux/nitro_enclaves.h b/include/linux/nitro_enclaves.h
new file mode 100644
index ..d91ef2bfdf47
--- /dev/null
+++ b/include/linux/nitro_enclaves.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _LINUX_NITRO_ENCLAVES_H_
+#define _LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+#endif /* _LINUX_NITRO_ENCLAVES_H_ */
diff --git a/include/uapi/linux/nitro_enclaves.h 
b/include/uapi/linux/nitro_enclaves.h
new file mode 100644
index ..b945073fe544
--- /dev/null
+++ b/include/uapi/linux/nitro_enclaves.h
@@ -0,0 +1,359 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
+#define _UAPI_LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+/**
+ * DOC: Nitro Enclaves (NE) Kernel Driver Interface
+ */
+
+/

[PATCH v8 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver

2020-09-04 Thread Andra Paraschiv
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* Remove, for now, the dependency on ARM64 arch. x86 is currently
  supported, with Arm to come afterwards. The NE kernel driver can be
  built for aarch64 arch.

v5 -> v6

* No changes.

v4 -> v5

* Add arch dependency for Arm / x86.

v3 -> v4

* Add PCI and SMP dependencies.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path to Kconfig to match the drivers/virt/nitro_enclaves
  directory.
* Update help in Kconfig.
---
 drivers/virt/Kconfig|  2 ++
 drivers/virt/nitro_enclaves/Kconfig | 20 
 2 files changed, 22 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/Kconfig

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index cbc1f25c79ab..80c5f9c16ec1 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -32,4 +32,6 @@ config FSL_HV_MANAGER
 partition shuts down.
 
 source "drivers/virt/vboxguest/Kconfig"
+
+source "drivers/virt/nitro_enclaves/Kconfig"
 endif
diff --git a/drivers/virt/nitro_enclaves/Kconfig 
b/drivers/virt/nitro_enclaves/Kconfig
new file mode 100644
index ..8c9387a232df
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Kconfig
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Amazon Nitro Enclaves (NE) support.
+# Nitro is a hypervisor that has been developed by Amazon.
+
+# TODO: Add dependency for ARM64 once NE is supported on Arm platforms. For 
now,
+# the NE kernel driver can be built for aarch64 arch.
+# depends on (ARM64 || X86) && HOTPLUG_CPU && PCI && SMP
+
+config NITRO_ENCLAVES
+   tristate "Nitro Enclaves Support"
+   depends on X86 && HOTPLUG_CPU && PCI && SMP
+   help
+ This driver consists of support for enclave lifetime management
+ for Nitro Enclaves (NE).
+
+ To compile this driver as a module, choose M here.
+ The module will be called nitro_enclaves.
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v8 10/18] nitro_enclaves: Add logic for getting the enclave image load info

2020-09-04 Thread Andra Paraschiv
Before setting the memory regions for the enclave, the enclave image
needs to be placed in memory. After the memory regions are set, this
memory cannot be used anymore by the VM, being carved out.

Add ioctl command logic to get the offset in enclave memory where to
place the enclave image. Then the user space tooling copies the enclave
image in the memory using the given memory offset.

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* Add custom error code for incorrect enclave image load info flag.

v6 -> v7

* No changes.

v5 -> v6

* Check for invalid enclave image load flags.

v4 -> v5

* Check for the enclave not being started when invoking this ioctl call.
* Remove log on copy_from_user() / copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Set enclave image load offset based on flags.
* Update the naming for the ioctl command from metadata to info.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 36 +++
 1 file changed, 36 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 0477b11bf15d..0248db07fd6a 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -795,6 +795,42 @@ static long ne_enclave_ioctl(struct file *file, unsigned 
int cmd, unsigned long
return 0;
}
 
+   case NE_GET_IMAGE_LOAD_INFO: {
+   struct ne_image_load_info image_load_info = {};
+
+   if (copy_from_user(_load_info, (void __user *)arg, 
sizeof(image_load_info)))
+   return -EFAULT;
+
+   mutex_lock(_enclave->enclave_info_mutex);
+
+   if (ne_enclave->state != NE_STATE_INIT) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave is not in init state\n");
+
+   mutex_unlock(_enclave->enclave_info_mutex);
+
+   return -NE_ERR_NOT_IN_INIT_STATE;
+   }
+
+   mutex_unlock(_enclave->enclave_info_mutex);
+
+   if (!image_load_info.flags ||
+   image_load_info.flags >= NE_IMAGE_LOAD_MAX_FLAG_VAL) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Incorrect flag in enclave image 
load info\n");
+
+   return -NE_ERR_INVALID_FLAG_VALUE;
+   }
+
+   if (image_load_info.flags == NE_EIF_IMAGE)
+   image_load_info.memory_offset = NE_EIF_LOAD_OFFSET;
+
+   if (copy_to_user((void __user *)arg, _load_info, 
sizeof(image_load_info)))
+   return -EFAULT;
+
+   return 0;
+   }
+
default:
return -ENOTTY;
}
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v8 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves driver

2020-09-04 Thread Andra Paraschiv
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* No changes.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path to Makefile to match the drivers/virt/nitro_enclaves
  directory.
---
 drivers/virt/Makefile|  2 ++
 drivers/virt/nitro_enclaves/Makefile | 11 +++
 2 files changed, 13 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/Makefile

diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index fd331247c27a..f28425ce4b39 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -5,3 +5,5 @@
 
 obj-$(CONFIG_FSL_HV_MANAGER)   += fsl_hypervisor.o
 obj-y  += vboxguest/
+
+obj-$(CONFIG_NITRO_ENCLAVES)   += nitro_enclaves/
diff --git a/drivers/virt/nitro_enclaves/Makefile 
b/drivers/virt/nitro_enclaves/Makefile
new file mode 100644
index ..e9f4fcd1591e
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Enclave lifetime management support for Nitro Enclaves (NE).
+
+obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves.o
+
+nitro_enclaves-y := ne_pci_dev.o ne_misc_dev.o
+
+ccflags-y += -Wall
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v8 18/18] MAINTAINERS: Add entry for the Nitro Enclaves driver

2020-09-04 Thread Andra Paraschiv
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* No changes.

v2 -> v3

* Update file entries to be in alphabetical order.

v1 -> v2

* No changes.
---
 MAINTAINERS | 13 +
 1 file changed, 13 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e4647c84c987..73a9b4e9b04b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12269,6 +12269,19 @@ S: Maintained
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2.git
 F: arch/nios2/
 
+NITRO ENCLAVES (NE)
+M: Andra Paraschiv 
+M: Alexandru Vasile 
+M: Alexandru Ciobotaru 
+L: linux-kernel@vger.kernel.org
+S: Supported
+W: https://aws.amazon.com/ec2/nitro/nitro-enclaves/
+F: Documentation/nitro_enclaves/
+F: drivers/virt/nitro_enclaves/
+F: include/linux/nitro_enclaves.h
+F: include/uapi/linux/nitro_enclaves.h
+F: samples/nitro_enclaves/
+
 NOHZ, DYNTICKS SUPPORT
 M: Frederic Weisbecker 
 M: Thomas Gleixner 
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v8 13/18] nitro_enclaves: Add logic for terminating an enclave

2020-09-04 Thread Andra Paraschiv
An enclave is associated with an fd that is returned after the enclave
creation logic is completed. This enclave fd is further used to setup
enclave resources. Once the enclave needs to be terminated, the enclave
fd is closed.

Add logic for enclave termination, that is mapped to the enclave fd
release callback. Free the internal enclave info used for bookkeeping.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* Remove the pci_dev_put() call as the NE misc device parent field is
  used now to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Update documentation to kernel-doc format.
* Use directly put_page() instead of unpin_user_pages(), to match the
  get_user_pages() calls.

v4 -> v5

* Release the reference to the NE PCI device on enclave fd release.
* Adapt the logic to cpumask enclave vCPU ids and CPU cores.
* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add early exit in release() if there was a slot alloc error in the fd
  creation path.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 166 ++
 1 file changed, 166 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 5ec7fdf9d08e..1fb194d3ab62 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -1309,6 +1309,171 @@ static long ne_enclave_ioctl(struct file *file, 
unsigned int cmd, unsigned long
return 0;
 }
 
+/**
+ * ne_enclave_remove_all_mem_region_entries() - Remove all memory region 
entries
+ * from the enclave data structure.
+ * @ne_enclave :   Private data associated with the current enclave.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ */
+static void ne_enclave_remove_all_mem_region_entries(struct ne_enclave 
*ne_enclave)
+{
+   unsigned long i = 0;
+   struct ne_mem_region *ne_mem_region = NULL;
+   struct ne_mem_region *ne_mem_region_tmp = NULL;
+
+   list_for_each_entry_safe(ne_mem_region, ne_mem_region_tmp,
+_enclave->mem_regions_list,
+mem_region_list_entry) {
+   list_del(_mem_region->mem_region_list_entry);
+
+   for (i = 0; i < ne_mem_region->nr_pages; i++)
+   put_page(ne_mem_region->pages[i]);
+
+   kfree(ne_mem_region->pages);
+
+   kfree(ne_mem_region);
+   }
+}
+
+/**
+ * ne_enclave_remove_all_vcpu_id_entries() - Remove all vCPU id entries from
+ *  the enclave data structure.
+ * @ne_enclave :   Private data associated with the current enclave.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ */
+static void ne_enclave_remove_all_vcpu_id_entries(struct ne_enclave 
*ne_enclave)
+{
+   unsigned int cpu = 0;
+   unsigned int i = 0;
+
+   mutex_lock(_cpu_pool.mutex);
+
+   for (i = 0; i < ne_enclave->nr_parent_vm_cores; i++) {
+   for_each_cpu(cpu, ne_enclave->threads_per_core[i])
+   /* Update the available NE CPU pool. */
+   cpumask_set_cpu(cpu, 
ne_cpu_pool.avail_threads_per_core[i]);
+
+   free_cpumask_var(ne_enclave->threads_per_core[i]);
+   }
+
+   mutex_unlock(_cpu_pool.mutex);
+
+   kfree(ne_enclave->threads_per_core);
+
+   free_cpumask_var(ne_enclave->vcpu_ids);
+}
+
+/**
+ * ne_pci_dev_remove_enclave_entry() - Remove the enclave entry from the data
+ *structure that is part of the NE PCI
+ *device private data.
+ * @ne_enclave :   Private data associated with the current enclave.
+ * @ne_pci_dev :   Private data associated with the PCI device.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev 
enclave
+ * mutex held.
+ */
+static void ne_pci_dev_remove_enclave_entry(struct ne_enclave *ne_enclave,
+   struct ne_pci_dev *ne_pci_dev)
+{
+   struct ne_enclave *ne_enclave_entry = NULL;
+   struct ne_enclave *ne_enclave_entry_tmp = NULL;
+
+   list_for_each_entry_safe(ne_enclave_entry, ne_enclave_entry_tmp,
+_pci_dev->enclaves_list, 
enclave_

[PATCH v8 17/18] nitro_enclaves: Add overview documentation

2020-09-04 Thread Andra Paraschiv
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* Add info about the primary / parent VM CID value.
* Update reference link for huge pages.
* Add reference link for the x86 boot protocol.
* Add license mention and update doc title / chapter formatting.

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* Update doc type from .txt to .rst.
* Update documentation based on the changes from v4.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.
---
 Documentation/nitro_enclaves/ne_overview.rst | 95 
 1 file changed, 95 insertions(+)
 create mode 100644 Documentation/nitro_enclaves/ne_overview.rst

diff --git a/Documentation/nitro_enclaves/ne_overview.rst 
b/Documentation/nitro_enclaves/ne_overview.rst
new file mode 100644
index ..39b0c8fe2654
--- /dev/null
+++ b/Documentation/nitro_enclaves/ne_overview.rst
@@ -0,0 +1,95 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==
+Nitro Enclaves
+==
+
+Overview
+
+
+Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
+that allows customers to carve out isolated compute environments within EC2
+instances [1].
+
+For example, an application that processes sensitive data and runs in a VM,
+can be separated from other applications running in the same VM. This
+application then runs in a separate VM than the primary VM, namely an enclave.
+
+An enclave runs alongside the VM that spawned it. This setup matches low 
latency
+applications needs. The resources that are allocated for the enclave, such as
+memory and CPUs, are carved out of the primary VM. Each enclave is mapped to a
+process running in the primary VM, that communicates with the NE driver via an
+ioctl interface.
+
+In this sense, there are two components:
+
+1. An enclave abstraction process - a user space process running in the primary
+VM guest that uses the provided ioctl interface of the NE driver to spawn an
+enclave VM (that's 2 below).
+
+There is a NE emulated PCI device exposed to the primary VM. The driver for 
this
+new PCI device is included in the NE driver.
+
+The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE 
ioctl
+maps to an enclave start PCI command. The PCI device commands are then
+translated into  actions taken on the hypervisor side; that's the Nitro
+hypervisor running on the host where the primary VM is running. The Nitro
+hypervisor is based on core KVM technology.
+
+2. The enclave itself - a VM running on the same host as the primary VM that
+spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
+for the enclave VM. An enclave does not have persistent storage attached.
+
+The memory regions carved out of the primary VM and given to an enclave need to
+be aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
+this size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
+user space [2][3]. The memory size for an enclave needs to be at least 64 MiB.
+The enclave memory and CPUs need to be from the same NUMA node.
+
+An enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
+available for the primary VM. A CPU pool has to be set for NE purposes by an
+user with admin capability. See the cpu list section from the kernel
+documentation [4] for how a CPU pool format looks.
+
+An enclave communicates with the primary VM via a local communication channel,
+using virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
+while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
+uses eventfd for signaling. The enclave VM sees the usual interfaces - local
+APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
+device is placed in memory below the typical 4 GiB.
+
+The application that runs in the enclave needs to be packaged in an enclave
+image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
+enclave VM. The enclave VM has its own kernel and follows the standard Linux
+boot protocol [6].
+
+The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
+Enclave Image Format (EIF); plus an EIF header including metadata such as magic
+number, eif version, image size and CRC.
+
+Hash values are computed for the entire enclave image (EIF), the kernel and
+ramdisk(s). That's used, for example, to check that the enclave image that is
+loaded in the enclave VM is the one that was intended to be run.
+
+These crypto measurements are included in a signed attestation document
+generated by the Nitro Hypervisor and further used to prove the identity of the
+enclave; KMS is an example of service that NE is integrated with and that 
checks
+the attestation doc.
+
+The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
+init process in the enclave connects to the vsock CID of the primary VM and a
+predefined 

[PATCH v8 16/18] nitro_enclaves: Add sample for ioctl interface usage

2020-09-04 Thread Andra Paraschiv
Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* Track NE custom error codes for invalid page size, invalid flags and
  enclave CID.
* Update the heartbeat logic to have a listener fd first, then start the
  enclave and then accept connection to get the heartbeat.
* Update the reference link to the hugetlb documentation.

v6 -> v7

* Track POLLNVAL as poll event in addition to POLLHUP.

v5 -> v6

* Remove "rc" mentioning when printing errno string.
* Remove the ioctl to query API version.
* Include usage info for NUMA-aware hugetlb configuration.
* Update documentation to kernel-doc format.
* Add logic for enclave image loading.

v4 -> v5

* Print enclave vCPU ids when they are created.
* Update logic to map the modified vCPU ioctl call.
* Add check for the path to the enclave image to be less than PATH_MAX.
* Update the ioctl calls error checking logic to match the NE specific
  error codes.

v3 -> v4

* Update usage details to match the updates in v4.
* Update NE ioctl interface usage.

v2 -> v3

* Remove the include directory to use the uapi from the kernel.
* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* New in v2.
---
 samples/nitro_enclaves/.gitignore|   2 +
 samples/nitro_enclaves/Makefile  |  16 +
 samples/nitro_enclaves/ne_ioctl_sample.c | 883 +++
 3 files changed, 901 insertions(+)
 create mode 100644 samples/nitro_enclaves/.gitignore
 create mode 100644 samples/nitro_enclaves/Makefile
 create mode 100644 samples/nitro_enclaves/ne_ioctl_sample.c

diff --git a/samples/nitro_enclaves/.gitignore 
b/samples/nitro_enclaves/.gitignore
new file mode 100644
index ..827934129c90
--- /dev/null
+++ b/samples/nitro_enclaves/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+ne_ioctl_sample
diff --git a/samples/nitro_enclaves/Makefile b/samples/nitro_enclaves/Makefile
new file mode 100644
index ..a3ec78fefb52
--- /dev/null
+++ b/samples/nitro_enclaves/Makefile
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Enclave lifetime management support for Nitro Enclaves (NE) - ioctl sample
+# usage.
+
+.PHONY: all clean
+
+CFLAGS += -Wall
+
+all:
+   $(CC) $(CFLAGS) -o ne_ioctl_sample ne_ioctl_sample.c -lpthread
+
+clean:
+   rm -f ne_ioctl_sample
diff --git a/samples/nitro_enclaves/ne_ioctl_sample.c 
b/samples/nitro_enclaves/ne_ioctl_sample.c
new file mode 100644
index ..480b763142b3
--- /dev/null
+++ b/samples/nitro_enclaves/ne_ioctl_sample.c
@@ -0,0 +1,883 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * DOC: Sample flow of using the ioctl interface provided by the Nitro 
Enclaves (NE)
+ * kernel driver.
+ *
+ * Usage
+ * -
+ *
+ * Load the nitro_enclaves module, setting also the enclave CPU pool. The
+ * enclave CPUs need to be full cores from the same NUMA node. CPU 0 and its
+ * siblings have to remain available for the primary / parent VM, so they
+ * cannot be included in the enclave CPU pool.
+ *
+ * See the cpu list section from the kernel documentation.
+ * 
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html#cpu-lists
+ *
+ * insmod drivers/virt/nitro_enclaves/nitro_enclaves.ko
+ * lsmod
+ *
+ * The CPU pool can be set at runtime, after the kernel module is loaded.
+ *
+ * echo  > /sys/module/nitro_enclaves/parameters/ne_cpus
+ *
+ * NUMA and CPU siblings information can be found using:
+ *
+ * lscpu
+ * /proc/cpuinfo
+ *
+ * Check the online / offline CPU list. The CPUs from the pool should be
+ * offlined.
+ *
+ * lscpu
+ *
+ * Check dmesg for any warnings / errors through the NE driver lifetime / 
usage.
+ * The NE logs contain the "nitro_enclaves" or "pci :00:02.0" pattern.
+ *
+ * dmesg
+ *
+ * Setup hugetlbfs huge pages. The memory needs to be from the same NUMA node 
as
+ * the enclave CPUs.
+ *
+ * https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html
+ *
+ * By default, the allocation of hugetlb pages are distributed on all possible
+ * NUMA nodes. Use the following configuration files to set the number of huge
+ * pages from a NUMA node:
+ *
+ * /sys/devices/system/node/node/hugepages/hugepages-2048kB/nr_hugepages
+ * 
/sys/devices/system/node/node/hugepages/hugepages-1048576kB/nr_hugepages
+ *
+ * or, if not on a system with multiple NUMA nodes, can also set the number
+ * of 2 MiB / 1 GiB huge pages using
+ *
+ * /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
+ * /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
+ *
+ * In this example 256 hugepages of 2 MiB are used.
+ *
+ * Build and run the NE sample.
+ *
+ * make -

[PATCH v8 12/18] nitro_enclaves: Add logic for starting an enclave

2020-09-04 Thread Andra Paraschiv
After all the enclave resources are set, the enclave is ready for
beginning to run.

Add ioctl command logic for starting an enclave after all its resources,
memory regions and CPUs, have been set.

The enclave start information includes the local channel addressing -
vsock CID - and the flags associated with the enclave.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* Add check for invalid enclave CID value e.g. well-known CIDs and
  parent VM CID.
* Add custom error code for incorrect flag in enclave start info and
  invalid enclave CID.

v6 -> v7

* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Check for invalid enclave start flags.
* Update documentation to kernel-doc format.

v4 -> v5

* Add early exit on enclave start ioctl function call error.
* Move sanity checks in the enclave start ioctl function, outside of the
  switch-case block.
* Remove log on copy_from_user() / copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update the naming for the ioctl command from metadata to info.
* Check for minimum enclave memory size.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.

v1 -> v2

* Add log pattern for NE.
* Check if enclave state is init when starting an enclave.
* Remove the BUG_ON calls.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 155 ++
 1 file changed, 155 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 9912d78c0905..5ec7fdf9d08e 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -989,6 +989,77 @@ static int ne_set_user_memory_region_ioctl(struct 
ne_enclave *ne_enclave,
return rc;
 }
 
+/**
+ * ne_start_enclave_ioctl() - Trigger enclave start after the enclave 
resources,
+ *   such as memory and CPU, have been set.
+ * @ne_enclave :   Private data associated with the current 
enclave.
+ * @enclave_start_info :   Enclave info that includes enclave cid and 
flags.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_start_enclave_ioctl(struct ne_enclave *ne_enclave,
+   struct ne_enclave_start_info *enclave_start_info)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   unsigned int cpu = 0;
+   struct enclave_start_req enclave_start_req = {};
+   unsigned int i = 0;
+   int rc = -EINVAL;
+
+   if (!ne_enclave->nr_mem_regions) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave has no mem regions\n");
+
+   return -NE_ERR_NO_MEM_REGIONS_ADDED;
+   }
+
+   if (ne_enclave->mem_size < NE_MIN_ENCLAVE_MEM_SIZE) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave memory is less than %ld\n",
+   NE_MIN_ENCLAVE_MEM_SIZE);
+
+   return -NE_ERR_ENCLAVE_MEM_MIN_SIZE;
+   }
+
+   if (!ne_enclave->nr_vcpus) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave has no vCPUs\n");
+
+   return -NE_ERR_NO_VCPUS_ADDED;
+   }
+
+   for (i = 0; i < ne_enclave->nr_parent_vm_cores; i++)
+   for_each_cpu(cpu, ne_enclave->threads_per_core[i])
+   if (!cpumask_test_cpu(cpu, ne_enclave->vcpu_ids)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Full CPU cores not 
used\n");
+
+   return -NE_ERR_FULL_CORES_NOT_USED;
+   }
+
+   enclave_start_req.enclave_cid = enclave_start_info->enclave_cid;
+   enclave_start_req.flags = enclave_start_info->flags;
+   enclave_start_req.slot_uid = ne_enclave->slot_uid;
+
+   rc = ne_do_request(ne_enclave->pdev, ENCLAVE_START, _start_req,
+  sizeof(enclave_start_req), _reply, 
sizeof(cmd_reply));
+   if (rc < 0) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Error in enclave start [rc=%d]\n", rc);
+
+   return rc;
+   }
+
+   ne_enclave->state = NE_STATE_RUNNING;
+
+   enclave_start_info->enclave_cid = cmd_reply.enclave_cid;
+
+   return 0;
+}
+
 /**
  * ne_enclave_ioctl() - Ioctl function provided by the enclave file.
  * @file:  File associated with this ioctl function.
@@ -1147,6 +1218,90 @@ static long ne_enclave_ioctl(str

[PATCH v8 11/18] nitro_enclaves: Add logic for setting an enclave memory region

2020-09-04 Thread Andra Paraschiv
Another resource that is being set for an enclave is memory. User space
memory regions, that need to be backed by contiguous memory regions,
are associated with the enclave.

One solution for allocating / reserving contiguous memory regions, that
is used for integration, is hugetlbfs. The user space process that is
associated with the enclave passes to the driver these memory regions.

The enclave memory regions need to be from the same NUMA node as the
enclave CPUs.

Add ioctl command logic for setting user space memory region for an
enclave.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* Add early check, while getting user pages, to be multiple of 2 MiB for
  the pages that back the user space memory region.
* Add custom error code for incorrect user space memory region flag.
* Include in a separate function the sanity checks for each page of the
  user space memory region.

v6 -> v7

* Update check for duplicate user space memory regions to cover
  additional possible scenarios.

v5 -> v6

* Check for max number of pages allocated for the internal data
  structure for pages.
* Check for invalid memory region flags.
* Check for aligned physical memory regions.
* Update documentation to kernel-doc format.
* Check for duplicate user space memory regions.
* Use directly put_page() instead of unpin_user_pages(), to match the
  get_user_pages() calls.

v4 -> v5

* Add early exit on set memory region ioctl function call error.
* Remove log on copy_from_user() failure.
* Exit without unpinning the pages on NE PCI dev request failure as
  memory regions from the user space range may have already been added.
* Add check for the memory region user space address to be 2 MiB
  aligned.
* Update logic to not have a hardcoded check for 2 MiB memory regions.

v3 -> v4

* Check enclave memory regions are from the same NUMA node as the
  enclave CPUs.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave max memory regions is reached when setting an enclave
  memory region.
* Check if enclave state is init when setting an enclave memory region.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 316 ++
 1 file changed, 316 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 0248db07fd6a..9912d78c0905 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -710,6 +710,285 @@ static int ne_add_vcpu_ioctl(struct ne_enclave 
*ne_enclave, u32 vcpu_id)
return 0;
 }
 
+/**
+ * ne_sanity_check_user_mem_region() - Sanity check the user space memory
+ *region received during the set user
+ *memory region ioctl call.
+ * @ne_enclave :   Private data associated with the current enclave.
+ * @mem_region :   User space memory region to be sanity checked.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_sanity_check_user_mem_region(struct ne_enclave *ne_enclave,
+   struct ne_user_memory_region mem_region)
+{
+   struct ne_mem_region *ne_mem_region = NULL;
+
+   if (ne_enclave->mm != current->mm)
+   return -EIO;
+
+   if (mem_region.memory_size & (NE_MIN_MEM_REGION_SIZE - 1)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "User space memory size is not multiple of 
2 MiB\n");
+
+   return -NE_ERR_INVALID_MEM_REGION_SIZE;
+   }
+
+   if (!IS_ALIGNED(mem_region.userspace_addr, NE_MIN_MEM_REGION_SIZE)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "User space address is not 2 MiB 
aligned\n");
+
+   return -NE_ERR_UNALIGNED_MEM_REGION_ADDR;
+   }
+
+   if ((mem_region.userspace_addr & (NE_MIN_MEM_REGION_SIZE - 1)) ||
+   !access_ok((void __user *)(unsigned long)mem_region.userspace_addr,
+  mem_region.memory_size)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Invalid user space address range\n");
+
+   return -NE_ERR_INVALID_MEM_REGION_ADDR;
+   }
+
+   list_for_each_entry(ne_mem_region, _enclave->mem_regions_list,
+   mem_region_list_entry) {
+   u64 memory_size = ne_mem_region->memory_size;
+   u64 userspace_addr

[PATCH v8 08/18] nitro_enclaves: Add logic for creating an enclave VM

2020-09-04 Thread Andra Paraschiv
Add ioctl command logic for enclave VM creation. It triggers a slot
allocation. The enclave resources will be associated with this slot and
it will be used as an identifier for triggering enclave run.

Return a file descriptor, namely enclave fd. This is further used by the
associated user space enclave process to set enclave resources and
trigger enclave termination.

The poll function is implemented in order to notify the enclave process
when an enclave exits without a specific enclave termination command
trigger e.g. when an enclave crashes.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* Use the NE misc device parent field to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Update the code base to init the ioctl function in this patch.
* Update documentation to kernel-doc format.

v4 -> v5

* Release the reference to the NE PCI device on create VM error.
* Close enclave fd on copy_to_user() failure; rename fd to enclave fd
  while at it.
* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.
* Remove log on copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.
* Add metadata for the NUMA node for the enclave memory and CPUs.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 226 ++
 1 file changed, 226 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 6bb05217b593..7ad3f1eb75d4 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -103,9 +103,235 @@ struct ne_cpu_pool {
 
 static struct ne_cpu_pool ne_cpu_pool;
 
+/**
+ * ne_enclave_poll() - Poll functionality used for enclave out-of-band events.
+ * @file:  File associated with this poll function.
+ * @wait:  Poll table data structure.
+ *
+ * Context: Process context.
+ * Return:
+ * * Poll mask.
+ */
+static __poll_t ne_enclave_poll(struct file *file, poll_table *wait)
+{
+   __poll_t mask = 0;
+   struct ne_enclave *ne_enclave = file->private_data;
+
+   poll_wait(file, _enclave->eventq, wait);
+
+   if (!ne_enclave->has_event)
+   return mask;
+
+   mask = POLLHUP;
+
+   return mask;
+}
+
+static const struct file_operations ne_enclave_fops = {
+   .owner  = THIS_MODULE,
+   .llseek = noop_llseek,
+   .poll   = ne_enclave_poll,
+};
+
+/**
+ * ne_create_vm_ioctl() - Alloc slot to be associated with an enclave. Create
+ *   enclave file descriptor to be further used for enclave
+ *   resources handling e.g. memory regions and CPUs.
+ * @pdev:  PCI device used for enclave lifetime management.
+ * @ne_pci_dev :   Private data associated with the PCI device.
+ * @slot_uid:  Generated unique slot id associated with an enclave.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev 
enclave
+ * mutex held.
+ * Return:
+ * * Enclave fd on success.
+ * * Negative return value on failure.
+ */
+static int ne_create_vm_ioctl(struct pci_dev *pdev, struct ne_pci_dev 
*ne_pci_dev,
+ u64 *slot_uid)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   int enclave_fd = -1;
+   struct file *enclave_file = NULL;
+   unsigned int i = 0;
+   struct ne_enclave *ne_enclave = NULL;
+   int rc = -EINVAL;
+   struct slot_alloc_req slot_alloc_req = {};
+
+   mutex_lock(_cpu_pool.mutex);
+
+   for (i = 0; i < ne_cpu_pool.nr_parent_vm_cores; i++)
+   if (!cpumask_empty(ne_cpu_pool.avail_threads_per_core[i]))
+   break;
+
+   if (i == ne_cpu_pool.nr_parent_vm_cores) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "No CPUs available in CPU pool\n");
+
+   mutex_unlock(_cpu_pool.mutex);
+
+   return -NE_ERR_NO_CPUS_AVAIL_IN_POOL;
+   }
+
+   mutex_unlock(_cpu_pool.mutex);
+
+   ne_enclave = kzalloc(sizeof(*ne_enclave), GFP_KERNEL);
+   if (!ne_enclave)
+   return -ENOMEM;
+
+   mutex_lock(_cpu_pool.mutex);
+
+   ne_enclave->nr_parent_vm_cores = ne_cpu_pool.nr_parent_vm_cores;
+   ne_enclave->nr_threads_per_core = ne_cpu_pool.nr_threads_per_core;
+   ne_enclave->numa_

[PATCH v8 09/18] nitro_enclaves: Add logic for setting an enclave vCPU

2020-09-04 Thread Andra Paraschiv
An enclave, before being started, has its resources set. One of its
resources is CPU.

A NE CPU pool is set and enclave CPUs are chosen from it. Offline the
CPUs from the NE CPU pool during the pool setup and online them back
during the NE CPU pool teardown. The CPU offline is necessary so that
there would not be more vCPUs than physical CPUs available to the
primary / parent VM. In that case the CPUs would be overcommitted and
would change the initial configuration of the primary / parent VM of
having dedicated vCPUs to physical CPUs.

The enclave CPUs need to be full cores and from the same NUMA node. CPU
0 and its siblings have to remain available to the primary / parent VM.

Add ioctl command logic for setting an enclave vCPU.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* Check for error return value when setting the kernel parameter string.
* Use the NE misc device parent field to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.
* Calculate the number of threads per core and not use smp_num_siblings
  that is x86 specific.

v5 -> v6

* Check CPUs are from the same NUMA node before going through CPU
  siblings during the NE CPU pool setup.
* Update documentation to kernel-doc format.

v4 -> v5

* Set empty string in case of invalid NE CPU pool.
* Clear NE CPU pool mask on pool setup failure.
* Setup NE CPU cores out of the NE CPU pool.
* Early exit on NE CPU pool setup if enclave(s) already running.
* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.
* Add check for maximum vCPU id possible before looking into the CPU
  pool.
* Remove log on copy_from_user() / copy_to_user() failure and on admin
  capability check for setting the NE CPU pool.
* Update the ioctl call to not create a file descriptor for the vCPU.
* Split the CPU pool usage logic in 2 separate functions - one to get a
  CPU from the pool and the other to check the given CPU is available in
  the pool.

v3 -> v4

* Setup the NE CPU pool at runtime via a sysfs file for the kernel
  parameter.
* Check enclave CPUs to be from the same NUMA node.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open, ioctl and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave state is init when setting enclave vCPU.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 702 ++
 1 file changed, 702 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 7ad3f1eb75d4..0477b11bf15d 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -64,8 +64,11 @@
  * TODO: Update logic to create new sysfs entries instead of using
  * a kernel parameter e.g. if multiple sysfs files needed.
  */
+static int ne_set_kernel_param(const char *val, const struct kernel_param *kp);
+
 static const struct kernel_param_ops ne_cpu_pool_ops = {
.get= param_get_string,
+   .set= ne_set_kernel_param,
 };
 
 static char ne_cpus[NE_CPUS_SIZE];
@@ -103,6 +106,702 @@ struct ne_cpu_pool {
 
 static struct ne_cpu_pool ne_cpu_pool;
 
+/**
+ * ne_check_enclaves_created() - Verify if at least one enclave has been 
created.
+ * @void:  No parameters provided.
+ *
+ * Context: Process context.
+ * Return:
+ * * True if at least one enclave is created.
+ * * False otherwise.
+ */
+static bool ne_check_enclaves_created(void)
+{
+   struct ne_pci_dev *ne_pci_dev = NULL;
+   struct pci_dev *pdev = NULL;
+   bool ret = false;
+
+   if (!ne_misc_dev.parent)
+   return ret;
+
+   pdev = to_pci_dev(ne_misc_dev.parent);
+   if (!pdev)
+   return ret;
+
+   ne_pci_dev = pci_get_drvdata(pdev);
+   if (!ne_pci_dev)
+   return ret;
+
+   mutex_lock(_pci_dev->enclaves_list_mutex);
+
+   if (!list_empty(_pci_dev->enclaves_list))
+   ret = true;
+
+   mutex_unlock(_pci_dev->enclaves_list_mutex);
+
+   return ret;
+}
+
+/**
+ * ne_setup_cpu_pool() - Set the NE CPU pool after handling sanity checks such
+ *  as not sharing CPU cores with the primary / parent VM
+ *  or not using CPU 0, which should remain available for
+ *  the primary / parent VM. Offline the CPUs from the
+ *  pool after the checks passed.
+ * @ne_cpu_list:   The CPU list used for setting NE CPU pool.
+ *
+ * Context: Proces

[PATCH v8 07/18] nitro_enclaves: Init misc device providing the ioctl interface

2020-09-04 Thread Andra Paraschiv
The Nitro Enclaves driver provides an ioctl interface to the user space
for enclave lifetime management e.g. enclave creation / termination and
setting enclave resources such as memory and CPU.

This ioctl interface is mapped to a Nitro Enclaves misc device.

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* Add define for the CID of the primary / parent VM.
* Update the NE PCI driver shutdown logic to include misc device
  deregister.

v6 -> v7

* Set the NE PCI device the parent of the NE misc device to be able to
  use it in the ioctl logic.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Remove the ioctl to query API version.
* Update documentation to kernel-doc format.

v4 -> v5

* Update the size of the NE CPU pool string from 4096 to 512 chars.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Remove the NE CPU pool init during kernel module loading, as the CPU
  pool is now setup at runtime, via a sysfs file for the kernel
  parameter.
* Add minimum enclave memory size definition.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.
* Remove the WARN_ON calls.
* Remove linux/bug and linux/kvm_host includes that are not needed.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.
* Remove file ops that do nothing for now - open and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Update ne_cpu_pool data structure to include the global mutex.
* Update NE misc device mode to 0660.
* Check if the CPU siblings are included in the NE CPU pool, as full CPU
  cores are given for the enclave(s).
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 135 ++
 drivers/virt/nitro_enclaves/ne_pci_dev.c  |  21 
 2 files changed, 156 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
new file mode 100644
index ..6bb05217b593
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * DOC: Enclave lifetime management driver for Nitro Enclaves (NE).
+ * Nitro is a hypervisor that has been developed by Amazon.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+/**
+ * NE_CPUS_SIZE - Size for max 128 CPUs, for now, in a cpu-list string, comma
+ *   separated. The NE CPU pool includes CPUs from a single NUMA
+ *   node.
+ */
+#define NE_CPUS_SIZE   (512)
+
+/**
+ * NE_EIF_LOAD_OFFSET - The offset where to copy the Enclave Image Format (EIF)
+ * image in enclave memory.
+ */
+#define NE_EIF_LOAD_OFFSET (8 * 1024UL * 1024UL)
+
+/**
+ * NE_MIN_ENCLAVE_MEM_SIZE - The minimum memory size an enclave can be launched
+ *  with.
+ */
+#define NE_MIN_ENCLAVE_MEM_SIZE(64 * 1024UL * 1024UL)
+
+/**
+ * NE_MIN_MEM_REGION_SIZE - The minimum size of an enclave memory region.
+ */
+#define NE_MIN_MEM_REGION_SIZE (2 * 1024UL * 1024UL)
+
+/**
+ * NE_PARENT_VM_CID - The CID for the vsock device of the primary / parent VM.
+ */
+#define NE_PARENT_VM_CID   (3)
+
+/*
+ * TODO: Update logic to create new sysfs entries instead of using
+ * a kernel parameter e.g. if multiple sysfs files needed.
+ */
+static const struct kernel_param_ops ne_cpu_pool_ops = {
+   .get= param_get_string,
+};
+
+static char ne_cpus[NE_CPUS_SIZE];
+static struct kparam_string ne_cpus_arg = {
+   .maxlen = sizeof(ne_cpus),
+   .string = ne_cpus,
+};
+
+module_param_cb(ne_cpus, _cpu_pool_ops, _cpus_arg, 0644);
+/* 
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html#cpu-lists
 */
+MODULE_PARM_DESC(ne_cpus, " - CPU pool used for Nitro Enclaves");
+
+/**
+ * struct ne_cpu_pool - CPU pool used for Nitro Enclaves.
+ * @avail_threads_per_core:Available full CPU cores to be dedicated to
+ * enclave(s). The cpumasks from the array, indexed
+ * by core id, contain all the threads from the
+ * available cores, that are not set for created
+ * enclave(s). The full CPU cores are part of the
+ * NE CPU pool.
+ * @mutex: Mutex for the access to the NE CPU pool.
+ * @nr_parent_vm_cores :   The size of the available threads per core 
array.
+ *   

[PATCH v8 06/18] nitro_enclaves: Handle out-of-band PCI device events

2020-09-04 Thread Andra Paraschiv
In addition to the replies sent by the Nitro Enclaves PCI device in
response to command requests, out-of-band enclave events can happen e.g.
an enclave crashes. In this case, the Nitro Enclaves driver needs to be
aware of the event and notify the corresponding user space process that
abstracts the enclave.

Register an MSI-X interrupt vector to be used for this kind of
out-of-band events. The interrupt notifies that the state of an enclave
changed and the driver logic scans the state of each running enclave to
identify for which this notification is intended.

Create an workqueue to handle the out-of-band events. Notify user space
enclave process that is using a polling mechanism on the enclave fd.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* No changes.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 116 +++
 1 file changed, 116 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index e9e3ff882cc7..dcf529ba509d 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -199,6 +199,88 @@ static irqreturn_t ne_reply_handler(int irq, void *args)
return IRQ_HANDLED;
 }
 
+/**
+ * ne_event_work_handler() - Work queue handler for notifying enclaves on a
+ *  state change received by the event interrupt
+ *  handler.
+ * @work:  Item containing the NE PCI device for which an out-of-band event
+ * was issued.
+ *
+ * An out-of-band event is being issued by the Nitro Hypervisor when at least
+ * one enclave is changing state without client interaction.
+ *
+ * Context: Work queue context.
+ */
+static void ne_event_work_handler(struct work_struct *work)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   struct ne_enclave *ne_enclave = NULL;
+   struct ne_pci_dev *ne_pci_dev =
+   container_of(work, struct ne_pci_dev, notify_work);
+   int rc = -EINVAL;
+   struct slot_info_req slot_info_req = {};
+
+   mutex_lock(_pci_dev->enclaves_list_mutex);
+
+   /*
+* Iterate over all enclaves registered for the Nitro Enclaves
+* PCI device and determine for which enclave(s) the out-of-band event
+* is corresponding to.
+*/
+   list_for_each_entry(ne_enclave, _pci_dev->enclaves_list, 
enclave_list_entry) {
+   mutex_lock(_enclave->enclave_info_mutex);
+
+   /*
+* Enclaves that were never started cannot receive out-of-band
+* events.
+*/
+   if (ne_enclave->state != NE_STATE_RUNNING)
+   goto unlock;
+
+   slot_info_req.slot_uid = ne_enclave->slot_uid;
+
+   rc = ne_do_request(ne_enclave->pdev, SLOT_INFO, _info_req,
+  sizeof(slot_info_req), _reply, 
sizeof(cmd_reply));
+   if (rc < 0)
+   dev_err(_enclave->pdev->dev, "Error in slot info 
[rc=%d]\n", rc);
+
+   /* Notify enclave process that the enclave state changed. */
+   if (ne_enclave->state != cmd_reply.state) {
+   ne_enclave->state = cmd_reply.state;
+
+   ne_enclave->has_event = true;
+
+   wake_up_interruptible(_enclave->eventq);
+   }
+
+unlock:
+mutex_unlock(_enclave->enclave_info_mutex);
+   }
+
+   mutex_unlock(_pci_dev->enclaves_list_mutex);
+}
+
+/**
+ * ne_event_handler() - Interrupt handler for PCI device out-of-band events.
+ * This interrupt does not supply any data in the MMIO
+ * region. It notifies a change in the state of any of
+ * the launched enclaves.
+ * @irq:   Received interrupt for an out-of-band event.
+ * @args:  PCI device private data structure.
+ *
+ * Context: Interrupt context.
+ * Return:
+ * * IRQ_HANDLED on handled interrupt.
+ */
+static irqreturn_t ne_event_handler(int irq, void *args)
+{
+   struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
+
+   queue_work(ne_pci_dev->event_wq, _pci_dev->notify_work);
+
+   return IRQ_HANDLED;
+}
+
 /**
  * ne_setup_ms

[PATCH v8 05/18] nitro_enclaves: Handle PCI device command requests

2020-09-04 Thread Andra Paraschiv
The Nitro Enclaves PCI device exposes a MMIO space that this driver
uses to submit command requests and to receive command replies e.g. for
enclave creation / termination or setting enclave resources.

Add logic for handling PCI device command requests based on the given
command type.

Register an MSI-X interrupt vector for command reply notifications to
handle this type of communication events.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* Update function signature for submit request and retrive reply
  functions as they only returned 0, no error code.
* Include command type value in the error logs of ne_do_request().

v6 -> v7

* No changes.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.

v1 -> v2

* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add fix for kbuild report:
  https://lore.kernel.org/lkml/202004231644.xtmn4z1z%25...@intel.com/
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 189 +++
 1 file changed, 189 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index daf8b36383f1..e9e3ff882cc7 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -33,6 +33,172 @@ static const struct pci_device_id ne_pci_ids[] = {
 
 MODULE_DEVICE_TABLE(pci, ne_pci_ids);
 
+/**
+ * ne_submit_request() - Submit command request to the PCI device based on the
+ *  command type.
+ * @pdev:  PCI device to send the command to.
+ * @cmd_type:  Command type of the request sent to the PCI device.
+ * @cmd_request:   Command request payload.
+ * @cmd_request_size:  Size of the command request payload.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev mutex 
held.
+ */
+static void ne_submit_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type 
cmd_type,
+ void *cmd_request, size_t cmd_request_size)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+   memcpy_toio(ne_pci_dev->iomem_base + NE_SEND_DATA, cmd_request, 
cmd_request_size);
+
+   iowrite32(cmd_type, ne_pci_dev->iomem_base + NE_COMMAND);
+}
+
+/**
+ * ne_retrieve_reply() - Retrieve reply from the PCI device.
+ * @pdev:  PCI device to receive the reply from.
+ * @cmd_reply: Command reply payload.
+ * @cmd_reply_size:Size of the command reply payload.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev mutex 
held.
+ */
+static void ne_retrieve_reply(struct pci_dev *pdev, struct 
ne_pci_dev_cmd_reply *cmd_reply,
+ size_t cmd_reply_size)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+   memcpy_fromio(cmd_reply, ne_pci_dev->iomem_base + NE_RECV_DATA, 
cmd_reply_size);
+}
+
+/**
+ * ne_wait_for_reply() - Wait for a reply of a PCI device command.
+ * @pdev:  PCI device for which a reply is waited.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_wait_for_reply(struct pci_dev *pdev)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+   int rc = -EINVAL;
+
+   /*
+* TODO: Update to _interruptible and handle interrupted wait event
+* e.g. -ERESTARTSYS, incoming signals + update timeout, if needed.
+*/
+   rc = wait_event_timeout(ne_pci_dev->cmd_reply_wait_q,
+   atomic_read(_pci_dev->cmd_reply_avail) != 0,
+   msecs_to_jiffies(NE_DEFAULT_TIMEOUT_MSECS));
+   if (!rc)
+   return -ETIMEDOUT;
+
+   return 0;
+}
+
+int ne_do_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type cmd_type,
+ void *cmd_request, size_t cmd_request_size,
+ struct ne_pci_dev_cmd_reply *cmd_reply, size_t cmd_reply_size)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+   int rc = -EINVAL;
+
+   if (cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD) {
+   dev_err_ratelimited(>dev, "Invalid cmd type=%u\n", 
cmd_type);
+
+   return -EINVAL;
+   }
+
+   if (!cmd_request) {
+   dev_err_ratelimited(>dev, "Null cmd request for cmd 
type=%u\n",
+   cmd_typ

[PATCH v8 04/18] nitro_enclaves: Init PCI device driver

2020-09-04 Thread Andra Paraschiv
The Nitro Enclaves PCI device is used by the kernel driver as a means of
communication with the hypervisor on the host where the primary VM and
the enclaves run. It handles requests with regard to enclave lifetime.

Setup the PCI device driver and add support for MSI-X interrupts.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Alexandru Ciobotaru 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* Add NE PCI driver shutdown logic.

v6 -> v7

* No changes.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update NE PCI driver name to "nitro_enclaves".

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.
* Remove the WARN_ON calls.
* Remove linux/bug include that is not needed.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update PCI device setup functions to receive PCI device data structure and
  then get private data from it inside the functions logic.
* Remove the BUG_ON calls.
* Add teardown function for MSI-X setup.
* Update goto labels to match their purpose.
* Implement TODO for NE PCI device disable state check.
* Update function name for NE PCI device probe / remove.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 298 +++
 1 file changed, 298 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
new file mode 100644
index ..daf8b36383f1
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -0,0 +1,298 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * DOC: Nitro Enclaves (NE) PCI device driver.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+/**
+ * NE_DEFAULT_TIMEOUT_MSECS - Default timeout to wait for a reply from
+ *   the NE PCI device.
+ */
+#define NE_DEFAULT_TIMEOUT_MSECS   (12) /* 120 sec */
+
+static const struct pci_device_id ne_pci_ids[] = {
+   { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_NE) },
+   { 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, ne_pci_ids);
+
+/**
+ * ne_setup_msix() - Setup MSI-X vectors for the PCI device.
+ * @pdev:  PCI device to setup the MSI-X for.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_setup_msix(struct pci_dev *pdev)
+{
+   int nr_vecs = 0;
+   int rc = -EINVAL;
+
+   nr_vecs = pci_msix_vec_count(pdev);
+   if (nr_vecs < 0) {
+   rc = nr_vecs;
+
+   dev_err(>dev, "Error in getting vec count [rc=%d]\n", rc);
+
+   return rc;
+   }
+
+   rc = pci_alloc_irq_vectors(pdev, nr_vecs, nr_vecs, PCI_IRQ_MSIX);
+   if (rc < 0) {
+   dev_err(>dev, "Error in alloc MSI-X vecs [rc=%d]\n", rc);
+
+   return rc;
+   }
+
+   return 0;
+}
+
+/**
+ * ne_teardown_msix() - Teardown MSI-X vectors for the PCI device.
+ * @pdev:  PCI device to teardown the MSI-X for.
+ *
+ * Context: Process context.
+ */
+static void ne_teardown_msix(struct pci_dev *pdev)
+{
+   pci_free_irq_vectors(pdev);
+}
+
+/**
+ * ne_pci_dev_enable() - Select the PCI device version and enable it.
+ * @pdev:  PCI device to select version for and then enable.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_pci_dev_enable(struct pci_dev *pdev)
+{
+   u8 dev_enable_reply = 0;
+   u16 dev_version_reply = 0;
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+   iowrite16(NE_VERSION_MAX, ne_pci_dev->iomem_base + NE_VERSION);
+
+   dev_version_reply = ioread16(ne_pci_dev->iomem_base + NE_VERSION);
+   if (dev_version_reply != NE_VERSION_MAX) {
+   dev_err(>dev, "Error in pci dev version cmd\n");
+
+   return -EIO;
+   }
+
+   iowrite8(NE_ENABLE_ON, ne_pci_dev->iomem_base + NE_ENABLE);
+
+   dev_enable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
+   if (dev_enable_reply != NE_ENABLE_ON) {
+   dev_err(>dev, "Error in pci dev enable cmd\n");
+
+   return -EIO;
+   }
+
+   return 0;
+}
+
+/**
+ * ne_pci_dev_disable() - Disable the PCI device.
+ * @pdev:  PCI device to disable.
+ *
+ * Context: Process

[PATCH v8 03/18] nitro_enclaves: Define enclave info for internal bookkeeping

2020-09-04 Thread Andra Paraschiv
The Nitro Enclaves driver keeps an internal info per each enclave.

This is needed to be able to manage enclave resources state, enclave
notifications and have a reference of the PCI device that handles
command requests for enclave lifetime management.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Update documentation to kernel-doc format.
* Include in the enclave memory region data structure the user space
  address and size for duplicate user space memory regions checks.

v4 -> v5

* Include enclave cores field in the enclave metadata.
* Update the vCPU ids data structure to be a cpumask instead of a list.

v3 -> v4

* Add NUMA node field for an enclave metadata as the enclave memory and
  CPUs need to be from the same NUMA node.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Add enclave memory regions and vcpus count for enclave bookkeeping.
* Update ne_state comments to reflect NE_START_ENCLAVE ioctl naming
  update.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.h | 99 +++
 1 file changed, 99 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.h

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.h 
b/drivers/virt/nitro_enclaves/ne_misc_dev.h
new file mode 100644
index ..a907924de7ca
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _NE_MISC_DEV_H_
+#define _NE_MISC_DEV_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * struct ne_mem_region - Entry in the enclave user space memory regions list.
+ * @mem_region_list_entry: Entry in the list of enclave memory regions.
+ * @memory_size:   Size of the user space memory region.
+ * @nr_pages:  Number of pages that make up the memory region.
+ * @pages: Pages that make up the user space memory region.
+ * @userspace_addr:User space address of the memory region.
+ */
+struct ne_mem_region {
+   struct list_headmem_region_list_entry;
+   u64 memory_size;
+   unsigned long   nr_pages;
+   struct page **pages;
+   u64 userspace_addr;
+};
+
+/**
+ * struct ne_enclave - Per-enclave data used for enclave lifetime management.
+ * @enclave_info_mutex :   Mutex for accessing this internal state.
+ * @enclave_list_entry :   Entry in the list of created enclaves.
+ * @eventq:Wait queue used for out-of-band event 
notifications
+ * triggered from the PCI device event handler to
+ * the enclave process via the poll function.
+ * @has_event: Variable used to determine if the out-of-band 
event
+ * was triggered.
+ * @max_mem_regions:   The maximum number of memory regions that can be
+ * handled by the hypervisor.
+ * @mem_regions_list:  Enclave user space memory regions list.
+ * @mem_size:  Enclave memory size.
+ * @mm :   Enclave process abstraction mm data struct.
+ * @nr_mem_regions:Number of memory regions associated with the 
enclave.
+ * @nr_parent_vm_cores :   The size of the threads per core array. The
+ * total number of CPU cores available on the
+ * parent / primary VM.
+ * @nr_threads_per_core:   The number of threads that a full CPU core has.
+ * @nr_vcpus:  Number of vcpus associated with the enclave.
+ * @numa_node: NUMA node of the enclave memory and CPUs.
+ * @pdev:  PCI device used for enclave lifetime management.
+ * @slot_uid:  Slot unique id mapped to the enclave.
+ * @state: Enclave state, updated during enclave lifetime.
+ * @threads_per_core:  Enclave full CPU cores array, indexed by core 
id,
+ * consisting of cpumasks with all their threads.
+ * Full CPU cores are taken from the NE CPU pool
+ * and are available to the enclave.
+ * @vcpu_ids:  Cpumask of the vCPUs that are set for the 
enclave.
+ */
+struct ne_enclave {
+   struct mutexenclave_info_mutex;
+   struct list_headenclave_list_entry;
+   wait_queue_head_t   eventq;
+   boolhas_event;
+   u64   

[PATCH v8 02/18] nitro_enclaves: Define the PCI device interface

2020-09-04 Thread Andra Paraschiv
The Nitro Enclaves (NE) driver communicates with a new PCI device, that
is exposed to a virtual machine (VM) and handles commands meant for
handling enclaves lifetime e.g. creation, termination, setting memory
regions. The communication with the PCI device is handled using a MMIO
space and MSI-X interrupts.

This device communicates with the hypervisor on the host, where the VM
that spawned the enclave itself runs, e.g. to launch a VM that is used
for the enclave.

Define the MMIO space of the NE PCI device, the commands that are
provided by this device. Add an internal data structure used as private
data for the PCI device driver and the function for the PCI device
command requests handling.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Alexandru Ciobotaru 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* Update the documentation to include references to the NE PCI device id
  and MMIO bar.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Add a TODO for including flags in the request to the NE PCI device to
  set a memory region for an enclave. It is not used for now.

v3 -> v4

* Remove the "packed" attribute and include padding in the NE data
  structures.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path naming to drivers/virt/nitro_enclaves.
* Update NE_ENABLE_OFF / NE_ENABLE_ON defines.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.h | 327 +++
 1 file changed, 327 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.h

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.h 
b/drivers/virt/nitro_enclaves/ne_pci_dev.h
new file mode 100644
index ..336fa344d630
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.h
@@ -0,0 +1,327 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _NE_PCI_DEV_H_
+#define _NE_PCI_DEV_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * DOC: Nitro Enclaves (NE) PCI device
+ */
+
+/**
+ * PCI_DEVICE_ID_NE - Nitro Enclaves PCI device id.
+ */
+#define PCI_DEVICE_ID_NE   (0xe4c1)
+/**
+ * PCI_BAR_NE - Nitro Enclaves PCI device MMIO BAR.
+ */
+#define PCI_BAR_NE (0x03)
+
+/**
+ * DOC: Device registers in the NE PCI device MMIO BAR
+ */
+
+/**
+ * NE_ENABLE - (1 byte) Register to notify the device that the driver is using
+ *it (Read/Write).
+ */
+#define NE_ENABLE  (0x)
+#define NE_ENABLE_OFF  (0x00)
+#define NE_ENABLE_ON   (0x01)
+
+/**
+ * NE_VERSION - (2 bytes) Register to select the device run-time version
+ * (Read/Write).
+ */
+#define NE_VERSION (0x0002)
+#define NE_VERSION_MAX (0x0001)
+
+/**
+ * NE_COMMAND - (4 bytes) Register to notify the device what command was
+ * requested (Write-Only).
+ */
+#define NE_COMMAND (0x0004)
+
+/**
+ * NE_EVTCNT - (4 bytes) Register to notify the driver that a reply or a device
+ *event is available (Read-Only):
+ *- Lower half  - command reply counter
+ *- Higher half - out-of-band device event counter
+ */
+#define NE_EVTCNT  (0x000c)
+#define NE_EVTCNT_REPLY_SHIFT  (0)
+#define NE_EVTCNT_REPLY_MASK   (0x)
+#define NE_EVTCNT_REPLY(cnt)   (((cnt) & NE_EVTCNT_REPLY_MASK) >> \
+   NE_EVTCNT_REPLY_SHIFT)
+#define NE_EVTCNT_EVENT_SHIFT  (16)
+#define NE_EVTCNT_EVENT_MASK   (0x)
+#define NE_EVTCNT_EVENT(cnt)   (((cnt) & NE_EVTCNT_EVENT_MASK) >> \
+   NE_EVTCNT_EVENT_SHIFT)
+
+/**
+ * NE_SEND_DATA - (240 bytes) Buffer for sending the command request payload
+ *   (Read/Write).
+ */
+#define NE_SEND_DATA   (0x0010)
+
+/**
+ * NE_RECV_DATA - (240 bytes) Buffer for receiving the command reply payload
+ *   (Read-Only).
+ */
+#define NE_RECV_DATA   (0x0100)
+
+/**
+ * DOC: Device MMIO buffer sizes
+ */
+
+/**
+ * NE_SEND_DATA_SIZE / NE_RECV_DATA_SIZE - 240 bytes for send / recv buffer.
+ */
+#define NE_SEND_DATA_SIZE  (240)
+#define NE_RECV_DATA_SIZE  (240)
+
+/**
+ * DOC: MSI-X interrupt vectors
+ */
+
+/**
+ * NE_VEC_REPLY - MSI-X vector used for command reply notification.
+ */
+#define NE_VEC_REPLY   (0)
+
+/**
+ * NE_VEC_EVENT - MSI-X vector used for out-of-band events e.g. enclave crash.
+ */
+#define NE_VEC_EVENT   (1)
+
+/**
+ * enum ne_pci_dev_cmd_type - Device command types.
+ * @INVALID_CMD:   Invalid command.
+ * @ENCLAVE_START: Start an enclave, after setting its resources.
+ * @ENCLAVE_GET_SLOT:  Get the slot uid of an enclave.
+ * @ENCLAVE_STOP:  Terminate an enclave.
+ * @SLOT_ALLOC :  

[PATCH v8 01/18] nitro_enclaves: Add ioctl interface definition

2020-09-04 Thread Andra Paraschiv
The Nitro Enclaves driver handles the enclave lifetime management. This
includes enclave creation, termination and setting up its resources such
as memory and CPU.

An enclave runs alongside the VM that spawned it. It is abstracted as a
process running in the VM that launched it. The process interacts with
the NE driver, that exposes an ioctl interface for creating an enclave
and setting up its resources.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
Reviewed-by: Stefan Hajnoczi 
---
Changelog

v7 -> v8

* Add NE custom error codes for user space memory regions not backed by
  pages multiple of 2 MiB, invalid flags and enclave CID.
* Add max flag value for enclave image load info.

v6 -> v7

* Clarify in the ioctls documentation that the return value is -1 and
  errno is set on failure.
* Update the error code value for NE_ERR_INVALID_MEM_REGION_SIZE as it
  gets in user space as value 25 (ENOTTY) instead of 515. Update the
  NE custom error codes values range to not be the same as the ones
  defined in include/linux/errno.h, although these are not propagated
  to user space.

v5 -> v6

* Fix typo in the description about the NE CPU pool.
* Update documentation to kernel-doc format.
* Remove the ioctl to query API version.

v4 -> v5

* Add more details about the ioctl calls usage e.g. error codes, file
  descriptors used.
* Update the ioctl to set an enclave vCPU to not return a file
  descriptor.
* Add specific NE error codes.

v3 -> v4

* Decouple NE ioctl interface from KVM API.
* Add NE API version and the corresponding ioctl call.
* Add enclave / image load flags options.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Add ioctl for getting enclave image load metadata.
* Update NE_ENCLAVE_START ioctl name to NE_START_ENCLAVE.
* Add entry in Documentation/userspace-api/ioctl/ioctl-number.rst for NE
  ioctls.
* Update NE ioctls definition based on the updated ioctl range for major
  and minor.
---
 .../userspace-api/ioctl/ioctl-number.rst  |   5 +-
 include/linux/nitro_enclaves.h|  11 +
 include/uapi/linux/nitro_enclaves.h   | 359 ++
 3 files changed, 374 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/nitro_enclaves.h
 create mode 100644 include/uapi/linux/nitro_enclaves.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 2a198838fca9..5f7ff00f394e 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -328,8 +328,11 @@ Code  Seq#Include File 
  Comments
 0xAC  00-1F  linux/raw.h
 0xAD  00 Netfilter 
device in development:
  
<mailto:ru...@rustcorp.com.au>
-0xAE  alllinux/kvm.h 
Kernel-based Virtual Machine
+0xAE  00-1F  linux/kvm.h 
Kernel-based Virtual Machine
  
<mailto:k...@vger.kernel.org>
+0xAE  40-FF  linux/kvm.h 
Kernel-based Virtual Machine
+ 
<mailto:k...@vger.kernel.org>
+0xAE  20-3F  linux/nitro_enclaves.h  Nitro 
Enclaves
 0xAF  00-1F  linux/fsl_hypervisor.h  Freescale 
hypervisor
 0xB0  allRATIO 
devices in development:
  
<mailto:v...@ratio.de>
diff --git a/include/linux/nitro_enclaves.h b/include/linux/nitro_enclaves.h
new file mode 100644
index ..d91ef2bfdf47
--- /dev/null
+++ b/include/linux/nitro_enclaves.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _LINUX_NITRO_ENCLAVES_H_
+#define _LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+#endif /* _LINUX_NITRO_ENCLAVES_H_ */
diff --git a/include/uapi/linux/nitro_enclaves.h 
b/include/uapi/linux/nitro_enclaves.h
new file mode 100644
index ..b945073fe544
--- /dev/null
+++ b/include/uapi/linux/nitro_enclaves.h
@@ -0,0 +1,359 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
+#define _UAPI_LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+/**
+ * DOC: Nitro Enclaves (NE) Kernel Driver Interface
+ */
+
+/**
+ * NE_CREATE_VM - The command is used to create a slot that is associated with
+ *   an encla

[PATCH v8 00/18] Add support for Nitro Enclaves

2020-09-04 Thread Andra Paraschiv
lls, and memory regions not backed by pages multiple of 2 MiB.
* Add NE PCI driver shutdown logic.
* Add check for invalid provided enclave CID to the start enclave ioctl.
* Update documentation to include info about the primary / parent VM CID for its
  vsock device. Update reference link for huge pages and include refs for the
  x86 boot protocol.
* Update sample to track the newly added NE custom error codes and match the
  latest logic for the heartbeat enclave boot check.
* v7: https://lore.kernel.org/lkml/20200817131003.56650-1-andra...@amazon.com/

v6 -> v7

* Rebase on top of v5.9-rc1.
* Use the NE misc device parent field to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic of
  handling full CPU cores and dedicating them to the enclave.
* Remove, for now, the dependency on ARM64 arch in Kconfig. x86 is currently
  supported, with Arm to come afterwards. The NE kernel driver can be currently
  built for aarch64 arch.
* Clarify in the ioctls documentation that the return value is -1 and errno is
  set on failure.
* Update the error code value for NE_ERR_INVALID_MEM_REGION_SIZE as it gets in
  user space as value 25 (ENOTTY) instead of 515. Update the NE custom error
  codes values range to not be the same as the ones defined in
  include/linux/errno.h, although these are not propagated to user space.
* Update the documentation to include references to the NE PCI device id and
  MMIO bar.
* Update check for duplicate user space memory regions to cover additional
  possible scenarios.
* Calculate the number of threads per core and not use smp_num_siblings that is
  x86 specific.
* v6: https://lore.kernel.org/lkml/20200805091017.86203-1-andra...@amazon.com/

v5 -> v6

* Rebase on top of v5.8.
* Update documentation to kernel-doc format.
* Update sample to include the enclave image loading logic.
* Remove the ioctl to query API version.
* Check for invalid provided flags field via ioctl calls args.
* Check for duplicate provided user space memory regions.
* Check for aligned memory regions.
* Include, in the sample, usage info for NUMA-aware hugetlb config.
* v5: https://lore.kernel.org/lkml/20200715194540.45532-1-andra...@amazon.com/

v4 -> v5

* Rebase on top of v5.8-rc5.
* Add more details about the ioctl calls usage e.g. error codes.
* Update the ioctl to set an enclave vCPU to not return a fd.
* Add specific NE error codes.
* Split the NE CPU pool in CPU cores cpumasks.
* Remove log on copy_from_user() / copy_to_user() failure.
* Release the reference to the NE PCI device on failure paths.
* Close enclave fd on copy_to_user() failure.
* Set empty string in case of invalid NE CPU pool sysfs value.
* Early exit on NE CPU pool setup if enclave(s) already running.
* Add more sanity checks for provided vCPUs e.g. maximum possible value.
* Split logic for checking if a vCPU is in pool / getting a vCPU from pool.
* Exit without unpinning the pages on NE PCI dev request failure.
* Add check for the memory region user space address alignment.
* Update the logic to set memory region to not have a hardcoded check for 2 MiB.
* Add arch dependency for Arm / x86.
* v4: https://lore.kernel.org/lkml/20200622200329.52996-1-andra...@amazon.com/

v3 -> v4

* Rebase on top of v5.8-rc2.
* Add NE API version and the corresponding ioctl call.
* Add enclave / image load flags options.
* Decouple NE ioctl interface from KVM API.
* Remove the "packed" attribute and include padding in the NE data structures.
* Update documentation based on the changes from v4.
* Update sample to match the updates in v4.
* Remove the NE CPU pool init during NE kernel module loading.
* Setup the NE CPU pool at runtime via a sysfs file for the kernel parameter.
* Check if the enclave memory and CPUs are from the same NUMA node.
* Add minimum enclave memory size definition.
* v3: https://lore.kernel.org/lkml/20200525221334.62966-1-andra...@amazon.com/ 

v2 -> v3

* Rebase on top of v5.7-rc7.
* Add changelog to each patch in the series.
* Remove "ratelimited" from the logs that are not in the ioctl call paths.
* Update static calls sanity checks.
* Remove file ops that do nothing for now.
* Remove GPL additional wording as SPDX-License-Identifier is already in place.
* v2: https://lore.kernel.org/lkml/20200522062946.28973-1-andra...@amazon.com/

v1 -> v2

* Rebase on top of v5.7-rc6.
* Adapt codebase based on feedback from v1.
* Update ioctl number definition - major and minor.
* Add sample / documentation for the ioctl interface basic flow usage.
* Update cover letter to include more context on the NE overall.
* Add fix for the enclave / vcpu fd creation error cleanup path.
* Add fix reported by kbuild test robot .
* v1: https://lore.kernel.org/lkml/20200421184150.68011-1-andra...@amazon.com/

---

Andra Paraschiv (18):
  nitro_enclaves: Add ioctl interface definition
  nitro_enclaves: Define the PCI device interface
  nitro_enclaves: Define enclave info for internal book

[PATCH v7 18/18] MAINTAINERS: Add entry for the Nitro Enclaves driver

2020-08-17 Thread Andra Paraschiv
Signed-off-by: Andra Paraschiv 
---
Changelog

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* No changes.

v2 -> v3

* Update file entries to be in alphabetical order.

v1 -> v2

* No changes.
---
 MAINTAINERS | 13 +
 1 file changed, 13 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index deaafb617361..06247ca41e5e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12268,6 +12268,19 @@ S: Maintained
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2.git
 F: arch/nios2/
 
+NITRO ENCLAVES (NE)
+M:     Andra Paraschiv 
+M: Alexandru Vasile 
+M: Alexandru Ciobotaru 
+L: linux-kernel@vger.kernel.org
+S: Supported
+W: https://aws.amazon.com/ec2/nitro/nitro-enclaves/
+F: Documentation/nitro_enclaves/
+F: drivers/virt/nitro_enclaves/
+F: include/linux/nitro_enclaves.h
+F: include/uapi/linux/nitro_enclaves.h
+F: samples/nitro_enclaves/
+
 NOHZ, DYNTICKS SUPPORT
 M: Frederic Weisbecker 
 M: Thomas Gleixner 
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v7 13/18] nitro_enclaves: Add logic for terminating an enclave

2020-08-17 Thread Andra Paraschiv
An enclave is associated with an fd that is returned after the enclave
creation logic is completed. This enclave fd is further used to setup
enclave resources. Once the enclave needs to be terminated, the enclave
fd is closed.

Add logic for enclave termination, that is mapped to the enclave fd
release callback. Free the internal enclave info used for bookkeeping.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v6 -> v7

* Remove the pci_dev_put() call as the NE misc device parent field is
  used now to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Update documentation to kernel-doc format.
* Use directly put_page() instead of unpin_user_pages(), to match the
  get_user_pages() calls.

v4 -> v5

* Release the reference to the NE PCI device on enclave fd release.
* Adapt the logic to cpumask enclave vCPU ids and CPU cores.
* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add early exit in release() if there was a slot alloc error in the fd
  creation path.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 166 ++
 1 file changed, 166 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index be81ff5634af..787428390d94 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -1221,6 +1221,171 @@ static long ne_enclave_ioctl(struct file *file, 
unsigned int cmd, unsigned long
return 0;
 }
 
+/**
+ * ne_enclave_remove_all_mem_region_entries() - Remove all memory region 
entries
+ * from the enclave data structure.
+ * @ne_enclave :   Private data associated with the current enclave.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ */
+static void ne_enclave_remove_all_mem_region_entries(struct ne_enclave 
*ne_enclave)
+{
+   unsigned long i = 0;
+   struct ne_mem_region *ne_mem_region = NULL;
+   struct ne_mem_region *ne_mem_region_tmp = NULL;
+
+   list_for_each_entry_safe(ne_mem_region, ne_mem_region_tmp,
+_enclave->mem_regions_list,
+mem_region_list_entry) {
+   list_del(_mem_region->mem_region_list_entry);
+
+   for (i = 0; i < ne_mem_region->nr_pages; i++)
+   put_page(ne_mem_region->pages[i]);
+
+   kfree(ne_mem_region->pages);
+
+   kfree(ne_mem_region);
+   }
+}
+
+/**
+ * ne_enclave_remove_all_vcpu_id_entries() - Remove all vCPU id entries from
+ *  the enclave data structure.
+ * @ne_enclave :   Private data associated with the current enclave.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ */
+static void ne_enclave_remove_all_vcpu_id_entries(struct ne_enclave 
*ne_enclave)
+{
+   unsigned int cpu = 0;
+   unsigned int i = 0;
+
+   mutex_lock(_cpu_pool.mutex);
+
+   for (i = 0; i < ne_enclave->nr_parent_vm_cores; i++) {
+   for_each_cpu(cpu, ne_enclave->threads_per_core[i])
+   /* Update the available NE CPU pool. */
+   cpumask_set_cpu(cpu, 
ne_cpu_pool.avail_threads_per_core[i]);
+
+   free_cpumask_var(ne_enclave->threads_per_core[i]);
+   }
+
+   mutex_unlock(_cpu_pool.mutex);
+
+   kfree(ne_enclave->threads_per_core);
+
+   free_cpumask_var(ne_enclave->vcpu_ids);
+}
+
+/**
+ * ne_pci_dev_remove_enclave_entry() - Remove the enclave entry from the data
+ *structure that is part of the NE PCI
+ *device private data.
+ * @ne_enclave :   Private data associated with the current enclave.
+ * @ne_pci_dev :   Private data associated with the PCI device.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev 
enclave
+ * mutex held.
+ */
+static void ne_pci_dev_remove_enclave_entry(struct ne_enclave *ne_enclave,
+   struct ne_pci_dev *ne_pci_dev)
+{
+   struct ne_enclave *ne_enclave_entry = NULL;
+   struct ne_enclave *ne_enclave_entry_tmp = NULL;
+
+   list_for_each_entry_safe(ne_enclave_entry, ne_enclave_entry_tmp,
+_pci_dev->enclaves_list, 
enclave_list_entry) {
+

[PATCH v7 16/18] nitro_enclaves: Add sample for ioctl interface usage

2020-08-17 Thread Andra Paraschiv
Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v6 -> v7

* Track POLLNVAL as poll event in addition to POLLHUP.

v5 -> v6

* Remove "rc" mentioning when printing errno string.
* Remove the ioctl to query API version.
* Include usage info for NUMA-aware hugetlb configuration.
* Update documentation to kernel-doc format.
* Add logic for enclave image loading.

v4 -> v5

* Print enclave vCPU ids when they are created.
* Update logic to map the modified vCPU ioctl call.
* Add check for the path to the enclave image to be less than PATH_MAX.
* Update the ioctl calls error checking logic to match the NE specific
  error codes.

v3 -> v4

* Update usage details to match the updates in v4.
* Update NE ioctl interface usage.

v2 -> v3

* Remove the include directory to use the uapi from the kernel.
* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* New in v2.
---
 samples/nitro_enclaves/.gitignore|   2 +
 samples/nitro_enclaves/Makefile  |  16 +
 samples/nitro_enclaves/ne_ioctl_sample.c | 850 +++
 3 files changed, 868 insertions(+)
 create mode 100644 samples/nitro_enclaves/.gitignore
 create mode 100644 samples/nitro_enclaves/Makefile
 create mode 100644 samples/nitro_enclaves/ne_ioctl_sample.c

diff --git a/samples/nitro_enclaves/.gitignore 
b/samples/nitro_enclaves/.gitignore
new file mode 100644
index ..827934129c90
--- /dev/null
+++ b/samples/nitro_enclaves/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+ne_ioctl_sample
diff --git a/samples/nitro_enclaves/Makefile b/samples/nitro_enclaves/Makefile
new file mode 100644
index ..a3ec78fefb52
--- /dev/null
+++ b/samples/nitro_enclaves/Makefile
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Enclave lifetime management support for Nitro Enclaves (NE) - ioctl sample
+# usage.
+
+.PHONY: all clean
+
+CFLAGS += -Wall
+
+all:
+   $(CC) $(CFLAGS) -o ne_ioctl_sample ne_ioctl_sample.c -lpthread
+
+clean:
+   rm -f ne_ioctl_sample
diff --git a/samples/nitro_enclaves/ne_ioctl_sample.c 
b/samples/nitro_enclaves/ne_ioctl_sample.c
new file mode 100644
index ..1c4ee3132e11
--- /dev/null
+++ b/samples/nitro_enclaves/ne_ioctl_sample.c
@@ -0,0 +1,850 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * DOC: Sample flow of using the ioctl interface provided by the Nitro 
Enclaves (NE)
+ * kernel driver.
+ *
+ * Usage
+ * -
+ *
+ * Load the nitro_enclaves module, setting also the enclave CPU pool. The
+ * enclave CPUs need to be full cores from the same NUMA node. CPU 0 and its
+ * siblings have to remain available for the primary / parent VM, so they
+ * cannot be included in the enclave CPU pool.
+ *
+ * See the cpu list section from the kernel documentation.
+ * 
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html#cpu-lists
+ *
+ * insmod drivers/virt/nitro_enclaves/nitro_enclaves.ko
+ * lsmod
+ *
+ * The CPU pool can be set at runtime, after the kernel module is loaded.
+ *
+ * echo  > /sys/module/nitro_enclaves/parameters/ne_cpus
+ *
+ * NUMA and CPU siblings information can be found using:
+ *
+ * lscpu
+ * /proc/cpuinfo
+ *
+ * Check the online / offline CPU list. The CPUs from the pool should be
+ * offlined.
+ *
+ * lscpu
+ *
+ * Check dmesg for any warnings / errors through the NE driver lifetime / 
usage.
+ * The NE logs contain the "nitro_enclaves" or "pci :00:02.0" pattern.
+ *
+ * dmesg
+ *
+ * Setup hugetlbfs huge pages. The memory needs to be from the same NUMA node 
as
+ * the enclave CPUs.
+ * https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
+ * By default, the allocation of hugetlb pages are distributed on all possible
+ * NUMA nodes. Use the following configuration files to set the number of huge
+ * pages from a NUMA node:
+ *
+ * /sys/devices/system/node/node/hugepages/hugepages-2048kB/nr_hugepages
+ * 
/sys/devices/system/node/node/hugepages/hugepages-1048576kB/nr_hugepages
+ *
+ * or, if not on a system with multiple NUMA nodes, can also set the number
+ * of 2 MiB / 1 GiB huge pages using
+ *
+ * /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
+ * /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
+ *
+ * In this example 256 hugepages of 2 MiB are used.
+ *
+ * Build and run the NE sample.
+ *
+ * make -C samples/nitro_enclaves clean
+ * make -C samples/nitro_enclaves
+ * ./samples/nitro_enclaves/ne_ioctl_sample 
+ *
+ * Unload the nitro_enclaves module.
+ *
+ * rmmod nitro_enclaves
+ * lsmod
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#in

[PATCH v7 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver

2020-08-17 Thread Andra Paraschiv
Signed-off-by: Andra Paraschiv 
---
Changelog

v6 -> v7

* Remove, for now, the dependency on ARM64 arch. x86 is currently
  supported, with Arm to come afterwards. The NE kernel driver can be
  built for aarch64 arch.

v5 -> v6

* No changes.

v4 -> v5

* Add arch dependency for Arm / x86.

v3 -> v4

* Add PCI and SMP dependencies.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path to Kconfig to match the drivers/virt/nitro_enclaves
  directory.
* Update help in Kconfig.
---
 drivers/virt/Kconfig|  2 ++
 drivers/virt/nitro_enclaves/Kconfig | 20 
 2 files changed, 22 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/Kconfig

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index cbc1f25c79ab..80c5f9c16ec1 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -32,4 +32,6 @@ config FSL_HV_MANAGER
 partition shuts down.
 
 source "drivers/virt/vboxguest/Kconfig"
+
+source "drivers/virt/nitro_enclaves/Kconfig"
 endif
diff --git a/drivers/virt/nitro_enclaves/Kconfig 
b/drivers/virt/nitro_enclaves/Kconfig
new file mode 100644
index ..8c9387a232df
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Kconfig
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Amazon Nitro Enclaves (NE) support.
+# Nitro is a hypervisor that has been developed by Amazon.
+
+# TODO: Add dependency for ARM64 once NE is supported on Arm platforms. For 
now,
+# the NE kernel driver can be built for aarch64 arch.
+# depends on (ARM64 || X86) && HOTPLUG_CPU && PCI && SMP
+
+config NITRO_ENCLAVES
+   tristate "Nitro Enclaves Support"
+   depends on X86 && HOTPLUG_CPU && PCI && SMP
+   help
+ This driver consists of support for enclave lifetime management
+ for Nitro Enclaves (NE).
+
+ To compile this driver as a module, choose M here.
+ The module will be called nitro_enclaves.
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v7 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves driver

2020-08-17 Thread Andra Paraschiv
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* No changes.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path to Makefile to match the drivers/virt/nitro_enclaves
  directory.
---
 drivers/virt/Makefile|  2 ++
 drivers/virt/nitro_enclaves/Makefile | 11 +++
 2 files changed, 13 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/Makefile

diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index fd331247c27a..f28425ce4b39 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -5,3 +5,5 @@
 
 obj-$(CONFIG_FSL_HV_MANAGER)   += fsl_hypervisor.o
 obj-y  += vboxguest/
+
+obj-$(CONFIG_NITRO_ENCLAVES)   += nitro_enclaves/
diff --git a/drivers/virt/nitro_enclaves/Makefile 
b/drivers/virt/nitro_enclaves/Makefile
new file mode 100644
index ..e9f4fcd1591e
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Enclave lifetime management support for Nitro Enclaves (NE).
+
+obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves.o
+
+nitro_enclaves-y := ne_pci_dev.o ne_misc_dev.o
+
+ccflags-y += -Wall
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v7 17/18] nitro_enclaves: Add overview documentation

2020-08-17 Thread Andra Paraschiv
Signed-off-by: Andra Paraschiv 
---
Changelog

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* Update doc type from .txt to .rst.
* Update documentation based on the changes from v4.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.
---
 Documentation/nitro_enclaves/ne_overview.rst | 87 
 1 file changed, 87 insertions(+)
 create mode 100644 Documentation/nitro_enclaves/ne_overview.rst

diff --git a/Documentation/nitro_enclaves/ne_overview.rst 
b/Documentation/nitro_enclaves/ne_overview.rst
new file mode 100644
index ..9cc7a2720955
--- /dev/null
+++ b/Documentation/nitro_enclaves/ne_overview.rst
@@ -0,0 +1,87 @@
+Nitro Enclaves
+==
+
+Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
+that allows customers to carve out isolated compute environments within EC2
+instances [1].
+
+For example, an application that processes sensitive data and runs in a VM,
+can be separated from other applications running in the same VM. This
+application then runs in a separate VM than the primary VM, namely an enclave.
+
+An enclave runs alongside the VM that spawned it. This setup matches low 
latency
+applications needs. The resources that are allocated for the enclave, such as
+memory and CPUs, are carved out of the primary VM. Each enclave is mapped to a
+process running in the primary VM, that communicates with the NE driver via an
+ioctl interface.
+
+In this sense, there are two components:
+
+1. An enclave abstraction process - a user space process running in the primary
+VM guest that uses the provided ioctl interface of the NE driver to spawn an
+enclave VM (that's 2 below).
+
+There is a NE emulated PCI device exposed to the primary VM. The driver for 
this
+new PCI device is included in the NE driver.
+
+The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE 
ioctl
+maps to an enclave start PCI command. The PCI device commands are then
+translated into  actions taken on the hypervisor side; that's the Nitro
+hypervisor running on the host where the primary VM is running. The Nitro
+hypervisor is based on core KVM technology.
+
+2. The enclave itself - a VM running on the same host as the primary VM that
+spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
+for the enclave VM. An enclave does not have persistent storage attached.
+
+The memory regions carved out of the primary VM and given to an enclave need to
+be aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
+this size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
+user space [2][3]. The memory size for an enclave needs to be at least 64 MiB.
+The enclave memory and CPUs need to be from the same NUMA node.
+
+An enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
+available for the primary VM. A CPU pool has to be set for NE purposes by an
+user with admin capability. See the cpu list section from the kernel
+documentation [4] for how a CPU pool format looks.
+
+An enclave communicates with the primary VM via a local communication channel,
+using virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
+while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
+uses eventfd for signaling. The enclave VM sees the usual interfaces - local
+APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
+device is placed in memory below the typical 4 GiB.
+
+The application that runs in the enclave needs to be packaged in an enclave
+image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
+enclave VM. The enclave VM has its own kernel and follows the standard Linux
+boot protocol.
+
+The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
+Enclave Image Format (EIF); plus an EIF header including metadata such as magic
+number, eif version, image size and CRC.
+
+Hash values are computed for the entire enclave image (EIF), the kernel and
+ramdisk(s). That's used, for example, to check that the enclave image that is
+loaded in the enclave VM is the one that was intended to be run.
+
+These crypto measurements are included in a signed attestation document
+generated by the Nitro Hypervisor and further used to prove the identity of the
+enclave; KMS is an example of service that NE is integrated with and that 
checks
+the attestation doc.
+
+The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
+init process in the enclave connects to the vsock CID of the primary VM and a
+predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
+used to check in the primary VM that the enclave has booted.
+
+If the enclave VM crashes or gracefully exits, an interrupt event is received 
by
+the NE driver. This event is sent further to the user space enclave process
+running in the primary VM via a poll noti

[PATCH v7 11/18] nitro_enclaves: Add logic for setting an enclave memory region

2020-08-17 Thread Andra Paraschiv
Another resource that is being set for an enclave is memory. User space
memory regions, that need to be backed by contiguous memory regions,
are associated with the enclave.

One solution for allocating / reserving contiguous memory regions, that
is used for integration, is hugetlbfs. The user space process that is
associated with the enclave passes to the driver these memory regions.

The enclave memory regions need to be from the same NUMA node as the
enclave CPUs.

Add ioctl command logic for setting user space memory region for an
enclave.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v6 -> v7

* Update check for duplicate user space memory regions to cover
  additional possible scenarios.

v5 -> v6

* Check for max number of pages allocated for the internal data
  structure for pages.
* Check for invalid memory region flags.
* Check for aligned physical memory regions.
* Update documentation to kernel-doc format.
* Check for duplicate user space memory regions.
* Use directly put_page() instead of unpin_user_pages(), to match the
  get_user_pages() calls.

v4 -> v5

* Add early exit on set memory region ioctl function call error.
* Remove log on copy_from_user() failure.
* Exit without unpinning the pages on NE PCI dev request failure as
  memory regions from the user space range may have already been added.
* Add check for the memory region user space address to be 2 MiB
  aligned.
* Update logic to not have a hardcoded check for 2 MiB memory regions.

v3 -> v4

* Check enclave memory regions are from the same NUMA node as the
  enclave CPUs.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave max memory regions is reached when setting an enclave
  memory region.
* Check if enclave state is init when setting an enclave memory region.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 287 ++
 1 file changed, 287 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 810c4bba424f..3d8a771bde1d 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -703,6 +703,260 @@ static int ne_add_vcpu_ioctl(struct ne_enclave 
*ne_enclave, u32 vcpu_id)
return 0;
 }
 
+/**
+ * ne_sanity_check_user_mem_region() - Sanity check the user space memory
+ *region received during the set user
+ *memory region ioctl call.
+ * @ne_enclave :   Private data associated with the current enclave.
+ * @mem_region :   User space memory region to be sanity checked.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_sanity_check_user_mem_region(struct ne_enclave *ne_enclave,
+   struct ne_user_memory_region mem_region)
+{
+   struct ne_mem_region *ne_mem_region = NULL;
+
+   if (ne_enclave->mm != current->mm)
+   return -EIO;
+
+   if (mem_region.memory_size & (NE_MIN_MEM_REGION_SIZE - 1)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "User space memory size is not multiple of 
2 MiB\n");
+
+   return -NE_ERR_INVALID_MEM_REGION_SIZE;
+   }
+
+   if (!IS_ALIGNED(mem_region.userspace_addr, NE_MIN_MEM_REGION_SIZE)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "User space address is not 2 MiB 
aligned\n");
+
+   return -NE_ERR_UNALIGNED_MEM_REGION_ADDR;
+   }
+
+   if ((mem_region.userspace_addr & (NE_MIN_MEM_REGION_SIZE - 1)) ||
+   !access_ok((void __user *)(unsigned long)mem_region.userspace_addr,
+  mem_region.memory_size)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Invalid user space address range\n");
+
+   return -NE_ERR_INVALID_MEM_REGION_ADDR;
+   }
+
+   list_for_each_entry(ne_mem_region, _enclave->mem_regions_list,
+   mem_region_list_entry) {
+   u64 memory_size = ne_mem_region->memory_size;
+   u64 userspace_addr = ne_mem_region->userspace_addr;
+
+   if ((userspace_addr <= mem_region.userspace_addr &&
+   mem_region.userspace_addr < (userspace_addr + memory_size)) 
||
+   (mem_region.userspace_addr <= userspace_addr &

[PATCH v7 12/18] nitro_enclaves: Add logic for starting an enclave

2020-08-17 Thread Andra Paraschiv
After all the enclave resources are set, the enclave is ready for
beginning to run.

Add ioctl command logic for starting an enclave after all its resources,
memory regions and CPUs, have been set.

The enclave start information includes the local channel addressing -
vsock CID - and the flags associated with the enclave.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v6 -> v7

* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Check for invalid enclave start flags.
* Update documentation to kernel-doc format.

v4 -> v5

* Add early exit on enclave start ioctl function call error.
* Move sanity checks in the enclave start ioctl function, outside of the
  switch-case block.
* Remove log on copy_from_user() / copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update the naming for the ioctl command from metadata to info.
* Check for minimum enclave memory size.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.

v1 -> v2

* Add log pattern for NE.
* Check if enclave state is init when starting an enclave.
* Remove the BUG_ON calls.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 109 ++
 1 file changed, 109 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 3d8a771bde1d..be81ff5634af 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -957,6 +957,77 @@ static int ne_set_user_memory_region_ioctl(struct 
ne_enclave *ne_enclave,
return rc;
 }
 
+/**
+ * ne_start_enclave_ioctl() - Trigger enclave start after the enclave 
resources,
+ *   such as memory and CPU, have been set.
+ * @ne_enclave :   Private data associated with the current 
enclave.
+ * @enclave_start_info :   Enclave info that includes enclave cid and 
flags.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_start_enclave_ioctl(struct ne_enclave *ne_enclave,
+   struct ne_enclave_start_info *enclave_start_info)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   unsigned int cpu = 0;
+   struct enclave_start_req enclave_start_req = {};
+   unsigned int i = 0;
+   int rc = -EINVAL;
+
+   if (!ne_enclave->nr_mem_regions) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave has no mem regions\n");
+
+   return -NE_ERR_NO_MEM_REGIONS_ADDED;
+   }
+
+   if (ne_enclave->mem_size < NE_MIN_ENCLAVE_MEM_SIZE) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave memory is less than %ld\n",
+   NE_MIN_ENCLAVE_MEM_SIZE);
+
+   return -NE_ERR_ENCLAVE_MEM_MIN_SIZE;
+   }
+
+   if (!ne_enclave->nr_vcpus) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave has no vCPUs\n");
+
+   return -NE_ERR_NO_VCPUS_ADDED;
+   }
+
+   for (i = 0; i < ne_enclave->nr_parent_vm_cores; i++)
+   for_each_cpu(cpu, ne_enclave->threads_per_core[i])
+   if (!cpumask_test_cpu(cpu, ne_enclave->vcpu_ids)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Full CPU cores not 
used\n");
+
+   return -NE_ERR_FULL_CORES_NOT_USED;
+   }
+
+   enclave_start_req.enclave_cid = enclave_start_info->enclave_cid;
+   enclave_start_req.flags = enclave_start_info->flags;
+   enclave_start_req.slot_uid = ne_enclave->slot_uid;
+
+   rc = ne_do_request(ne_enclave->pdev, ENCLAVE_START, _start_req,
+  sizeof(enclave_start_req), _reply, 
sizeof(cmd_reply));
+   if (rc < 0) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Error in enclave start [rc=%d]\n", rc);
+
+   return rc;
+   }
+
+   ne_enclave->state = NE_STATE_RUNNING;
+
+   enclave_start_info->enclave_cid = cmd_reply.enclave_cid;
+
+   return 0;
+}
+
 /**
  * ne_enclave_ioctl() - Ioctl function provided by the enclave file.
  * @file:  File associated with this ioctl function.
@@ -1105,6 +1176,44 @@ static long ne_enclave_ioctl(struct file *file, unsigned 
int cmd, unsigned long
return 0;
}
 
+   case NE_START_ENCLAVE: {
+   struct ne_enclave_start_info enclave_start_

[PATCH v7 10/18] nitro_enclaves: Add logic for getting the enclave image load info

2020-08-17 Thread Andra Paraschiv
Before setting the memory regions for the enclave, the enclave image
needs to be placed in memory. After the memory regions are set, this
memory cannot be used anymore by the VM, being carved out.

Add ioctl command logic to get the offset in enclave memory where to
place the enclave image. Then the user space tooling copies the enclave
image in the memory using the given memory offset.

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v6 -> v7

* No changes.

v5 -> v6

* Check for invalid enclave image load flags.

v4 -> v5

* Check for the enclave not being started when invoking this ioctl call.
* Remove log on copy_from_user() / copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Set enclave image load offset based on flags.
* Update the naming for the ioctl command from metadata to info.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 30 +++
 1 file changed, 30 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 104c9646ec87..810c4bba424f 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -788,6 +788,36 @@ static long ne_enclave_ioctl(struct file *file, unsigned 
int cmd, unsigned long
return 0;
}
 
+   case NE_GET_IMAGE_LOAD_INFO: {
+   struct ne_image_load_info image_load_info = {};
+
+   if (copy_from_user(_load_info, (void __user *)arg, 
sizeof(image_load_info)))
+   return -EFAULT;
+
+   mutex_lock(_enclave->enclave_info_mutex);
+
+   if (ne_enclave->state != NE_STATE_INIT) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave is not in init state\n");
+
+   mutex_unlock(_enclave->enclave_info_mutex);
+
+   return -NE_ERR_NOT_IN_INIT_STATE;
+   }
+
+   mutex_unlock(_enclave->enclave_info_mutex);
+
+   if (image_load_info.flags == NE_EIF_IMAGE)
+   image_load_info.memory_offset = NE_EIF_LOAD_OFFSET;
+   else
+   return -EINVAL;
+
+   if (copy_to_user((void __user *)arg, _load_info, 
sizeof(image_load_info)))
+   return -EFAULT;
+
+   return 0;
+   }
+
default:
return -ENOTTY;
}
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v7 04/18] nitro_enclaves: Init PCI device driver

2020-08-17 Thread Andra Paraschiv
The Nitro Enclaves PCI device is used by the kernel driver as a means of
communication with the hypervisor on the host where the primary VM and
the enclaves run. It handles requests with regard to enclave lifetime.

Setup the PCI device driver and add support for MSI-X interrupts.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Alexandru Ciobotaru 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v6 -> v7

* No changes.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update NE PCI driver name to "nitro_enclaves".

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.
* Remove the WARN_ON calls.
* Remove linux/bug include that is not needed.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update PCI device setup functions to receive PCI device data structure and
  then get private data from it inside the functions logic.
* Remove the BUG_ON calls.
* Add teardown function for MSI-X setup.
* Update goto labels to match their purpose.
* Implement TODO for NE PCI device disable state check.
* Update function name for NE PCI device probe / remove.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 269 +++
 1 file changed, 269 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
new file mode 100644
index ..31650dcd592e
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -0,0 +1,269 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * DOC: Nitro Enclaves (NE) PCI device driver.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+/**
+ * NE_DEFAULT_TIMEOUT_MSECS - Default timeout to wait for a reply from
+ *   the NE PCI device.
+ */
+#define NE_DEFAULT_TIMEOUT_MSECS   (12) /* 120 sec */
+
+static const struct pci_device_id ne_pci_ids[] = {
+   { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_NE) },
+   { 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, ne_pci_ids);
+
+/**
+ * ne_setup_msix() - Setup MSI-X vectors for the PCI device.
+ * @pdev:  PCI device to setup the MSI-X for.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_setup_msix(struct pci_dev *pdev)
+{
+   int nr_vecs = 0;
+   int rc = -EINVAL;
+
+   nr_vecs = pci_msix_vec_count(pdev);
+   if (nr_vecs < 0) {
+   rc = nr_vecs;
+
+   dev_err(>dev, "Error in getting vec count [rc=%d]\n", rc);
+
+   return rc;
+   }
+
+   rc = pci_alloc_irq_vectors(pdev, nr_vecs, nr_vecs, PCI_IRQ_MSIX);
+   if (rc < 0) {
+   dev_err(>dev, "Error in alloc MSI-X vecs [rc=%d]\n", rc);
+
+   return rc;
+   }
+
+   return 0;
+}
+
+/**
+ * ne_teardown_msix() - Teardown MSI-X vectors for the PCI device.
+ * @pdev:  PCI device to teardown the MSI-X for.
+ *
+ * Context: Process context.
+ */
+static void ne_teardown_msix(struct pci_dev *pdev)
+{
+   pci_free_irq_vectors(pdev);
+}
+
+/**
+ * ne_pci_dev_enable() - Select the PCI device version and enable it.
+ * @pdev:  PCI device to select version for and then enable.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_pci_dev_enable(struct pci_dev *pdev)
+{
+   u8 dev_enable_reply = 0;
+   u16 dev_version_reply = 0;
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+   iowrite16(NE_VERSION_MAX, ne_pci_dev->iomem_base + NE_VERSION);
+
+   dev_version_reply = ioread16(ne_pci_dev->iomem_base + NE_VERSION);
+   if (dev_version_reply != NE_VERSION_MAX) {
+   dev_err(>dev, "Error in pci dev version cmd\n");
+
+   return -EIO;
+   }
+
+   iowrite8(NE_ENABLE_ON, ne_pci_dev->iomem_base + NE_ENABLE);
+
+   dev_enable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
+   if (dev_enable_reply != NE_ENABLE_ON) {
+   dev_err(>dev, "Error in pci dev enable cmd\n");
+
+   return -EIO;
+   }
+
+   return 0;
+}
+
+/**
+ * ne_pci_dev_disable() - Disable the PCI device.
+ * @pdev:  PCI device to disable.
+ *
+ * Context: Process context.
+ */
+static void ne_pci_dev

[PATCH v7 09/18] nitro_enclaves: Add logic for setting an enclave vCPU

2020-08-17 Thread Andra Paraschiv
An enclave, before being started, has its resources set. One of its
resources is CPU.

A NE CPU pool is set and enclave CPUs are chosen from it. Offline the
CPUs from the NE CPU pool during the pool setup and online them back
during the NE CPU pool teardown. The CPU offline is necessary so that
there would not be more vCPUs than physical CPUs available to the
primary / parent VM. In that case the CPUs would be overcommitted and
would change the initial configuration of the primary / parent VM of
having dedicated vCPUs to physical CPUs.

The enclave CPUs need to be full cores and from the same NUMA node. CPU
0 and its siblings have to remain available to the primary / parent VM.

Add ioctl command logic for setting an enclave vCPU.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v6 -> v7

* Check for error return value when setting the kernel parameter string.
* Use the NE misc device parent field to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.
* Calculate the number of threads per core and not use smp_num_siblings
  that is x86 specific.

v5 -> v6

* Check CPUs are from the same NUMA node before going through CPU
  siblings during the NE CPU pool setup.
* Update documentation to kernel-doc format.

v4 -> v5

* Set empty string in case of invalid NE CPU pool.
* Clear NE CPU pool mask on pool setup failure.
* Setup NE CPU cores out of the NE CPU pool.
* Early exit on NE CPU pool setup if enclave(s) already running.
* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.
* Add check for maximum vCPU id possible before looking into the CPU
  pool.
* Remove log on copy_from_user() / copy_to_user() failure and on admin
  capability check for setting the NE CPU pool.
* Update the ioctl call to not create a file descriptor for the vCPU.
* Split the CPU pool usage logic in 2 separate functions - one to get a
  CPU from the pool and the other to check the given CPU is available in
  the pool.

v3 -> v4

* Setup the NE CPU pool at runtime via a sysfs file for the kernel
  parameter.
* Check enclave CPUs to be from the same NUMA node.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open, ioctl and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave state is init when setting enclave vCPU.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 702 ++
 1 file changed, 702 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index a824a50341dd..104c9646ec87 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -57,8 +57,11 @@
  * TODO: Update logic to create new sysfs entries instead of using
  * a kernel parameter e.g. if multiple sysfs files needed.
  */
+static int ne_set_kernel_param(const char *val, const struct kernel_param *kp);
+
 static const struct kernel_param_ops ne_cpu_pool_ops = {
.get= param_get_string,
+   .set= ne_set_kernel_param,
 };
 
 static char ne_cpus[NE_CPUS_SIZE];
@@ -96,6 +99,702 @@ struct ne_cpu_pool {
 
 static struct ne_cpu_pool ne_cpu_pool;
 
+/**
+ * ne_check_enclaves_created() - Verify if at least one enclave has been 
created.
+ * @void:  No parameters provided.
+ *
+ * Context: Process context.
+ * Return:
+ * * True if at least one enclave is created.
+ * * False otherwise.
+ */
+static bool ne_check_enclaves_created(void)
+{
+   struct ne_pci_dev *ne_pci_dev = NULL;
+   struct pci_dev *pdev = NULL;
+   bool ret = false;
+
+   if (!ne_misc_dev.parent)
+   return ret;
+
+   pdev = to_pci_dev(ne_misc_dev.parent);
+   if (!pdev)
+   return ret;
+
+   ne_pci_dev = pci_get_drvdata(pdev);
+   if (!ne_pci_dev)
+   return ret;
+
+   mutex_lock(_pci_dev->enclaves_list_mutex);
+
+   if (!list_empty(_pci_dev->enclaves_list))
+   ret = true;
+
+   mutex_unlock(_pci_dev->enclaves_list_mutex);
+
+   return ret;
+}
+
+/**
+ * ne_setup_cpu_pool() - Set the NE CPU pool after handling sanity checks such
+ *  as not sharing CPU cores with the primary / parent VM
+ *  or not using CPU 0, which should remain available for
+ *  the primary / parent VM. Offline the CPUs from the
+ *  pool after the checks passed.
+ * @ne_cpu_list:   The CPU list used for setting NE CPU pool.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negat

[PATCH v7 07/18] nitro_enclaves: Init misc device providing the ioctl interface

2020-08-17 Thread Andra Paraschiv
The Nitro Enclaves driver provides an ioctl interface to the user space
for enclave lifetime management e.g. enclave creation / termination and
setting enclave resources such as memory and CPU.

This ioctl interface is mapped to a Nitro Enclaves misc device.

Signed-off-by: Andra Paraschiv 
---
Changelog

v6 -> v7

* Set the NE PCI device the parent of the NE misc device to be able to
  use it in the ioctl logic.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Remove the ioctl to query API version.
* Update documentation to kernel-doc format.

v4 -> v5

* Update the size of the NE CPU pool string from 4096 to 512 chars.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Remove the NE CPU pool init during kernel module loading, as the CPU
  pool is now setup at runtime, via a sysfs file for the kernel
  parameter.
* Add minimum enclave memory size definition.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.
* Remove the WARN_ON calls.
* Remove linux/bug and linux/kvm_host includes that are not needed.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.
* Remove file ops that do nothing for now - open and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Update ne_cpu_pool data structure to include the global mutex.
* Update NE misc device mode to 0660.
* Check if the CPU siblings are included in the NE CPU pool, as full CPU
  cores are given for the enclave(s).
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 128 ++
 drivers/virt/nitro_enclaves/ne_pci_dev.c  |  17 +++
 2 files changed, 145 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
new file mode 100644
index ..0776a4b36c61
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * DOC: Enclave lifetime management driver for Nitro Enclaves (NE).
+ * Nitro is a hypervisor that has been developed by Amazon.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+/**
+ * NE_CPUS_SIZE - Size for max 128 CPUs, for now, in a cpu-list string, comma
+ *   separated. The NE CPU pool includes CPUs from a single NUMA
+ *   node.
+ */
+#define NE_CPUS_SIZE   (512)
+
+/**
+ * NE_EIF_LOAD_OFFSET - The offset where to copy the Enclave Image Format (EIF)
+ * image in enclave memory.
+ */
+#define NE_EIF_LOAD_OFFSET (8 * 1024UL * 1024UL)
+
+/**
+ * NE_MIN_ENCLAVE_MEM_SIZE - The minimum memory size an enclave can be launched
+ *  with.
+ */
+#define NE_MIN_ENCLAVE_MEM_SIZE(64 * 1024UL * 1024UL)
+
+/**
+ * NE_MIN_MEM_REGION_SIZE - The minimum size of an enclave memory region.
+ */
+#define NE_MIN_MEM_REGION_SIZE (2 * 1024UL * 1024UL)
+
+/*
+ * TODO: Update logic to create new sysfs entries instead of using
+ * a kernel parameter e.g. if multiple sysfs files needed.
+ */
+static const struct kernel_param_ops ne_cpu_pool_ops = {
+   .get= param_get_string,
+};
+
+static char ne_cpus[NE_CPUS_SIZE];
+static struct kparam_string ne_cpus_arg = {
+   .maxlen = sizeof(ne_cpus),
+   .string = ne_cpus,
+};
+
+module_param_cb(ne_cpus, _cpu_pool_ops, _cpus_arg, 0644);
+/* 
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html#cpu-lists
 */
+MODULE_PARM_DESC(ne_cpus, " - CPU pool used for Nitro Enclaves");
+
+/**
+ * struct ne_cpu_pool - CPU pool used for Nitro Enclaves.
+ * @avail_threads_per_core:Available full CPU cores to be dedicated to
+ * enclave(s). The cpumasks from the array, indexed
+ * by core id, contain all the threads from the
+ * available cores, that are not set for created
+ * enclave(s). The full CPU cores are part of the
+ * NE CPU pool.
+ * @mutex: Mutex for the access to the NE CPU pool.
+ * @nr_parent_vm_cores :   The size of the available threads per core 
array.
+ * The total number of CPU cores available on the
+ * parent / primary VM.
+ * @nr_threads_per_core:   The number of threads that a full CPU core has.
+ * @numa_node: NUMA node of the CPUs in the pool.
+ */
+struct ne_cpu_pool {
+   cpumask_var_

[PATCH v7 05/18] nitro_enclaves: Handle PCI device command requests

2020-08-17 Thread Andra Paraschiv
The Nitro Enclaves PCI device exposes a MMIO space that this driver
uses to submit command requests and to receive command replies e.g. for
enclave creation / termination or setting enclave resources.

Add logic for handling PCI device command requests based on the given
command type.

Register an MSI-X interrupt vector for command reply notifications to
handle this type of communication events.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v6 -> v7

* No changes.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.

v1 -> v2

* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add fix for kbuild report:
  https://lore.kernel.org/lkml/202004231644.xtmn4z1z%25...@intel.com/
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 204 +++
 1 file changed, 204 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index 31650dcd592e..77ccbc43bce3 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -33,6 +33,187 @@ static const struct pci_device_id ne_pci_ids[] = {
 
 MODULE_DEVICE_TABLE(pci, ne_pci_ids);
 
+/**
+ * ne_submit_request() - Submit command request to the PCI device based on the
+ *  command type.
+ * @pdev:  PCI device to send the command to.
+ * @cmd_type:  Command type of the request sent to the PCI device.
+ * @cmd_request:   Command request payload.
+ * @cmd_request_size:  Size of the command request payload.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_submit_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type 
cmd_type,
+void *cmd_request, size_t cmd_request_size)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+   memcpy_toio(ne_pci_dev->iomem_base + NE_SEND_DATA, cmd_request, 
cmd_request_size);
+
+   iowrite32(cmd_type, ne_pci_dev->iomem_base + NE_COMMAND);
+
+   return 0;
+}
+
+/**
+ * ne_retrieve_reply() - Retrieve reply from the PCI device.
+ * @pdev:  PCI device to receive the reply from.
+ * @cmd_reply: Command reply payload.
+ * @cmd_reply_size:Size of the command reply payload.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_retrieve_reply(struct pci_dev *pdev, struct ne_pci_dev_cmd_reply 
*cmd_reply,
+size_t cmd_reply_size)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+   memcpy_fromio(cmd_reply, ne_pci_dev->iomem_base + NE_RECV_DATA, 
cmd_reply_size);
+
+   return 0;
+}
+
+/**
+ * ne_wait_for_reply() - Wait for a reply of a PCI device command.
+ * @pdev:  PCI device for which a reply is waited.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_wait_for_reply(struct pci_dev *pdev)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+   int rc = -EINVAL;
+
+   /*
+* TODO: Update to _interruptible and handle interrupted wait event
+* e.g. -ERESTARTSYS, incoming signals + update timeout, if needed.
+*/
+   rc = wait_event_timeout(ne_pci_dev->cmd_reply_wait_q,
+   atomic_read(_pci_dev->cmd_reply_avail) != 0,
+   msecs_to_jiffies(NE_DEFAULT_TIMEOUT_MSECS));
+   if (!rc)
+   return -ETIMEDOUT;
+
+   return 0;
+}
+
+int ne_do_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type cmd_type,
+ void *cmd_request, size_t cmd_request_size,
+ struct ne_pci_dev_cmd_reply *cmd_reply, size_t cmd_reply_size)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+   int rc = -EINVAL;
+
+   if (cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD) {
+   dev_err_ratelimited(>dev, "Invalid cmd type=%u\n", 
cmd_type);
+
+   return -EINVAL;
+   }
+
+   if (!cmd_request) {
+   dev_err_ratelimited(>dev, "Null cmd request\n");
+
+   return -EINVAL;
+   }
+
+   if (cmd_request_size > NE_SEN

[PATCH v7 06/18] nitro_enclaves: Handle out-of-band PCI device events

2020-08-17 Thread Andra Paraschiv
In addition to the replies sent by the Nitro Enclaves PCI device in
response to command requests, out-of-band enclave events can happen e.g.
an enclave crashes. In this case, the Nitro Enclaves driver needs to be
aware of the event and notify the corresponding user space process that
abstracts the enclave.

Register an MSI-X interrupt vector to be used for this kind of
out-of-band events. The interrupt notifies that the state of an enclave
changed and the driver logic scans the state of each running enclave to
identify for which this notification is intended.

Create an workqueue to handle the out-of-band events. Notify user space
enclave process that is using a polling mechanism on the enclave fd.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v6 -> v7

* No changes.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 116 +++
 1 file changed, 116 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index 77ccbc43bce3..a898fae066d9 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -214,6 +214,88 @@ static irqreturn_t ne_reply_handler(int irq, void *args)
return IRQ_HANDLED;
 }
 
+/**
+ * ne_event_work_handler() - Work queue handler for notifying enclaves on a
+ *  state change received by the event interrupt
+ *  handler.
+ * @work:  Item containing the NE PCI device for which an out-of-band event
+ * was issued.
+ *
+ * An out-of-band event is being issued by the Nitro Hypervisor when at least
+ * one enclave is changing state without client interaction.
+ *
+ * Context: Work queue context.
+ */
+static void ne_event_work_handler(struct work_struct *work)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   struct ne_enclave *ne_enclave = NULL;
+   struct ne_pci_dev *ne_pci_dev =
+   container_of(work, struct ne_pci_dev, notify_work);
+   int rc = -EINVAL;
+   struct slot_info_req slot_info_req = {};
+
+   mutex_lock(_pci_dev->enclaves_list_mutex);
+
+   /*
+* Iterate over all enclaves registered for the Nitro Enclaves
+* PCI device and determine for which enclave(s) the out-of-band event
+* is corresponding to.
+*/
+   list_for_each_entry(ne_enclave, _pci_dev->enclaves_list, 
enclave_list_entry) {
+   mutex_lock(_enclave->enclave_info_mutex);
+
+   /*
+* Enclaves that were never started cannot receive out-of-band
+* events.
+*/
+   if (ne_enclave->state != NE_STATE_RUNNING)
+   goto unlock;
+
+   slot_info_req.slot_uid = ne_enclave->slot_uid;
+
+   rc = ne_do_request(ne_enclave->pdev, SLOT_INFO, _info_req,
+  sizeof(slot_info_req), _reply, 
sizeof(cmd_reply));
+   if (rc < 0)
+   dev_err(_enclave->pdev->dev, "Error in slot info 
[rc=%d]\n", rc);
+
+   /* Notify enclave process that the enclave state changed. */
+   if (ne_enclave->state != cmd_reply.state) {
+   ne_enclave->state = cmd_reply.state;
+
+   ne_enclave->has_event = true;
+
+   wake_up_interruptible(_enclave->eventq);
+   }
+
+unlock:
+mutex_unlock(_enclave->enclave_info_mutex);
+   }
+
+   mutex_unlock(_pci_dev->enclaves_list_mutex);
+}
+
+/**
+ * ne_event_handler() - Interrupt handler for PCI device out-of-band events.
+ * This interrupt does not supply any data in the MMIO
+ * region. It notifies a change in the state of any of
+ * the launched enclaves.
+ * @irq:   Received interrupt for an out-of-band event.
+ * @args:  PCI device private data structure.
+ *
+ * Context: Interrupt context.
+ * Return:
+ * * IRQ_HANDLED on handled interrupt.
+ */
+static irqreturn_t ne_event_handler(int irq, void *args)
+{
+   struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
+
+   queue_work(ne_pci_dev->event_wq, _pci_dev->notify_work);
+
+   return IRQ_HANDLED;
+}
+
 /**
  * ne_setup_msix() - Setup MSI-X vectors for the PCI device.
  * @pdev:

[PATCH v7 03/18] nitro_enclaves: Define enclave info for internal bookkeeping

2020-08-17 Thread Andra Paraschiv
The Nitro Enclaves driver keeps an internal info per each enclave.

This is needed to be able to manage enclave resources state, enclave
notifications and have a reference of the PCI device that handles
command requests for enclave lifetime management.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v6 -> v7

* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Update documentation to kernel-doc format.
* Include in the enclave memory region data structure the user space
  address and size for duplicate user space memory regions checks.

v4 -> v5

* Include enclave cores field in the enclave metadata.
* Update the vCPU ids data structure to be a cpumask instead of a list.

v3 -> v4

* Add NUMA node field for an enclave metadata as the enclave memory and
  CPUs need to be from the same NUMA node.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Add enclave memory regions and vcpus count for enclave bookkeeping.
* Update ne_state comments to reflect NE_START_ENCLAVE ioctl naming
  update.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.h | 99 +++
 1 file changed, 99 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.h

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.h 
b/drivers/virt/nitro_enclaves/ne_misc_dev.h
new file mode 100644
index ..a907924de7ca
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _NE_MISC_DEV_H_
+#define _NE_MISC_DEV_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * struct ne_mem_region - Entry in the enclave user space memory regions list.
+ * @mem_region_list_entry: Entry in the list of enclave memory regions.
+ * @memory_size:   Size of the user space memory region.
+ * @nr_pages:  Number of pages that make up the memory region.
+ * @pages: Pages that make up the user space memory region.
+ * @userspace_addr:User space address of the memory region.
+ */
+struct ne_mem_region {
+   struct list_headmem_region_list_entry;
+   u64 memory_size;
+   unsigned long   nr_pages;
+   struct page **pages;
+   u64 userspace_addr;
+};
+
+/**
+ * struct ne_enclave - Per-enclave data used for enclave lifetime management.
+ * @enclave_info_mutex :   Mutex for accessing this internal state.
+ * @enclave_list_entry :   Entry in the list of created enclaves.
+ * @eventq:Wait queue used for out-of-band event 
notifications
+ * triggered from the PCI device event handler to
+ * the enclave process via the poll function.
+ * @has_event: Variable used to determine if the out-of-band 
event
+ * was triggered.
+ * @max_mem_regions:   The maximum number of memory regions that can be
+ * handled by the hypervisor.
+ * @mem_regions_list:  Enclave user space memory regions list.
+ * @mem_size:  Enclave memory size.
+ * @mm :   Enclave process abstraction mm data struct.
+ * @nr_mem_regions:Number of memory regions associated with the 
enclave.
+ * @nr_parent_vm_cores :   The size of the threads per core array. The
+ * total number of CPU cores available on the
+ * parent / primary VM.
+ * @nr_threads_per_core:   The number of threads that a full CPU core has.
+ * @nr_vcpus:  Number of vcpus associated with the enclave.
+ * @numa_node: NUMA node of the enclave memory and CPUs.
+ * @pdev:  PCI device used for enclave lifetime management.
+ * @slot_uid:  Slot unique id mapped to the enclave.
+ * @state: Enclave state, updated during enclave lifetime.
+ * @threads_per_core:  Enclave full CPU cores array, indexed by core 
id,
+ * consisting of cpumasks with all their threads.
+ * Full CPU cores are taken from the NE CPU pool
+ * and are available to the enclave.
+ * @vcpu_ids:  Cpumask of the vCPUs that are set for the 
enclave.
+ */
+struct ne_enclave {
+   struct mutexenclave_info_mutex;
+   struct list_headenclave_list_entry;
+   wait_queue_head_t   eventq;
+   boolhas_event;
+   u64 max_mem_regions;
+ 

[PATCH v7 08/18] nitro_enclaves: Add logic for creating an enclave VM

2020-08-17 Thread Andra Paraschiv
Add ioctl command logic for enclave VM creation. It triggers a slot
allocation. The enclave resources will be associated with this slot and
it will be used as an identifier for triggering enclave run.

Return a file descriptor, namely enclave fd. This is further used by the
associated user space enclave process to set enclave resources and
trigger enclave termination.

The poll function is implemented in order to notify the enclave process
when an enclave exits without a specific enclave termination command
trigger e.g. when an enclave crashes.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v6 -> v7

* Use the NE misc device parent field to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Update the code base to init the ioctl function in this patch.
* Update documentation to kernel-doc format.

v4 -> v5

* Release the reference to the NE PCI device on create VM error.
* Close enclave fd on copy_to_user() failure; rename fd to enclave fd
  while at it.
* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.
* Remove log on copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.
* Add metadata for the NUMA node for the enclave memory and CPUs.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 226 ++
 1 file changed, 226 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 0776a4b36c61..a824a50341dd 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -96,9 +96,235 @@ struct ne_cpu_pool {
 
 static struct ne_cpu_pool ne_cpu_pool;
 
+/**
+ * ne_enclave_poll() - Poll functionality used for enclave out-of-band events.
+ * @file:  File associated with this poll function.
+ * @wait:  Poll table data structure.
+ *
+ * Context: Process context.
+ * Return:
+ * * Poll mask.
+ */
+static __poll_t ne_enclave_poll(struct file *file, poll_table *wait)
+{
+   __poll_t mask = 0;
+   struct ne_enclave *ne_enclave = file->private_data;
+
+   poll_wait(file, _enclave->eventq, wait);
+
+   if (!ne_enclave->has_event)
+   return mask;
+
+   mask = POLLHUP;
+
+   return mask;
+}
+
+static const struct file_operations ne_enclave_fops = {
+   .owner  = THIS_MODULE,
+   .llseek = noop_llseek,
+   .poll   = ne_enclave_poll,
+};
+
+/**
+ * ne_create_vm_ioctl() - Alloc slot to be associated with an enclave. Create
+ *   enclave file descriptor to be further used for enclave
+ *   resources handling e.g. memory regions and CPUs.
+ * @pdev:  PCI device used for enclave lifetime management.
+ * @ne_pci_dev :   Private data associated with the PCI device.
+ * @slot_uid:  Generated unique slot id associated with an enclave.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev 
enclave
+ * mutex held.
+ * Return:
+ * * Enclave fd on success.
+ * * Negative return value on failure.
+ */
+static int ne_create_vm_ioctl(struct pci_dev *pdev, struct ne_pci_dev 
*ne_pci_dev,
+ u64 *slot_uid)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   int enclave_fd = -1;
+   struct file *enclave_file = NULL;
+   unsigned int i = 0;
+   struct ne_enclave *ne_enclave = NULL;
+   int rc = -EINVAL;
+   struct slot_alloc_req slot_alloc_req = {};
+
+   mutex_lock(_cpu_pool.mutex);
+
+   for (i = 0; i < ne_cpu_pool.nr_parent_vm_cores; i++)
+   if (!cpumask_empty(ne_cpu_pool.avail_threads_per_core[i]))
+   break;
+
+   if (i == ne_cpu_pool.nr_parent_vm_cores) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "No CPUs available in CPU pool\n");
+
+   mutex_unlock(_cpu_pool.mutex);
+
+   return -NE_ERR_NO_CPUS_AVAIL_IN_POOL;
+   }
+
+   mutex_unlock(_cpu_pool.mutex);
+
+   ne_enclave = kzalloc(sizeof(*ne_enclave), GFP_KERNEL);
+   if (!ne_enclave)
+   return -ENOMEM;
+
+   mutex_lock(_cpu_pool.mutex);
+
+   ne_enclave->nr_parent_vm_cores = ne_cpu_pool.nr_parent_vm_cores;
+   ne_enclave->nr_threads_per_core = ne_cpu_pool.nr_threads_per_core;
+   ne_enclave->numa_node = ne_cpu_pool.

[PATCH v7 01/18] nitro_enclaves: Add ioctl interface definition

2020-08-17 Thread Andra Paraschiv
The Nitro Enclaves driver handles the enclave lifetime management. This
includes enclave creation, termination and setting up its resources such
as memory and CPU.

An enclave runs alongside the VM that spawned it. It is abstracted as a
process running in the VM that launched it. The process interacts with
the NE driver, that exposes an ioctl interface for creating an enclave
and setting up its resources.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
Reviewed-by: Stefan Hajnoczi 
---
Changelog

v6 -> v7

* Clarify in the ioctls documentation that the return value is -1 and
  errno is set on failure.
* Update the error code value for NE_ERR_INVALID_MEM_REGION_SIZE as it
  gets in user space as value 25 (ENOTTY) instead of 515. Update the
  NE custom error codes values range to not be the same as the ones
  defined in include/linux/errno.h, although these are not propagated
  to user space.

v5 -> v6

* Fix typo in the description about the NE CPU pool.
* Update documentation to kernel-doc format.
* Remove the ioctl to query API version.

v4 -> v5

* Add more details about the ioctl calls usage e.g. error codes, file
  descriptors used.
* Update the ioctl to set an enclave vCPU to not return a file
  descriptor.
* Add specific NE error codes.

v3 -> v4

* Decouple NE ioctl interface from KVM API.
* Add NE API version and the corresponding ioctl call.
* Add enclave / image load flags options.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Add ioctl for getting enclave image load metadata.
* Update NE_ENCLAVE_START ioctl name to NE_START_ENCLAVE.
* Add entry in Documentation/userspace-api/ioctl/ioctl-number.rst for NE
  ioctls.
* Update NE ioctls definition based on the updated ioctl range for major
  and minor.
---
 .../userspace-api/ioctl/ioctl-number.rst  |   5 +-
 include/linux/nitro_enclaves.h|  11 +
 include/uapi/linux/nitro_enclaves.h   | 337 ++
 3 files changed, 352 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/nitro_enclaves.h
 create mode 100644 include/uapi/linux/nitro_enclaves.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 2a198838fca9..5f7ff00f394e 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -328,8 +328,11 @@ Code  Seq#Include File 
  Comments
 0xAC  00-1F  linux/raw.h
 0xAD  00 Netfilter 
device in development:
  
<mailto:ru...@rustcorp.com.au>
-0xAE  alllinux/kvm.h 
Kernel-based Virtual Machine
+0xAE  00-1F  linux/kvm.h 
Kernel-based Virtual Machine
  
<mailto:k...@vger.kernel.org>
+0xAE  40-FF  linux/kvm.h 
Kernel-based Virtual Machine
+ 
<mailto:k...@vger.kernel.org>
+0xAE  20-3F  linux/nitro_enclaves.h  Nitro 
Enclaves
 0xAF  00-1F  linux/fsl_hypervisor.h  Freescale 
hypervisor
 0xB0  allRATIO 
devices in development:
  
<mailto:v...@ratio.de>
diff --git a/include/linux/nitro_enclaves.h b/include/linux/nitro_enclaves.h
new file mode 100644
index ..d91ef2bfdf47
--- /dev/null
+++ b/include/linux/nitro_enclaves.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _LINUX_NITRO_ENCLAVES_H_
+#define _LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+#endif /* _LINUX_NITRO_ENCLAVES_H_ */
diff --git a/include/uapi/linux/nitro_enclaves.h 
b/include/uapi/linux/nitro_enclaves.h
new file mode 100644
index ..1f81aa9f94bb
--- /dev/null
+++ b/include/uapi/linux/nitro_enclaves.h
@@ -0,0 +1,337 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
+#define _UAPI_LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+/**
+ * DOC: Nitro Enclaves (NE) Kernel Driver Interface
+ */
+
+/**
+ * NE_CREATE_VM - The command is used to create a slot that is associated with
+ *   an enclave VM.
+ *   The generated unique slot id is an output parameter.
+ *   The ioctl can be invoked on the /dev/nitro_enclaves fd, before
+ *   setti

[PATCH v7 02/18] nitro_enclaves: Define the PCI device interface

2020-08-17 Thread Andra Paraschiv
The Nitro Enclaves (NE) driver communicates with a new PCI device, that
is exposed to a virtual machine (VM) and handles commands meant for
handling enclaves lifetime e.g. creation, termination, setting memory
regions. The communication with the PCI device is handled using a MMIO
space and MSI-X interrupts.

This device communicates with the hypervisor on the host, where the VM
that spawned the enclave itself runs, e.g. to launch a VM that is used
for the enclave.

Define the MMIO space of the NE PCI device, the commands that are
provided by this device. Add an internal data structure used as private
data for the PCI device driver and the function for the PCI device
command requests handling.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Alexandru Ciobotaru 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v6 -> v7

* Update the documentation to include references to the NE PCI device id
  and MMIO bar.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Add a TODO for including flags in the request to the NE PCI device to
  set a memory region for an enclave. It is not used for now.

v3 -> v4

* Remove the "packed" attribute and include padding in the NE data
  structures.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path naming to drivers/virt/nitro_enclaves.
* Update NE_ENABLE_OFF / NE_ENABLE_ON defines.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.h | 327 +++
 1 file changed, 327 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.h

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.h 
b/drivers/virt/nitro_enclaves/ne_pci_dev.h
new file mode 100644
index ..336fa344d630
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.h
@@ -0,0 +1,327 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _NE_PCI_DEV_H_
+#define _NE_PCI_DEV_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * DOC: Nitro Enclaves (NE) PCI device
+ */
+
+/**
+ * PCI_DEVICE_ID_NE - Nitro Enclaves PCI device id.
+ */
+#define PCI_DEVICE_ID_NE   (0xe4c1)
+/**
+ * PCI_BAR_NE - Nitro Enclaves PCI device MMIO BAR.
+ */
+#define PCI_BAR_NE (0x03)
+
+/**
+ * DOC: Device registers in the NE PCI device MMIO BAR
+ */
+
+/**
+ * NE_ENABLE - (1 byte) Register to notify the device that the driver is using
+ *it (Read/Write).
+ */
+#define NE_ENABLE  (0x)
+#define NE_ENABLE_OFF  (0x00)
+#define NE_ENABLE_ON   (0x01)
+
+/**
+ * NE_VERSION - (2 bytes) Register to select the device run-time version
+ * (Read/Write).
+ */
+#define NE_VERSION (0x0002)
+#define NE_VERSION_MAX (0x0001)
+
+/**
+ * NE_COMMAND - (4 bytes) Register to notify the device what command was
+ * requested (Write-Only).
+ */
+#define NE_COMMAND (0x0004)
+
+/**
+ * NE_EVTCNT - (4 bytes) Register to notify the driver that a reply or a device
+ *event is available (Read-Only):
+ *- Lower half  - command reply counter
+ *- Higher half - out-of-band device event counter
+ */
+#define NE_EVTCNT  (0x000c)
+#define NE_EVTCNT_REPLY_SHIFT  (0)
+#define NE_EVTCNT_REPLY_MASK   (0x)
+#define NE_EVTCNT_REPLY(cnt)   (((cnt) & NE_EVTCNT_REPLY_MASK) >> \
+   NE_EVTCNT_REPLY_SHIFT)
+#define NE_EVTCNT_EVENT_SHIFT  (16)
+#define NE_EVTCNT_EVENT_MASK   (0x)
+#define NE_EVTCNT_EVENT(cnt)   (((cnt) & NE_EVTCNT_EVENT_MASK) >> \
+   NE_EVTCNT_EVENT_SHIFT)
+
+/**
+ * NE_SEND_DATA - (240 bytes) Buffer for sending the command request payload
+ *   (Read/Write).
+ */
+#define NE_SEND_DATA   (0x0010)
+
+/**
+ * NE_RECV_DATA - (240 bytes) Buffer for receiving the command reply payload
+ *   (Read-Only).
+ */
+#define NE_RECV_DATA   (0x0100)
+
+/**
+ * DOC: Device MMIO buffer sizes
+ */
+
+/**
+ * NE_SEND_DATA_SIZE / NE_RECV_DATA_SIZE - 240 bytes for send / recv buffer.
+ */
+#define NE_SEND_DATA_SIZE  (240)
+#define NE_RECV_DATA_SIZE  (240)
+
+/**
+ * DOC: MSI-X interrupt vectors
+ */
+
+/**
+ * NE_VEC_REPLY - MSI-X vector used for command reply notification.
+ */
+#define NE_VEC_REPLY   (0)
+
+/**
+ * NE_VEC_EVENT - MSI-X vector used for out-of-band events e.g. enclave crash.
+ */
+#define NE_VEC_EVENT   (1)
+
+/**
+ * enum ne_pci_dev_cmd_type - Device command types.
+ * @INVALID_CMD:   Invalid command.
+ * @ENCLAVE_START: Start an enclave, after setting its resources.
+ * @ENCLAVE_GET_SLOT:  Get the slot uid of an enclave.
+ * @ENCLAVE_STOP:  Terminate an enclave.
+ * @SLOT_ALLOC :   Allocate a slot for an enclav

[PATCH v7 00/18] Add support for Nitro Enclaves

2020-08-17 Thread Andra Paraschiv
ove, for now, the dependency on ARM64 arch in Kconfig. x86 is currently
  supported, with Arm to come afterwards. The NE kernel driver can be currently
  built for aarch64 arch.
* Clarify in the ioctls documentation that the return value is -1 and errno is
  set on failure.
* Update the error code value for NE_ERR_INVALID_MEM_REGION_SIZE as it gets in
  user space as value 25 (ENOTTY) instead of 515. Update the NE custom error
  codes values range to not be the same as the ones defined in
  include/linux/errno.h, although these are not propagated to user space.
* Update the documentation to include references to the NE PCI device id and
  MMIO bar.
* Update check for duplicate user space memory regions to cover additional
  possible scenarios.
* Calculate the number of threads per core and not use smp_num_siblings that is
  x86 specific.
* v6: https://lore.kernel.org/lkml/20200805091017.86203-1-andra...@amazon.com/

v5 -> v6

* Rebase on top of v5.8.
* Update documentation to kernel-doc format.
* Update sample to include the enclave image loading logic.
* Remove the ioctl to query API version.
* Check for invalid provided flags field via ioctl calls args.
* Check for duplicate provided user space memory regions.
* Check for aligned memory regions.
* Include, in the sample, usage info for NUMA-aware hugetlb config.
* v5: https://lore.kernel.org/lkml/20200715194540.45532-1-andra...@amazon.com/

v4 -> v5

* Rebase on top of v5.8-rc5.
* Add more details about the ioctl calls usage e.g. error codes.
* Update the ioctl to set an enclave vCPU to not return a fd.
* Add specific NE error codes.
* Split the NE CPU pool in CPU cores cpumasks.
* Remove log on copy_from_user() / copy_to_user() failure.
* Release the reference to the NE PCI device on failure paths.
* Close enclave fd on copy_to_user() failure.
* Set empty string in case of invalid NE CPU pool sysfs value.
* Early exit on NE CPU pool setup if enclave(s) already running.
* Add more sanity checks for provided vCPUs e.g. maximum possible value.
* Split logic for checking if a vCPU is in pool / getting a vCPU from pool.
* Exit without unpinning the pages on NE PCI dev request failure.
* Add check for the memory region user space address alignment.
* Update the logic to set memory region to not have a hardcoded check for 2 MiB.
* Add arch dependency for Arm / x86.
* v4: https://lore.kernel.org/lkml/20200622200329.52996-1-andra...@amazon.com/

v3 -> v4

* Rebase on top of v5.8-rc2.
* Add NE API version and the corresponding ioctl call.
* Add enclave / image load flags options.
* Decouple NE ioctl interface from KVM API.
* Remove the "packed" attribute and include padding in the NE data structures.
* Update documentation based on the changes from v4.
* Update sample to match the updates in v4.
* Remove the NE CPU pool init during NE kernel module loading.
* Setup the NE CPU pool at runtime via a sysfs file for the kernel parameter.
* Check if the enclave memory and CPUs are from the same NUMA node.
* Add minimum enclave memory size definition.
* v3: https://lore.kernel.org/lkml/20200525221334.62966-1-andra...@amazon.com/ 

v2 -> v3

* Rebase on top of v5.7-rc7.
* Add changelog to each patch in the series.
* Remove "ratelimited" from the logs that are not in the ioctl call paths.
* Update static calls sanity checks.
* Remove file ops that do nothing for now.
* Remove GPL additional wording as SPDX-License-Identifier is already in place.
* v2: https://lore.kernel.org/lkml/20200522062946.28973-1-andra...@amazon.com/

v1 -> v2

* Rebase on top of v5.7-rc6.
* Adapt codebase based on feedback from v1.
* Update ioctl number definition - major and minor.
* Add sample / documentation for the ioctl interface basic flow usage.
* Update cover letter to include more context on the NE overall.
* Add fix for the enclave / vcpu fd creation error cleanup path.
* Add fix reported by kbuild test robot .
* v1: https://lore.kernel.org/lkml/20200421184150.68011-1-andra...@amazon.com/

---

Andra Paraschiv (18):
  nitro_enclaves: Add ioctl interface definition
  nitro_enclaves: Define the PCI device interface
  nitro_enclaves: Define enclave info for internal bookkeeping
  nitro_enclaves: Init PCI device driver
  nitro_enclaves: Handle PCI device command requests
  nitro_enclaves: Handle out-of-band PCI device events
  nitro_enclaves: Init misc device providing the ioctl interface
  nitro_enclaves: Add logic for creating an enclave VM
  nitro_enclaves: Add logic for setting an enclave vCPU
  nitro_enclaves: Add logic for getting the enclave image load info
  nitro_enclaves: Add logic for setting an enclave memory region
  nitro_enclaves: Add logic for starting an enclave
  nitro_enclaves: Add logic for terminating an enclave
  nitro_enclaves: Add Kconfig for the Nitro Enclaves driver
  nitro_enclaves: Add Makefile for the Nitro Enclaves driver
  nitro_enclaves: Add sample for ioctl interface usage
  nitro_enclaves: Add overview documentation

[PATCH v6 13/18] nitro_enclaves: Add logic for terminating an enclave

2020-08-05 Thread Andra Paraschiv
An enclave is associated with an fd that is returned after the enclave
creation logic is completed. This enclave fd is further used to setup
enclave resources. Once the enclave needs to be terminated, the enclave
fd is closed.

Add logic for enclave termination, that is mapped to the enclave fd
release callback. Free the internal enclave info used for bookkeeping.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v5 -> v6

* Update documentation to kernel-doc format.
* Use directly put_page() instead of unpin_user_pages(), to match the
  get_user_pages() calls.

v4 -> v5

* Release the reference to the NE PCI device on enclave fd release.
* Adapt the logic to cpumask enclave vCPU ids and CPU cores.
* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add early exit in release() if there was a slot alloc error in the fd
  creation path.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 168 ++
 1 file changed, 168 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index c7eb1146a6c0..6a1b8a0084c4 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -1083,6 +1083,173 @@ static long ne_enclave_ioctl(struct file *file, 
unsigned int cmd, unsigned long
return 0;
 }
 
+/**
+ * ne_enclave_remove_all_mem_region_entries() - Remove all memory region 
entries
+ * from the enclave data structure.
+ * @ne_enclave :   Private data associated with the current enclave.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ */
+static void ne_enclave_remove_all_mem_region_entries(struct ne_enclave 
*ne_enclave)
+{
+   unsigned long i = 0;
+   struct ne_mem_region *ne_mem_region = NULL;
+   struct ne_mem_region *ne_mem_region_tmp = NULL;
+
+   list_for_each_entry_safe(ne_mem_region, ne_mem_region_tmp,
+_enclave->mem_regions_list,
+mem_region_list_entry) {
+   list_del(_mem_region->mem_region_list_entry);
+
+   for (i = 0; i < ne_mem_region->nr_pages; i++)
+   put_page(ne_mem_region->pages[i]);
+
+   kfree(ne_mem_region->pages);
+
+   kfree(ne_mem_region);
+   }
+}
+
+/**
+ * ne_enclave_remove_all_vcpu_id_entries() - Remove all vCPU id entries from
+ *  the enclave data structure.
+ * @ne_enclave :   Private data associated with the current enclave.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ */
+static void ne_enclave_remove_all_vcpu_id_entries(struct ne_enclave 
*ne_enclave)
+{
+   unsigned int cpu = 0;
+   unsigned int i = 0;
+
+   mutex_lock(_cpu_pool.mutex);
+
+   for (i = 0; i < ne_enclave->avail_cpu_cores_size; i++) {
+   for_each_cpu(cpu, ne_enclave->avail_cpu_cores[i])
+   /* Update the available NE CPU pool. */
+   cpumask_set_cpu(cpu, ne_cpu_pool.avail_cores[i]);
+
+   free_cpumask_var(ne_enclave->avail_cpu_cores[i]);
+   }
+
+   mutex_unlock(_cpu_pool.mutex);
+
+   kfree(ne_enclave->avail_cpu_cores);
+
+   free_cpumask_var(ne_enclave->vcpu_ids);
+}
+
+/**
+ * ne_pci_dev_remove_enclave_entry() - Remove the enclave entry from the data
+ *structure that is part of the NE PCI
+ *device private data.
+ * @ne_enclave :   Private data associated with the current enclave.
+ * @ne_pci_dev :   Private data associated with the PCI device.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev 
enclave
+ * mutex held.
+ */
+static void ne_pci_dev_remove_enclave_entry(struct ne_enclave *ne_enclave,
+   struct ne_pci_dev *ne_pci_dev)
+{
+   struct ne_enclave *ne_enclave_entry = NULL;
+   struct ne_enclave *ne_enclave_entry_tmp = NULL;
+
+   list_for_each_entry_safe(ne_enclave_entry, ne_enclave_entry_tmp,
+_pci_dev->enclaves_list, 
enclave_list_entry) {
+   if (ne_enclave_entry->slot_uid == ne_enclave->slot_uid) {
+   list_del(_enclave_entry->enclave_list_entry);
+
+   break;
+   }
+   }
+}
+
+/**
+ * ne_enclave_release() - Release function provided by th

[PATCH v6 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver

2020-08-05 Thread Andra Paraschiv
Signed-off-by: Andra Paraschiv 
---
Changelog

v5 -> v6

* No changes.

v4 -> v5

* Add arch dependency for Arm / x86.

v3 -> v4

* Add PCI and SMP dependencies.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path to Kconfig to match the drivers/virt/nitro_enclaves
  directory.
* Update help in Kconfig.
---
 drivers/virt/Kconfig|  2 ++
 drivers/virt/nitro_enclaves/Kconfig | 16 
 2 files changed, 18 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/Kconfig

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index cbc1f25c79ab..80c5f9c16ec1 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -32,4 +32,6 @@ config FSL_HV_MANAGER
 partition shuts down.
 
 source "drivers/virt/vboxguest/Kconfig"
+
+source "drivers/virt/nitro_enclaves/Kconfig"
 endif
diff --git a/drivers/virt/nitro_enclaves/Kconfig 
b/drivers/virt/nitro_enclaves/Kconfig
new file mode 100644
index ..78eb7293d2f7
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Amazon Nitro Enclaves (NE) support.
+# Nitro is a hypervisor that has been developed by Amazon.
+
+config NITRO_ENCLAVES
+   tristate "Nitro Enclaves Support"
+   depends on (ARM64 || X86) && HOTPLUG_CPU && PCI && SMP
+   help
+ This driver consists of support for enclave lifetime management
+ for Nitro Enclaves (NE).
+
+ To compile this driver as a module, choose M here.
+ The module will be called nitro_enclaves.
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v6 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves driver

2020-08-05 Thread Andra Paraschiv
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* No changes.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path to Makefile to match the drivers/virt/nitro_enclaves
  directory.
---
 drivers/virt/Makefile|  2 ++
 drivers/virt/nitro_enclaves/Makefile | 11 +++
 2 files changed, 13 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/Makefile

diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index fd331247c27a..f28425ce4b39 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -5,3 +5,5 @@
 
 obj-$(CONFIG_FSL_HV_MANAGER)   += fsl_hypervisor.o
 obj-y  += vboxguest/
+
+obj-$(CONFIG_NITRO_ENCLAVES)   += nitro_enclaves/
diff --git a/drivers/virt/nitro_enclaves/Makefile 
b/drivers/virt/nitro_enclaves/Makefile
new file mode 100644
index ..e9f4fcd1591e
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Enclave lifetime management support for Nitro Enclaves (NE).
+
+obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves.o
+
+nitro_enclaves-y := ne_pci_dev.o ne_misc_dev.o
+
+ccflags-y += -Wall
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v6 17/18] nitro_enclaves: Add overview documentation

2020-08-05 Thread Andra Paraschiv
Signed-off-by: Andra Paraschiv 
---
Changelog

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* Update doc type from .txt to .rst.
* Update documentation based on the changes from v4.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.
---
 Documentation/nitro_enclaves/ne_overview.rst | 87 
 1 file changed, 87 insertions(+)
 create mode 100644 Documentation/nitro_enclaves/ne_overview.rst

diff --git a/Documentation/nitro_enclaves/ne_overview.rst 
b/Documentation/nitro_enclaves/ne_overview.rst
new file mode 100644
index ..9cc7a2720955
--- /dev/null
+++ b/Documentation/nitro_enclaves/ne_overview.rst
@@ -0,0 +1,87 @@
+Nitro Enclaves
+==
+
+Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
+that allows customers to carve out isolated compute environments within EC2
+instances [1].
+
+For example, an application that processes sensitive data and runs in a VM,
+can be separated from other applications running in the same VM. This
+application then runs in a separate VM than the primary VM, namely an enclave.
+
+An enclave runs alongside the VM that spawned it. This setup matches low 
latency
+applications needs. The resources that are allocated for the enclave, such as
+memory and CPUs, are carved out of the primary VM. Each enclave is mapped to a
+process running in the primary VM, that communicates with the NE driver via an
+ioctl interface.
+
+In this sense, there are two components:
+
+1. An enclave abstraction process - a user space process running in the primary
+VM guest that uses the provided ioctl interface of the NE driver to spawn an
+enclave VM (that's 2 below).
+
+There is a NE emulated PCI device exposed to the primary VM. The driver for 
this
+new PCI device is included in the NE driver.
+
+The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE 
ioctl
+maps to an enclave start PCI command. The PCI device commands are then
+translated into  actions taken on the hypervisor side; that's the Nitro
+hypervisor running on the host where the primary VM is running. The Nitro
+hypervisor is based on core KVM technology.
+
+2. The enclave itself - a VM running on the same host as the primary VM that
+spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
+for the enclave VM. An enclave does not have persistent storage attached.
+
+The memory regions carved out of the primary VM and given to an enclave need to
+be aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
+this size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
+user space [2][3]. The memory size for an enclave needs to be at least 64 MiB.
+The enclave memory and CPUs need to be from the same NUMA node.
+
+An enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
+available for the primary VM. A CPU pool has to be set for NE purposes by an
+user with admin capability. See the cpu list section from the kernel
+documentation [4] for how a CPU pool format looks.
+
+An enclave communicates with the primary VM via a local communication channel,
+using virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
+while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
+uses eventfd for signaling. The enclave VM sees the usual interfaces - local
+APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
+device is placed in memory below the typical 4 GiB.
+
+The application that runs in the enclave needs to be packaged in an enclave
+image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
+enclave VM. The enclave VM has its own kernel and follows the standard Linux
+boot protocol.
+
+The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
+Enclave Image Format (EIF); plus an EIF header including metadata such as magic
+number, eif version, image size and CRC.
+
+Hash values are computed for the entire enclave image (EIF), the kernel and
+ramdisk(s). That's used, for example, to check that the enclave image that is
+loaded in the enclave VM is the one that was intended to be run.
+
+These crypto measurements are included in a signed attestation document
+generated by the Nitro Hypervisor and further used to prove the identity of the
+enclave; KMS is an example of service that NE is integrated with and that 
checks
+the attestation doc.
+
+The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
+init process in the enclave connects to the vsock CID of the primary VM and a
+predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
+used to check in the primary VM that the enclave has booted.
+
+If the enclave VM crashes or gracefully exits, an interrupt event is received 
by
+the NE driver. This event is sent further to the user space enclave process
+running in the primary VM via a poll notification mechanism. Then 

[PATCH v6 16/18] nitro_enclaves: Add sample for ioctl interface usage

2020-08-05 Thread Andra Paraschiv
Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v5 -> v6

* Remove "rc" mentioning when printing errno string.
* Remove the ioctl to query API version.
* Include usage info for NUMA-aware hugetlb configuration.
* Update documentation to kernel-doc format.
* Add logic for enclave image loading.

v4 -> v5

* Print enclave vCPU ids when they are created.
* Update logic to map the modified vCPU ioctl call.
* Add check for the path to the enclave image to be less than PATH_MAX.
* Update the ioctl calls error checking logic to match the NE specific
  error codes.

v3 -> v4

* Update usage details to match the updates in v4.
* Update NE ioctl interface usage.

v2 -> v3

* Remove the include directory to use the uapi from the kernel.
* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* New in v2.
---
 samples/nitro_enclaves/.gitignore|   2 +
 samples/nitro_enclaves/Makefile  |  16 +
 samples/nitro_enclaves/ne_ioctl_sample.c | 853 +++
 3 files changed, 871 insertions(+)
 create mode 100644 samples/nitro_enclaves/.gitignore
 create mode 100644 samples/nitro_enclaves/Makefile
 create mode 100644 samples/nitro_enclaves/ne_ioctl_sample.c

diff --git a/samples/nitro_enclaves/.gitignore 
b/samples/nitro_enclaves/.gitignore
new file mode 100644
index ..827934129c90
--- /dev/null
+++ b/samples/nitro_enclaves/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+ne_ioctl_sample
diff --git a/samples/nitro_enclaves/Makefile b/samples/nitro_enclaves/Makefile
new file mode 100644
index ..a3ec78fefb52
--- /dev/null
+++ b/samples/nitro_enclaves/Makefile
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Enclave lifetime management support for Nitro Enclaves (NE) - ioctl sample
+# usage.
+
+.PHONY: all clean
+
+CFLAGS += -Wall
+
+all:
+   $(CC) $(CFLAGS) -o ne_ioctl_sample ne_ioctl_sample.c -lpthread
+
+clean:
+   rm -f ne_ioctl_sample
diff --git a/samples/nitro_enclaves/ne_ioctl_sample.c 
b/samples/nitro_enclaves/ne_ioctl_sample.c
new file mode 100644
index ..3305ca5696b8
--- /dev/null
+++ b/samples/nitro_enclaves/ne_ioctl_sample.c
@@ -0,0 +1,853 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * DOC: Sample flow of using the ioctl interface provided by the Nitro 
Enclaves (NE)
+ * kernel driver.
+ *
+ * Usage
+ * -
+ *
+ * Load the nitro_enclaves module, setting also the enclave CPU pool. The
+ * enclave CPUs need to be full cores from the same NUMA node. CPU 0 and its
+ * siblings have to remain available for the primary / parent VM, so they
+ * cannot be included in the enclave CPU pool.
+ *
+ * See the cpu list section from the kernel documentation.
+ * 
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html#cpu-lists
+ *
+ * insmod drivers/virt/nitro_enclaves/nitro_enclaves.ko
+ * lsmod
+ *
+ * The CPU pool can be set at runtime, after the kernel module is loaded.
+ *
+ * echo  > /sys/module/nitro_enclaves/parameters/ne_cpus
+ *
+ * NUMA and CPU siblings information can be found using:
+ *
+ * lscpu
+ * /proc/cpuinfo
+ *
+ * Check the online / offline CPU list. The CPUs from the pool should be
+ * offlined.
+ *
+ * lscpu
+ *
+ * Check dmesg for any warnings / errors through the NE driver lifetime / 
usage.
+ * The NE logs contain the "nitro_enclaves" or "pci :00:02.0" pattern.
+ *
+ * dmesg
+ *
+ * Setup hugetlbfs huge pages. The memory needs to be from the same NUMA node 
as
+ * the enclave CPUs.
+ * https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
+ * By default, the allocation of hugetlb pages are distributed on all possible
+ * NUMA nodes. Use the following configuration files to set the number of huge
+ * pages from a NUMA node:
+ *
+ * /sys/devices/system/node/node/hugepages/hugepages-2048kB/nr_hugepages
+ * 
/sys/devices/system/node/node/hugepages/hugepages-1048576kB/nr_hugepages
+ *
+ * or, if not on a system with multiple NUMA nodes, can also set the number
+ * of 2 MiB / 1 GiB huge pages using
+ *
+ * /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
+ * /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
+ *
+ * In this example 256 hugepages of 2 MiB are used.
+ *
+ * Build and run the NE sample.
+ *
+ * make -C samples/nitro_enclaves clean
+ * make -C samples/nitro_enclaves
+ * ./samples/nitro_enclaves/ne_ioctl_sample 
+ *
+ * Unload the nitro_enclaves module.
+ *
+ * rmmod nitro_enclaves
+ * lsmod
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+

[PATCH v6 18/18] MAINTAINERS: Add entry for the Nitro Enclaves driver

2020-08-05 Thread Andra Paraschiv
Signed-off-by: Andra Paraschiv 
---
Changelog

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* No changes.

v2 -> v3

* Update file entries to be in alphabetical order.

v1 -> v2

* No changes.
---
 MAINTAINERS | 13 +
 1 file changed, 13 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4e2698cc7e23..0d83aaad3cd4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12127,6 +12127,19 @@ S: Maintained
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2.git
 F: arch/nios2/
 
+NITRO ENCLAVES (NE)
+M: Andra Paraschiv 
+M: Alexandru Vasile 
+M: Alexandru Ciobotaru 
+L: linux-kernel@vger.kernel.org
+S: Supported
+W: https://aws.amazon.com/ec2/nitro/nitro-enclaves/
+F: Documentation/nitro_enclaves/
+F: drivers/virt/nitro_enclaves/
+F: include/linux/nitro_enclaves.h
+F: include/uapi/linux/nitro_enclaves.h
+F: samples/nitro_enclaves/
+
 NOHZ, DYNTICKS SUPPORT
 M: Frederic Weisbecker 
 M: Thomas Gleixner 
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v6 10/18] nitro_enclaves: Add logic for getting the enclave image load info

2020-08-05 Thread Andra Paraschiv
Before setting the memory regions for the enclave, the enclave image
needs to be placed in memory. After the memory regions are set, this
memory cannot be used anymore by the VM, being carved out.

Add ioctl command logic to get the offset in enclave memory where to
place the enclave image. Then the user space tooling copies the enclave
image in the memory using the given memory offset.

Signed-off-by: Andra Paraschiv 
---
Changelog

v5 -> v6

* Check for invalid enclave image load flags.

v4 -> v5

* Check for the enclave not being started when invoking this ioctl call.
* Remove log on copy_from_user() / copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Set enclave image load offset based on flags.
* Update the naming for the ioctl command from metadata to info.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 30 +++
 1 file changed, 30 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 4787bc59d39d..850e5e0ce0e9 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -652,6 +652,36 @@ static long ne_enclave_ioctl(struct file *file, unsigned 
int cmd, unsigned long
return 0;
}
 
+   case NE_GET_IMAGE_LOAD_INFO: {
+   struct ne_image_load_info image_load_info = {};
+
+   if (copy_from_user(_load_info, (void __user *)arg, 
sizeof(image_load_info)))
+   return -EFAULT;
+
+   mutex_lock(_enclave->enclave_info_mutex);
+
+   if (ne_enclave->state != NE_STATE_INIT) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave is not in init state\n");
+
+   mutex_unlock(_enclave->enclave_info_mutex);
+
+   return -NE_ERR_NOT_IN_INIT_STATE;
+   }
+
+   mutex_unlock(_enclave->enclave_info_mutex);
+
+   if (image_load_info.flags == NE_EIF_IMAGE)
+   image_load_info.memory_offset = NE_EIF_LOAD_OFFSET;
+   else
+   return -EINVAL;
+
+   if (copy_to_user((void __user *)arg, _load_info, 
sizeof(image_load_info)))
+   return -EFAULT;
+
+   return 0;
+   }
+
default:
return -ENOTTY;
}
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v6 12/18] nitro_enclaves: Add logic for starting an enclave

2020-08-05 Thread Andra Paraschiv
After all the enclave resources are set, the enclave is ready for
beginning to run.

Add ioctl command logic for starting an enclave after all its resources,
memory regions and CPUs, have been set.

The enclave start information includes the local channel addressing -
vsock CID - and the flags associated with the enclave.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v5 -> v6

* Check for invalid enclave start flags.
* Update documentation to kernel-doc format.

v4 -> v5

* Add early exit on enclave start ioctl function call error.
* Move sanity checks in the enclave start ioctl function, outside of the
  switch-case block.
* Remove log on copy_from_user() / copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update the naming for the ioctl command from metadata to info.
* Check for minimum enclave memory size.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.

v1 -> v2

* Add log pattern for NE.
* Check if enclave state is init when starting an enclave.
* Remove the BUG_ON calls.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 109 ++
 1 file changed, 109 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 88536a415246..c7eb1146a6c0 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -820,6 +820,77 @@ static int ne_set_user_memory_region_ioctl(struct 
ne_enclave *ne_enclave,
return rc;
 }
 
+/**
+ * ne_start_enclave_ioctl() - Trigger enclave start after the enclave 
resources,
+ *   such as memory and CPU, have been set.
+ * @ne_enclave :   Private data associated with the current 
enclave.
+ * @enclave_start_info :   Enclave info that includes enclave cid and 
flags.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_start_enclave_ioctl(struct ne_enclave *ne_enclave,
+   struct ne_enclave_start_info *enclave_start_info)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   unsigned int cpu = 0;
+   struct enclave_start_req enclave_start_req = {};
+   unsigned int i = 0;
+   int rc = -EINVAL;
+
+   if (!ne_enclave->nr_mem_regions) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave has no mem regions\n");
+
+   return -NE_ERR_NO_MEM_REGIONS_ADDED;
+   }
+
+   if (ne_enclave->mem_size < NE_MIN_ENCLAVE_MEM_SIZE) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave memory is less than %ld\n",
+   NE_MIN_ENCLAVE_MEM_SIZE);
+
+   return -NE_ERR_ENCLAVE_MEM_MIN_SIZE;
+   }
+
+   if (!ne_enclave->nr_vcpus) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Enclave has no vCPUs\n");
+
+   return -NE_ERR_NO_VCPUS_ADDED;
+   }
+
+   for (i = 0; i < ne_enclave->avail_cpu_cores_size; i++)
+   for_each_cpu(cpu, ne_enclave->avail_cpu_cores[i])
+   if (!cpumask_test_cpu(cpu, ne_enclave->vcpu_ids)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Full CPU cores not 
used\n");
+
+   return -NE_ERR_FULL_CORES_NOT_USED;
+   }
+
+   enclave_start_req.enclave_cid = enclave_start_info->enclave_cid;
+   enclave_start_req.flags = enclave_start_info->flags;
+   enclave_start_req.slot_uid = ne_enclave->slot_uid;
+
+   rc = ne_do_request(ne_enclave->pdev, ENCLAVE_START, _start_req,
+  sizeof(enclave_start_req), _reply, 
sizeof(cmd_reply));
+   if (rc < 0) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Error in enclave start [rc=%d]\n", rc);
+
+   return rc;
+   }
+
+   ne_enclave->state = NE_STATE_RUNNING;
+
+   enclave_start_info->enclave_cid = cmd_reply.enclave_cid;
+
+   return 0;
+}
+
 /**
  * ne_enclave_ioctl() - Ioctl function provided by the enclave file.
  * @file:  File associated with this ioctl function.
@@ -967,6 +1038,44 @@ static long ne_enclave_ioctl(struct file *file, unsigned 
int cmd, unsigned long
return 0;
}
 
+   case NE_START_ENCLAVE: {
+   struct ne_enclave_start_info enclave_start_info = {};
+   int rc = -EINVAL;
+
+   if (copy_from_user(_start_info, (void __user *)arg,
+  sizeof(enclave_start_info)))
+

[PATCH v6 08/18] nitro_enclaves: Add logic for creating an enclave VM

2020-08-05 Thread Andra Paraschiv
Add ioctl command logic for enclave VM creation. It triggers a slot
allocation. The enclave resources will be associated with this slot and
it will be used as an identifier for triggering enclave run.

Return a file descriptor, namely enclave fd. This is further used by the
associated user space enclave process to set enclave resources and
trigger enclave termination.

The poll function is implemented in order to notify the enclave process
when an enclave exits without a specific enclave termination command
trigger e.g. when an enclave crashes.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v5 -> v6

* Update the code base to init the ioctl function in this patch.
* Update documentation to kernel-doc format.

v4 -> v5

* Release the reference to the NE PCI device on create VM error.
* Close enclave fd on copy_to_user() failure; rename fd to enclave fd
  while at it.
* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.
* Remove log on copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.
* Add metadata for the NUMA node for the enclave memory and CPUs.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 229 ++
 1 file changed, 229 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 472850250220..6c8c12f65666 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -87,9 +87,238 @@ struct ne_cpu_pool {
 
 static struct ne_cpu_pool ne_cpu_pool;
 
+/**
+ * ne_enclave_poll() - Poll functionality used for enclave out-of-band events.
+ * @file:  File associated with this poll function.
+ * @wait:  Poll table data structure.
+ *
+ * Context: Process context.
+ * Return:
+ * * Poll mask.
+ */
+static __poll_t ne_enclave_poll(struct file *file, poll_table *wait)
+{
+   __poll_t mask = 0;
+   struct ne_enclave *ne_enclave = file->private_data;
+
+   poll_wait(file, _enclave->eventq, wait);
+
+   if (!ne_enclave->has_event)
+   return mask;
+
+   mask = POLLHUP;
+
+   return mask;
+}
+
+static const struct file_operations ne_enclave_fops = {
+   .owner  = THIS_MODULE,
+   .llseek = noop_llseek,
+   .poll   = ne_enclave_poll,
+};
+
+/**
+ * ne_create_vm_ioctl() - Alloc slot to be associated with an enclave. Create
+ *   enclave file descriptor to be further used for enclave
+ *   resources handling e.g. memory regions and CPUs.
+ * @pdev:  PCI device used for enclave lifetime management.
+ * @ne_pci_dev :   Private data associated with the PCI device.
+ * @slot_uid:  Generated unique slot id associated with an enclave.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev 
enclave
+ * mutex held.
+ * Return:
+ * * Enclave fd on success.
+ * * Negative return value on failure.
+ */
+static int ne_create_vm_ioctl(struct pci_dev *pdev, struct ne_pci_dev 
*ne_pci_dev,
+ u64 *slot_uid)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   int enclave_fd = -1;
+   struct file *enclave_file = NULL;
+   unsigned int i = 0;
+   struct ne_enclave *ne_enclave = NULL;
+   int rc = -EINVAL;
+   struct slot_alloc_req slot_alloc_req = {};
+
+   mutex_lock(_cpu_pool.mutex);
+
+   for (i = 0; i < ne_cpu_pool.avail_cores_size; i++)
+   if (!cpumask_empty(ne_cpu_pool.avail_cores[i]))
+   break;
+
+   if (i == ne_cpu_pool.avail_cores_size) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "No CPUs available in CPU pool\n");
+
+   mutex_unlock(_cpu_pool.mutex);
+
+   return -NE_ERR_NO_CPUS_AVAIL_IN_POOL;
+   }
+
+   mutex_unlock(_cpu_pool.mutex);
+
+   ne_enclave = kzalloc(sizeof(*ne_enclave), GFP_KERNEL);
+   if (!ne_enclave)
+   return -ENOMEM;
+
+   mutex_lock(_cpu_pool.mutex);
+
+   ne_enclave->avail_cpu_cores_size = ne_cpu_pool.avail_cores_size;
+   ne_enclave->numa_node = ne_cpu_pool.numa_node;
+
+   mutex_unlock(_cpu_pool.mutex);
+
+   ne_enclave->avail_cpu_cores = kcalloc(ne_enclave->avail_cpu_cores_size,
+   sizeof(*ne_enclave->avail_cpu_cores), GFP_KERNEL);
+   if (!ne_enclave->avail_cpu_cores) {
+   rc = -ENOMEM;
+
+   goto free_ne_

[PATCH v6 07/18] nitro_enclaves: Init misc device providing the ioctl interface

2020-08-05 Thread Andra Paraschiv
The Nitro Enclaves driver provides an ioctl interface to the user space
for enclave lifetime management e.g. enclave creation / termination and
setting enclave resources such as memory and CPU.

This ioctl interface is mapped to a Nitro Enclaves misc device.

Signed-off-by: Andra Paraschiv 
---
Changelog

v5 -> v6

* Remove the ioctl to query API version.
* Update documentation to kernel-doc format.

v4 -> v5

* Update the size of the NE CPU pool string from 4096 to 512 chars.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Remove the NE CPU pool init during kernel module loading, as the CPU
  pool is now setup at runtime, via a sysfs file for the kernel
  parameter.
* Add minimum enclave memory size definition.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.
* Remove the WARN_ON calls.
* Remove linux/bug and linux/kvm_host includes that are not needed.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.
* Remove file ops that do nothing for now - open and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Update ne_cpu_pool data structure to include the global mutex.
* Update NE misc device mode to 0660.
* Check if the CPU siblings are included in the NE CPU pool, as full CPU
  cores are given for the enclave(s).
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 121 ++
 drivers/virt/nitro_enclaves/ne_pci_dev.c  |  11 ++
 2 files changed, 132 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
new file mode 100644
index ..472850250220
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * DOC: Enclave lifetime management driver for Nitro Enclaves (NE).
+ * Nitro is a hypervisor that has been developed by Amazon.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+/**
+ * NE_CPUS_SIZE - Size for max 128 CPUs, for now, in a cpu-list string, comma
+ *   separated. The NE CPU pool includes CPUs from a single NUMA
+ *   node.
+ */
+#define NE_CPUS_SIZE   (512)
+
+/**
+ * NE_EIF_LOAD_OFFSET - The offset where to copy the Enclave Image Format (EIF)
+ * image in enclave memory.
+ */
+#define NE_EIF_LOAD_OFFSET (8 * 1024UL * 1024UL)
+
+/**
+ * NE_MIN_ENCLAVE_MEM_SIZE - The minimum memory size an enclave can be launched
+ *  with.
+ */
+#define NE_MIN_ENCLAVE_MEM_SIZE(64 * 1024UL * 1024UL)
+
+/**
+ * NE_MIN_MEM_REGION_SIZE - The minimum size of an enclave memory region.
+ */
+#define NE_MIN_MEM_REGION_SIZE (2 * 1024UL * 1024UL)
+
+/*
+ * TODO: Update logic to create new sysfs entries instead of using
+ * a kernel parameter e.g. if multiple sysfs files needed.
+ */
+static const struct kernel_param_ops ne_cpu_pool_ops = {
+   .get= param_get_string,
+};
+
+static char ne_cpus[NE_CPUS_SIZE];
+static struct kparam_string ne_cpus_arg = {
+   .maxlen = sizeof(ne_cpus),
+   .string = ne_cpus,
+};
+
+module_param_cb(ne_cpus, _cpu_pool_ops, _cpus_arg, 0644);
+/* 
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html#cpu-lists
 */
+MODULE_PARM_DESC(ne_cpus, " - CPU pool used for Nitro Enclaves");
+
+/**
+ * struct ne_cpu_pool - CPU pool used for Nitro Enclaves.
+ * @avail_cores:   Available CPU cores in the pool.
+ * @avail_cores_size:  The size of the available cores array.
+ * @mutex: Mutex for the access to the NE CPU pool.
+ * @numa_node: NUMA node of the CPUs in the pool.
+ */
+struct ne_cpu_pool {
+   cpumask_var_t   *avail_cores;
+   unsigned intavail_cores_size;
+   struct mutexmutex;
+   int numa_node;
+};
+
+static struct ne_cpu_pool ne_cpu_pool;
+
+static const struct file_operations ne_fops = {
+   .owner  = THIS_MODULE,
+   .llseek = noop_llseek,
+};
+
+struct miscdevice ne_misc_dev = {
+   .minor  = MISC_DYNAMIC_MINOR,
+   .name   = "nitro_enclaves",
+   .fops   = _fops,
+   .mode   = 0660,
+};
+
+static int __init ne_init(void)
+{
+   mutex_init(_cpu_pool.mutex);
+
+   return pci_register_driver(_pci_driver);
+}
+
+static void __exit ne_exit(void)
+{
+   pci_unregister_driver(_pci_driver);
+}
+
+/* TODO: Handle actions such as reboot, kexec. */
+
+module_init(ne_init);
+module_exit(ne_exit);
+
+MODULE_AUTHOR("Amazon.com, Inc. or its affiliates");
+MODULE_DESCRIPTION(&qu

[PATCH v6 11/18] nitro_enclaves: Add logic for setting an enclave memory region

2020-08-05 Thread Andra Paraschiv
Another resource that is being set for an enclave is memory. User space
memory regions, that need to be backed by contiguous memory regions,
are associated with the enclave.

One solution for allocating / reserving contiguous memory regions, that
is used for integration, is hugetlbfs. The user space process that is
associated with the enclave passes to the driver these memory regions.

The enclave memory regions need to be from the same NUMA node as the
enclave CPUs.

Add ioctl command logic for setting user space memory region for an
enclave.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v5 -> v6

* Check for max number of pages allocated for the internal data
  structure for pages.
* Check for invalid memory region flags.
* Check for aligned physical memory regions.
* Update documentation to kernel-doc format.
* Check for duplicate user space memory regions.
* Use directly put_page() instead of unpin_user_pages(), to match the
  get_user_pages() calls.

v4 -> v5

* Add early exit on set memory region ioctl function call error.
* Remove log on copy_from_user() failure.
* Exit without unpinning the pages on NE PCI dev request failure as
  memory regions from the user space range may have already been added.
* Add check for the memory region user space address to be 2 MiB
  aligned.
* Update logic to not have a hardcoded check for 2 MiB memory regions.

v3 -> v4

* Check enclave memory regions are from the same NUMA node as the
  enclave CPUs.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave max memory regions is reached when setting an enclave
  memory region.
* Check if enclave state is init when setting an enclave memory region.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 285 ++
 1 file changed, 285 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 850e5e0ce0e9..88536a415246 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -568,6 +568,258 @@ static int ne_add_vcpu_ioctl(struct ne_enclave 
*ne_enclave, u32 vcpu_id)
return 0;
 }
 
+/**
+ * ne_sanity_check_user_mem_region() - Sanity check the user space memory
+ *region received during the set user
+ *memory region ioctl call.
+ * @ne_enclave :   Private data associated with the current enclave.
+ * @mem_region :   User space memory region to be sanity checked.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_sanity_check_user_mem_region(struct ne_enclave *ne_enclave,
+   struct ne_user_memory_region mem_region)
+{
+   struct ne_mem_region *ne_mem_region = NULL;
+
+   if (ne_enclave->mm != current->mm)
+   return -EIO;
+
+   if (mem_region.memory_size & (NE_MIN_MEM_REGION_SIZE - 1)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "User space memory size is not multiple of 
2 MiB\n");
+
+   return -NE_ERR_INVALID_MEM_REGION_SIZE;
+   }
+
+   if (!IS_ALIGNED(mem_region.userspace_addr, NE_MIN_MEM_REGION_SIZE)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "User space address is not 2 MiB 
aligned\n");
+
+   return -NE_ERR_UNALIGNED_MEM_REGION_ADDR;
+   }
+
+   if ((mem_region.userspace_addr & (NE_MIN_MEM_REGION_SIZE - 1)) ||
+   !access_ok((void __user *)(unsigned long)mem_region.userspace_addr,
+  mem_region.memory_size)) {
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "Invalid user space address range\n");
+
+   return -NE_ERR_INVALID_MEM_REGION_ADDR;
+   }
+
+   list_for_each_entry(ne_mem_region, _enclave->mem_regions_list,
+   mem_region_list_entry) {
+   u64 memory_size = ne_mem_region->memory_size;
+   u64 userspace_addr = ne_mem_region->userspace_addr;
+
+   if (userspace_addr <= mem_region.userspace_addr &&
+   mem_region.userspace_addr < (userspace_addr + memory_size)) 
{
+   dev_err_ratelimited(ne_misc_dev.this_device,
+   "User space memory region already 
used\n");
+
+   return -NE_ERR_MEM_REGION_ALREADY_USED;
+   }
+ 

[PATCH v6 09/18] nitro_enclaves: Add logic for setting an enclave vCPU

2020-08-05 Thread Andra Paraschiv
An enclave, before being started, has its resources set. One of its
resources is CPU.

A NE CPU pool is set and enclave CPUs are chosen from it. Offline the
CPUs from the NE CPU pool during the pool setup and online them back
during the NE CPU pool teardown. The CPU offline is necessary so that
there would not be more vCPUs than physical CPUs available to the
primary / parent VM. In that case the CPUs would be overcommitted and
would change the initial configuration of the primary / parent VM of
having dedicated vCPUs to physical CPUs.

The enclave CPUs need to be full cores and from the same NUMA node. CPU
0 and its siblings have to remain available to the primary / parent VM.

Add ioctl command logic for setting an enclave vCPU.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v5 -> v6

* Check CPUs are from the same NUMA node before going through CPU
  siblings during the NE CPU pool setup.
* Update documentation to kernel-doc format.

v4 -> v5

* Set empty string in case of invalid NE CPU pool.
* Clear NE CPU pool mask on pool setup failure.
* Setup NE CPU cores out of the NE CPU pool.
* Early exit on NE CPU pool setup if enclave(s) already running.
* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.
* Add check for maximum vCPU id possible before looking into the CPU
  pool.
* Remove log on copy_from_user() / copy_to_user() failure and on admin
  capability check for setting the NE CPU pool.
* Update the ioctl call to not create a file descriptor for the vCPU.
* Split the CPU pool usage logic in 2 separate functions - one to get a
  CPU from the pool and the other to check the given CPU is available in
  the pool.

v3 -> v4

* Setup the NE CPU pool at runtime via a sysfs file for the kernel
  parameter.
* Check enclave CPUs to be from the same NUMA node.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open, ioctl and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave state is init when setting enclave vCPU.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 575 ++
 1 file changed, 575 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 6c8c12f65666..4787bc59d39d 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -57,8 +57,11 @@
  * TODO: Update logic to create new sysfs entries instead of using
  * a kernel parameter e.g. if multiple sysfs files needed.
  */
+static int ne_set_kernel_param(const char *val, const struct kernel_param *kp);
+
 static const struct kernel_param_ops ne_cpu_pool_ops = {
.get= param_get_string,
+   .set= ne_set_kernel_param,
 };
 
 static char ne_cpus[NE_CPUS_SIZE];
@@ -87,6 +90,575 @@ struct ne_cpu_pool {
 
 static struct ne_cpu_pool ne_cpu_pool;
 
+/**
+ * ne_check_enclaves_created() - Verify if at least one enclave has been 
created.
+ * @void:  No parameters provided.
+ *
+ * Context: Process context.
+ * Return:
+ * * True if at least one enclave is created.
+ * * False otherwise.
+ */
+static bool ne_check_enclaves_created(void)
+{
+   struct ne_pci_dev *ne_pci_dev = NULL;
+   /* TODO: Find another way to get the NE PCI device reference. */
+   struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON, 
PCI_DEVICE_ID_NE, NULL);
+   bool ret = false;
+
+   if (!pdev)
+   return ret;
+
+   ne_pci_dev = pci_get_drvdata(pdev);
+   if (!ne_pci_dev) {
+   pci_dev_put(pdev);
+
+   return ret;
+   }
+
+   mutex_lock(_pci_dev->enclaves_list_mutex);
+
+   if (!list_empty(_pci_dev->enclaves_list))
+   ret = true;
+
+   mutex_unlock(_pci_dev->enclaves_list_mutex);
+
+   pci_dev_put(pdev);
+
+   return ret;
+}
+
+/**
+ * ne_setup_cpu_pool() - Set the NE CPU pool after handling sanity checks such
+ *  as not sharing CPU cores with the primary / parent VM
+ *  or not using CPU 0, which should remain available for
+ *  the primary / parent VM. Offline the CPUs from the
+ *  pool after the checks passed.
+ * @ne_cpu_list:   The CPU list used for setting NE CPU pool.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_setup_cpu_pool(const char *ne_cpu_list)
+{
+   int core_id = -1;
+   unsigned int cpu = 0;
+   cpumask_var_t cpu_pool = NULL;
+   unsigned int cpu_sibling = 0;
+   unsigned int i = 0;
+   int numa_node = -1;

[PATCH v6 06/18] nitro_enclaves: Handle out-of-band PCI device events

2020-08-05 Thread Andra Paraschiv
In addition to the replies sent by the Nitro Enclaves PCI device in
response to command requests, out-of-band enclave events can happen e.g.
an enclave crashes. In this case, the Nitro Enclaves driver needs to be
aware of the event and notify the corresponding user space process that
abstracts the enclave.

Register an MSI-X interrupt vector to be used for this kind of
out-of-band events. The interrupt notifies that the state of an enclave
changed and the driver logic scans the state of each running enclave to
identify for which this notification is intended.

Create an workqueue to handle the out-of-band events. Notify user space
enclave process that is using a polling mechanism on the enclave fd.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 116 +++
 1 file changed, 116 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index 77ccbc43bce3..a898fae066d9 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -214,6 +214,88 @@ static irqreturn_t ne_reply_handler(int irq, void *args)
return IRQ_HANDLED;
 }
 
+/**
+ * ne_event_work_handler() - Work queue handler for notifying enclaves on a
+ *  state change received by the event interrupt
+ *  handler.
+ * @work:  Item containing the NE PCI device for which an out-of-band event
+ * was issued.
+ *
+ * An out-of-band event is being issued by the Nitro Hypervisor when at least
+ * one enclave is changing state without client interaction.
+ *
+ * Context: Work queue context.
+ */
+static void ne_event_work_handler(struct work_struct *work)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   struct ne_enclave *ne_enclave = NULL;
+   struct ne_pci_dev *ne_pci_dev =
+   container_of(work, struct ne_pci_dev, notify_work);
+   int rc = -EINVAL;
+   struct slot_info_req slot_info_req = {};
+
+   mutex_lock(_pci_dev->enclaves_list_mutex);
+
+   /*
+* Iterate over all enclaves registered for the Nitro Enclaves
+* PCI device and determine for which enclave(s) the out-of-band event
+* is corresponding to.
+*/
+   list_for_each_entry(ne_enclave, _pci_dev->enclaves_list, 
enclave_list_entry) {
+   mutex_lock(_enclave->enclave_info_mutex);
+
+   /*
+* Enclaves that were never started cannot receive out-of-band
+* events.
+*/
+   if (ne_enclave->state != NE_STATE_RUNNING)
+   goto unlock;
+
+   slot_info_req.slot_uid = ne_enclave->slot_uid;
+
+   rc = ne_do_request(ne_enclave->pdev, SLOT_INFO, _info_req,
+  sizeof(slot_info_req), _reply, 
sizeof(cmd_reply));
+   if (rc < 0)
+   dev_err(_enclave->pdev->dev, "Error in slot info 
[rc=%d]\n", rc);
+
+   /* Notify enclave process that the enclave state changed. */
+   if (ne_enclave->state != cmd_reply.state) {
+   ne_enclave->state = cmd_reply.state;
+
+   ne_enclave->has_event = true;
+
+   wake_up_interruptible(_enclave->eventq);
+   }
+
+unlock:
+mutex_unlock(_enclave->enclave_info_mutex);
+   }
+
+   mutex_unlock(_pci_dev->enclaves_list_mutex);
+}
+
+/**
+ * ne_event_handler() - Interrupt handler for PCI device out-of-band events.
+ * This interrupt does not supply any data in the MMIO
+ * region. It notifies a change in the state of any of
+ * the launched enclaves.
+ * @irq:   Received interrupt for an out-of-band event.
+ * @args:  PCI device private data structure.
+ *
+ * Context: Interrupt context.
+ * Return:
+ * * IRQ_HANDLED on handled interrupt.
+ */
+static irqreturn_t ne_event_handler(int irq, void *args)
+{
+   struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
+
+   queue_work(ne_pci_dev->event_wq, _pci_dev->notify_work);
+
+   return IRQ_HANDLED;
+}
+
 /**
  * ne_setup_msix() - Setup MSI-X vectors for the PCI device.
  * @pdev:  PCI device to s

[PATCH v6 03/18] nitro_enclaves: Define enclave info for internal bookkeeping

2020-08-05 Thread Andra Paraschiv
The Nitro Enclaves driver keeps an internal info per each enclave.

This is needed to be able to manage enclave resources state, enclave
notifications and have a reference of the PCI device that handles
command requests for enclave lifetime management.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v5 -> v6

* Update documentation to kernel-doc format.
* Include in the enclave memory region data structure the user space
  address and size for duplicate user space memory regions checks.

v4 -> v5

* Include enclave cores field in the enclave metadata.
* Update the vCPU ids data structure to be a cpumask instead of a list.

v3 -> v4

* Add NUMA node field for an enclave metadata as the enclave memory and
  CPUs need to be from the same NUMA node.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Add enclave memory regions and vcpus count for enclave bookkeeping.
* Update ne_state comments to reflect NE_START_ENCLAVE ioctl naming
  update.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.h | 92 +++
 1 file changed, 92 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.h

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.h 
b/drivers/virt/nitro_enclaves/ne_misc_dev.h
new file mode 100644
index ..ae5882ae2e05
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _NE_MISC_DEV_H_
+#define _NE_MISC_DEV_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * struct ne_mem_region - Entry in the enclave user space memory regions list.
+ * @mem_region_list_entry: Entry in the list of enclave memory regions.
+ * @memory_size:   Size of the user space memory region.
+ * @nr_pages:  Number of pages that make up the memory region.
+ * @pages: Pages that make up the user space memory region.
+ * @userspace_addr:User space address of the memory region.
+ */
+struct ne_mem_region {
+   struct list_headmem_region_list_entry;
+   u64 memory_size;
+   unsigned long   nr_pages;
+   struct page **pages;
+   u64 userspace_addr;
+};
+
+/**
+ * struct ne_enclave - Per-enclave data used for enclave lifetime management.
+ * @avail_cpu_cores:   Available CPU cores for the enclave.
+ * @avail_cpu_cores_size:  The size of the available cores array.
+ * @enclave_info_mutex :   Mutex for accessing this internal state.
+ * @enclave_list_entry :   Entry in the list of created enclaves.
+ * @eventq:Wait queue used for out-of-band event 
notifications
+ * triggered from the PCI device event handler to
+ * the enclave process via the poll function.
+ * @has_event: Variable used to determine if the out-of-band 
event
+ * was triggered.
+ * @max_mem_regions:   The maximum number of memory regions that can be
+ * handled by the hypervisor.
+ * @mem_regions_list:  Enclave user space memory regions list.
+ * @mem_size:  Enclave memory size.
+ * @mm :   Enclave process abstraction mm data struct.
+ * @nr_mem_regions:Number of memory regions associated with the 
enclave.
+ * @nr_vcpus:  Number of vcpus associated with the enclave.
+ * @numa_node: NUMA node of the enclave memory and CPUs.
+ * @pdev:  PCI device used for enclave lifetime management.
+ * @slot_uid:  Slot unique id mapped to the enclave.
+ * @state: Enclave state, updated during enclave lifetime.
+ * @vcpu_ids:  Enclave vCPUs.
+ */
+struct ne_enclave {
+   cpumask_var_t   *avail_cpu_cores;
+   unsigned intavail_cpu_cores_size;
+   struct mutexenclave_info_mutex;
+   struct list_headenclave_list_entry;
+   wait_queue_head_t   eventq;
+   boolhas_event;
+   u64 max_mem_regions;
+   struct list_headmem_regions_list;
+   u64 mem_size;
+   struct mm_struct*mm;
+   u64 nr_mem_regions;
+   u64 nr_vcpus;
+   int numa_node;
+   struct pci_dev  *pdev;
+   u64 slot_uid;
+   u16 state;
+   cpumask_var_t   vcpu_ids;
+};
+
+/**
+ * enum ne_state - States available for an enclave.
+ * @NE_STATE_INIT: The enclave has not b

[PATCH v6 05/18] nitro_enclaves: Handle PCI device command requests

2020-08-05 Thread Andra Paraschiv
The Nitro Enclaves PCI device exposes a MMIO space that this driver
uses to submit command requests and to receive command replies e.g. for
enclave creation / termination or setting enclave resources.

Add logic for handling PCI device command requests based on the given
command type.

Register an MSI-X interrupt vector for command reply notifications to
handle this type of communication events.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.

v1 -> v2

* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add fix for kbuild report:
  https://lore.kernel.org/lkml/202004231644.xtmn4z1z%25...@intel.com/
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 204 +++
 1 file changed, 204 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index 31650dcd592e..77ccbc43bce3 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -33,6 +33,187 @@ static const struct pci_device_id ne_pci_ids[] = {
 
 MODULE_DEVICE_TABLE(pci, ne_pci_ids);
 
+/**
+ * ne_submit_request() - Submit command request to the PCI device based on the
+ *  command type.
+ * @pdev:  PCI device to send the command to.
+ * @cmd_type:  Command type of the request sent to the PCI device.
+ * @cmd_request:   Command request payload.
+ * @cmd_request_size:  Size of the command request payload.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_submit_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type 
cmd_type,
+void *cmd_request, size_t cmd_request_size)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+   memcpy_toio(ne_pci_dev->iomem_base + NE_SEND_DATA, cmd_request, 
cmd_request_size);
+
+   iowrite32(cmd_type, ne_pci_dev->iomem_base + NE_COMMAND);
+
+   return 0;
+}
+
+/**
+ * ne_retrieve_reply() - Retrieve reply from the PCI device.
+ * @pdev:  PCI device to receive the reply from.
+ * @cmd_reply: Command reply payload.
+ * @cmd_reply_size:Size of the command reply payload.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_retrieve_reply(struct pci_dev *pdev, struct ne_pci_dev_cmd_reply 
*cmd_reply,
+size_t cmd_reply_size)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+   memcpy_fromio(cmd_reply, ne_pci_dev->iomem_base + NE_RECV_DATA, 
cmd_reply_size);
+
+   return 0;
+}
+
+/**
+ * ne_wait_for_reply() - Wait for a reply of a PCI device command.
+ * @pdev:  PCI device for which a reply is waited.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev mutex 
held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_wait_for_reply(struct pci_dev *pdev)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+   int rc = -EINVAL;
+
+   /*
+* TODO: Update to _interruptible and handle interrupted wait event
+* e.g. -ERESTARTSYS, incoming signals + update timeout, if needed.
+*/
+   rc = wait_event_timeout(ne_pci_dev->cmd_reply_wait_q,
+   atomic_read(_pci_dev->cmd_reply_avail) != 0,
+   msecs_to_jiffies(NE_DEFAULT_TIMEOUT_MSECS));
+   if (!rc)
+   return -ETIMEDOUT;
+
+   return 0;
+}
+
+int ne_do_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type cmd_type,
+ void *cmd_request, size_t cmd_request_size,
+ struct ne_pci_dev_cmd_reply *cmd_reply, size_t cmd_reply_size)
+{
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+   int rc = -EINVAL;
+
+   if (cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD) {
+   dev_err_ratelimited(>dev, "Invalid cmd type=%u\n", 
cmd_type);
+
+   return -EINVAL;
+   }
+
+   if (!cmd_request) {
+   dev_err_ratelimited(>dev, "Null cmd request\n");
+
+   return -EINVAL;
+   }
+
+   if (cmd_request_size > NE_SEND_DATA_SIZE) {
+  

[PATCH v6 04/18] nitro_enclaves: Init PCI device driver

2020-08-05 Thread Andra Paraschiv
The Nitro Enclaves PCI device is used by the kernel driver as a means of
communication with the hypervisor on the host where the primary VM and
the enclaves run. It handles requests with regard to enclave lifetime.

Setup the PCI device driver and add support for MSI-X interrupts.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Alexandru Ciobotaru 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update NE PCI driver name to "nitro_enclaves".

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.
* Remove the WARN_ON calls.
* Remove linux/bug include that is not needed.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update PCI device setup functions to receive PCI device data structure and
  then get private data from it inside the functions logic.
* Remove the BUG_ON calls.
* Add teardown function for MSI-X setup.
* Update goto labels to match their purpose.
* Implement TODO for NE PCI device disable state check.
* Update function name for NE PCI device probe / remove.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 269 +++
 1 file changed, 269 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
new file mode 100644
index ..31650dcd592e
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -0,0 +1,269 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * DOC: Nitro Enclaves (NE) PCI device driver.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+/**
+ * NE_DEFAULT_TIMEOUT_MSECS - Default timeout to wait for a reply from
+ *   the NE PCI device.
+ */
+#define NE_DEFAULT_TIMEOUT_MSECS   (12) /* 120 sec */
+
+static const struct pci_device_id ne_pci_ids[] = {
+   { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_NE) },
+   { 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, ne_pci_ids);
+
+/**
+ * ne_setup_msix() - Setup MSI-X vectors for the PCI device.
+ * @pdev:  PCI device to setup the MSI-X for.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_setup_msix(struct pci_dev *pdev)
+{
+   int nr_vecs = 0;
+   int rc = -EINVAL;
+
+   nr_vecs = pci_msix_vec_count(pdev);
+   if (nr_vecs < 0) {
+   rc = nr_vecs;
+
+   dev_err(>dev, "Error in getting vec count [rc=%d]\n", rc);
+
+   return rc;
+   }
+
+   rc = pci_alloc_irq_vectors(pdev, nr_vecs, nr_vecs, PCI_IRQ_MSIX);
+   if (rc < 0) {
+   dev_err(>dev, "Error in alloc MSI-X vecs [rc=%d]\n", rc);
+
+   return rc;
+   }
+
+   return 0;
+}
+
+/**
+ * ne_teardown_msix() - Teardown MSI-X vectors for the PCI device.
+ * @pdev:  PCI device to teardown the MSI-X for.
+ *
+ * Context: Process context.
+ */
+static void ne_teardown_msix(struct pci_dev *pdev)
+{
+   pci_free_irq_vectors(pdev);
+}
+
+/**
+ * ne_pci_dev_enable() - Select the PCI device version and enable it.
+ * @pdev:  PCI device to select version for and then enable.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_pci_dev_enable(struct pci_dev *pdev)
+{
+   u8 dev_enable_reply = 0;
+   u16 dev_version_reply = 0;
+   struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+   iowrite16(NE_VERSION_MAX, ne_pci_dev->iomem_base + NE_VERSION);
+
+   dev_version_reply = ioread16(ne_pci_dev->iomem_base + NE_VERSION);
+   if (dev_version_reply != NE_VERSION_MAX) {
+   dev_err(>dev, "Error in pci dev version cmd\n");
+
+   return -EIO;
+   }
+
+   iowrite8(NE_ENABLE_ON, ne_pci_dev->iomem_base + NE_ENABLE);
+
+   dev_enable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
+   if (dev_enable_reply != NE_ENABLE_ON) {
+   dev_err(>dev, "Error in pci dev enable cmd\n");
+
+   return -EIO;
+   }
+
+   return 0;
+}
+
+/**
+ * ne_pci_dev_disable() - Disable the PCI device.
+ * @pdev:  PCI device to disable.
+ *
+ * Context: Process context.
+ */
+static void ne_pci_dev_disable(struct pci_dev *pdev)
+{
+   u

[PATCH v6 02/18] nitro_enclaves: Define the PCI device interface

2020-08-05 Thread Andra Paraschiv
The Nitro Enclaves (NE) driver communicates with a new PCI device, that
is exposed to a virtual machine (VM) and handles commands meant for
handling enclaves lifetime e.g. creation, termination, setting memory
regions. The communication with the PCI device is handled using a MMIO
space and MSI-X interrupts.

This device communicates with the hypervisor on the host, where the VM
that spawned the enclave itself runs, e.g. to launch a VM that is used
for the enclave.

Define the MMIO space of the NE PCI device, the commands that are
provided by this device. Add an internal data structure used as private
data for the PCI device driver and the function for the PCI device
command requests handling.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Alexandru Ciobotaru 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Add a TODO for including flags in the request to the NE PCI device to
  set a memory region for an enclave. It is not used for now.

v3 -> v4

* Remove the "packed" attribute and include padding in the NE data
  structures.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path naming to drivers/virt/nitro_enclaves.
* Update NE_ENABLE_OFF / NE_ENABLE_ON defines.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.h | 321 +++
 1 file changed, 321 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.h

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.h 
b/drivers/virt/nitro_enclaves/ne_pci_dev.h
new file mode 100644
index ..bfae8af4bf06
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.h
@@ -0,0 +1,321 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _NE_PCI_DEV_H_
+#define _NE_PCI_DEV_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * DOC: Nitro Enclaves (NE) PCI device
+ */
+
+#define PCI_DEVICE_ID_NE   (0xe4c1)
+#define PCI_BAR_NE (0x03)
+
+/**
+ * DOC: Device registers
+ */
+
+/**
+ * NE_ENABLE - (1 byte) Register to notify the device that the driver is using
+ *it (Read/Write).
+ */
+#define NE_ENABLE  (0x)
+#define NE_ENABLE_OFF  (0x00)
+#define NE_ENABLE_ON   (0x01)
+
+/**
+ * NE_VERSION - (2 bytes) Register to select the device run-time version
+ * (Read/Write).
+ */
+#define NE_VERSION (0x0002)
+#define NE_VERSION_MAX (0x0001)
+
+/**
+ * NE_COMMAND - (4 bytes) Register to notify the device what command was
+ * requested (Write-Only).
+ */
+#define NE_COMMAND (0x0004)
+
+/**
+ * NE_EVTCNT - (4 bytes) Register to notify the driver that a reply or a device
+ *event is available (Read-Only):
+ *- Lower half  - command reply counter
+ *- Higher half - out-of-band device event counter
+ */
+#define NE_EVTCNT  (0x000c)
+#define NE_EVTCNT_REPLY_SHIFT  (0)
+#define NE_EVTCNT_REPLY_MASK   (0x)
+#define NE_EVTCNT_REPLY(cnt)   (((cnt) & NE_EVTCNT_REPLY_MASK) >> \
+   NE_EVTCNT_REPLY_SHIFT)
+#define NE_EVTCNT_EVENT_SHIFT  (16)
+#define NE_EVTCNT_EVENT_MASK   (0x)
+#define NE_EVTCNT_EVENT(cnt)   (((cnt) & NE_EVTCNT_EVENT_MASK) >> \
+   NE_EVTCNT_EVENT_SHIFT)
+
+/**
+ * NE_SEND_DATA - (240 bytes) Buffer for sending the command request payload
+ *   (Read/Write).
+ */
+#define NE_SEND_DATA   (0x0010)
+
+/**
+ * NE_RECV_DATA - (240 bytes) Buffer for receiving the command reply payload
+ *   (Read-Only).
+ */
+#define NE_RECV_DATA   (0x0100)
+
+/**
+ * DOC: Device MMIO buffer sizes
+ */
+
+/**
+ * NE_SEND_DATA_SIZE / NE_RECV_DATA_SIZE - 240 bytes for send / recv buffer.
+ */
+#define NE_SEND_DATA_SIZE  (240)
+#define NE_RECV_DATA_SIZE  (240)
+
+/**
+ * DOC: MSI-X interrupt vectors
+ */
+
+/**
+ * NE_VEC_REPLY - MSI-X vector used for command reply notification.
+ */
+#define NE_VEC_REPLY   (0)
+
+/**
+ * NE_VEC_EVENT - MSI-X vector used for out-of-band events e.g. enclave crash.
+ */
+#define NE_VEC_EVENT   (1)
+
+/**
+ * enum ne_pci_dev_cmd_type - Device command types.
+ * @INVALID_CMD:   Invalid command.
+ * @ENCLAVE_START: Start an enclave, after setting its resources.
+ * @ENCLAVE_GET_SLOT:  Get the slot uid of an enclave.
+ * @ENCLAVE_STOP:  Terminate an enclave.
+ * @SLOT_ALLOC :   Allocate a slot for an enclave.
+ * @SLOT_FREE: Free the slot allocated for an enclave
+ * @SLOT_ADD_MEM:  Add a memory region to an enclave slot.
+ * @SLOT_ADD_VCPU: Add a vCPU to an enclave slot.
+ * @SLOT_COUNT :   Get the number of allocat

[PATCH v6 00/18] Add support for Nitro Enclaves

2020-08-05 Thread Andra Paraschiv
nfo for NUMA-aware hugetlb config.
* v5: https://lore.kernel.org/lkml/20200715194540.45532-1-andra...@amazon.com/

v4 -> v5

* Rebase on top of v5.8-rc5.
* Add more details about the ioctl calls usage e.g. error codes.
* Update the ioctl to set an enclave vCPU to not return a fd.
* Add specific NE error codes.
* Split the NE CPU pool in CPU cores cpumasks.
* Remove log on copy_from_user() / copy_to_user() failure.
* Release the reference to the NE PCI device on failure paths.
* Close enclave fd on copy_to_user() failure.
* Set empty string in case of invalid NE CPU pool sysfs value.
* Early exit on NE CPU pool setup if enclave(s) already running.
* Add more sanity checks for provided vCPUs e.g. maximum possible value.
* Split logic for checking if a vCPU is in pool / getting a vCPU from pool.
* Exit without unpinning the pages on NE PCI dev request failure.
* Add check for the memory region user space address alignment.
* Update the logic to set memory region to not have a hardcoded check for 2 MiB.
* Add arch dependency for Arm / x86.
* v4: https://lore.kernel.org/lkml/20200622200329.52996-1-andra...@amazon.com/

v3 -> v4

* Rebase on top of v5.8-rc2.
* Add NE API version and the corresponding ioctl call.
* Add enclave / image load flags options.
* Decouple NE ioctl interface from KVM API.
* Remove the "packed" attribute and include padding in the NE data structures.
* Update documentation based on the changes from v4.
* Update sample to match the updates in v4.
* Remove the NE CPU pool init during NE kernel module loading.
* Setup the NE CPU pool at runtime via a sysfs file for the kernel parameter.
* Check if the enclave memory and CPUs are from the same NUMA node.
* Add minimum enclave memory size definition.
* v3: https://lore.kernel.org/lkml/20200525221334.62966-1-andra...@amazon.com/ 

v2 -> v3

* Rebase on top of v5.7-rc7.
* Add changelog to each patch in the series.
* Remove "ratelimited" from the logs that are not in the ioctl call paths.
* Update static calls sanity checks.
* Remove file ops that do nothing for now.
* Remove GPL additional wording as SPDX-License-Identifier is already in place.
* v2: https://lore.kernel.org/lkml/20200522062946.28973-1-andra...@amazon.com/

v1 -> v2

* Rebase on top of v5.7-rc6.
* Adapt codebase based on feedback from v1.
* Update ioctl number definition - major and minor.
* Add sample / documentation for the ioctl interface basic flow usage.
* Update cover letter to include more context on the NE overall.
* Add fix for the enclave / vcpu fd creation error cleanup path.
* Add fix reported by kbuild test robot .
* v1: https://lore.kernel.org/lkml/20200421184150.68011-1-andra...@amazon.com/

---

Andra Paraschiv (18):
  nitro_enclaves: Add ioctl interface definition
  nitro_enclaves: Define the PCI device interface
  nitro_enclaves: Define enclave info for internal bookkeeping
  nitro_enclaves: Init PCI device driver
  nitro_enclaves: Handle PCI device command requests
  nitro_enclaves: Handle out-of-band PCI device events
  nitro_enclaves: Init misc device providing the ioctl interface
  nitro_enclaves: Add logic for creating an enclave VM
  nitro_enclaves: Add logic for setting an enclave vCPU
  nitro_enclaves: Add logic for getting the enclave image load info
  nitro_enclaves: Add logic for setting an enclave memory region
  nitro_enclaves: Add logic for starting an enclave
  nitro_enclaves: Add logic for terminating an enclave
  nitro_enclaves: Add Kconfig for the Nitro Enclaves driver
  nitro_enclaves: Add Makefile for the Nitro Enclaves driver
  nitro_enclaves: Add sample for ioctl interface usage
  nitro_enclaves: Add overview documentation
  MAINTAINERS: Add entry for the Nitro Enclaves driver

 Documentation/nitro_enclaves/ne_overview.rst  |   87 +
 .../userspace-api/ioctl/ioctl-number.rst  |5 +-
 MAINTAINERS   |   13 +
 drivers/virt/Kconfig  |2 +
 drivers/virt/Makefile |2 +
 drivers/virt/nitro_enclaves/Kconfig   |   16 +
 drivers/virt/nitro_enclaves/Makefile  |   11 +
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 1517 +
 drivers/virt/nitro_enclaves/ne_misc_dev.h |   92 +
 drivers/virt/nitro_enclaves/ne_pci_dev.c  |  600 +++
 drivers/virt/nitro_enclaves/ne_pci_dev.h  |  321 
 include/linux/nitro_enclaves.h|   11 +
 include/uapi/linux/nitro_enclaves.h   |  327 
 samples/nitro_enclaves/.gitignore |2 +
 samples/nitro_enclaves/Makefile   |   16 +
 samples/nitro_enclaves/ne_ioctl_sample.c  |  853 +
 16 files changed, 3874 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/nitro_enclaves/ne_overview.rst
 create mode 100644 drivers/virt/nitro_enclaves/Kconfig
 create mode 100644 drivers/virt/nitro_enclaves/Makefile
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c
 create mode 100644 drivers/virt/n

[PATCH v6 01/18] nitro_enclaves: Add ioctl interface definition

2020-08-05 Thread Andra Paraschiv
The Nitro Enclaves driver handles the enclave lifetime management. This
includes enclave creation, termination and setting up its resources such
as memory and CPU.

An enclave runs alongside the VM that spawned it. It is abstracted as a
process running in the VM that launched it. The process interacts with
the NE driver, that exposes an ioctl interface for creating an enclave
and setting up its resources.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
Reviewed-by: Stefan Hajnoczi 
---
Changelog

v5 -> v6

* Fix typo in the description about the NE CPU pool.
* Update documentation to kernel-doc format.
* Remove the ioctl to query API version.

v4 -> v5

* Add more details about the ioctl calls usage e.g. error codes, file
  descriptors used.
* Update the ioctl to set an enclave vCPU to not return a file
  descriptor.
* Add specific NE error codes.

v3 -> v4

* Decouple NE ioctl interface from KVM API.
* Add NE API version and the corresponding ioctl call.
* Add enclave / image load flags options.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Add ioctl for getting enclave image load metadata.
* Update NE_ENCLAVE_START ioctl name to NE_START_ENCLAVE.
* Add entry in Documentation/userspace-api/ioctl/ioctl-number.rst for NE
  ioctls.
* Update NE ioctls definition based on the updated ioctl range for major
  and minor.
---
 .../userspace-api/ioctl/ioctl-number.rst  |   5 +-
 include/linux/nitro_enclaves.h|  11 +
 include/uapi/linux/nitro_enclaves.h   | 327 ++
 3 files changed, 342 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/nitro_enclaves.h
 create mode 100644 include/uapi/linux/nitro_enclaves.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 59472cd6a11d..783440c6719b 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -328,8 +328,11 @@ Code  Seq#Include File 
  Comments
 0xAC  00-1F  linux/raw.h
 0xAD  00 Netfilter 
device in development:
  
<mailto:ru...@rustcorp.com.au>
-0xAE  alllinux/kvm.h 
Kernel-based Virtual Machine
+0xAE  00-1F  linux/kvm.h 
Kernel-based Virtual Machine
  
<mailto:k...@vger.kernel.org>
+0xAE  40-FF  linux/kvm.h 
Kernel-based Virtual Machine
+ 
<mailto:k...@vger.kernel.org>
+0xAE  20-3F  linux/nitro_enclaves.h  Nitro 
Enclaves
 0xAF  00-1F  linux/fsl_hypervisor.h  Freescale 
hypervisor
 0xB0  allRATIO 
devices in development:
  
<mailto:v...@ratio.de>
diff --git a/include/linux/nitro_enclaves.h b/include/linux/nitro_enclaves.h
new file mode 100644
index ..d91ef2bfdf47
--- /dev/null
+++ b/include/linux/nitro_enclaves.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _LINUX_NITRO_ENCLAVES_H_
+#define _LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+#endif /* _LINUX_NITRO_ENCLAVES_H_ */
diff --git a/include/uapi/linux/nitro_enclaves.h 
b/include/uapi/linux/nitro_enclaves.h
new file mode 100644
index ..87b4ab0fca18
--- /dev/null
+++ b/include/uapi/linux/nitro_enclaves.h
@@ -0,0 +1,327 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
+#define _UAPI_LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+/**
+ * DOC: Nitro Enclaves (NE) Kernel Driver Interface
+ */
+
+/**
+ * NE_CREATE_VM - The command is used to create a slot that is associated with
+ *   an enclave VM.
+ *   The generated unique slot id is an output parameter.
+ *   The ioctl can be invoked on the /dev/nitro_enclaves fd, before
+ *   setting any resources, such as memory and vCPUs, for an
+ *   enclave. Memory and vCPUs are set for the slot mapped to an 
enclave.
+ *   A NE CPU pool has to be set before calling this function. The
+ *   pool can be set after the NE driver load, using
+ *   /sys/module/nitro_enclaves/parameters/ne_cpus.
+ *   Its format is the detailed in the cpu-

[PATCH v5 18/18] MAINTAINERS: Add entry for the Nitro Enclaves driver

2020-07-15 Thread Andra Paraschiv
Signed-off-by: Andra Paraschiv 
---
Changelog

v4 -> v5

* No changes.

v3 -> v4

* No changes.

v2 -> v3

* Update file entries to be in alphabetical order.

v1 -> v2

* No changes.
---
 MAINTAINERS | 13 +
 1 file changed, 13 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b4a43a9e7fbc..a1789a8df546 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12116,6 +12116,19 @@ S: Maintained
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2.git
 F: arch/nios2/
 
+NITRO ENCLAVES (NE)
+M: Andra Paraschiv 
+M: Alexandru Vasile 
+M: Alexandru Ciobotaru 
+L: linux-kernel@vger.kernel.org
+S: Supported
+W: https://aws.amazon.com/ec2/nitro/nitro-enclaves/
+F: Documentation/nitro_enclaves/
+F: drivers/virt/nitro_enclaves/
+F: include/linux/nitro_enclaves.h
+F: include/uapi/linux/nitro_enclaves.h
+F: samples/nitro_enclaves/
+
 NOHZ, DYNTICKS SUPPORT
 M: Frederic Weisbecker 
 M: Thomas Gleixner 
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



[PATCH v5 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves driver

2020-07-15 Thread Andra Paraschiv
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v4 -> v5

* No changes.

v3 -> v4

* No changes.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path to Makefile to match the drivers/virt/nitro_enclaves
  directory.
---
 drivers/virt/Makefile|  2 ++
 drivers/virt/nitro_enclaves/Makefile | 11 +++
 2 files changed, 13 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/Makefile

diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index fd331247c27a..f28425ce4b39 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -5,3 +5,5 @@
 
 obj-$(CONFIG_FSL_HV_MANAGER)   += fsl_hypervisor.o
 obj-y  += vboxguest/
+
+obj-$(CONFIG_NITRO_ENCLAVES)   += nitro_enclaves/
diff --git a/drivers/virt/nitro_enclaves/Makefile 
b/drivers/virt/nitro_enclaves/Makefile
new file mode 100644
index ..e9f4fcd1591e
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Enclave lifetime management support for Nitro Enclaves (NE).
+
+obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves.o
+
+nitro_enclaves-y := ne_pci_dev.o ne_misc_dev.o
+
+ccflags-y += -Wall
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



  1   2   >