Re: [PATCH] nitro_enclaves: set master in the procedure of NE probe

2021-01-20 Thread Paraschiv, Andra-Irina



On 19/01/2021 05:30, Longpeng(Mike) wrote:

According the PCI spec:
   Bus Master Enable – Controls the ability of a PCI Express
   Endpoint to issue Memory and I/O Read/Write Requests, and
   the ability of a Root or Switch Port to forward Memory and
   I/O Read/Write Requests in the Upstream direction

Set BusMaster to make the driver to be PCI conformant.


Could update the commit title and message body to reflect more the why 
and what for the change. One option can be:


nitro_enclaves: Set Bus Master for the NE PCI device

Enable Bus Master for the NE PCI device, according to the PCI spec
for submitting memory or I/O requests:
  Bus Master Enable ...



Please include the changelog in the commit message for the next revision(s).

+ Greg in CC, as the patches for the Nitro Enclaves kernel driver are 
first included in the char misc tree, then in the linux next and finally 
in the mainline.




Signed-off-by: Longpeng(Mike) 
---
  drivers/virt/nitro_enclaves/ne_pci_dev.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index b9c1de4..143207e 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -480,6 +480,8 @@ static int ne_pci_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
 goto free_ne_pci_dev;
 }

+   pci_set_master(pdev);


I was looking if we need the reverse for this - pci_clear_master() [1] - 
on the error and remove / shutdown codebase paths, but 
pci_disable_device() seems to include the bus master disable logic [2][3].


Thanks,
Andra

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pci.c?h=v5.11-rc4#n4312
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pci.c?h=v5.11-rc4#n2104
[3] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pci.c?h=v5.11-rc4#n4242



+
 rc = pci_request_regions_exclusive(pdev, "nitro_enclaves");
 if (rc < 0) {
 dev_err(>dev, "Error in pci request regions [rc=%d]\n", 
rc);
--
1.8.3.1






Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v4 0/5] vsock: Add flags field in the vsock address

2020-12-14 Thread Paraschiv, Andra-Irina



On 14/12/2020 19:09, Stefano Garzarella wrote:


On Mon, Dec 14, 2020 at 06:11:17PM +0200, Andra Paraschiv wrote:
vsock enables communication between virtual machines and the host 
they are

running on. Nested VMs can be setup to use vsock channels, as the multi
transport support has been available in the mainline since the v5.5 
Linux kernel

has been released.

Implicitly, if no host->guest vsock transport is loaded, all the 
vsock packets
are forwarded to the host. This behavior can be used to setup 
communication
channels between sibling VMs that are running on the same host. One 
example can

be the vsock channels that can be established within AWS Nitro Enclaves
(see Documentation/virt/ne_overview.rst).

To be able to explicitly mark a connection as being used for a 
certain use case,
add a flags field in the vsock address data structure. The value of 
the flags
field is taken into consideration when the vsock transport is 
assigned. This way

can distinguish between different use cases, such as nested VMs / local
communication and sibling VMs.

The flags field can be set in the user space application connect 
logic. On the

listen path, the field can be set in the kernel space logic.


I reviewed and tested all the patches, great job!



Thanks for checking it out.

Andra





Thank you.

Andra

---

Patch Series Changelog

The patch series is built on top of v5.10.

GitHub repo branch for the latest version of the patch series:

* https://github.com/andraprs/linux/tree/vsock-flag-sibling-comm-v4

v3 -> v4

* Rebase on top of v5.10.
* Add check for supported flag values.
* Update the "svm_flags" field to be 1 byte instead of 2 bytes.
* v3: 
https://lore.kernel.org/lkml/20201211103241.17751-1-andra...@amazon.com/


v2 -> v3

* Rebase on top of v5.10-rc7.
* Add "svm_flags" as a new field, not reusing "svm_reserved1".
* Update comments to mention when the "VMADDR_FLAG_TO_HOST" flag is 
set in the

 connect and listen paths.
* Update bitwise check logic to not compare result to the flag value.
* v2: 
https://lore.kernel.org/lkml/20201204170235.84387-1-andra...@amazon.com/


v1 -> v2

* Update the vsock flag naming to "VMADDR_FLAG_TO_HOST".
* Use bitwise operators to setup and check the vsock flag.
* Set the vsock flag on the receive path in the vsock transport 
assignment

 logic.
* Merge the checks for the g2h transport assignment in one "if" block.
* v1: 
https://lore.kernel.org/lkml/20201201152505.19445-1-andra...@amazon.com/


---

Andra Paraschiv (5):
 vm_sockets: Add flags field in the vsock address data structure
 vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag
 vsock_addr: Check for supported flag values
 af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path
 af_vsock: Assign the vsock transport considering the vsock address
   flags

include/uapi/linux/vm_sockets.h | 26 +-
net/vmw_vsock/af_vsock.c    | 21 +++--
net/vmw_vsock/vsock_addr.c  |  4 +++-
3 files changed, 47 insertions(+), 4 deletions(-)

--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. 
Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. 
Registered in Romania. Registration number J22/2621/2005.









Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v3 0/4] vsock: Add flags field in the vsock address

2020-12-14 Thread Paraschiv, Andra-Irina



On 14/12/2020 10:13, Stefano Garzarella wrote:


On Sat, Dec 12, 2020 at 09:16:08AM -0800, Jakub Kicinski wrote:

On Fri, 11 Dec 2020 16:24:13 +0100 Stefano Garzarella wrote:

On Fri, Dec 11, 2020 at 12:32:37PM +0200, Andra Paraschiv wrote:
>vsock enables communication between virtual machines and the host 
they are
>running on. Nested VMs can be setup to use vsock channels, as the 
multi
>transport support has been available in the mainline since the v5.5 
Linux kernel

>has been released.
>
>Implicitly, if no host->guest vsock transport is loaded, all the 
vsock packets
>are forwarded to the host. This behavior can be used to setup 
communication
>channels between sibling VMs that are running on the same host. One 
example can
>be the vsock channels that can be established within AWS Nitro 
Enclaves

>(see Documentation/virt/ne_overview.rst).
>
>To be able to explicitly mark a connection as being used for a 
certain use case,
>add a flags field in the vsock address data structure. The value of 
the flags
>field is taken into consideration when the vsock transport is 
assigned. This way
>can distinguish between different use cases, such as nested VMs / 
local

>communication and sibling VMs.
>
>The flags field can be set in the user space application connect 
logic. On the

>listen path, the field can be set in the kernel space logic.
>

I reviewed all the patches and they are in a good shape!

Maybe the last thing to add is a flags check in the
vsock_addr_validate(), to avoid that flags that we don't know how to
handle are specified.
For example if in the future we add new flags that this version of the
kernel is not able to satisfy, we should return an error to the
application.

I mean something like this:

 diff --git a/net/vmw_vsock/vsock_addr.c 
b/net/vmw_vsock/vsock_addr.c

 index 909de26cb0e7..73bb1d2fa526 100644
 --- a/net/vmw_vsock/vsock_addr.c
 +++ b/net/vmw_vsock/vsock_addr.c
 @@ -22,6 +22,8 @@ EXPORT_SYMBOL_GPL(vsock_addr_init);

  int vsock_addr_validate(const struct sockaddr_vm *addr)
  {
 +   unsigned short svm_valid_flags = VMADDR_FLAG_TO_HOST;
 +
 if (!addr)
 return -EFAULT;

 @@ -31,6 +33,9 @@ int vsock_addr_validate(const struct 
sockaddr_vm *addr)

 if (addr->svm_zero[0] != 0)
 return -EINVAL;


Strictly speaking this check should be superseded by the check below
(AKA removed). We used to check svm_zero[0], with the new field added
this now checks svm_zero[2]. Old applications may have not initialized
svm_zero[2] (we're talking about binary compatibility here, apps built
with old headers).


 +   if (addr->svm_flags & ~svm_valid_flags)
 +   return -EINVAL;


The flags should also probably be one byte (we can define a "more
flags" flag to unlock further bytes) - otherwise on big endian the
new flag will fall into svm_zero[1] so the v3 improvements are moot
for big endian, right?


Right, I assumed the entire svm_zero[] was zeroed out, but we can't be
sure.

So, I agree to change the svm_flags to 1 byte (__u8), and remove the
superseded check that you pointed out.
With these changes we should be fully binary compatibility.



Here we go, sent out v4:

https://lore.kernel.org/lkml/20201214161122.37717-1-andra...@amazon.com/

Thank you both.

Andra




 return 0;
  }
  EXPORT_SYMBOL_GPL(vsock_addr_validate);









Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v3 0/4] vsock: Add flags field in the vsock address

2020-12-11 Thread Paraschiv, Andra-Irina



On 11/12/2020 17:24, Stefano Garzarella wrote:


Hi Andra,

On Fri, Dec 11, 2020 at 12:32:37PM +0200, Andra Paraschiv wrote:
vsock enables communication between virtual machines and the host 
they are

running on. Nested VMs can be setup to use vsock channels, as the multi
transport support has been available in the mainline since the v5.5 
Linux kernel

has been released.

Implicitly, if no host->guest vsock transport is loaded, all the 
vsock packets
are forwarded to the host. This behavior can be used to setup 
communication
channels between sibling VMs that are running on the same host. One 
example can

be the vsock channels that can be established within AWS Nitro Enclaves
(see Documentation/virt/ne_overview.rst).

To be able to explicitly mark a connection as being used for a 
certain use case,
add a flags field in the vsock address data structure. The value of 
the flags
field is taken into consideration when the vsock transport is 
assigned. This way

can distinguish between different use cases, such as nested VMs / local
communication and sibling VMs.

The flags field can be set in the user space application connect 
logic. On the

listen path, the field can be set in the kernel space logic.



I reviewed all the patches and they are in a good shape!


Hi Stefano,

Thanks for the overall review and for the reconfirmation of the Rb for 
the vsock address data structure changes.




Maybe the last thing to add is a flags check in the
vsock_addr_validate(), to avoid that flags that we don't know how to
handle are specified.


I can add this validation as a new patch in the series, next revision.

Thanks,
Andra



For example if in the future we add new flags that this version of the
kernel is not able to satisfy, we should return an error to the
application.

I mean something like this:

    diff --git a/net/vmw_vsock/vsock_addr.c b/net/vmw_vsock/vsock_addr.c
    index 909de26cb0e7..73bb1d2fa526 100644
    --- a/net/vmw_vsock/vsock_addr.c
    +++ b/net/vmw_vsock/vsock_addr.c
    @@ -22,6 +22,8 @@ EXPORT_SYMBOL_GPL(vsock_addr_init);

 int vsock_addr_validate(const struct sockaddr_vm *addr)
 {
    +   unsigned short svm_valid_flags = VMADDR_FLAG_TO_HOST;
    +
    if (!addr)
    return -EFAULT;

    @@ -31,6 +33,9 @@ int vsock_addr_validate(const struct sockaddr_vm 
*addr)

    if (addr->svm_zero[0] != 0)
    return -EINVAL;

    +   if (addr->svm_flags & ~svm_valid_flags)
    +   return -EINVAL;
    +
    return 0;
 }
 EXPORT_SYMBOL_GPL(vsock_addr_validate);


Thanks,
Stefano


Thank you.

Andra

---

Patch Series Changelog

The patch series is built on top of v5.10-rc7.

GitHub repo branch for the latest version of the patch series:

* https://github.com/andraprs/linux/tree/vsock-flag-sibling-comm-v3

v2 -> v3

* Rebase on top of v5.10-rc7.
* Add "svm_flags" as a new field, not reusing "svm_reserved1".
* Update comments to mention when the "VMADDR_FLAG_TO_HOST" flag is 
set in the

 connect and listen paths.
* Update bitwise check logic to not compare result to the flag value.
* v2: 
https://lore.kernel.org/lkml/20201204170235.84387-1-andra...@amazon.com/


v1 -> v2

* Update the vsock flag naming to "VMADDR_FLAG_TO_HOST".
* Use bitwise operators to setup and check the vsock flag.
* Set the vsock flag on the receive path in the vsock transport 
assignment

 logic.
* Merge the checks for the g2h transport assignment in one "if" block.
* v1: 
https://lore.kernel.org/lkml/20201201152505.19445-1-andra...@amazon.com/


---

Andra Paraschiv (4):
 vm_sockets: Add flags field in the vsock address data structure
 vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag
 af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path
 af_vsock: Assign the vsock transport considering the vsock address
   flags

include/uapi/linux/vm_sockets.h | 25 -
net/vmw_vsock/af_vsock.c    | 21 +++--
2 files changed, 43 insertions(+), 3 deletions(-)

--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. 
Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. 
Registered in Romania. Registration number J22/2621/2005.









Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v2 1/4] vm_sockets: Include flags field in the vsock address data structure

2020-12-10 Thread Paraschiv, Andra-Irina



On 09/12/2020 19:30, Jakub Kicinski wrote:

On Wed, 9 Dec 2020 17:17:56 +0200 Paraschiv, Andra-Irina wrote:

I agree that could be a problem, but here some considerations:
- I checked some applications (qemu-guest-agent, ncat, iperf-vsock) and
   all use the same pattern: allocate memory, initialize all the
   sockaddr_vm to zero (to be sure to initialize the svm_zero), set the
   cid and port fields.
   So we should be safe, but of course it may not always be true.

- For now the issue could affect only nested VMs. We introduced this
   support one year ago, so it's something new and maybe we don't cause
   too many problems.

As an alternative, what about using 1 or 2 bytes from svm_zero[]?
These must be set at zero, even if we only check the first byte in the
kernel.

Thanks for the follow-up info.

We can also consider the "svm_zero" option and could use 2 bytes from
that field for "svm_flags", keeping the same "unsigned short" type.

Or use svm_zero as a gate for interpreting other fields?
If svm_zero[0]* == something start checking the value of reserved1?
* in practice the name can be unioned to something more palatable ;)


Thanks for the shared option, that could be one case to reuse the 
reserved field, with a two phase check logic.


I'll give it a try to the option of having a new field "svm_flags" and 
the "svm_zero" updated and then send out a new revision. Just let me 
know if there are other updates needed / questions in the meantime.



struct sockaddr_vm {
    __kernel_sa_family_t svm_family;
    unsigned short svm_reserved1;
    unsigned int svm_port;
    unsigned int svm_cid;
    unsigned short svm_flags;
    unsigned char svm_zero[sizeof(struct sockaddr) -
               sizeof(sa_family_t) -
               sizeof(unsigned short) -
               sizeof(unsigned int) - sizeof(unsigned int) -
sizeof(unsigned short)];
};


Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v2 1/4] vm_sockets: Include flags field in the vsock address data structure

2020-12-09 Thread Paraschiv, Andra-Irina



On 09/12/2020 12:48, Stefano Garzarella wrote:


On Tue, Dec 08, 2020 at 10:42:22AM -0800, Jakub Kicinski wrote:

On Tue, 8 Dec 2020 20:23:24 +0200 Paraschiv, Andra-Irina wrote:

>> --- a/include/uapi/linux/vm_sockets.h
>> +++ b/include/uapi/linux/vm_sockets.h
>> @@ -145,7 +145,7 @@
>>
>>   struct sockaddr_vm {
>>    __kernel_sa_family_t svm_family;
>> - unsigned short svm_reserved1;
>> + unsigned short svm_flags;
>>    unsigned int svm_port;
>>    unsigned int svm_cid;
>>    unsigned char svm_zero[sizeof(struct sockaddr) -
> Since this is a uAPI header I gotta ask - are you 100% sure that it's
> okay to rename this field?
>
> I didn't grasp from just reading the patches whether this is a 
uAPI or

> just internal kernel flag, seems like the former from the reading of
> the comment in patch 2. In which case what guarantees that existing
> users don't pass in garbage since the kernel doesn't check it was 0?

That's always good to double-check the uapi changes don't break / 
assume

something, thanks for bringing this up. :)

Sure, let's go through the possible options step by step. Let me 
know if

I get anything wrong and if I can help with clarifications.

There is the "svm_reserved1" field that is not used in the kernel
codebase. It is set to 0 on the receive (listen) path as part of the
vsock address initialization [1][2]. The "svm_family" and "svm_zero"
fields are checked as part of the address validation [3].

Now, with the current change to "svm_flags", the flow is the following:

* On the receive (listen) path, the remote address structure is
initialized as part of the vsock address init logic [2]. Then patch 3/4
of this series sets the "VMADDR_FLAG_TO_HOST" flag given a set of
conditions (local and remote CID > VMADDR_CID_HOST).

* On the connect path, the userspace logic can set the "svm_flags"
field. It can be set to 0 or 1 (VMADDR_FLAG_TO_HOST); or any other 
value

greater than 1. If the "VMADDR_FLAG_TO_HOST" flag is set, all the vsock
packets are then forwarded to the host.

* When the vsock transport is assigned, the "svm_flags" field is
checked, and if it has the "VMADDR_FLAG_TO_HOST" flag set, it goes on
with a guest->host transport (patch 4/4 of this series). Otherwise,
other specific flag value is not currently used.

Given all these points, the question remains what happens if the
"svm_flags" field is set on the connect path to a value higher than 1
(maybe a bogus one, not intended so). And it includes the
"VMADDR_FLAG_TO_HOST" value (the single flag set and specifically used
for now, but we should also account for any further possible flags). In
this case, all the vsock packets would be forwarded to the host and
maybe not intended so, having a bogus value for the flags field. Is 
this

possible case what you are referring to?


Correct. What if user basically declared the structure on the stack,
and only initialized the fields the kernel used to check?

This problem needs to be at the very least discussed in the commit
message.



I agree that could be a problem, but here some considerations:
- I checked some applications (qemu-guest-agent, ncat, iperf-vsock) and
  all use the same pattern: allocate memory, initialize all the
  sockaddr_vm to zero (to be sure to initialize the svm_zero), set the
  cid and port fields.
  So we should be safe, but of course it may not always be true.

- For now the issue could affect only nested VMs. We introduced this
  support one year ago, so it's something new and maybe we don't cause
  too many problems.

As an alternative, what about using 1 or 2 bytes from svm_zero[]?
These must be set at zero, even if we only check the first byte in the
kernel.


Thanks for the follow-up info.

We can also consider the "svm_zero" option and could use 2 bytes from 
that field for "svm_flags", keeping the same "unsigned short" type.


Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v2 1/4] vm_sockets: Include flags field in the vsock address data structure

2020-12-08 Thread Paraschiv, Andra-Irina



On 07/12/2020 23:29, Jakub Kicinski wrote:

On Fri, 4 Dec 2020 19:02:32 +0200 Andra Paraschiv wrote:

diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
index fd0ed7221645d..46735376a57a8 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -145,7 +145,7 @@

  struct sockaddr_vm {
   __kernel_sa_family_t svm_family;
- unsigned short svm_reserved1;
+ unsigned short svm_flags;
   unsigned int svm_port;
   unsigned int svm_cid;
   unsigned char svm_zero[sizeof(struct sockaddr) -

Since this is a uAPI header I gotta ask - are you 100% sure that it's
okay to rename this field?

I didn't grasp from just reading the patches whether this is a uAPI or
just internal kernel flag, seems like the former from the reading of
the comment in patch 2. In which case what guarantees that existing
users don't pass in garbage since the kernel doesn't check it was 0?


That's always good to double-check the uapi changes don't break / assume 
something, thanks for bringing this up. :)


Sure, let's go through the possible options step by step. Let me know if 
I get anything wrong and if I can help with clarifications.


There is the "svm_reserved1" field that is not used in the kernel 
codebase. It is set to 0 on the receive (listen) path as part of the 
vsock address initialization [1][2]. The "svm_family" and "svm_zero" 
fields are checked as part of the address validation [3].


Now, with the current change to "svm_flags", the flow is the following:

* On the receive (listen) path, the remote address structure is 
initialized as part of the vsock address init logic [2]. Then patch 3/4 
of this series sets the "VMADDR_FLAG_TO_HOST" flag given a set of 
conditions (local and remote CID > VMADDR_CID_HOST).


* On the connect path, the userspace logic can set the "svm_flags" 
field. It can be set to 0 or 1 (VMADDR_FLAG_TO_HOST); or any other value 
greater than 1. If the "VMADDR_FLAG_TO_HOST" flag is set, all the vsock 
packets are then forwarded to the host.


* When the vsock transport is assigned, the "svm_flags" field is 
checked, and if it has the "VMADDR_FLAG_TO_HOST" flag set, it goes on 
with a guest->host transport (patch 4/4 of this series). Otherwise, 
other specific flag value is not currently used.


Given all these points, the question remains what happens if the 
"svm_flags" field is set on the connect path to a value higher than 1 
(maybe a bogus one, not intended so). And it includes the 
"VMADDR_FLAG_TO_HOST" value (the single flag set and specifically used 
for now, but we should also account for any further possible flags). In 
this case, all the vsock packets would be forwarded to the host and 
maybe not intended so, having a bogus value for the flags field. Is this 
possible case what you are referring to?


Thanks,
Andra

[1] https://man7.org/linux/man-pages/man7/vsock.7.html
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/vmw_vsock/vsock_addr.c#n14
[3] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/vmw_vsock/vsock_addr.c#n23




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v2 4/4] af_vsock: Assign the vsock transport considering the vsock address flags

2020-12-07 Thread Paraschiv, Andra-Irina



On 07/12/2020 12:00, Stefano Garzarella wrote:


On Fri, Dec 04, 2020 at 07:02:35PM +0200, Andra Paraschiv wrote:

The vsock flags field can be set in the connect and (listen) receive
paths.

When the vsock transport is assigned, the remote CID is used to
distinguish between types of connection.

Use the vsock flags value (in addition to the CID) from the remote
address to decide which vsock transport to assign. For the sibling VMs
use case, all the vsock packets need to be forwarded to the host, so
always assign the guest->host transport if the VMADDR_FLAG_TO_HOST flag
is set. For the other use cases, the vsock transport assignment logic is
not changed.

Changelog

v1 -> v2

* Use bitwise operator to check the vsock flag.
* Use the updated "VMADDR_FLAG_TO_HOST" flag naming.
* Merge the checks for the g2h transport assignment in one "if" block.

Signed-off-by: Andra Paraschiv 
---
net/vmw_vsock/af_vsock.c | 9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 83d035eab0b05..66e643c3b5f85 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -421,7 +421,8 @@ static void vsock_deassign_transport(struct 
vsock_sock *vsk)

 * The vsk->remote_addr is used to decide which transport to use:
 *  - remote CID == VMADDR_CID_LOCAL or g2h->local_cid or 
VMADDR_CID_HOST if

 *    g2h is not loaded, will use local transport;
- *  - remote CID <= VMADDR_CID_HOST will use guest->host transport;
+ *  - remote CID <= VMADDR_CID_HOST or h2g is not loaded or remote 
flags field
+ *    includes VMADDR_FLAG_TO_HOST flag value, will use guest->host 
transport;

 *  - remote CID > VMADDR_CID_HOST will use host->guest transport;
 */
int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock 
*psk)
@@ -429,6 +430,7 @@ int vsock_assign_transport(struct vsock_sock 
*vsk, struct vsock_sock *psk)

  const struct vsock_transport *new_transport;
  struct sock *sk = sk_vsock(vsk);
  unsigned int remote_cid = vsk->remote_addr.svm_cid;
+  unsigned short remote_flags;
  int ret;

  /* If the packet is coming with the source and destination CIDs 
higher
@@ -443,6 +445,8 @@ int vsock_assign_transport(struct vsock_sock 
*vsk, struct vsock_sock *psk)

  vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
  vsk->remote_addr.svm_flags |= VMADDR_FLAG_TO_HOST;

+  remote_flags = vsk->remote_addr.svm_flags;
+
  switch (sk->sk_type) {
  case SOCK_DGRAM:
  new_transport = transport_dgram;
@@ -450,7 +454,8 @@ int vsock_assign_transport(struct vsock_sock 
*vsk, struct vsock_sock *psk)

  case SOCK_STREAM:
  if (vsock_use_local_transport(remote_cid))
  new_transport = transport_local;
-  else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g)
+  else if (remote_cid <= VMADDR_CID_HOST || 
!transport_h2g ||
+   (remote_flags & VMADDR_FLAG_TO_HOST) == 
VMADDR_FLAG_TO_HOST)


Maybe "remote_flags & VMADDR_FLAG_TO_HOST" should be enough, but the
patch is okay:

Reviewed-by: Stefano Garzarella 


Done, updated to have only the bitwise logic, without the comparison.

Thanks,
Andra




  new_transport = transport_g2h;
  else
  new_transport = transport_h2g;
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. 
Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. 
Registered in Romania. Registration number J22/2621/2005.









Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v2 2/4] vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag

2020-12-07 Thread Paraschiv, Andra-Irina



On 07/12/2020 11:59, Stefano Garzarella wrote:


On Fri, Dec 04, 2020 at 07:02:33PM +0200, Andra Paraschiv wrote:

Add VMADDR_FLAG_TO_HOST vsock flag that is used to setup a vsock
connection where all the packets are forwarded to the host.

Then, using this type of vsock channel, vsock communication between
sibling VMs can be built on top of it.

Changelog

v1 -> v2

* New patch in v2, it was split from the first patch in the series.
* Remove the default value for the vsock flags field.
* Update the naming for the vsock flag to "VMADDR_FLAG_TO_HOST".

Signed-off-by: Andra Paraschiv 
---
include/uapi/linux/vm_sockets.h | 15 +++
1 file changed, 15 insertions(+)

diff --git a/include/uapi/linux/vm_sockets.h 
b/include/uapi/linux/vm_sockets.h

index 46735376a57a8..72e1a3d05682d 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -114,6 +114,21 @@

#define VMADDR_CID_HOST 2

+/* The current default use case for the vsock channel is the following:
+ * local vsock communication between guest and host and nested VMs 
setup.
+ * In addition to this, implicitly, the vsock packets are forwarded 
to the host

+ * if no host->guest vsock transport is set.
+ *
+ * Set this flag value in the sockaddr_vm corresponding field if the 
vsock
+ * packets need to be always forwarded to the host. Using this 
behavior,

+ * vsock communication between sibling VMs can be setup.


Maybe we can add a sentence saying that this flag is set on the remote
peer address for an incoming connection when it is routed from the host
(local CID and remote CID > VMADDR_CID_HOST).


Sure, I can make it more clear when it is set e.g. in user space 
(connect path) and in kernel space (listen path).


Thanks,
Andra




+ *
+ * This way can explicitly distinguish between vsock channels 
created for
+ * different use cases, such as nested VMs (or local communication 
between

+ * guest and host) and sibling VMs.
+ */
+#define VMADDR_FLAG_TO_HOST 0x0001
+
/* Invalid vSockets version. */

#define VM_SOCKETS_INVALID_VERSION -1U
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. 
Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. 
Registered in Romania. Registration number J22/2621/2005.









Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v2 1/4] vm_sockets: Include flags field in the vsock address data structure

2020-12-07 Thread Paraschiv, Andra-Irina



On 07/12/2020 11:59, Stefano Garzarella wrote:


On Fri, Dec 04, 2020 at 07:02:32PM +0200, Andra Paraschiv wrote:

vsock enables communication between virtual machines and the host they
are running on. With the multi transport support (guest->host and
host->guest), nested VMs can also use vsock channels for communication.

In addition to this, by default, all the vsock packets are forwarded to
the host, if no host->guest transport is loaded. This behavior can be
implicitly used for enabling vsock communication between sibling VMs.

Add a flags field in the vsock address data structure that can be used
to explicitly mark the vsock connection as being targeted for a certain
type of communication. This way, can distinguish between different use
cases such as nested VMs and sibling VMs.

Use the already available "svm_reserved1" field and mark it as a flags
field instead. This field can be set when initializing the vsock address
variable used for the connect() call.

Changelog

v1 -> v2

* Update the field name to "svm_flags".
* Split the current patch in 2 patches.


Usually the changelog goes after the 3 dashes, but I'm not sure there is
a strict rule :-)


Yup, I've seen both ways. I'd rather keep the changelog in the commit 
message, for further reference after merge, in the commits.




Anyway the patch LGTM:

Reviewed-by: Stefano Garzarella 


Thank you.

Andra





Signed-off-by: Andra Paraschiv 
---
include/uapi/linux/vm_sockets.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/vm_sockets.h 
b/include/uapi/linux/vm_sockets.h

index fd0ed7221645d..46735376a57a8 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -145,7 +145,7 @@

struct sockaddr_vm {
  __kernel_sa_family_t svm_family;
-  unsigned short svm_reserved1;
+  unsigned short svm_flags;
  unsigned int svm_port;
  unsigned int svm_cid;
  unsigned char svm_zero[sizeof(struct sockaddr) -
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. 
Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. 
Registered in Romania. Registration number J22/2621/2005.









Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v2 0/4] vsock: Add flags field in the vsock address

2020-12-07 Thread Paraschiv, Andra-Irina



On 07/12/2020 12:05, Stefano Garzarella wrote:


Hi Andra,

On Fri, Dec 04, 2020 at 07:02:31PM +0200, Andra Paraschiv wrote:
vsock enables communication between virtual machines and the host 
they are

running on. Nested VMs can be setup to use vsock channels, as the multi
transport support has been available in the mainline since the v5.5 
Linux kernel

has been released.

Implicitly, if no host->guest vsock transport is loaded, all the 
vsock packets
are forwarded to the host. This behavior can be used to setup 
communication
channels between sibling VMs that are running on the same host. One 
example can

be the vsock channels that can be established within AWS Nitro Enclaves
(see Documentation/virt/ne_overview.rst).

To be able to explicitly mark a connection as being used for a 
certain use case,
add a flags field in the vsock address data structure. The 
"svm_reserved1" field
has been repurposed to be the flags field. The value of the flags 
will then be
taken into consideration when the vsock transport is assigned. This 
way can
distinguish between different use cases, such as nested VMs / local 
communication

and sibling VMs.


the series seems in a good shape, I left some minor comments.
I run my test suite (vsock_test, iperf3, nc) with nested VMs (QEMU/KVM),
and everything looks good.


Thanks, Stefano, for review and checking it out for the nested case as well.

I'll send out v3 including the addressed feedback and the Rb tags.



Note: I'll be offline today and tomorrow, so I may miss followups.


Ok, np, thanks for the heads-up.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v1 1/3] vm_sockets: Include flag field in the vsock address data structure

2020-12-03 Thread Paraschiv, Andra-Irina



On 03/12/2020 15:38, Stefano Garzarella wrote:


On Thu, Dec 03, 2020 at 12:32:08PM +0200, Paraschiv, Andra-Irina wrote:



On 03/12/2020 11:21, Stefan Hajnoczi wrote:

On Tue, Dec 01, 2020 at 05:25:03PM +0200, Andra Paraschiv wrote:

vsock enables communication between virtual machines and the host they
are running on. With the multi transport support (guest->host and
host->guest), nested VMs can also use vsock channels for 
communication.


In addition to this, by default, all the vsock packets are 
forwarded to

the host, if no host->guest transport is loaded. This behavior can be
implicitly used for enabling vsock communication between sibling VMs.

Add a flag field in the vsock address data structure that can be 
used to

explicitly mark the vsock connection as being targeted for a certain
type of communication. This way, can distinguish between nested VMs 
and

sibling VMs use cases and can also setup them at the same time. Till
now, could either have nested VMs or sibling VMs at a time using the
vsock communication stack.

Use the already available "svm_reserved1" field and mark it as a flag
field instead. This flag can be set when initializing the vsock 
address

variable used for the connect() call.

Signed-off-by: Andra Paraschiv 
---
 include/uapi/linux/vm_sockets.h | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/vm_sockets.h 
b/include/uapi/linux/vm_sockets.h

index fd0ed7221645d..58da5a91413ac 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -114,6 +114,22 @@
 #define VMADDR_CID_HOST 2
+/* This sockaddr_vm flag value covers the current default use case:
+ * local vsock communication between guest and host and nested VMs 
setup.
+ * In addition to this, implicitly, the vsock packets are 
forwarded to the host

+ * if no host->guest vsock transport is set.
+ */
+#define VMADDR_FLAG_DEFAULT_COMMUNICATION   0x
+
+/* Set this flag value in the sockaddr_vm corresponding field if 
the vsock
+ * channel needs to be setup between two sibling VMs running on 
the same host.
+ * This way can explicitly distinguish between vsock channels 
created for nested
+ * VMs (or local communication between guest and host) and the 
ones created for
+ * sibling VMs. And vsock channels for multiple use cases (nested 
/ sibling VMs)

+ * can be setup at the same time.
+ */
+#define VMADDR_FLAG_SIBLING_VMS_COMMUNICATION   0x0001
vsock has the h2g and g2h concept. It would be more general to call 
this

flag VMADDR_FLAG_G2H or less cryptically VMADDR_FLAG_TO_HOST.


I agree, VMADDR_FLAG_TO_HOST is more general and it's clearer that is up
to the host where to forward the packet (sibling if supported, or
whatever).


Ok, then VMADDR_FLAG_TO_HOST it is. :) I also updated the commit 
messages / comments to reflect this more general angle, with one of the 
current use cases being guest to guest communication.


Thanks,
Andra





Thanks for the feedback, Stefan.

I can update the naming to be more general, such as "_TO_HOST", and
keep the use cases (e.g. guest <-> host / nested / sibling VMs
communication) mention in the comments so that would relate more to
the motivation behind it.

Andra



That way it just tells the driver in which direction to send packets
without implying that sibling communication is possible (it's not
allowed by default on any transport).

I don't have a strong opinion on this but wanted to suggest the idea.

Stefan





Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. 
Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. 
Registered in Romania. Registration number J22/2621/2005.









Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v1 1/3] vm_sockets: Include flag field in the vsock address data structure

2020-12-03 Thread Paraschiv, Andra-Irina




On 03/12/2020 11:21, Stefan Hajnoczi wrote:

On Tue, Dec 01, 2020 at 05:25:03PM +0200, Andra Paraschiv wrote:

vsock enables communication between virtual machines and the host they
are running on. With the multi transport support (guest->host and
host->guest), nested VMs can also use vsock channels for communication.

In addition to this, by default, all the vsock packets are forwarded to
the host, if no host->guest transport is loaded. This behavior can be
implicitly used for enabling vsock communication between sibling VMs.

Add a flag field in the vsock address data structure that can be used to
explicitly mark the vsock connection as being targeted for a certain
type of communication. This way, can distinguish between nested VMs and
sibling VMs use cases and can also setup them at the same time. Till
now, could either have nested VMs or sibling VMs at a time using the
vsock communication stack.

Use the already available "svm_reserved1" field and mark it as a flag
field instead. This flag can be set when initializing the vsock address
variable used for the connect() call.

Signed-off-by: Andra Paraschiv 
---
  include/uapi/linux/vm_sockets.h | 18 +-
  1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
index fd0ed7221645d..58da5a91413ac 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -114,6 +114,22 @@
  
  #define VMADDR_CID_HOST 2
  
+/* This sockaddr_vm flag value covers the current default use case:

+ * local vsock communication between guest and host and nested VMs setup.
+ * In addition to this, implicitly, the vsock packets are forwarded to the host
+ * if no host->guest vsock transport is set.
+ */
+#define VMADDR_FLAG_DEFAULT_COMMUNICATION  0x
+
+/* Set this flag value in the sockaddr_vm corresponding field if the vsock
+ * channel needs to be setup between two sibling VMs running on the same host.
+ * This way can explicitly distinguish between vsock channels created for 
nested
+ * VMs (or local communication between guest and host) and the ones created for
+ * sibling VMs. And vsock channels for multiple use cases (nested / sibling 
VMs)
+ * can be setup at the same time.
+ */
+#define VMADDR_FLAG_SIBLING_VMS_COMMUNICATION  0x0001

vsock has the h2g and g2h concept. It would be more general to call this
flag VMADDR_FLAG_G2H or less cryptically VMADDR_FLAG_TO_HOST.


Thanks for the feedback, Stefan.

I can update the naming to be more general, such as  "_TO_HOST", and 
keep the use cases (e.g. guest <-> host / nested / sibling VMs 
communication) mention in the comments so that would relate more to the 
motivation behind it.


Andra



That way it just tells the driver in which direction to send packets
without implying that sibling communication is possible (it's not
allowed by default on any transport).

I don't have a strong opinion on this but wanted to suggest the idea.

Stefan





Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH net-next v1 0/3] vsock: Add flag field in the vsock address

2020-12-02 Thread Paraschiv, Andra-Irina



On 02/12/2020 15:37, Stefano Garzarella wrote:


Hi Andra,

On Tue, Dec 01, 2020 at 05:25:02PM +0200, Andra Paraschiv wrote:
vsock enables communication between virtual machines and the host 
they are

running on. Nested VMs can be setup to use vsock channels, as the multi
transport support has been available in the mainline since the v5.5 
Linux kernel

has been released.

Implicitly, if no host->guest vsock transport is loaded, all the 
vsock packets
are forwarded to the host. This behavior can be used to setup 
communication
channels between sibling VMs that are running on the same host. One 
example can

be the vsock channels that can be established within AWS Nitro Enclaves
(see Documentation/virt/ne_overview.rst).

To be able to explicitly mark a connection as being used for a 
certain use case,
add a flag field in the vsock address data structure. The 
"svm_reserved1" field
has been repurposed to be the flag field. The value of the flag will 
then be

taken into consideration when the vsock transport is assigned.

This way can distinguish between nested VMs / local communication and 
sibling
VMs use cases. And can also setup one or more types of communication 
at the same

time.



Another thing worth mentioning is that for now it is not supported in
vhost-vsock, since we are discarding every packet not addressed to the
host.


Right, thanks for the follow-up.



What we should do would be:
- add a new IOCTL to vhost-vsock to enable sibling communication, by
  default I'd like to leave it disabled

- allow sibling forwarding only if both guests have sibling
  communication enabled and we should implement some kind of filtering
  or network namespace support to allow the communication only between a
  subset of VMs


Do you have plans to work on it?


Nope, not yet. But I can take some time in the second part of December / 
beginning of January for this. And we can catch up in the meantime if 
there is something blocking or more clarifications are needed to make it 
work.


Thanks,
Andra




Otherwise I put it in my to-do list and hope I have time to do it (maybe
next month).

Thanks,
Stefano






Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v1 3/3] af_vsock: Assign the vsock transport considering the vsock address flag

2020-12-01 Thread Paraschiv, Andra-Irina



On 01/12/2020 18:23, Stefano Garzarella wrote:


On Tue, Dec 01, 2020 at 05:25:05PM +0200, Andra Paraschiv wrote:

The vsock flag has been set in the connect and (listen) receive paths.

When the vsock transport is assigned, the remote CID is used to
distinguish between types of connection.

Use the vsock flag (in addition to the CID) from the remote address to
decide which vsock transport to assign. For the sibling VMs use case,
all the vsock packets need to be forwarded to the host, so always assign
the guest->host transport if the vsock flag is set. For the other use
cases, the vsock transport assignment logic is not changed.

Signed-off-by: Andra Paraschiv 
---
net/vmw_vsock/af_vsock.c | 15 +++
1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index d10916ab45267..bafc1cb20abd4 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -419,16 +419,21 @@ static void vsock_deassign_transport(struct 
vsock_sock *vsk)

 * (e.g. during the connect() or when a connection request on a listener
 * socket is received).
 * The vsk->remote_addr is used to decide which transport to use:
- *  - remote CID == VMADDR_CID_LOCAL or g2h->local_cid or 
VMADDR_CID_HOST if

- *    g2h is not loaded, will use local transport;
- *  - remote CID <= VMADDR_CID_HOST will use guest->host transport;
- *  - remote CID > VMADDR_CID_HOST will use host->guest transport;
+ *  - remote flag == VMADDR_FLAG_SIBLING_VMS_COMMUNICATION, will always
+ *    forward the vsock packets to the host and use guest->host 
transport;

+ *  - otherwise, going forward with the remote flag default value:
+ *    - remote CID == VMADDR_CID_LOCAL or g2h->local_cid or 
VMADDR_CID_HOST

+ *  if g2h is not loaded, will use local transport;
+ *    - remote CID <= VMADDR_CID_HOST or h2g is not loaded, will use
+ *  guest->host transport;
+ *    - remote CID > VMADDR_CID_HOST will use host->guest transport;
 */
int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock 
*psk)

{
  const struct vsock_transport *new_transport;
  struct sock *sk = sk_vsock(vsk);
  unsigned int remote_cid = vsk->remote_addr.svm_cid;
+  unsigned short remote_flag = vsk->remote_addr.svm_flag;
  int ret;

  switch (sk->sk_type) {
@@ -438,6 +443,8 @@ int vsock_assign_transport(struct vsock_sock 
*vsk, struct vsock_sock *psk)

  case SOCK_STREAM:
  if (vsock_use_local_transport(remote_cid))
  new_transport = transport_local;
+  else if (remote_flag == 
VMADDR_FLAG_SIBLING_VMS_COMMUNICATION)


Others flags can be added, so here we should use the bitwise AND
operator to check if this flag is set.

And what about merging with the next if clause?



Indeed, I'll update the codebase to use the bitwise operator. Then I can 
also merge all the checks corresponding to the g2h transport in a single 
if block.


Thanks,
Andra




+  new_transport = transport_g2h;
  else if (remote_cid <= VMADDR_CID_HOST ||
  !transport_h2g)
  new_transport = transport_g2h;
  else
--
2.20.1 (Apple Git-117)








Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v1 2/3] virtio_transport_common: Set sibling VMs flag on the receive path

2020-12-01 Thread Paraschiv, Andra-Irina



On 01/12/2020 18:22, Stefano Garzarella wrote:


On Tue, Dec 01, 2020 at 05:25:04PM +0200, Andra Paraschiv wrote:

The vsock flag can be set during the connect() setup logic, when
initializing the vsock address data structure variable. Then the vsock
transport is assigned, also considering this flag.

The vsock transport is also assigned on the (listen) receive path. The
flag needs to be set considering the use case.

Set the vsock flag of the remote address to the one targeted for sibling
VMs communication if the following conditions are met:

* The source CID of the packet is higher than VMADDR_CID_HOST.
* The destination CID of the packet is higher than VMADDR_CID_HOST.

Signed-off-by: Andra Paraschiv 
---
net/vmw_vsock/virtio_transport_common.c | 8 
1 file changed, 8 insertions(+)

diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c

index 5956939eebb78..871c84e0916b1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1062,6 +1062,14 @@ virtio_transport_recv_listen(struct sock *sk, 
struct virtio_vsock_pkt *pkt,
  vsock_addr_init(>remote_addr, 
le64_to_cpu(pkt->hdr.src_cid),

  le32_to_cpu(pkt->hdr.src_port));



Maybe is better to create an helper function that other transports can
use for the same purpose or we can put this code in the
vsock_assign_transport() and set this flag only when the 'psk' argument
is not NULL (this is the case when it's called by the transports when we
receive a new connection request and 'psk' is the listener socket).

The second way should allow us to support all the transports without
touching them.


Ack, I was wondering about the other transports such as vmci or hyperv.

I can move the logic below in the codebase that assigns the transport, 
after checking 'psk'.




+  /* If the packet is coming with the source and destination 
CIDs higher
+   * than VMADDR_CID_HOST, then a vsock channel should be 
established for

+   * sibling VMs communication.
+   */
+  if (vchild->local_addr.svm_cid > VMADDR_CID_HOST &&
+  vchild->remote_addr.svm_cid > VMADDR_CID_HOST)
+  vchild->remote_addr.svm_flag = 
VMADDR_FLAG_SIBLING_VMS_COMMUNICATION;


svm_flag is always initialized to 0 in vsock_addr_init(), so this
assignment is the first one and it's okay, but to avoid future issues
I'd use |= here to set the flag.


Fair point. I was thinking more towards exclusive flags values 
(purposes), but that's fine with the bitwise operator if we would get a 
set of flag values together. I will also update the field name to 
'svm_flags', let me know if we should keep the previous one or there is 
a better option.


Thanks,
Andra




+
  ret = vsock_assign_transport(vchild, vsk);
  /* Transport assigned (looking at remote_addr) must be the same
   * where we received the request.
-- 2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. 
Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. 
Registered in Romania. Registration number J22/2621/2005.









Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v1 1/3] vm_sockets: Include flag field in the vsock address data structure

2020-12-01 Thread Paraschiv, Andra-Irina



On 01/12/2020 18:09, Stefano Garzarella wrote:


On Tue, Dec 01, 2020 at 05:25:03PM +0200, Andra Paraschiv wrote:

vsock enables communication between virtual machines and the host they
are running on. With the multi transport support (guest->host and
host->guest), nested VMs can also use vsock channels for communication.

In addition to this, by default, all the vsock packets are forwarded to
the host, if no host->guest transport is loaded. This behavior can be
implicitly used for enabling vsock communication between sibling VMs.

Add a flag field in the vsock address data structure that can be used to
explicitly mark the vsock connection as being targeted for a certain
type of communication. This way, can distinguish between nested VMs and
sibling VMs use cases and can also setup them at the same time. Till
now, could either have nested VMs or sibling VMs at a time using the
vsock communication stack.

Use the already available "svm_reserved1" field and mark it as a flag
field instead. This flag can be set when initializing the vsock address
variable used for the connect() call.


Maybe we can split this patch in 2 patches, one to rename the svm_flag
and one to add the new flags.


Sure, I can split this in 2 patches, to have a bit more separation of 
duties.






Signed-off-by: Andra Paraschiv 
---
include/uapi/linux/vm_sockets.h | 18 +-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/vm_sockets.h 
b/include/uapi/linux/vm_sockets.h

index fd0ed7221645d..58da5a91413ac 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -114,6 +114,22 @@

#define VMADDR_CID_HOST 2

+/* This sockaddr_vm flag value covers the current default use case:
+ * local vsock communication between guest and host and nested VMs 
setup.
+ * In addition to this, implicitly, the vsock packets are forwarded 
to the host

+ * if no host->guest vsock transport is set.
+ */
+#define VMADDR_FLAG_DEFAULT_COMMUNICATION 0x


I think we don't need this macro, since the next one can be used to
check if it a sibling communication (flag 0x1 set) or not (flag 0x1
not set).


Right, that's not particularly the use of the flag value, as by default 
comes as 0. It was more for sharing the cases this covers. But I can 
remove the define and keep this kind of info, with regard to the default 
case, in the commit message / comments.





+
+/* Set this flag value in the sockaddr_vm corresponding field if the 
vsock
+ * channel needs to be setup between two sibling VMs running on the 
same host.
+ * This way can explicitly distinguish between vsock channels 
created for nested
+ * VMs (or local communication between guest and host) and the ones 
created for
+ * sibling VMs. And vsock channels for multiple use cases (nested / 
sibling VMs)

+ * can be setup at the same time.
+ */
+#define VMADDR_FLAG_SIBLING_VMS_COMMUNICATION 0x0001


What do you think if we shorten in VMADDR_FLAG_SIBLING?



Yup, this seems ok as well for me. I'll update the naming.

Thanks,
Andra




+
/* Invalid vSockets version. */

#define VM_SOCKETS_INVALID_VERSION -1U
@@ -145,7 +161,7 @@

struct sockaddr_vm {
  __kernel_sa_family_t svm_family;
-  unsigned short svm_reserved1;
+  unsigned short svm_flag;
  unsigned int svm_port;
  unsigned int svm_cid;
  unsigned char svm_zero[sizeof(struct sockaddr) -
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. 
Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. 
Registered in Romania. Registration number J22/2621/2005.









Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH net-next v1 0/3] vsock: Add flag field in the vsock address

2020-12-01 Thread Paraschiv, Andra-Irina



On 01/12/2020 18:27, Stefano Garzarella wrote:



Hi Andra,

On Tue, Dec 01, 2020 at 05:25:02PM +0200, Andra Paraschiv wrote:
vsock enables communication between virtual machines and the host 
they are

running on. Nested VMs can be setup to use vsock channels, as the multi
transport support has been available in the mainline since the v5.5 
Linux kernel

has been released.

Implicitly, if no host->guest vsock transport is loaded, all the 
vsock packets
are forwarded to the host. This behavior can be used to setup 
communication
channels between sibling VMs that are running on the same host. One 
example can

be the vsock channels that can be established within AWS Nitro Enclaves
(see Documentation/virt/ne_overview.rst).

To be able to explicitly mark a connection as being used for a 
certain use case,
add a flag field in the vsock address data structure. The 
"svm_reserved1" field
has been repurposed to be the flag field. The value of the flag will 
then be

taken into consideration when the vsock transport is assigned.

This way can distinguish between nested VMs / local communication and 
sibling
VMs use cases. And can also setup one or more types of communication 
at the same

time.


Thanks to work on this, I've left you a few comments, but I think this
is the right way to support nested and sibling communication together.


Hi Stefano,

Thanks also for taking time to review and both you and Stefan for 
sharing an overview of this proposed option.


I'm going through the comments and will send out the v2 of the patch 
series as I have the changes done and validated.


Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v2] nitro_enclaves: Fixup type and simplify logic of the poll mask setup

2020-11-02 Thread Paraschiv, Andra-Irina




On 02/11/2020 19:50, Alexander Graf wrote:



On 02.11.20 18:36, Andra Paraschiv wrote:

Update the assigned value of the poll result to be EPOLLHUP instead of
POLLHUP to match the __poll_t type.

While at it, simplify the logic of setting the mask result of the poll
function.

Changelog

v1 -> v2

* Simplify the mask setting logic from the poll function.

Signed-off-by: Andra Paraschiv 
Reported-by: kernel test robot 


Reviewed-by: Alexander Graf 




Greg, let me know if there is anything remaining to be done for this 
patch. Otherwise, can you please add the patch to the char-misc tree.


Thanks,
Andra




---
  drivers/virt/nitro_enclaves/ne_misc_dev.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c

index f06622b48d695..f1964ea4b8269 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -1505,10 +1505,8 @@ static __poll_t ne_enclave_poll(struct file 
*file, poll_table *wait)

    poll_wait(file, _enclave->eventq, wait);
  -    if (!ne_enclave->has_event)
-    return mask;
-
-    mask = POLLHUP;
+    if (ne_enclave->has_event)
+    mask |= EPOLLHUP;
    return mask;
  }






Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v1] nitro_enclaves: Fixup type of the poll result assigned value

2020-11-02 Thread Paraschiv, Andra-Irina




On 02/11/2020 18:16, Alexander Graf wrote:



On 14.10.20 11:05, Andra Paraschiv wrote:

Update the assigned value of the poll result to be EPOLLHUP instead of
POLLHUP to match the __poll_t type.

Signed-off-by: Andra Paraschiv 
Reported-by: kernel test robot 
---
  drivers/virt/nitro_enclaves/ne_misc_dev.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c

index f06622b48d69..9148566455e8 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -1508,7 +1508,7 @@ static __poll_t ne_enclave_poll(struct file 
*file, poll_table *wait)

  if (!ne_enclave->has_event)
  return mask;
  -    mask = POLLHUP;
+    mask = EPOLLHUP;


That whole function looks a bit ... convoluted? How about this? I 
guess you could trim it down even further, but this looks quite 
readable to me:


diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c

index f06622b48d69..5b7f45e2eb4c 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -1505,10 +1505,8 @@ static __poll_t ne_enclave_poll(struct file 
*file, poll_table *wait)


 poll_wait(file, _enclave->eventq, wait);

-    if (!ne_enclave->has_event)
-    return mask;
-
-    mask = POLLHUP;
+    if (ne_enclave->has_event)
+    mask |= POLLHUP;

 return mask;
 }



Good point, I updated the logic and sent the v2 of the patch.

https://lore.kernel.org/lkml/20201102173622.32169-1-andra...@amazon.com/

Thank you.

Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v9 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver

2020-09-22 Thread Paraschiv, Andra-Irina



On 22/09/2020 19:20, Greg KH wrote:

On Tue, Sep 22, 2020 at 05:13:02PM +0300, Paraschiv, Andra-Irina wrote:


On 21/09/2020 15:34, Paraschiv, Andra-Irina wrote:


On 14/09/2020 20:23, Paraschiv, Andra-Irina wrote:


On 14/09/2020 18:59, Greg KH wrote:

On Fri, Sep 11, 2020 at 05:11:37PM +0300, Andra Paraschiv wrote:

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 

I can't take patches without any changelog text at all, sorry.

Same for a few other patches in this series :(


I can move the changelog text before the Sob tag(s) for all the
patches. I also can add a summary phrase in the commit message for
the commits like this one that have only the commit title and Sob &
Rb tags.

Would these updates to the commit messages match the expectations?

Let me know if remaining feedback to discuss and I should include as
updates in v10. Otherwise, I can send the new revision with the
updated commit messages.

Thanks for review.

Here we go, I published v10, including the updated commit messages and
rebased on top of v5.9-rc6.

https://lore.kernel.org/lkml/20200921121732.44291-1-andra...@amazon.com/

Any additional feedback, open to discuss.

If all looks good, we can move forward as we've talked before, to have
the patch series on the char-misc branch and target v5.10-rc1.

Thanks for merging the patch series on the char-misc-testing branch and for
the review sessions we've had.

Let's see how all goes next; if anything in the meantime to be done (e.g.
and not coming via auto-generated mails), just let me know.

Will do, thanks for sticking with this and cleaning it up to look a lot
better than the original submission.


And this also came with a couple of lessons learnt that I've applied or 
will apply for other pieces of codebase as well, either for the Linux 
kernel or other projects.




Now comes the real work, maintaining it for the next 10 years :)


I agree, maintenance is equally important. There is an ongoing process 
to make sure the whole ecosystem continues to work.


Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v9 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver

2020-09-22 Thread Paraschiv, Andra-Irina



On 21/09/2020 15:34, Paraschiv, Andra-Irina wrote:



On 14/09/2020 20:23, Paraschiv, Andra-Irina wrote:



On 14/09/2020 18:59, Greg KH wrote:

On Fri, Sep 11, 2020 at 05:11:37PM +0300, Andra Paraschiv wrote:

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 

I can't take patches without any changelog text at all, sorry.

Same for a few other patches in this series :(



I can move the changelog text before the Sob tag(s) for all the 
patches. I also can add a summary phrase in the commit message for 
the commits like this one that have only the commit title and Sob & 
Rb tags.


Would these updates to the commit messages match the expectations?

Let me know if remaining feedback to discuss and I should include as 
updates in v10. Otherwise, I can send the new revision with the 
updated commit messages.


Thanks for review.


Here we go, I published v10, including the updated commit messages and 
rebased on top of v5.9-rc6.


https://lore.kernel.org/lkml/20200921121732.44291-1-andra...@amazon.com/

Any additional feedback, open to discuss.

If all looks good, we can move forward as we've talked before, to have 
the patch series on the char-misc branch and target v5.10-rc1.


Thanks for merging the patch series on the char-misc-testing branch and 
for the review sessions we've had.


Let's see how all goes next; if anything in the meantime to be done 
(e.g. and not coming via auto-generated mails), just let me know.


Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v9 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver

2020-09-21 Thread Paraschiv, Andra-Irina



On 14/09/2020 20:23, Paraschiv, Andra-Irina wrote:



On 14/09/2020 18:59, Greg KH wrote:

On Fri, Sep 11, 2020 at 05:11:37PM +0300, Andra Paraschiv wrote:

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 

I can't take patches without any changelog text at all, sorry.

Same for a few other patches in this series :(



I can move the changelog text before the Sob tag(s) for all the 
patches. I also can add a summary phrase in the commit message for the 
commits like this one that have only the commit title and Sob & Rb tags.


Would these updates to the commit messages match the expectations?

Let me know if remaining feedback to discuss and I should include as 
updates in v10. Otherwise, I can send the new revision with the 
updated commit messages.


Thanks for review.


Here we go, I published v10, including the updated commit messages and 
rebased on top of v5.9-rc6.


https://lore.kernel.org/lkml/20200921121732.44291-1-andra...@amazon.com/

Any additional feedback, open to discuss.

If all looks good, we can move forward as we've talked before, to have 
the patch series on the char-misc branch and target v5.10-rc1.


Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v9 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver

2020-09-14 Thread Paraschiv, Andra-Irina



On 14/09/2020 18:59, Greg KH wrote:

On Fri, Sep 11, 2020 at 05:11:37PM +0300, Andra Paraschiv wrote:

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 

I can't take patches without any changelog text at all, sorry.

Same for a few other patches in this series :(



I can move the changelog text before the Sob tag(s) for all the patches. 
I also can add a summary phrase in the commit message for the commits 
like this one that have only the commit title and Sob & Rb tags.


Would these updates to the commit messages match the expectations?

Let me know if remaining feedback to discuss and I should include as 
updates in v10. Otherwise, I can send the new revision with the updated 
commit messages.


Thanks for review.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v8 17/18] nitro_enclaves: Add overview documentation

2020-09-11 Thread Paraschiv, Andra-Irina



On 07/09/2020 18:13, Paraschiv, Andra-Irina wrote:



On 07/09/2020 17:08, Greg KH wrote:

On Mon, Sep 07, 2020 at 04:43:11PM +0300, Paraschiv, Andra-Irina wrote:


On 07/09/2020 12:01, Greg KH wrote:

On Fri, Sep 04, 2020 at 08:37:17PM +0300, Andra Paraschiv wrote:

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* Add info about the primary / parent VM CID value.
* Update reference link for huge pages.
* Add reference link for the x86 boot protocol.
* Add license mention and update doc title / chapter formatting.

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* Update doc type from .txt to .rst.
* Update documentation based on the changes from v4.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.
---
   Documentation/nitro_enclaves/ne_overview.rst | 95 


   1 file changed, 95 insertions(+)
   create mode 100644 Documentation/nitro_enclaves/ne_overview.rst

A whole new subdir, for a single driver, and not tied into the kernel
documentation build process at all?  Not good :(


Would the "virt" directory be a better option for this doc file?

Yes.


Alright, I'll update the doc file location, the index file and the 
MAINTAINERS entry to reflect the new doc file location.




I sent out a new revision that includes the updates based on your 
feedback. Thanks for review.


To be aware of this beforehand, what would be the further necessary 
steps (e.g. linux-next branch, additional review and / or sanity checks) 
to consider for targeting the next merge window?


Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v8 17/18] nitro_enclaves: Add overview documentation

2020-09-11 Thread Paraschiv, Andra-Irina



On 11/09/2020 18:12, Greg KH wrote:


On Fri, Sep 11, 2020 at 05:56:10PM +0300, Paraschiv, Andra-Irina wrote:


On 07/09/2020 18:13, Paraschiv, Andra-Irina wrote:


On 07/09/2020 17:08, Greg KH wrote:

On Mon, Sep 07, 2020 at 04:43:11PM +0300, Paraschiv, Andra-Irina wrote:

On 07/09/2020 12:01, Greg KH wrote:

On Fri, Sep 04, 2020 at 08:37:17PM +0300, Andra Paraschiv wrote:

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* Add info about the primary / parent VM CID value.
* Update reference link for huge pages.
* Add reference link for the x86 boot protocol.
* Add license mention and update doc title / chapter formatting.

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* Update doc type from .txt to .rst.
* Update documentation based on the changes from v4.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.
---
Documentation/nitro_enclaves/ne_overview.rst | 95

1 file changed, 95 insertions(+)
create mode 100644 Documentation/nitro_enclaves/ne_overview.rst

A whole new subdir, for a single driver, and not tied into the kernel
documentation build process at all?  Not good :(


Would the "virt" directory be a better option for this doc file?

Yes.

Alright, I'll update the doc file location, the index file and the
MAINTAINERS entry to reflect the new doc file location.


I sent out a new revision that includes the updates based on your feedback.
Thanks for review.

To be aware of this beforehand, what would be the further necessary steps
(e.g. linux-next branch, additional review and / or sanity checks) to
consider for targeting the next merge window?

If all looks good, I can just suck it into my char-misc branch to get it
into 5.10-rc1.  I'll look at the series next week, thanks.



Ok, let's do this way then, thanks for info.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v8 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves driver

2020-09-07 Thread Paraschiv, Andra-Irina



On 07/09/2020 12:00, Greg KH wrote:



On Fri, Sep 04, 2020 at 08:37:15PM +0300, Andra Paraschiv wrote:

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* No changes.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
   already in place.

v1 -> v2

* Update path to Makefile to match the drivers/virt/nitro_enclaves
   directory.
---
  drivers/virt/Makefile|  2 ++
  drivers/virt/nitro_enclaves/Makefile | 11 +++
  2 files changed, 13 insertions(+)
  create mode 100644 drivers/virt/nitro_enclaves/Makefile

diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index fd331247c27a..f28425ce4b39 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -5,3 +5,5 @@

  obj-$(CONFIG_FSL_HV_MANAGER) += fsl_hypervisor.o
  obj-y+= vboxguest/
+
+obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves/
diff --git a/drivers/virt/nitro_enclaves/Makefile 
b/drivers/virt/nitro_enclaves/Makefile
new file mode 100644
index ..e9f4fcd1591e
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Enclave lifetime management support for Nitro Enclaves (NE).
+
+obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves.o
+
+nitro_enclaves-y := ne_pci_dev.o ne_misc_dev.o
+
+ccflags-y += -Wall

That flag is _really_ risky over time, are you _SURE_ that all new
versions of clang and gcc will never produce any warnings?  People work
to fix up build warnings quite quickly for new compilers, you shouldn't
prevent the code from being built at all just for that, right?



That would also need Werror, to have warnings treated as errors and 
prevent building the codebase. If it's about something more, just let me 
know.


Would this apply to the samples directory as well, no?

I could remove the Wall flags and keep it for development validation 
purposes on my side to solve at least the warnings that would further see.


Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v8 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves driver

2020-09-07 Thread Paraschiv, Andra-Irina



On 07/09/2020 17:08, Greg KH wrote:


On Mon, Sep 07, 2020 at 04:35:23PM +0300, Paraschiv, Andra-Irina wrote:


On 07/09/2020 12:00, Greg KH wrote:


On Fri, Sep 04, 2020 at 08:37:15PM +0300, Andra Paraschiv wrote:

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* No changes.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
already in place.

v1 -> v2

* Update path to Makefile to match the drivers/virt/nitro_enclaves
directory.
---
   drivers/virt/Makefile|  2 ++
   drivers/virt/nitro_enclaves/Makefile | 11 +++
   2 files changed, 13 insertions(+)
   create mode 100644 drivers/virt/nitro_enclaves/Makefile

diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index fd331247c27a..f28425ce4b39 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -5,3 +5,5 @@

   obj-$(CONFIG_FSL_HV_MANAGER) += fsl_hypervisor.o
   obj-y+= vboxguest/
+
+obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves/
diff --git a/drivers/virt/nitro_enclaves/Makefile 
b/drivers/virt/nitro_enclaves/Makefile
new file mode 100644
index ..e9f4fcd1591e
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Enclave lifetime management support for Nitro Enclaves (NE).
+
+obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves.o
+
+nitro_enclaves-y := ne_pci_dev.o ne_misc_dev.o
+
+ccflags-y += -Wall

That flag is _really_ risky over time, are you _SURE_ that all new
versions of clang and gcc will never produce any warnings?  People work
to fix up build warnings quite quickly for new compilers, you shouldn't
prevent the code from being built at all just for that, right?


That would also need Werror, to have warnings treated as errors and prevent
building the codebase. If it's about something more, just let me know.

No, you are right, Werror would be needed here too.

W=1 gives you -Wall if you really want that, no need to add it by hand.


Good, we are on the some page then. :) Thanks for the heads-up with 
regard to the W=1 option.


Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v8 17/18] nitro_enclaves: Add overview documentation

2020-09-07 Thread Paraschiv, Andra-Irina



On 07/09/2020 17:08, Greg KH wrote:

On Mon, Sep 07, 2020 at 04:43:11PM +0300, Paraschiv, Andra-Irina wrote:


On 07/09/2020 12:01, Greg KH wrote:

On Fri, Sep 04, 2020 at 08:37:17PM +0300, Andra Paraschiv wrote:

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* Add info about the primary / parent VM CID value.
* Update reference link for huge pages.
* Add reference link for the x86 boot protocol.
* Add license mention and update doc title / chapter formatting.

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* Update doc type from .txt to .rst.
* Update documentation based on the changes from v4.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.
---
   Documentation/nitro_enclaves/ne_overview.rst | 95 
   1 file changed, 95 insertions(+)
   create mode 100644 Documentation/nitro_enclaves/ne_overview.rst

A whole new subdir, for a single driver, and not tied into the kernel
documentation build process at all?  Not good :(


Would the "virt" directory be a better option for this doc file?

Yes.


Alright, I'll update the doc file location, the index file and the 
MAINTAINERS entry to reflect the new doc file location.


Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v8 17/18] nitro_enclaves: Add overview documentation

2020-09-07 Thread Paraschiv, Andra-Irina



On 07/09/2020 12:01, Greg KH wrote:


On Fri, Sep 04, 2020 at 08:37:17PM +0300, Andra Paraschiv wrote:

Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* Add info about the primary / parent VM CID value.
* Update reference link for huge pages.
* Add reference link for the x86 boot protocol.
* Add license mention and update doc title / chapter formatting.

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* Update doc type from .txt to .rst.
* Update documentation based on the changes from v4.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.
---
  Documentation/nitro_enclaves/ne_overview.rst | 95 
  1 file changed, 95 insertions(+)
  create mode 100644 Documentation/nitro_enclaves/ne_overview.rst

A whole new subdir, for a single driver, and not tied into the kernel
documentation build process at all?  Not good :(



Would the "virt" directory be a better option for this doc file?

I can then add the doc name to the corresponding index file as well.

Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v8 09/18] nitro_enclaves: Add logic for setting an enclave vCPU

2020-09-07 Thread Paraschiv, Andra-Irina



On 07/09/2020 11:58, Greg KH wrote:


On Fri, Sep 04, 2020 at 08:37:09PM +0300, Andra Paraschiv wrote:

An enclave, before being started, has its resources set. One of its
resources is CPU.

A NE CPU pool is set and enclave CPUs are chosen from it. Offline the
CPUs from the NE CPU pool during the pool setup and online them back
during the NE CPU pool teardown. The CPU offline is necessary so that
there would not be more vCPUs than physical CPUs available to the
primary / parent VM. In that case the CPUs would be overcommitted and
would change the initial configuration of the primary / parent VM of
having dedicated vCPUs to physical CPUs.

The enclave CPUs need to be full cores and from the same NUMA node. CPU
0 and its siblings have to remain available to the primary / parent VM.

Add ioctl command logic for setting an enclave vCPU.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* Check for error return value when setting the kernel parameter string.
* Use the NE misc device parent field to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
   of handling full CPU cores and dedicating them to the enclave.
* Calculate the number of threads per core and not use smp_num_siblings
   that is x86 specific.

v5 -> v6

* Check CPUs are from the same NUMA node before going through CPU
   siblings during the NE CPU pool setup.
* Update documentation to kernel-doc format.

v4 -> v5

* Set empty string in case of invalid NE CPU pool.
* Clear NE CPU pool mask on pool setup failure.
* Setup NE CPU cores out of the NE CPU pool.
* Early exit on NE CPU pool setup if enclave(s) already running.
* Remove sanity checks for situations that shouldn't happen, only if
   buggy system or broken logic at all.
* Add check for maximum vCPU id possible before looking into the CPU
   pool.
* Remove log on copy_from_user() / copy_to_user() failure and on admin
   capability check for setting the NE CPU pool.
* Update the ioctl call to not create a file descriptor for the vCPU.
* Split the CPU pool usage logic in 2 separate functions - one to get a
   CPU from the pool and the other to check the given CPU is available in
   the pool.

v3 -> v4

* Setup the NE CPU pool at runtime via a sysfs file for the kernel
   parameter.
* Check enclave CPUs to be from the same NUMA node.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open, ioctl and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave state is init when setting enclave vCPU.
---
  drivers/virt/nitro_enclaves/ne_misc_dev.c | 702 ++
  1 file changed, 702 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 7ad3f1eb75d4..0477b11bf15d 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -64,8 +64,11 @@
   * TODO: Update logic to create new sysfs entries instead of using
   * a kernel parameter e.g. if multiple sysfs files needed.
   */
+static int ne_set_kernel_param(const char *val, const struct kernel_param *kp);
+
  static const struct kernel_param_ops ne_cpu_pool_ops = {
   .get= param_get_string,
+ .set= ne_set_kernel_param,
  };

  static char ne_cpus[NE_CPUS_SIZE];
@@ -103,6 +106,702 @@ struct ne_cpu_pool {

  static struct ne_cpu_pool ne_cpu_pool;

+/**
+ * ne_check_enclaves_created() - Verify if at least one enclave has been 
created.
+ * @void:No parameters provided.
+ *
+ * Context: Process context.
+ * Return:
+ * * True if at least one enclave is created.
+ * * False otherwise.
+ */
+static bool ne_check_enclaves_created(void)
+{
+ struct ne_pci_dev *ne_pci_dev = NULL;
+ struct pci_dev *pdev = NULL;
+ bool ret = false;
+
+ if (!ne_misc_dev.parent)

How can that be the case?

I wouldn't rely on the misc device's internals to be something that you
count on for proper operation of your code, right?


Given the option that you shared in the previous patch, to have a field 
(in an updated ne_misc_dev data structure) for the data we want to keep 
(ne_pci_dev) and not rely on the misc device's parent field, this logic 
would count on that particular field instead.


Thanks,
Andra





Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v8 08/18] nitro_enclaves: Add logic for creating an enclave VM

2020-09-07 Thread Paraschiv, Andra-Irina



On 07/09/2020 11:57, Greg KH wrote:


On Fri, Sep 04, 2020 at 08:37:08PM +0300, Andra Paraschiv wrote:

+static long ne_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ switch (cmd) {
+ case NE_CREATE_VM: {
+ int enclave_fd = -1;
+ struct file *enclave_file = NULL;
+ struct ne_pci_dev *ne_pci_dev = NULL;
+ struct pci_dev *pdev = to_pci_dev(ne_misc_dev.parent);

That call is really "risky".  You "know" that the misc device's parent
is a specific PCI device, that just happens to be your pci device,
right?


Correct, that's how it's assigned the misc device's parent, to point to 
a particular PCI device that's the NE PCI device.




But why not just have your misc device hold the pointer to the structure
you really want, so you don't have to mess with the device tree in any
way, and you always "know" you have the correct pointer?  It should save
you this two-step lookup all the time, right?



That would help, yes, to keep the pointer directly to the ne_pci_dev 
data structure. Just that the misc device's parent data structure is a 
struct device pointer. I can create a new internal data structure to 
keep the miscdevice data structure and a pointer to the ne_pci_dev.


Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v7 00/18] Add support for Nitro Enclaves

2020-09-04 Thread Paraschiv, Andra-Irina



On 04/09/2020 19:13, Greg KH wrote:

On Mon, Aug 31, 2020 at 11:19:19AM +0300, Paraschiv, Andra-Irina wrote:


On 19/08/2020 14:26, Greg KH wrote:

On Wed, Aug 19, 2020 at 01:15:59PM +0200, Alexander Graf wrote:

On 17.08.20 15:09, Andra Paraschiv wrote:

Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
that allows customers to carve out isolated compute environments within EC2
instances [1].

For example, an application that processes sensitive data and runs in a VM,
can be separated from other applications running in the same VM. This
application then runs in a separate VM than the primary VM, namely an enclave.

An enclave runs alongside the VM that spawned it. This setup matches low latency
applications needs. The resources that are allocated for the enclave, such as
memory and CPUs, are carved out of the primary VM. Each enclave is mapped to a
process running in the primary VM, that communicates with the NE driver via an
ioctl interface.

In this sense, there are two components:

1. An enclave abstraction process - a user space process running in the primary
VM guest that uses the provided ioctl interface of the NE driver to spawn an
enclave VM (that's 2 below).

There is a NE emulated PCI device exposed to the primary VM. The driver for this
new PCI device is included in the NE driver.

The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl
maps to an enclave start PCI command. The PCI device commands are then
translated into  actions taken on the hypervisor side; that's the Nitro
hypervisor running on the host where the primary VM is running. The Nitro
hypervisor is based on core KVM technology.

2. The enclave itself - a VM running on the same host as the primary VM that
spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
for the enclave VM. An enclave does not have persistent storage attached.

The memory regions carved out of the primary VM and given to an enclave need to
be aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
this size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
user space [2][3]. The memory size for an enclave needs to be at least 64 MiB.
The enclave memory and CPUs need to be from the same NUMA node.

An enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
available for the primary VM. A CPU pool has to be set for NE purposes by an
user with admin capability. See the cpu list section from the kernel
documentation [4] for how a CPU pool format looks.

An enclave communicates with the primary VM via a local communication channel,
using virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
uses eventfd for signaling. The enclave VM sees the usual interfaces - local
APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
device is placed in memory below the typical 4 GiB.

The application that runs in the enclave needs to be packaged in an enclave
image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
enclave VM. The enclave VM has its own kernel and follows the standard Linux
boot protocol.

The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
Enclave Image Format (EIF); plus an EIF header including metadata such as magic
number, eif version, image size and CRC.

Hash values are computed for the entire enclave image (EIF), the kernel and
ramdisk(s). That's used, for example, to check that the enclave image that is
loaded in the enclave VM is the one that was intended to be run.

These crypto measurements are included in a signed attestation document
generated by the Nitro Hypervisor and further used to prove the identity of the
enclave; KMS is an example of service that NE is integrated with and that checks
the attestation doc.

The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
init process in the enclave connects to the vsock CID of the primary VM and a
predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
used to check in the primary VM that the enclave has booted.

If the enclave VM crashes or gracefully exits, an interrupt event is received by
the NE driver. This event is sent further to the user space enclave process
running in the primary VM via a poll notification mechanism. Then the user space
enclave process can exit.

Thank you.


This version reads very well, thanks a lot Andra!

Greg, would you mind to have another look over it?

Will do, it's in my to-review queue, behind lots of other patches...


I have a set of updates that can be included in a new revision, v8 e.g. new
NE custom error codes for invalid flags / enclave CID, "shutdown" function
for the NE PCI device driver, a couple more checks wrt invalid flags and
enclave vsock CID, documentation and sample updates. There is also the
opti

Re: [PATCH v7 00/18] Add support for Nitro Enclaves

2020-08-31 Thread Paraschiv, Andra-Irina



On 19/08/2020 14:26, Greg KH wrote:


On Wed, Aug 19, 2020 at 01:15:59PM +0200, Alexander Graf wrote:


On 17.08.20 15:09, Andra Paraschiv wrote:

Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
that allows customers to carve out isolated compute environments within EC2
instances [1].

For example, an application that processes sensitive data and runs in a VM,
can be separated from other applications running in the same VM. This
application then runs in a separate VM than the primary VM, namely an enclave.

An enclave runs alongside the VM that spawned it. This setup matches low latency
applications needs. The resources that are allocated for the enclave, such as
memory and CPUs, are carved out of the primary VM. Each enclave is mapped to a
process running in the primary VM, that communicates with the NE driver via an
ioctl interface.

In this sense, there are two components:

1. An enclave abstraction process - a user space process running in the primary
VM guest that uses the provided ioctl interface of the NE driver to spawn an
enclave VM (that's 2 below).

There is a NE emulated PCI device exposed to the primary VM. The driver for this
new PCI device is included in the NE driver.

The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl
maps to an enclave start PCI command. The PCI device commands are then
translated into  actions taken on the hypervisor side; that's the Nitro
hypervisor running on the host where the primary VM is running. The Nitro
hypervisor is based on core KVM technology.

2. The enclave itself - a VM running on the same host as the primary VM that
spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
for the enclave VM. An enclave does not have persistent storage attached.

The memory regions carved out of the primary VM and given to an enclave need to
be aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
this size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
user space [2][3]. The memory size for an enclave needs to be at least 64 MiB.
The enclave memory and CPUs need to be from the same NUMA node.

An enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
available for the primary VM. A CPU pool has to be set for NE purposes by an
user with admin capability. See the cpu list section from the kernel
documentation [4] for how a CPU pool format looks.

An enclave communicates with the primary VM via a local communication channel,
using virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
uses eventfd for signaling. The enclave VM sees the usual interfaces - local
APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
device is placed in memory below the typical 4 GiB.

The application that runs in the enclave needs to be packaged in an enclave
image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
enclave VM. The enclave VM has its own kernel and follows the standard Linux
boot protocol.

The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
Enclave Image Format (EIF); plus an EIF header including metadata such as magic
number, eif version, image size and CRC.

Hash values are computed for the entire enclave image (EIF), the kernel and
ramdisk(s). That's used, for example, to check that the enclave image that is
loaded in the enclave VM is the one that was intended to be run.

These crypto measurements are included in a signed attestation document
generated by the Nitro Hypervisor and further used to prove the identity of the
enclave; KMS is an example of service that NE is integrated with and that checks
the attestation doc.

The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
init process in the enclave connects to the vsock CID of the primary VM and a
predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
used to check in the primary VM that the enclave has booted.

If the enclave VM crashes or gracefully exits, an interrupt event is received by
the NE driver. This event is sent further to the user space enclave process
running in the primary VM via a poll notification mechanism. Then the user space
enclave process can exit.

Thank you.


This version reads very well, thanks a lot Andra!

Greg, would you mind to have another look over it?

Will do, it's in my to-review queue, behind lots of other patches...



I have a set of updates that can be included in a new revision, v8 e.g. 
new NE custom error codes for invalid flags / enclave CID, "shutdown" 
function for the NE PCI device driver, a couple more checks wrt invalid 
flags and enclave vsock CID, documentation and sample updates. There is 
also the option to have these updates as follow-up patches.


Greg, let me know what would work fine for you with regard to the 

Re: [PATCH v7 00/18] Add support for Nitro Enclaves

2020-08-19 Thread Paraschiv, Andra-Irina



On 19/08/2020 14:26, Greg KH wrote:


On Wed, Aug 19, 2020 at 01:15:59PM +0200, Alexander Graf wrote:


On 17.08.20 15:09, Andra Paraschiv wrote:

Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
that allows customers to carve out isolated compute environments within EC2
instances [1].

For example, an application that processes sensitive data and runs in a VM,
can be separated from other applications running in the same VM. This
application then runs in a separate VM than the primary VM, namely an enclave.

An enclave runs alongside the VM that spawned it. This setup matches low latency
applications needs. The resources that are allocated for the enclave, such as
memory and CPUs, are carved out of the primary VM. Each enclave is mapped to a
process running in the primary VM, that communicates with the NE driver via an
ioctl interface.

In this sense, there are two components:

1. An enclave abstraction process - a user space process running in the primary
VM guest that uses the provided ioctl interface of the NE driver to spawn an
enclave VM (that's 2 below).

There is a NE emulated PCI device exposed to the primary VM. The driver for this
new PCI device is included in the NE driver.

The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl
maps to an enclave start PCI command. The PCI device commands are then
translated into  actions taken on the hypervisor side; that's the Nitro
hypervisor running on the host where the primary VM is running. The Nitro
hypervisor is based on core KVM technology.

2. The enclave itself - a VM running on the same host as the primary VM that
spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
for the enclave VM. An enclave does not have persistent storage attached.

The memory regions carved out of the primary VM and given to an enclave need to
be aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
this size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
user space [2][3]. The memory size for an enclave needs to be at least 64 MiB.
The enclave memory and CPUs need to be from the same NUMA node.

An enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
available for the primary VM. A CPU pool has to be set for NE purposes by an
user with admin capability. See the cpu list section from the kernel
documentation [4] for how a CPU pool format looks.

An enclave communicates with the primary VM via a local communication channel,
using virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
uses eventfd for signaling. The enclave VM sees the usual interfaces - local
APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
device is placed in memory below the typical 4 GiB.

The application that runs in the enclave needs to be packaged in an enclave
image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
enclave VM. The enclave VM has its own kernel and follows the standard Linux
boot protocol.

The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
Enclave Image Format (EIF); plus an EIF header including metadata such as magic
number, eif version, image size and CRC.

Hash values are computed for the entire enclave image (EIF), the kernel and
ramdisk(s). That's used, for example, to check that the enclave image that is
loaded in the enclave VM is the one that was intended to be run.

These crypto measurements are included in a signed attestation document
generated by the Nitro Hypervisor and further used to prove the identity of the
enclave; KMS is an example of service that NE is integrated with and that checks
the attestation doc.

The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
init process in the enclave connects to the vsock CID of the primary VM and a
predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
used to check in the primary VM that the enclave has booted.

If the enclave VM crashes or gracefully exits, an interrupt event is received by
the NE driver. This event is sent further to the user space enclave process
running in the primary VM via a poll notification mechanism. Then the user space
enclave process can exit.

Thank you.


This version reads very well, thanks a lot Andra!


Glad that the review experience has been improved and the patch series 
is in a better shape.




Greg, would you mind to have another look over it?

Will do, it's in my to-review queue, behind lots of other patches...




Thanks both for taking time to go through the patch series.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v6 10/18] nitro_enclaves: Add logic for getting the enclave image load info

2020-08-11 Thread Paraschiv, Andra-Irina




On 10/08/2020 12:57, Alexander Graf wrote:



On 05.08.20 11:10, Andra Paraschiv wrote:

Before setting the memory regions for the enclave, the enclave image
needs to be placed in memory. After the memory regions are set, this
memory cannot be used anymore by the VM, being carved out.

Add ioctl command logic to get the offset in enclave memory where to
place the enclave image. Then the user space tooling copies the enclave
image in the memory using the given memory offset.

Signed-off-by: Andra Paraschiv 


Reviewed-by: Alexander Graf 


Thanks for review. I included the tag for this one and the next 2 patches.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v6 09/18] nitro_enclaves: Add logic for setting an enclave vCPU

2020-08-11 Thread Paraschiv, Andra-Irina




On 10/08/2020 10:33, Alexander Graf wrote:



On 05.08.20 11:10, Andra Paraschiv wrote:

An enclave, before being started, has its resources set. One of its
resources is CPU.

A NE CPU pool is set and enclave CPUs are chosen from it. Offline the
CPUs from the NE CPU pool during the pool setup and online them back
during the NE CPU pool teardown. The CPU offline is necessary so that
there would not be more vCPUs than physical CPUs available to the
primary / parent VM. In that case the CPUs would be overcommitted and
would change the initial configuration of the primary / parent VM of
having dedicated vCPUs to physical CPUs.

The enclave CPUs need to be full cores and from the same NUMA node. CPU
0 and its siblings have to remain available to the primary / parent VM.

Add ioctl command logic for setting an enclave vCPU.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v5 -> v6

* Check CPUs are from the same NUMA node before going through CPU
   siblings during the NE CPU pool setup.
* Update documentation to kernel-doc format.

v4 -> v5

* Set empty string in case of invalid NE CPU pool.
* Clear NE CPU pool mask on pool setup failure.
* Setup NE CPU cores out of the NE CPU pool.
* Early exit on NE CPU pool setup if enclave(s) already running.
* Remove sanity checks for situations that shouldn't happen, only if
   buggy system or broken logic at all.
* Add check for maximum vCPU id possible before looking into the CPU
   pool.
* Remove log on copy_from_user() / copy_to_user() failure and on admin
   capability check for setting the NE CPU pool.
* Update the ioctl call to not create a file descriptor for the vCPU.
* Split the CPU pool usage logic in 2 separate functions - one to get a
   CPU from the pool and the other to check the given CPU is 
available in

   the pool.

v3 -> v4

* Setup the NE CPU pool at runtime via a sysfs file for the kernel
   parameter.
* Check enclave CPUs to be from the same NUMA node.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open, ioctl and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave state is init when setting enclave vCPU.
---
  drivers/virt/nitro_enclaves/ne_misc_dev.c | 575 ++
  1 file changed, 575 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c

index 6c8c12f65666..4787bc59d39d 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -57,8 +57,11 @@
   * TODO: Update logic to create new sysfs entries instead of using
   * a kernel parameter e.g. if multiple sysfs files needed.
   */
+static int ne_set_kernel_param(const char *val, const struct 
kernel_param *kp);

+
  static const struct kernel_param_ops ne_cpu_pool_ops = {
  .get    = param_get_string,
+    .set    = ne_set_kernel_param,
  };
    static char ne_cpus[NE_CPUS_SIZE];
@@ -87,6 +90,575 @@ struct ne_cpu_pool {
    static struct ne_cpu_pool ne_cpu_pool;
  +/**
+ * ne_check_enclaves_created() - Verify if at least one enclave has 
been created.

+ * @void:    No parameters provided.
+ *
+ * Context: Process context.
+ * Return:
+ * * True if at least one enclave is created.
+ * * False otherwise.
+ */
+static bool ne_check_enclaves_created(void)
+{
+    struct ne_pci_dev *ne_pci_dev = NULL;
+    /* TODO: Find another way to get the NE PCI device reference. */
+    struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON, 
PCI_DEVICE_ID_NE, NULL);


Can we just make this a global ref count instead?


I think we can use here as well the misc dev parent approach mentioned 
in the previous patches in this series. If no parent dev set, early exit.





+    bool ret = false;
+
+    if (!pdev)
+    return ret;
+
+    ne_pci_dev = pci_get_drvdata(pdev);
+    if (!ne_pci_dev) {
+    pci_dev_put(pdev);
+
+    return ret;
+    }
+
+    mutex_lock(_pci_dev->enclaves_list_mutex);
+
+    if (!list_empty(_pci_dev->enclaves_list))
+    ret = true;
+
+    mutex_unlock(_pci_dev->enclaves_list_mutex);
+
+    pci_dev_put(pdev);
+
+    return ret;
+}
+
+/**
+ * ne_setup_cpu_pool() - Set the NE CPU pool after handling sanity 
checks such

+ * as not sharing CPU cores with the primary / parent VM
+ * or not using CPU 0, which should remain available for
+ * the primary / parent VM. Offline the CPUs from the
+ * pool after the checks passed.
+ * @ne_cpu_list:    The CPU list used for setting NE CPU pool.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_setup_cpu_pool(const char *ne_cpu_list)
+{
+    int 

Re: [PATCH v6 08/18] nitro_enclaves: Add logic for creating an enclave VM

2020-08-11 Thread Paraschiv, Andra-Irina




On 10/08/2020 09:11, Alexander Graf wrote:



On 05.08.20 11:10, Andra Paraschiv wrote:

Add ioctl command logic for enclave VM creation. It triggers a slot
allocation. The enclave resources will be associated with this slot and
it will be used as an identifier for triggering enclave run.

Return a file descriptor, namely enclave fd. This is further used by the
associated user space enclave process to set enclave resources and
trigger enclave termination.

The poll function is implemented in order to notify the enclave process
when an enclave exits without a specific enclave termination command
trigger e.g. when an enclave crashes.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v5 -> v6

* Update the code base to init the ioctl function in this patch.
* Update documentation to kernel-doc format.

v4 -> v5

* Release the reference to the NE PCI device on create VM error.
* Close enclave fd on copy_to_user() failure; rename fd to enclave fd
   while at it.
* Remove sanity checks for situations that shouldn't happen, only if
   buggy system or broken logic at all.
* Remove log on copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.
* Add metadata for the NUMA node for the enclave memory and CPUs.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
---
  drivers/virt/nitro_enclaves/ne_misc_dev.c | 229 ++
  1 file changed, 229 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c

index 472850250220..6c8c12f65666 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c


[...]


+/**
+ * ne_ioctl() - Ioctl function provided by the NE misc device.
+ * @file:    File associated with this ioctl function.
+ * @cmd:    The command that is set for the ioctl call.
+ * @arg:    The argument that is provided for the ioctl call.
+ *
+ * Context: Process context.
+ * Return:
+ * * Ioctl result (e.g. enclave file descriptor) on success.
+ * * Negative return value on failure.
+ */
+static long ne_ioctl(struct file *file, unsigned int cmd, unsigned 
long arg)

+{
+    switch (cmd) {
+    case NE_CREATE_VM: {
+    int enclave_fd = -1;
+    struct file *enclave_file = NULL;
+    struct ne_pci_dev *ne_pci_dev = NULL;
+    /* TODO: Find another way to get the NE PCI device 
reference. */

+    struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
+  PCI_DEVICE_ID_NE, NULL);


This should go away if you set misc_dev.parent.


+    int rc = -EINVAL;
+    u64 slot_uid = 0;
+
+    ne_pci_dev = pci_get_drvdata(pdev);
+
+    mutex_lock(_pci_dev->enclaves_list_mutex);
+
+    enclave_fd = ne_create_vm_ioctl(pdev, ne_pci_dev, _uid);
+    if (enclave_fd < 0) {
+    rc = enclave_fd;
+
+ mutex_unlock(_pci_dev->enclaves_list_mutex);
+
+    pci_dev_put(pdev);


This should also disappear.


Correct. I'll follow the misc dev parent approach to get the PCI device 
and include all the necessary code base updates in v7.


Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v6 07/18] nitro_enclaves: Init misc device providing the ioctl interface

2020-08-11 Thread Paraschiv, Andra-Irina




On 10/08/2020 08:22, Alexander Graf wrote:



On 05.08.20 11:10, Andra Paraschiv wrote:

The Nitro Enclaves driver provides an ioctl interface to the user space
for enclave lifetime management e.g. enclave creation / termination and
setting enclave resources such as memory and CPU.

This ioctl interface is mapped to a Nitro Enclaves misc device.

Signed-off-by: Andra Paraschiv 
---
Changelog

v5 -> v6

* Remove the ioctl to query API version.
* Update documentation to kernel-doc format.

v4 -> v5

* Update the size of the NE CPU pool string from 4096 to 512 chars.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Remove the NE CPU pool init during kernel module loading, as the CPU
   pool is now setup at runtime, via a sysfs file for the kernel
   parameter.
* Add minimum enclave memory size definition.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
   already in place.
* Remove the WARN_ON calls.
* Remove linux/bug and linux/kvm_host includes that are not needed.
* Remove "ratelimited" from the logs that are not in the ioctl call
   paths.
* Remove file ops that do nothing for now - open and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Update ne_cpu_pool data structure to include the global mutex.
* Update NE misc device mode to 0660.
* Check if the CPU siblings are included in the NE CPU pool, as full CPU
   cores are given for the enclave(s).
---
  drivers/virt/nitro_enclaves/ne_misc_dev.c | 121 ++
  drivers/virt/nitro_enclaves/ne_pci_dev.c  |  11 ++
  2 files changed, 132 insertions(+)
  create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c

new file mode 100644
index ..472850250220
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights 
Reserved.

+ */
+
+/**
+ * DOC: Enclave lifetime management driver for Nitro Enclaves (NE).
+ * Nitro is a hypervisor that has been developed by Amazon.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+/**
+ * NE_CPUS_SIZE - Size for max 128 CPUs, for now, in a cpu-list 
string, comma

+ *  separated. The NE CPU pool includes CPUs from a single NUMA
+ *  node.
+ */
+#define NE_CPUS_SIZE    (512)
+
+/**
+ * NE_EIF_LOAD_OFFSET - The offset where to copy the Enclave Image 
Format (EIF)

+ *    image in enclave memory.
+ */
+#define NE_EIF_LOAD_OFFSET    (8 * 1024UL * 1024UL)
+
+/**
+ * NE_MIN_ENCLAVE_MEM_SIZE - The minimum memory size an enclave can 
be launched

+ * with.
+ */
+#define NE_MIN_ENCLAVE_MEM_SIZE    (64 * 1024UL * 1024UL)
+
+/**
+ * NE_MIN_MEM_REGION_SIZE - The minimum size of an enclave memory 
region.

+ */
+#define NE_MIN_MEM_REGION_SIZE    (2 * 1024UL * 1024UL)
+
+/*
+ * TODO: Update logic to create new sysfs entries instead of using
+ * a kernel parameter e.g. if multiple sysfs files needed.
+ */
+static const struct kernel_param_ops ne_cpu_pool_ops = {
+    .get    = param_get_string,
+};
+
+static char ne_cpus[NE_CPUS_SIZE];
+static struct kparam_string ne_cpus_arg = {
+    .maxlen    = sizeof(ne_cpus),
+    .string    = ne_cpus,
+};
+
+module_param_cb(ne_cpus, _cpu_pool_ops, _cpus_arg, 0644);
+/* 
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html#cpu-lists 
*/
+MODULE_PARM_DESC(ne_cpus, " - CPU pool used for Nitro 
Enclaves");

+
+/**
+ * struct ne_cpu_pool - CPU pool used for Nitro Enclaves.
+ * @avail_cores:    Available CPU cores in the pool.
+ * @avail_cores_size:    The size of the available cores array.
+ * @mutex:    Mutex for the access to the NE CPU pool.
+ * @numa_node:    NUMA node of the CPUs in the pool.
+ */
+struct ne_cpu_pool {
+    cpumask_var_t    *avail_cores;
+    unsigned int    avail_cores_size;
+    struct mutex    mutex;
+    int    numa_node;
+};
+
+static struct ne_cpu_pool ne_cpu_pool;
+
+static const struct file_operations ne_fops = {
+    .owner    = THIS_MODULE,
+    .llseek    = noop_llseek,
+};
+
+struct miscdevice ne_misc_dev = {
+    .minor    = MISC_DYNAMIC_MINOR,
+    .name    = "nitro_enclaves",
+    .fops    = _fops,
+    .mode    = 0660,
+};
+
+static int __init ne_init(void)
+{
+    mutex_init(_cpu_pool.mutex);
+
+    return pci_register_driver(_pci_driver);
+}
+
+static void __exit ne_exit(void)
+{
+    pci_unregister_driver(_pci_driver);
+}
+
+/* TODO: Handle actions such as reboot, kexec. */
+
+module_init(ne_init);
+module_exit(ne_exit);
+
+MODULE_AUTHOR("Amazon.com, Inc. or its affiliates");
+MODULE_DESCRIPTION("Nitro Enclaves Driver");
+MODULE_LICENSE("GPL v2");

Re: [PATCH v6 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves driver

2020-08-05 Thread Paraschiv, Andra-Irina




On 05/08/2020 17:23, kernel test robot wrote:


Hi Andra,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linux/master]
[also build test ERROR on linus/master v5.8 next-20200805]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Andra-Paraschiv/Add-support-for-Nitro-Enclaves/20200805-171942
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
bcf876870b95592b52519ed4aafcf9d95999bc9c
config: arm64-allyesconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
 wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
 chmod +x ~/bin/make.cross
 # save the attached .config to linux build tree
 COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=arm64


Removed, for now, the dependency on ARM64 arch. x86 is currently 
supported, with Arm to come afterwards.


Thanks,
Andra



If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

drivers/virt/nitro_enclaves/ne_misc_dev.c: In function 'ne_setup_cpu_pool':

drivers/virt/nitro_enclaves/ne_misc_dev.c:245:46: error: 'smp_num_siblings' 
undeclared (first use in this function); did you mean 'cpu_sibling'?

  245 |  ne_cpu_pool.avail_cores_size = nr_cpu_ids / smp_num_siblings;
  |  ^~~~
  |  cpu_sibling
drivers/virt/nitro_enclaves/ne_misc_dev.c:245:46: note: each undeclared 
identifier is reported only once for each function it appears in
drivers/virt/nitro_enclaves/ne_misc_dev.c: In function 'ne_enclave_ioctl':
drivers/virt/nitro_enclaves/ne_misc_dev.c:928:54: error: 'smp_num_siblings' 
undeclared (first use in this function)
  928 |   if (vcpu_id >= (ne_enclave->avail_cpu_cores_size * 
smp_num_siblings)) {
  |  
^~~~

vim +245 drivers/virt/nitro_enclaves/ne_misc_dev.c

7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  130
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  131  /**
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  132   * ne_setup_cpu_pool() - Set 
the NE CPU pool after handling sanity checks such
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  133   *  as not 
sharing CPU cores with the primary / parent VM
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  134   *  or not 
using CPU 0, which should remain available for
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  135   *  the 
primary / parent VM. Offline the CPUs from the
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  136   *  pool 
after the checks passed.
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  137   * @ne_cpu_list:   The CPU 
list used for setting NE CPU pool.
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  138   *
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  139   * Context: Process context.
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  140   * Return:
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  141   * * 0 on success.
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  142   * * Negative return value on 
failure.
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  143   */
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  144  static int 
ne_setup_cpu_pool(const char *ne_cpu_list)
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  145  {
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  146 int core_id = -1;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  147 unsigned int cpu = 0;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  148 cpumask_var_t cpu_pool 
= NULL;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  149 unsigned int 
cpu_sibling = 0;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  150 unsigned int i = 0;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  151 int numa_node = -1;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  152 int rc = -EINVAL;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  153
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  154 if (!ne_cpu_list)
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  155 return 0;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  156
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  157 if 
(!zalloc_cpumask_var(_pool, GFP_KERNEL))
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  158 return -ENOMEM;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  159
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  160 
mutex_lock(_cpu_pool.mutex);
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  161
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  162 rc = 
cpulist_parse(ne_cpu_list, cpu_pool);
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05 

Re: [PATCH v5 01/18] nitro_enclaves: Add ioctl interface definition

2020-07-24 Thread Paraschiv, Andra-Irina



On 24/07/2020 02:04, Alexander Graf wrote:



On 23.07.20 20:21, Paraschiv, Andra-Irina wrote:



On 23/07/2020 13:54, Greg KH wrote:

On Thu, Jul 23, 2020 at 12:23:56PM +0300, Paraschiv, Andra-Irina wrote:


On 22/07/2020 12:57, Greg KH wrote:
On Wed, Jul 22, 2020 at 11:27:29AM +0300, Paraschiv, Andra-Irina 
wrote:

+#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
+#define _UAPI_LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+/* Nitro Enclaves (NE) Kernel Driver Interface */
+
+#define NE_API_VERSION (1)

Why do you need this version?  It shouldn't be needed, right?
The version is used as a way for the user space tooling to sync 
on the
features set provided by the driver e.g. in case an older version 
of the
driver is available on the system and the user space tooling 
expects a set

of features that is not included in that driver version.

That is guaranteed to get out of sync instantly with different distro
kernels backporting random things, combined with stable kernel patch
updates and the like.

Just use the normal api interfaces instead, don't try to "version"
anything, it will not work, trust us :)

If an ioctl returns -ENOTTY then hey, it's not present and your
userspace code can handle it that way.
Correct, there could be a variety of kernel versions and user space 
tooling
either in the original form, customized or written from scratch. 
And ENOTTY

signals an ioctl not available or e.g. EINVAL (or custom error) if the
parameter field value is not valid within a certain version. We 
have these

in place, that's good. :)

However, I was thinking, for example, of an ioctl flow usage where 
a certain
order needs to be followed e.g. create a VM, add resources to a VM, 
start a

VM.

Let's say, for an use case wrt new features, ioctl A (create a VM) 
succeeds,

ioctl B (add memory to the VM) succeeds, ioctl C (add CPU to the VM)
succeeds and ioctl D (add any other type of resource before 
starting the VM)

fails because it is not supported.

Would not need to call ioctl A to C and go through their underneath 
logic to
realize ioctl D support is not there and rollback all the changes 
done till
then within ioctl A to C logic. Of course, there could be ioctl A 
followed
by ioctl D, and would need to rollback ioctl A changes, but I 
shared a more

lengthy call chain that can be an option as well.

I think you are overthinking this.

If your interface is this complex, you have much larger issues as you
ALWAYS have to be able to handle error conditions properly, even if the
API is "supported".


True, the error paths need to handled correctly on the kernel driver 
and on the user space logic side, independent of supported features 
or not. Cannot assume that all ioctl callers are behaving correctly 
or there are no errors in the system.


What I wanted to cover with that example is more towards the user 
space logic using new features, either early exiting before even 
trying the ioctl call flow path or going through part of the flow 
till getting the error e.g. ENOTTY for one of the ioctl calls.


If we need an API to query for new features, we can add it once we add 
new features, no? The absence of the query API will indicate that no 
additional features are available.


So yes, sorry, oversight on my side :(. I agree with Greg: there 
really is no need for a version query API as of today.


No problem. I can remove the versioning logic for now, although I think 
it would have been fine to have it from the beginning if we want to move 
further with a version query API in the end.


The overall discussion here was more towards having the versioning logic 
or not at all, either within the current code base or while getting to 
new features.








Perhaps your API is showing to be too complex?

Also, where is the userspace code for all of this?  Did I miss a 
link to

it in the patches somewhere?


Nope, you didn't miss any references to it. The codebase for the user 
space code is not publicly available for now, but it will be 
available on GitHub once the whole project is GA. And I'll include 
the refs, once available, in the NE kernel driver documentation.


Patch 16/18 contains an example user space to drive the ioctl interface.


Yup, and the flow mentioned in the previous mail is included in the 
ioctl usage sample.


Thanks,
Andra



The code base Andra is referring to above is going to be a more 
complete framework to drive Nitro Enclaves that also consumes this 
kernel API.



Alex





Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v5 01/18] nitro_enclaves: Add ioctl interface definition

2020-07-23 Thread Paraschiv, Andra-Irina



On 23/07/2020 13:54, Greg KH wrote:

On Thu, Jul 23, 2020 at 12:23:56PM +0300, Paraschiv, Andra-Irina wrote:


On 22/07/2020 12:57, Greg KH wrote:

On Wed, Jul 22, 2020 at 11:27:29AM +0300, Paraschiv, Andra-Irina wrote:

+#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
+#define _UAPI_LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+/* Nitro Enclaves (NE) Kernel Driver Interface */
+
+#define NE_API_VERSION (1)

Why do you need this version?  It shouldn't be needed, right?

The version is used as a way for the user space tooling to sync on the
features set provided by the driver e.g. in case an older version of the
driver is available on the system and the user space tooling expects a set
of features that is not included in that driver version.

That is guaranteed to get out of sync instantly with different distro
kernels backporting random things, combined with stable kernel patch
updates and the like.

Just use the normal api interfaces instead, don't try to "version"
anything, it will not work, trust us :)

If an ioctl returns -ENOTTY then hey, it's not present and your
userspace code can handle it that way.

Correct, there could be a variety of kernel versions and user space tooling
either in the original form, customized or written from scratch. And ENOTTY
signals an ioctl not available or e.g. EINVAL (or custom error) if the
parameter field value is not valid within a certain version. We have these
in place, that's good. :)

However, I was thinking, for example, of an ioctl flow usage where a certain
order needs to be followed e.g. create a VM, add resources to a VM, start a
VM.

Let's say, for an use case wrt new features, ioctl A (create a VM) succeeds,
ioctl B (add memory to the VM) succeeds, ioctl C (add CPU to the VM)
succeeds and ioctl D (add any other type of resource before starting the VM)
fails because it is not supported.

Would not need to call ioctl A to C and go through their underneath logic to
realize ioctl D support is not there and rollback all the changes done till
then within ioctl A to C logic. Of course, there could be ioctl A followed
by ioctl D, and would need to rollback ioctl A changes, but I shared a more
lengthy call chain that can be an option as well.

I think you are overthinking this.

If your interface is this complex, you have much larger issues as you
ALWAYS have to be able to handle error conditions properly, even if the
API is "supported".


True, the error paths need to handled correctly on the kernel driver and 
on the user space logic side, independent of supported features or not. 
Cannot assume that all ioctl callers are behaving correctly or there are 
no errors in the system.


What I wanted to cover with that example is more towards the user space 
logic using new features, either early exiting before even trying the 
ioctl call flow path or going through part of the flow till getting the 
error e.g. ENOTTY for one of the ioctl calls.




Perhaps your API is showing to be too complex?

Also, where is the userspace code for all of this?  Did I miss a link to
it in the patches somewhere?


Nope, you didn't miss any references to it. The codebase for the user 
space code is not publicly available for now, but it will be available 
on GitHub once the whole project is GA. And I'll include the refs, once 
available, in the NE kernel driver documentation.


I can summarize here the ioctl interface usage flow, let me know if I 
can help with more clarifications:


Enclave creation

* Open the misc device (/dev/nitro_enclaves) and get the device fd.
* Using the device fd, call NE_GET_API_VERSION to check the API version.
* Using the device fd, call NE_CREATE_VM and get an enclave fd.
* Using the enclave fd, call NE_GET_IMAGE_LOAD_INFO to get the offset in 
the enclave memory where to place the enclave image. Enclave memory 
regions consist of hugetlbfs huge pages. Place the enclave image in 
enclave memory.
* Using the enclave fd, call NE_SET_USER_MEMORY_REGION to set a memory 
region for an enclave. Repeat this step for all enclave memory regions.
* Using the enclave fd, call NE_ADD_VCPU to add a vCPU for an enclave. 
Repeat this step for all enclave vCPUs. The CPUs are part of the NE CPU 
pool, set using a sysfs file before starting to create an enclave.

* Using the enclave fd, call NE_START_ENCLAVE to start an enclave.


Enclave termination

* Close the enclave fd.


Thanks,
Andra



good luck!

greg k-h





Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v5 01/18] nitro_enclaves: Add ioctl interface definition

2020-07-23 Thread Paraschiv, Andra-Irina



On 22/07/2020 12:57, Greg KH wrote:

On Wed, Jul 22, 2020 at 11:27:29AM +0300, Paraschiv, Andra-Irina wrote:

+#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
+#define _UAPI_LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+/* Nitro Enclaves (NE) Kernel Driver Interface */
+
+#define NE_API_VERSION (1)

Why do you need this version?  It shouldn't be needed, right?

The version is used as a way for the user space tooling to sync on the
features set provided by the driver e.g. in case an older version of the
driver is available on the system and the user space tooling expects a set
of features that is not included in that driver version.

That is guaranteed to get out of sync instantly with different distro
kernels backporting random things, combined with stable kernel patch
updates and the like.

Just use the normal api interfaces instead, don't try to "version"
anything, it will not work, trust us :)

If an ioctl returns -ENOTTY then hey, it's not present and your
userspace code can handle it that way.


Correct, there could be a variety of kernel versions and user space 
tooling either in the original form, customized or written from scratch. 
And ENOTTY signals an ioctl not available or e.g. EINVAL (or custom 
error) if the parameter field value is not valid within a certain 
version. We have these in place, that's good. :)


However, I was thinking, for example, of an ioctl flow usage where a 
certain order needs to be followed e.g. create a VM, add resources to a 
VM, start a VM.


Let's say, for an use case wrt new features, ioctl A (create a VM) 
succeeds, ioctl B (add memory to the VM) succeeds, ioctl C (add CPU to 
the VM) succeeds and ioctl D (add any other type of resource before 
starting the VM) fails because it is not supported.


Would not need to call ioctl A to C and go through their underneath 
logic to realize ioctl D support is not there and rollback all the 
changes done till then within ioctl A to C logic. Of course, there could 
be ioctl A followed by ioctl D, and would need to rollback ioctl A 
changes, but I shared a more lengthy call chain that can be an option as 
well.


Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v5 01/18] nitro_enclaves: Add ioctl interface definition

2020-07-22 Thread Paraschiv, Andra-Irina



On 21/07/2020 15:12, Greg KH wrote:

On Wed, Jul 15, 2020 at 10:45:23PM +0300, Andra Paraschiv wrote:

The Nitro Enclaves driver handles the enclave lifetime management. This
includes enclave creation, termination and setting up its resources such
as memory and CPU.

An enclave runs alongside the VM that spawned it. It is abstracted as a
process running in the VM that launched it. The process interacts with
the NE driver, that exposes an ioctl interface for creating an enclave
and setting up its resources.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
Reviewed-by: Alexander Graf 
---
Changelog

v4 -> v5

* Add more details about the ioctl calls usage e.g. error codes, file
   descriptors used.
* Update the ioctl to set an enclave vCPU to not return a file
   descriptor.
* Add specific NE error codes.

v3 -> v4

* Decouple NE ioctl interface from KVM API.
* Add NE API version and the corresponding ioctl call.
* Add enclave / image load flags options.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
   already in place.

v1 -> v2

* Add ioctl for getting enclave image load metadata.
* Update NE_ENCLAVE_START ioctl name to NE_START_ENCLAVE.
* Add entry in Documentation/userspace-api/ioctl/ioctl-number.rst for NE
   ioctls.
* Update NE ioctls definition based on the updated ioctl range for major
   and minor.
---
  .../userspace-api/ioctl/ioctl-number.rst  |   5 +-
  include/linux/nitro_enclaves.h|  11 +
  include/uapi/linux/nitro_enclaves.h   | 244 ++
  3 files changed, 259 insertions(+), 1 deletion(-)
  create mode 100644 include/linux/nitro_enclaves.h
  create mode 100644 include/uapi/linux/nitro_enclaves.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 59472cd6a11d..783440c6719b 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -328,8 +328,11 @@ Code  Seq#Include File 
  Comments
  0xAC  00-1F  linux/raw.h
  0xAD  00 
Netfilter device in development:
   

-0xAE  alllinux/kvm.h 
Kernel-based Virtual Machine
+0xAE  00-1F  linux/kvm.h 
Kernel-based Virtual Machine
   

+0xAE  40-FF  linux/kvm.h 
Kernel-based Virtual Machine
+ 

+0xAE  20-3F  linux/nitro_enclaves.h  Nitro 
Enclaves
  0xAF  00-1F  linux/fsl_hypervisor.h  
Freescale hypervisor
  0xB0  allRATIO 
devices in development:
   

diff --git a/include/linux/nitro_enclaves.h b/include/linux/nitro_enclaves.h
new file mode 100644
index ..d91ef2bfdf47
--- /dev/null
+++ b/include/linux/nitro_enclaves.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _LINUX_NITRO_ENCLAVES_H_
+#define _LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+#endif /* _LINUX_NITRO_ENCLAVES_H_ */
diff --git a/include/uapi/linux/nitro_enclaves.h 
b/include/uapi/linux/nitro_enclaves.h
new file mode 100644
index ..cf1192f6e923
--- /dev/null
+++ b/include/uapi/linux/nitro_enclaves.h
@@ -0,0 +1,244 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
+#define _UAPI_LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+/* Nitro Enclaves (NE) Kernel Driver Interface */
+
+#define NE_API_VERSION (1)

Why do you need this version?  It shouldn't be needed, right?


The version is used as a way for the user space tooling to sync on the 
features set provided by the driver e.g. in case an older version of the 
driver is available on the system and the user space tooling expects a 
set of features that is not included in that driver version.







+
+/**
+ * The command is used to get the version of the NE API. This way the user 
space
+ * processes can be aware of the feature sets provided by the NE kernel driver.
+ *
+ * The NE API version is returned as result of this ioctl call.
+ *
+ * The ioctl can be invoked on the /dev/nitro_enclaves fd, independent of
+ * enclaves already created / started or not.
+ *
+ * No errors are returned.
+ */
+#define NE_GET_API_VERSION _IO(0xAE, 

Re: [PATCH v5 05/18] nitro_enclaves: Handle PCI device command requests

2020-07-22 Thread Paraschiv, Andra-Irina




On 21/07/2020 13:17, Alexander Graf wrote:



On 15.07.20 21:45, Andra Paraschiv wrote:

The Nitro Enclaves PCI device exposes a MMIO space that this driver
uses to submit command requests and to receive command replies e.g. for
enclave creation / termination or setting enclave resources.

Add logic for handling PCI device command requests based on the given
command type.

Register an MSI-X interrupt vector for command reply notifications to
handle this type of communication events.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 

Fix issue reported in:
https://lore.kernel.org/lkml/202004231644.xtmn4z1z%25...@intel.com/

Reported-by: kbuild test robot 


This means that the overall patch is a fix that was reported by the 
test rebot. I doubt that's what you mean. Just remove the line.


I wanted to mention that the patch includes also the fix for the report. 
I moved these details to the changelog.





Signed-off-by: Andra Paraschiv 


Reviewed-by: Alexander Graf 


Thanks for review.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v5 04/18] nitro_enclaves: Init PCI device driver

2020-07-21 Thread Paraschiv, Andra-Irina




On 20/07/2020 17:24, Alexander Graf wrote:



On 15.07.20 21:45, Andra Paraschiv wrote:

The Nitro Enclaves PCI device is used by the kernel driver as a means of
communication with the hypervisor on the host where the primary VM and
the enclaves run. It handles requests with regard to enclave lifetime.

Setup the PCI device driver and add support for MSI-X interrupts.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Alexandru Ciobotaru 
Signed-off-by: Andra Paraschiv 


Reviewed-by: Alexander Graf 


Included the Rb, thanks for review.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v5 01/18] nitro_enclaves: Add ioctl interface definition

2020-07-16 Thread Paraschiv, Andra-Irina




On 16/07/2020 11:30, Stefan Hajnoczi wrote:

On Wed, Jul 15, 2020 at 10:45:23PM +0300, Andra Paraschiv wrote:

+ * A NE CPU pool has be set before calling this function. The pool can be set

s/has be/has to be/


Fixed.



Thanks, this looks good!

Reviewed-by: Stefan Hajnoczi 


Thanks for review, glad to hear it's in a better shape.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v4 11/18] nitro_enclaves: Add logic for enclave memory region set

2020-07-09 Thread Paraschiv, Andra-Irina




On 09/07/2020 11:40, Alexander Graf wrote:



On 09.07.20 09:36, Paraschiv, Andra-Irina wrote:



On 06/07/2020 13:46, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:
Another resource that is being set for an enclave is memory. User 
space

memory regions, that need to be backed by contiguous memory regions,
are associated with the enclave.

One solution for allocating / reserving contiguous memory regions, 
that

is used for integration, is hugetlbfs. The user space process that is
associated with the enclave passes to the driver these memory regions.

The enclave memory regions need to be from the same NUMA node as the
enclave CPUs.

Add ioctl command logic for setting user space memory region for an
enclave.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v3 -> v4

* Check enclave memory regions are from the same NUMA node as the
   enclave CPUs.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave max memory regions is reached when setting an 
enclave

   memory region.
* Check if enclave state is init when setting an enclave memory 
region.

---
  drivers/virt/nitro_enclaves/ne_misc_dev.c | 257 
++

  1 file changed, 257 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c

index cfdefa52ed2a..17ccb6cdbd75 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -476,6 +476,233 @@ static int ne_create_vcpu_ioctl(struct 
ne_enclave *ne_enclave, u32 vcpu_id)

  return rc;
  }
  +/**
+ * ne_sanity_check_user_mem_region - Sanity check the userspace 
memory

+ * region received during the set user memory region ioctl call.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @mem_region: user space memory region to be sanity checked.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_sanity_check_user_mem_region(struct ne_enclave 
*ne_enclave,

+    struct ne_user_memory_region *mem_region)
+{
+    if (ne_enclave->mm != current->mm)
+    return -EIO;
+
+    if ((mem_region->memory_size % NE_MIN_MEM_REGION_SIZE) != 0) {
+    dev_err_ratelimited(ne_misc_dev.this_device,
+    "Mem size not multiple of 2 MiB\n");
+
+    return -EINVAL;


Can we make this an error that gets propagated to user space 
explicitly? I'd rather have a clear error return value of this 
function than a random message in dmesg.


We can make this, will add memory checks specific NE error codes, as 
for the other call paths in the series e.g. enclave CPU(s) setup.





+    }
+
+    if ((mem_region->userspace_addr & (NE_MIN_MEM_REGION_SIZE - 
1)) ||


This logic already relies on the fact that NE_MIN_MEM_REGION_SIZE is 
a power of two. Can you do the same above on the memory_size check?


Done.



+    !access_ok((void __user *)(unsigned 
long)mem_region->userspace_addr,

+   mem_region->memory_size)) {
+    dev_err_ratelimited(ne_misc_dev.this_device,
+    "Invalid user space addr range\n");
+
+    return -EINVAL;


Same comment again. Return different errors for different 
conditions, so that user space has a chance to print proper errors 
to its users.


Also, don't we have to check alignment of userspace_addr as well?



Would need an alignment check for 2 MiB at least, yes.


+    }
+
+    return 0;
+}
+
+/**
+ * ne_set_user_memory_region_ioctl - Add user space memory region 
to the slot

+ * associated with the current enclave.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @mem_region: user space memory region to be associated with the 
given slot.

+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_set_user_memory_region_ioctl(struct ne_enclave 
*ne_enclave,

+    struct ne_user_memory_region *mem_region)
+{
+    struct ne_pci_dev_cmd_reply cmd_reply = {};
+    long gup_rc = 0;
+    unsigned long i = 0;
+    struct ne_mem_region *ne_mem_region = NULL;
+    unsigned long nr_phys_contig_mem_regions = 0;
+    unsigned long nr_pinned_pages = 0;
+    struct page **phys_contig_mem_regions = NULL;
+    int rc = -EINVAL;
+    struct slot_add_mem_req slot_add_mem_req = {};
+
+    rc = ne_sanity_check_user_mem_region(ne_enclave, mem_region);
+    if (rc < 0)
+    return rc;
+
+    ne_mem_region = kzalloc(sizeof(*ne_mem_region), GFP_KERNEL);
+    if (!ne_mem_region)
+    return 

Re: [PATCH v4 11/18] nitro_enclaves: Add logic for enclave memory region set

2020-07-09 Thread Paraschiv, Andra-Irina




On 06/07/2020 13:46, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

Another resource that is being set for an enclave is memory. User space
memory regions, that need to be backed by contiguous memory regions,
are associated with the enclave.

One solution for allocating / reserving contiguous memory regions, that
is used for integration, is hugetlbfs. The user space process that is
associated with the enclave passes to the driver these memory regions.

The enclave memory regions need to be from the same NUMA node as the
enclave CPUs.

Add ioctl command logic for setting user space memory region for an
enclave.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v3 -> v4

* Check enclave memory regions are from the same NUMA node as the
   enclave CPUs.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave max memory regions is reached when setting an enclave
   memory region.
* Check if enclave state is init when setting an enclave memory region.
---
  drivers/virt/nitro_enclaves/ne_misc_dev.c | 257 ++
  1 file changed, 257 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c

index cfdefa52ed2a..17ccb6cdbd75 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -476,6 +476,233 @@ static int ne_create_vcpu_ioctl(struct 
ne_enclave *ne_enclave, u32 vcpu_id)

  return rc;
  }
  +/**
+ * ne_sanity_check_user_mem_region - Sanity check the userspace memory
+ * region received during the set user memory region ioctl call.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @mem_region: user space memory region to be sanity checked.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_sanity_check_user_mem_region(struct ne_enclave 
*ne_enclave,

+    struct ne_user_memory_region *mem_region)
+{
+    if (ne_enclave->mm != current->mm)
+    return -EIO;
+
+    if ((mem_region->memory_size % NE_MIN_MEM_REGION_SIZE) != 0) {
+    dev_err_ratelimited(ne_misc_dev.this_device,
+    "Mem size not multiple of 2 MiB\n");
+
+    return -EINVAL;


Can we make this an error that gets propagated to user space 
explicitly? I'd rather have a clear error return value of this 
function than a random message in dmesg.


We can make this, will add memory checks specific NE error codes, as for 
the other call paths in the series e.g. enclave CPU(s) setup.





+    }
+
+    if ((mem_region->userspace_addr & (NE_MIN_MEM_REGION_SIZE - 1)) ||


This logic already relies on the fact that NE_MIN_MEM_REGION_SIZE is a 
power of two. Can you do the same above on the memory_size check?


Done.



+    !access_ok((void __user *)(unsigned 
long)mem_region->userspace_addr,

+   mem_region->memory_size)) {
+    dev_err_ratelimited(ne_misc_dev.this_device,
+    "Invalid user space addr range\n");
+
+    return -EINVAL;


Same comment again. Return different errors for different conditions, 
so that user space has a chance to print proper errors to its users.


Also, don't we have to check alignment of userspace_addr as well?



Would need an alignment check for 2 MiB at least, yes.


+    }
+
+    return 0;
+}
+
+/**
+ * ne_set_user_memory_region_ioctl - Add user space memory region to 
the slot

+ * associated with the current enclave.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @mem_region: user space memory region to be associated with the 
given slot.

+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_set_user_memory_region_ioctl(struct ne_enclave 
*ne_enclave,

+    struct ne_user_memory_region *mem_region)
+{
+    struct ne_pci_dev_cmd_reply cmd_reply = {};
+    long gup_rc = 0;
+    unsigned long i = 0;
+    struct ne_mem_region *ne_mem_region = NULL;
+    unsigned long nr_phys_contig_mem_regions = 0;
+    unsigned long nr_pinned_pages = 0;
+    struct page **phys_contig_mem_regions = NULL;
+    int rc = -EINVAL;
+    struct slot_add_mem_req slot_add_mem_req = {};
+
+    rc = ne_sanity_check_user_mem_region(ne_enclave, mem_region);
+    if (rc < 0)
+    return rc;
+
+    ne_mem_region = kzalloc(sizeof(*ne_mem_region), GFP_KERNEL);
+    if (!ne_mem_region)
+    return -ENOMEM;
+
+    /*
+ * TODO: Update nr_pages value to handle contiguous virtual address
+ * ranges mapped to non-contiguous physical regions. Hugetlbfs 
can 

Re: [PATCH v4 09/18] nitro_enclaves: Add logic for enclave vcpu creation

2020-07-08 Thread Paraschiv, Andra-Irina




On 06/07/2020 13:12, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

An enclave, before being started, has its resources set. One of its
resources is CPU.

The NE CPU pool is set for choosing CPUs for enclaves from it. Offline
the CPUs from the NE CPU pool during the pool setup and online them back
during the NE CPU pool teardown.

The enclave CPUs need to be full cores and from the same NUMA node. CPU
0 and its siblings have to remain available to the primary / parent VM.

Add ioctl command logic for enclave vCPU creation. Return as result a
file descriptor that is associated with the enclave vCPU.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v3 -> v4

* Setup the NE CPU pool at runtime via a sysfs file for the kernel
   parameter.
* Check enclave CPUs to be from the same NUMA node.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open, ioctl and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave state is init when setting enclave vcpu.
---
  drivers/virt/nitro_enclaves/ne_misc_dev.c | 491 ++
  1 file changed, 491 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c

index f70496813033..d6777008f685 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -39,7 +39,11 @@
   * TODO: Update logic to create new sysfs entries instead of using
   * a kernel parameter e.g. if multiple sysfs files needed.
   */
+static int ne_set_kernel_param(const char *val, const struct 
kernel_param *kp);

+
  static const struct kernel_param_ops ne_cpu_pool_ops = {
+    .get = param_get_string,
+    .set = ne_set_kernel_param,
  };
    static char ne_cpus[PAGE_SIZE];
@@ -60,6 +64,485 @@ struct ne_cpu_pool {
    static struct ne_cpu_pool ne_cpu_pool;
  +static const struct file_operations ne_enclave_vcpu_fops = {
+    .owner    = THIS_MODULE,
+    .llseek    = noop_llseek,
+};


Do we really need an fd for an object without operations? I think the 
general flow to add CPUs from the pool to the VM is very sensible. But 
I don't think we really need an fd as return value from that operation.


Not particularly now, I kept it here for any potential further use cases 
where will need one and to make sure we take into account a stable 
interface and possibility for extensions.


As we've discussed that we can have as option for further extensions to 
add another ioctl which returns an fd, will update the current ioctl to 
keep the logic of adding a vCPU w/o generating an fd.





+
+/**
+ * ne_check_enclaves_created - Verify if at least one enclave has 
been created.

+ *
+ * @pdev: PCI device used for enclave lifetime management.
+ *
+ * @returns: true if at least one enclave is created, false otherwise.
+ */
+static bool ne_check_enclaves_created(struct pci_dev *pdev)
+{
+    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+    if (!ne_pci_dev)
+    return false;


Please pass in the ne_pci_dev into this function directly.


Updated the function signature.





+
+    mutex_lock(_pci_dev->enclaves_list_mutex);
+
+    if (list_empty(_pci_dev->enclaves_list)) {
+    mutex_unlock(_pci_dev->enclaves_list_mutex);
+
+    return false;


If you make this a return variable, you save on the unlock duplication.


Updated the logic to use a ret var.




+    }
+
+    mutex_unlock(_pci_dev->enclaves_list_mutex);
+
+    return true;
+}
+
+/**
+ * ne_setup_cpu_pool - Set the NE CPU pool after handling sanity 
checks such as
+ * not sharing CPU cores with the primary / parent VM or not using 
CPU 0, which
+ * should remain available for the primary / parent VM. Offline the 
CPUs from

+ * the pool after the checks passed.
+ *
+ * @pdev: PCI device used for enclave lifetime management.
+ * @ne_cpu_list: the CPU list used for setting NE CPU pool.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_setup_cpu_pool(struct pci_dev *pdev, const char 
*ne_cpu_list)

+{
+    unsigned int cpu = 0;
+    unsigned int cpu_sibling = 0;
+    int numa_node = -1;
+    int rc = -EINVAL;
+
+    if (!capable(CAP_SYS_ADMIN)) {
+    dev_err(>dev, "No admin capability for CPU pool 
setup\n");


No need to print anything here. It only gives non-admin users a chance 
to spill the kernel log. If non-admin users can write at all? Can they?


Also, isn't this at the wrong abstraction level? I would expect such a 
check to happen on the file write function, not here.


Removed the log. Non-admin users don't have the permission to write, 
that's the default file permission set. I wanted to guard the offline / 

Re: [PATCH v4 16/18] nitro_enclaves: Add sample for ioctl interface usage

2020-07-07 Thread Paraschiv, Andra-Irina




On 06/07/2020 14:39, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v3 -> v4

* Update usage details to match the updates in v4.
* Update NE ioctl interface usage.

v2 -> v3

* Remove the include directory to use the uapi from the kernel.
* Remove the GPL additional wording as SPDX-License-Identifier is
   already in place.

v1 -> v2

* New in v2.
---
  samples/nitro_enclaves/.gitignore    |   2 +
  samples/nitro_enclaves/Makefile  |  16 +
  samples/nitro_enclaves/ne_ioctl_sample.c | 520 +++
  3 files changed, 538 insertions(+)
  create mode 100644 samples/nitro_enclaves/.gitignore
  create mode 100644 samples/nitro_enclaves/Makefile
  create mode 100644 samples/nitro_enclaves/ne_ioctl_sample.c

diff --git a/samples/nitro_enclaves/.gitignore 
b/samples/nitro_enclaves/.gitignore

new file mode 100644
index ..827934129c90
--- /dev/null
+++ b/samples/nitro_enclaves/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+ne_ioctl_sample
diff --git a/samples/nitro_enclaves/Makefile 
b/samples/nitro_enclaves/Makefile

new file mode 100644
index ..a3ec78fefb52
--- /dev/null
+++ b/samples/nitro_enclaves/Makefile
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights 
Reserved.

+
+# Enclave lifetime management support for Nitro Enclaves (NE) - 
ioctl sample

+# usage.
+
+.PHONY: all clean
+
+CFLAGS += -Wall
+
+all:
+    $(CC) $(CFLAGS) -o ne_ioctl_sample ne_ioctl_sample.c -lpthread
+
+clean:
+    rm -f ne_ioctl_sample
diff --git a/samples/nitro_enclaves/ne_ioctl_sample.c 
b/samples/nitro_enclaves/ne_ioctl_sample.c

new file mode 100644
index ..572143d55d77
--- /dev/null
+++ b/samples/nitro_enclaves/ne_ioctl_sample.c
@@ -0,0 +1,520 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights 
Reserved.

+ */
+
+/**
+ * Sample flow of using the ioctl interface provided by the Nitro 
Enclaves (NE)

+ * kernel driver.
+ *
+ * Usage
+ * -
+ *
+ * Load the nitro_enclaves module, setting also the enclave CPU 
pool. The
+ * enclave CPUs need to be full cores from the same NUMA node. CPU 0 
and its
+ * siblings have to remain available for the primary / parent VM, so 
they

+ * cannot be included in the enclave CPU pool.
+ *
+ * See the cpu list section from the kernel documentation.
+ * 
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html

+ *
+ *    insmod drivers/virt/nitro_enclaves/nitro_enclaves.ko
+ *    lsmod
+ *
+ *    The CPU pool can be set at runtime, after the kernel module is 
loaded.

+ *
+ *    echo  > /sys/module/nitro_enclaves/parameters/ne_cpus
+ *
+ *    NUMA and CPU siblings information can be found using
+ *
+ *    lscpu
+ *    /proc/cpuinfo
+ *
+ * Check the online / offline CPU list. The CPUs from the pool 
should be

+ * offlined.
+ *
+ *    lscpu
+ *
+ * Check dmesg for any warnings / errors through the NE driver 
lifetime / usage.
+ * The NE logs contain the "nitro_enclaves" or "pci :00:02.0" 
pattern.

+ *
+ *    dmesg
+ *
+ * Setup hugetlbfs huge pages. The memory needs to be from the same 
NUMA node as

+ * the enclave CPUs.
+ * https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
+ *
+ *    echo  > /proc/sys/vm/nr_hugepages
+ *
+ *    or set the number of 2 MiB / 1 GiB hugepages using
+ *
+ *    /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
+ *    /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
+ *
+ *    In this example 256 hugepages of 2 MiB are used.
+ *
+ * Build and run the NE sample.
+ *
+ *    make -C samples/nitro_enclaves clean
+ *    make -C samples/nitro_enclaves
+ *    ./samples/nitro_enclaves/ne_ioctl_sample 
+ *
+ * Unload the nitro_enclaves module.
+ *
+ *    rmmod nitro_enclaves
+ *    lsmod
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+/* Nitro Enclaves (NE) misc device that provides the ioctl 
interface. */

+#define NE_DEV_NAME "/dev/nitro_enclaves"
+#define NE_EXPECTED_API_VERSION (1)
+
+/* Timeout in seconds / milliseconds for each poll event. */
+#define NE_POLL_WAIT_TIME (60)
+#define NE_POLL_WAIT_TIME_MS (NE_POLL_WAIT_TIME * 1000)
+
+/* Amount of time in seconds for the process to keep the enclave 
alive. */

+#define NE_SLEEP_TIME (300)
+
+/* Enclave vCPUs metadata. */
+#define NE_DEFAULT_NR_VCPUS (2)
+
+/* Enclave memory metadata */
+
+/* Min memory size - 2 MiB */
+#define NE_MIN_MEM_REGION_SIZE (2 * 1024 * 1024)
+
+/* 256 memory regions of 2 MiB */
+#define NE_DEFAULT_NR_MEM_REGIONS (256)
+
+/* Vsock addressing for enclave image loading heartbeat. */
+#define NE_IMAGE_LOAD_VSOCK_CID (3)
+#define NE_IMAGE_LOAD_VSOCK_PORT (9000)
+#define NE_IMAGE_LOAD_HEARTBEAT_VALUE (0xb7)
+

Re: [PATCH v4 12/18] nitro_enclaves: Add logic for enclave start

2020-07-07 Thread Paraschiv, Andra-Irina




On 06/07/2020 14:21, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

After all the enclave resources are set, the enclave is ready for
beginning to run.

Add ioctl command logic for starting an enclave after all its resources,
memory regions and CPUs, have been set.

The enclave start information includes the local channel addressing -
vsock CID - and the flags associated with the enclave.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update the naming for the ioctl command from metadata to info.
* Check for minimum enclave memory size.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.

v1 -> v2

* Add log pattern for NE.
* Check if enclave state is init when starting an enclave.
* Remove the BUG_ON calls.
---
  drivers/virt/nitro_enclaves/ne_misc_dev.c | 114 ++
  1 file changed, 114 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c

index 17ccb6cdbd75..d9794f327169 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -703,6 +703,45 @@ static int 
ne_set_user_memory_region_ioctl(struct ne_enclave *ne_enclave,

  return rc;
  }
  +/**
+ * ne_start_enclave_ioctl - Trigger enclave start after the enclave 
resources,

+ * such as memory and CPU, have been set.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @enclave_start_info: enclave info that includes enclave cid and 
flags.

+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_start_enclave_ioctl(struct ne_enclave *ne_enclave,
+    struct ne_enclave_start_info *enclave_start_info)
+{
+    struct ne_pci_dev_cmd_reply cmd_reply = {};
+    struct enclave_start_req enclave_start_req = {};
+    int rc = -EINVAL;
+
+    enclave_start_req.enclave_cid = enclave_start_info->enclave_cid;
+    enclave_start_req.flags = enclave_start_info->flags;
+    enclave_start_req.slot_uid = ne_enclave->slot_uid;


I think it's easier to read if you do the initialization straight in 
the variable declaation:


  struct enclave_start_req enclave_start_req = {
    .enclave_cid = enclave_start_info->cid,
    .flags = enclave_start_info->flags,
    .slot_uid = ne_enclave->slot_uid,
  };


Good point. In v5, I moved a couple of sanity checks from the ioctl 
switch case block in this function, so this would not apply wrt the 
updated codebase. But I'll keep this suggestion as reference for other 
cases.





+
+    rc = ne_do_request(ne_enclave->pdev, ENCLAVE_START, 
_start_req,

+   sizeof(enclave_start_req), _reply,
+   sizeof(cmd_reply));
+    if (rc < 0) {
+    dev_err_ratelimited(ne_misc_dev.this_device,
+    "Error in enclave start [rc=%d]\n", rc);
+
+    return rc;
+    }
+
+    ne_enclave->state = NE_STATE_RUNNING;
+
+    enclave_start_info->enclave_cid = cmd_reply.enclave_cid;
+
+    return 0;
+}
+
  static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
   unsigned long arg)
  {
@@ -818,6 +857,81 @@ static long ne_enclave_ioctl(struct file *file, 
unsigned int cmd,

  return rc;
  }
  +    case NE_START_ENCLAVE: {
+    struct ne_enclave_start_info enclave_start_info = {};
+    int rc = -EINVAL;
+
+    if (copy_from_user(_start_info, (void *)arg,
+   sizeof(enclave_start_info))) {
+    dev_err_ratelimited(ne_misc_dev.this_device,
+    "Error in copy from user\n");


No need to print anything here


Done.




+
+    return -EFAULT;
+    }
+
+    mutex_lock(_enclave->enclave_info_mutex);
+
+    if (ne_enclave->state != NE_STATE_INIT) {
+    dev_err_ratelimited(ne_misc_dev.this_device,
+    "Enclave isn't in init state\n");
+
+ mutex_unlock(_enclave->enclave_info_mutex);
+
+    return -EINVAL;


Can this be its own return value instead?


Yes, it should be and this would help with bubbling up to user space the 
reason of failure in more detail.


I started to define a set of NE error codes and update the failure paths 
(e.g. this one and the others mentioned below) to use those error codes.





+    }
+
+    if (!ne_enclave->nr_mem_regions) {
+    dev_err_ratelimited(ne_misc_dev.this_device,
+    "Enclave has no mem regions\n");
+
+ mutex_unlock(_enclave->enclave_info_mutex);
+
+    return -ENOMEM;
+    }
+
+    if (ne_enclave->mem_size < NE_MIN_ENCLAVE_MEM_SIZE) {
+    dev_err_ratelimited(ne_misc_dev.this_device,
+    "Enclave memory is less than %ld\n",
+    NE_MIN_ENCLAVE_MEM_SIZE);
+
+ mutex_unlock(_enclave->enclave_info_mutex);
+
+    return -ENOMEM;
+    }
+
+   

Re: [PATCH v4 13/18] nitro_enclaves: Add logic for enclave termination

2020-07-06 Thread Paraschiv, Andra-Irina




On 06/07/2020 14:26, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

An enclave is associated with an fd that is returned after the enclave
creation logic is completed. This enclave fd is further used to setup
enclave resources. Once the enclave needs to be terminated, the enclave
fd is closed.

Add logic for enclave termination, that is mapped to the enclave fd
release callback. Free the internal enclave info used for bookkeeping.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 


Reviewed-by: Alexander Graf 


Added. Thanks for review.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v4 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves driver

2020-07-06 Thread Paraschiv, Andra-Irina




On 06/07/2020 14:30, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

Signed-off-by: Andra Paraschiv 


Reviewed-by: Alexander Graf 


Added. Thank you.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v4 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver

2020-07-06 Thread Paraschiv, Andra-Irina




On 06/07/2020 14:28, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

Signed-off-by: Andra Paraschiv 
---
Changelog

v3 -> v4

* Add PCI and SMP dependencies.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
   already in place.

v1 -> v2

* Update path to Kconfig to match the drivers/virt/nitro_enclaves
   directory.
* Update help in Kconfig.
---
  drivers/virt/Kconfig    |  2 ++
  drivers/virt/nitro_enclaves/Kconfig | 16 
  2 files changed, 18 insertions(+)
  create mode 100644 drivers/virt/nitro_enclaves/Kconfig

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index cbc1f25c79ab..80c5f9c16ec1 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -32,4 +32,6 @@ config FSL_HV_MANAGER
   partition shuts down.
    source "drivers/virt/vboxguest/Kconfig"
+
+source "drivers/virt/nitro_enclaves/Kconfig"
  endif
diff --git a/drivers/virt/nitro_enclaves/Kconfig 
b/drivers/virt/nitro_enclaves/Kconfig

new file mode 100644
index ..69e41aad
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights 
Reserved.

+
+# Amazon Nitro Enclaves (NE) support.
+# Nitro is a hypervisor that has been developed by Amazon.
+
+config NITRO_ENCLAVES
+    tristate "Nitro Enclaves Support"
+    depends on HOTPLUG_CPU && PCI && SMP


Let's also depend on ARM64 || X86, so that we don't burden all of the 
other archs that are not available in EC2 today with an additional 
config option to think about.


Included the arch specs.

Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v4 10/18] nitro_enclaves: Add logic for enclave image load info

2020-07-06 Thread Paraschiv, Andra-Irina




On 06/07/2020 13:16, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

Before setting the memory regions for the enclave, the enclave image
needs to be placed in memory. After the memory regions are set, this
memory cannot be used anymore by the VM, being carved out.

Add ioctl command logic to get the offset in enclave memory where to
place the enclave image. Then the user space tooling copies the enclave
image in the memory using the given memory offset.

Signed-off-by: Andra Paraschiv 
---
Changelog

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Set enclave image load offset based on flags.
* Update the naming for the ioctl command from metadata to info.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.
---
  drivers/virt/nitro_enclaves/ne_misc_dev.c | 25 +++
  1 file changed, 25 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c

index d6777008f685..cfdefa52ed2a 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -536,6 +536,31 @@ static long ne_enclave_ioctl(struct file *file, 
unsigned int cmd,

  return rc;
  }
  +    case NE_GET_IMAGE_LOAD_INFO: {
+    struct ne_image_load_info image_load_info = {};
+
+    if (copy_from_user(_load_info, (void *)arg,
+   sizeof(image_load_info))) {
+    dev_err_ratelimited(ne_misc_dev.this_device,
+    "Error in copy from user\n");


The -EFAULT tells you all you need. Just remove this print.


Removed the log from here and the other occurrences in the patch series.

Thanks,
Andra




+
+    return -EFAULT;
+    }
+
+    if (image_load_info.flags == NE_EIF_IMAGE)
+    image_load_info.memory_offset = NE_EIF_LOAD_OFFSET;
+
+    if (copy_to_user((void *)arg, _load_info,
+ sizeof(image_load_info))) {
+    dev_err_ratelimited(ne_misc_dev.this_device,
+    "Error in copy to user\n");


Same here.


Alex


+
+    return -EFAULT;
+    }
+
+    return 0;
+    }
+
  default:
  return -ENOTTY;
  }






Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v4 08/18] nitro_enclaves: Add logic for enclave vm creation

2020-07-06 Thread Paraschiv, Andra-Irina




On 06/07/2020 10:53, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

Add ioctl command logic for enclave VM creation. It triggers a slot
allocation. The enclave resources will be associated with this slot and
it will be used as an identifier for triggering enclave run.

Return a file descriptor, namely enclave fd. This is further used by the
associated user space enclave process to set enclave resources and
trigger enclave termination.

The poll function is implemented in order to notify the enclave process
when an enclave exits without a specific enclave termination command
trigger e.g. when an enclave crashes.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 


Reviewed-by: Alexander Graf 


Added. Thank you.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface

2020-07-06 Thread Paraschiv, Andra-Irina




On 06/07/2020 11:01, Alexander Graf wrote:



On 06.07.20 09:49, Paraschiv, Andra-Irina wrote:



On 06/07/2020 10:13, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:
The Nitro Enclaves driver provides an ioctl interface to the user 
space
for enclave lifetime management e.g. enclave creation / termination 
and

setting enclave resources such as memory and CPU.

This ioctl interface is mapped to a Nitro Enclaves misc device.

Signed-off-by: Andra Paraschiv 
---
Changelog

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Remove the NE CPU pool init during kernel module loading, as the CPU
   pool is now setup at runtime, via a sysfs file for the kernel
   parameter.
* Add minimum enclave memory size definition.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
   already in place.
* Remove the WARN_ON calls.
* Remove linux/bug and linux/kvm_host includes that are not needed.
* Remove "ratelimited" from the logs that are not in the ioctl call
   paths.
* Remove file ops that do nothing for now - open and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Update ne_cpu_pool data structure to include the global mutex.
* Update NE misc device mode to 0660.
* Check if the CPU siblings are included in the NE CPU pool, as 
full CPU

   cores are given for the enclave(s).
---
  drivers/virt/nitro_enclaves/ne_misc_dev.c | 133 
++

  drivers/virt/nitro_enclaves/ne_pci_dev.c  |  11 ++
  2 files changed, 144 insertions(+)
  create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c

new file mode 100644
index ..628fb10c2b36
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights 
Reserved.

+ */
+
+/**
+ * Enclave lifetime management driver for Nitro Enclaves (NE).
+ * Nitro is a hypervisor that has been developed by Amazon.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+#define NE_EIF_LOAD_OFFSET (8 * 1024UL * 1024UL)
+
+#define NE_MIN_ENCLAVE_MEM_SIZE (64 * 1024UL * 1024UL)
+
+#define NE_MIN_MEM_REGION_SIZE (2 * 1024UL * 1024UL)
+
+/*
+ * TODO: Update logic to create new sysfs entries instead of using
+ * a kernel parameter e.g. if multiple sysfs files needed.
+ */
+static const struct kernel_param_ops ne_cpu_pool_ops = {


Adding an empty ops struct looks very odd. If you fill it in a later 
patch, please indicate so in a comment here.


True, I already updated this in v5, to have the .get function here 
and the .set one in a later patch.





+};
+
+static char ne_cpus[PAGE_SIZE];


PAGE_SIZE is a bit excessive, no? Even if you list every single CPU 
of a 256 CPU system you are <1024.


It is a bit too much, I was thinking of it while declaring this. I 
can update to 1024 in v5.


The largest NUMA node CPU count I'm aware of today is 64. Since we 
limit the pool to a single node, we can't go beyond that. Let's be a 
bit future proof and double that number: 128. Then we get to 401 
characters if you pass in every single CPU as comma separated. I would 
seriously hope most people would just pass ranges though.


So how about we make it 512 for now?


We can set it like this, I changed to 512 and updated the comment as well.

Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface

2020-07-06 Thread Paraschiv, Andra-Irina




On 06/07/2020 10:13, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

The Nitro Enclaves driver provides an ioctl interface to the user space
for enclave lifetime management e.g. enclave creation / termination and
setting enclave resources such as memory and CPU.

This ioctl interface is mapped to a Nitro Enclaves misc device.

Signed-off-by: Andra Paraschiv 
---
Changelog

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Remove the NE CPU pool init during kernel module loading, as the CPU
   pool is now setup at runtime, via a sysfs file for the kernel
   parameter.
* Add minimum enclave memory size definition.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
   already in place.
* Remove the WARN_ON calls.
* Remove linux/bug and linux/kvm_host includes that are not needed.
* Remove "ratelimited" from the logs that are not in the ioctl call
   paths.
* Remove file ops that do nothing for now - open and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Update ne_cpu_pool data structure to include the global mutex.
* Update NE misc device mode to 0660.
* Check if the CPU siblings are included in the NE CPU pool, as full CPU
   cores are given for the enclave(s).
---
  drivers/virt/nitro_enclaves/ne_misc_dev.c | 133 ++
  drivers/virt/nitro_enclaves/ne_pci_dev.c  |  11 ++
  2 files changed, 144 insertions(+)
  create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c

new file mode 100644
index ..628fb10c2b36
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights 
Reserved.

+ */
+
+/**
+ * Enclave lifetime management driver for Nitro Enclaves (NE).
+ * Nitro is a hypervisor that has been developed by Amazon.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+#define NE_EIF_LOAD_OFFSET (8 * 1024UL * 1024UL)
+
+#define NE_MIN_ENCLAVE_MEM_SIZE (64 * 1024UL * 1024UL)
+
+#define NE_MIN_MEM_REGION_SIZE (2 * 1024UL * 1024UL)
+
+/*
+ * TODO: Update logic to create new sysfs entries instead of using
+ * a kernel parameter e.g. if multiple sysfs files needed.
+ */
+static const struct kernel_param_ops ne_cpu_pool_ops = {


Adding an empty ops struct looks very odd. If you fill it in a later 
patch, please indicate so in a comment here.


True, I already updated this in v5, to have the .get function here and 
the .set one in a later patch.





+};
+
+static char ne_cpus[PAGE_SIZE];


PAGE_SIZE is a bit excessive, no? Even if you list every single CPU of 
a 256 CPU system you are <1024.


It is a bit too much, I was thinking of it while declaring this. I can 
update to 1024 in v5.


Thank you.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v4 06/18] nitro_enclaves: Handle out-of-band PCI device events

2020-07-04 Thread Paraschiv, Andra-Irina




On 02/07/2020 18:24, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

In addition to the replies sent by the Nitro Enclaves PCI device in
response to command requests, out-of-band enclave events can happen e.g.
an enclave crashes. In this case, the Nitro Enclaves driver needs to be
aware of the event and notify the corresponding user space process that
abstracts the enclave.

Register an MSI-X interrupt vector to be used for this kind of
out-of-band events. The interrupt notifies that the state of an enclave
changed and the driver logic scans the state of each running enclave to
identify for which this notification is intended.

Create an workqueue to handle the out-of-band events. Notify user space
enclave process that is using a polling mechanism on the enclave fd.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
   paths.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
---
  drivers/virt/nitro_enclaves/ne_pci_dev.c | 122 +++
  1 file changed, 122 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c

index c24230cfe7c0..9a137862cade 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -239,6 +239,93 @@ static irqreturn_t ne_reply_handler(int irq, 
void *args)

  return IRQ_HANDLED;
  }
  +/**
+ * ne_event_work_handler - Work queue handler for notifying enclaves on
+ * a state change received by the event interrupt handler.
+ *
+ * An out-of-band event is being issued by the Nitro Hypervisor when 
at least

+ * one enclave is changing state without client interaction.
+ *
+ * @work: item containing the Nitro Enclaves PCI device for which a
+ *  out-of-band event was issued.
+ */
+static void ne_event_work_handler(struct work_struct *work)
+{
+    struct ne_pci_dev_cmd_reply cmd_reply = {};
+    struct ne_enclave *ne_enclave = NULL;
+    struct ne_pci_dev *ne_pci_dev =
+    container_of(work, struct ne_pci_dev, notify_work);
+    int rc = -EINVAL;
+    struct slot_info_req slot_info_req = {};
+
+    if (!ne_pci_dev)
+    return;


How?


Removed this check and the one below. Thank you.

Andra




+
+    mutex_lock(_pci_dev->enclaves_list_mutex);
+
+    /*
+ * Iterate over all enclaves registered for the Nitro Enclaves
+ * PCI device and determine for which enclave(s) the out-of-band 
event

+ * is corresponding to.
+ */
+    list_for_each_entry(ne_enclave, _pci_dev->enclaves_list,
+    enclave_list_entry) {
+    mutex_lock(_enclave->enclave_info_mutex);
+
+    /*
+ * Enclaves that were never started cannot receive out-of-band
+ * events.
+ */
+    if (ne_enclave->state != NE_STATE_RUNNING)
+    goto unlock;
+
+    slot_info_req.slot_uid = ne_enclave->slot_uid;
+
+    rc = ne_do_request(ne_enclave->pdev, SLOT_INFO, _info_req,
+   sizeof(slot_info_req), _reply,
+   sizeof(cmd_reply));
+    if (rc < 0)
+    dev_err(_enclave->pdev->dev,
+    "Error in slot info [rc=%d]\n", rc);
+
+    /* Notify enclave process that the enclave state changed. */
+    if (ne_enclave->state != cmd_reply.state) {
+    ne_enclave->state = cmd_reply.state;
+
+    ne_enclave->has_event = true;
+
+    wake_up_interruptible(_enclave->eventq);
+    }
+
+unlock:
+ mutex_unlock(_enclave->enclave_info_mutex);
+    }
+
+    mutex_unlock(_pci_dev->enclaves_list_mutex);
+}
+
+/**
+ * ne_event_handler - Interrupt handler for PCI device out-of-band
+ * events. This interrupt does not supply any data in the MMIO region.
+ * It notifies a change in the state of any of the launched enclaves.
+ *
+ * @irq: received interrupt for an out-of-band event.
+ * @args: PCI device private data structure.
+ *
+ * @returns: IRQ_HANDLED on handled interrupt, IRQ_NONE otherwise.
+ */
+static irqreturn_t ne_event_handler(int irq, void *args)
+{
+    struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
+
+    if (!ne_pci_dev)
+    return IRQ_NONE;


How can this happen?


Alex


+
+    queue_work(ne_pci_dev->event_wq, _pci_dev->notify_work);
+
+    return IRQ_HANDLED;
+}
+
  /**
   * ne_setup_msix - Setup MSI-X vectors for the PCI device.
   *
@@ -284,8 +371,37 @@ static int ne_setup_msix(struct pci_dev *pdev)
  goto free_irq_vectors;
  }
  +    ne_pci_dev->event_wq = 
create_singlethread_workqueue("ne_pci_dev_wq");

+    if (!ne_pci_dev->event_wq) {
+    rc = -ENOMEM;
+
+    dev_err(>dev, "Cannot get wq for dev events [rc=%d]\n",
+    rc);
+
+    goto 

Re: [PATCH v4 05/18] nitro_enclaves: Handle PCI device command requests

2020-07-04 Thread Paraschiv, Andra-Irina




On 02/07/2020 18:19, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

The Nitro Enclaves PCI device exposes a MMIO space that this driver
uses to submit command requests and to receive command replies e.g. for
enclave creation / termination or setting enclave resources.

Add logic for handling PCI device command requests based on the given
command type.

Register an MSI-X interrupt vector for command reply notifications to
handle this type of communication events.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 

Fix issue reported in:
https://lore.kernel.org/lkml/202004231644.xtmn4z1z%25...@intel.com/

Reported-by: kbuild test robot 
Signed-off-by: Andra Paraschiv 
---
Changelog

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
   paths.

v1 -> v2

* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add fix for kbuild report.
---
  drivers/virt/nitro_enclaves/ne_pci_dev.c | 232 +++
  1 file changed, 232 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c

index 235fa3ecbee2..c24230cfe7c0 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -27,6 +27,218 @@ static const struct pci_device_id ne_pci_ids[] = {
    MODULE_DEVICE_TABLE(pci, ne_pci_ids);
  +/**
+ * ne_submit_request - Submit command request to the PCI device 
based on the

+ * command type.
+ *
+ * This function gets called with the ne_pci_dev mutex held.
+ *
+ * @pdev: PCI device to send the command to.
+ * @cmd_type: command type of the request sent to the PCI device.
+ * @cmd_request: command request payload.
+ * @cmd_request_size: size of the command request payload.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_submit_request(struct pci_dev *pdev,
+ enum ne_pci_dev_cmd_type cmd_type,
+ void *cmd_request, size_t cmd_request_size)
+{
+    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+    if (!ne_pci_dev || !ne_pci_dev->iomem_base)
+    return -EINVAL;


How can this ever happen?


Removed this one and the next checks in v5 of the patch series.

Thanks,
Andra




+
+    memcpy_toio(ne_pci_dev->iomem_base + NE_SEND_DATA, cmd_request,
+    cmd_request_size);
+
+    iowrite32(cmd_type, ne_pci_dev->iomem_base + NE_COMMAND);
+
+    return 0;
+}
+
+/**
+ * ne_retrieve_reply - Retrieve reply from the PCI device.
+ *
+ * This function gets called with the ne_pci_dev mutex held.
+ *
+ * @pdev: PCI device to receive the reply from.
+ * @cmd_reply: command reply payload.
+ * @cmd_reply_size: size of the command reply payload.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_retrieve_reply(struct pci_dev *pdev,
+ struct ne_pci_dev_cmd_reply *cmd_reply,
+ size_t cmd_reply_size)
+{
+    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+    if (!ne_pci_dev || !ne_pci_dev->iomem_base)
+    return -EINVAL;


Same.


+
+    memcpy_fromio(cmd_reply, ne_pci_dev->iomem_base + NE_RECV_DATA,
+  cmd_reply_size);
+
+    return 0;
+}
+
+/**
+ * ne_wait_for_reply - Wait for a reply of a PCI command.
+ *
+ * This function gets called with the ne_pci_dev mutex held.
+ *
+ * @pdev: PCI device for which a reply is waited.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_wait_for_reply(struct pci_dev *pdev)
+{
+    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+    int rc = -EINVAL;


Unused assignment?


+
+    if (!ne_pci_dev)
+    return -EINVAL;


Same.


+
+    /*
+ * TODO: Update to _interruptible and handle interrupted wait event
+ * e.g. -ERESTARTSYS, incoming signals + add / update timeout.
+ */
+    rc = wait_event_timeout(ne_pci_dev->cmd_reply_wait_q,
+ atomic_read(_pci_dev->cmd_reply_avail) != 0,
+    msecs_to_jiffies(NE_DEFAULT_TIMEOUT_MSECS));
+    if (!rc)
+    return -ETIMEDOUT;
+
+    return 0;
+}
+
+int ne_do_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type 
cmd_type,

+  void *cmd_request, size_t cmd_request_size,
+  struct ne_pci_dev_cmd_reply *cmd_reply, size_t 
cmd_reply_size)

+{
+    struct ne_pci_dev *ne_pci_dev = NULL;
+    int rc = -EINVAL;
+
+    if (!pdev)
+    return -ENODEV;


When can this happen?


+
+    ne_pci_dev = pci_get_drvdata(pdev);
+    if (!ne_pci_dev || !ne_pci_dev->iomem_base)
+    return -EINVAL;


Same


+
+    if (cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD) {
+    dev_err_ratelimited(>dev, "Invalid cmd type=%u\n",
+    cmd_type);
+
+    return -EINVAL;
+    }
+
+    if (!cmd_request) 

Re: [PATCH v4 04/18] nitro_enclaves: Init PCI device driver

2020-07-04 Thread Paraschiv, Andra-Irina




On 02/07/2020 18:09, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

The Nitro Enclaves PCI device is used by the kernel driver as a means of
communication with the hypervisor on the host where the primary VM and
the enclaves run. It handles requests with regard to enclave lifetime.

Setup the PCI device driver and add support for MSI-X interrupts.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Alexandru Ciobotaru 
Signed-off-by: Andra Paraschiv 
---
Changelog

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update NE PCI driver name to "nitro_enclaves".

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
   already in place.
* Remove the WARN_ON calls.
* Remove linux/bug include that is not needed.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
   paths.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update PCI device setup functions to receive PCI device data 
structure and

   then get private data from it inside the functions logic.
* Remove the BUG_ON calls.
* Add teardown function for MSI-X setup.
* Update goto labels to match their purpose.
* Implement TODO for NE PCI device disable state check.
* Update function name for NE PCI device probe / remove.
---
  drivers/virt/nitro_enclaves/ne_pci_dev.c | 261 +++
  1 file changed, 261 insertions(+)
  create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c

new file mode 100644
index ..235fa3ecbee2
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -0,0 +1,261 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights 
Reserved.

+ */
+
+/* Nitro Enclaves (NE) PCI device driver. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+#define NE_DEFAULT_TIMEOUT_MSECS (12) /* 120 sec */
+
+static const struct pci_device_id ne_pci_ids[] = {
+    { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_NE) },
+    { 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, ne_pci_ids);
+
+/**
+ * ne_setup_msix - Setup MSI-X vectors for the PCI device.
+ *
+ * @pdev: PCI device to setup the MSI-X for.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_setup_msix(struct pci_dev *pdev)
+{
+    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+    int nr_vecs = 0;
+    int rc = -EINVAL;
+
+    if (!ne_pci_dev)
+    return -EINVAL;
+
+    nr_vecs = pci_msix_vec_count(pdev);
+    if (nr_vecs < 0) {
+    rc = nr_vecs;
+
+    dev_err(>dev, "Error in getting vec count [rc=%d]\n", 
rc);

+
+    return rc;
+    }
+
+    rc = pci_alloc_irq_vectors(pdev, nr_vecs, nr_vecs, PCI_IRQ_MSIX);
+    if (rc < 0) {
+    dev_err(>dev, "Error in alloc MSI-X vecs [rc=%d]\n", rc);
+
+    return rc;
+    }
+
+    return 0;
+}
+
+/**
+ * ne_teardown_msix - Teardown MSI-X vectors for the PCI device.
+ *
+ * @pdev: PCI device to teardown the MSI-X for.
+ */
+static void ne_teardown_msix(struct pci_dev *pdev)
+{
+    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+    if (!ne_pci_dev)
+    return;
+
+    pci_free_irq_vectors(pdev);
+}
+
+/**
+ * ne_pci_dev_enable - Select PCI device version and enable it.
+ *
+ * @pdev: PCI device to select version for and then enable.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_pci_dev_enable(struct pci_dev *pdev)
+{
+    u8 dev_enable_reply = 0;
+    u16 dev_version_reply = 0;
+    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+    if (!ne_pci_dev || !ne_pci_dev->iomem_base)
+    return -EINVAL;


How can this ever happen?


This check and the following one are part of that checks added before 
for the situations that shouldn't happen, only if buggy system or broken 
logic at all. Removed the checks.


Thanks,
Andra




+
+    iowrite16(NE_VERSION_MAX, ne_pci_dev->iomem_base + NE_VERSION);
+
+    dev_version_reply = ioread16(ne_pci_dev->iomem_base + NE_VERSION);
+    if (dev_version_reply != NE_VERSION_MAX) {
+    dev_err(>dev, "Error in pci dev version cmd\n");
+
+    return -EIO;
+    }
+
+    iowrite8(NE_ENABLE_ON, ne_pci_dev->iomem_base + NE_ENABLE);
+
+    dev_enable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
+    if (dev_enable_reply != NE_ENABLE_ON) {
+    dev_err(>dev, "Error in pci dev enable cmd\n");
+
+    return -EIO;
+    }
+
+    return 0;
+}
+
+/**
+ * ne_pci_dev_disable - Disable PCI device.
+ *
+ * @pdev: PCI device to disable.
+ */
+static void ne_pci_dev_disable(struct pci_dev *pdev)
+{
+    u8 dev_disable_reply = 0;
+    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+    const unsigned int sleep_time = 10; /* 10 ms */
+    unsigned int 

Re: [PATCH v4 03/18] nitro_enclaves: Define enclave info for internal bookkeeping

2020-07-04 Thread Paraschiv, Andra-Irina




On 02/07/2020 18:24, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

The Nitro Enclaves driver keeps an internal info per each enclave.

This is needed to be able to manage enclave resources state, enclave
notifications and have a reference of the PCI device that handles
command requests for enclave lifetime management.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 


Reviewed-by: Alexander Graf 


Added. Thank you.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v4 02/18] nitro_enclaves: Define the PCI device interface

2020-07-04 Thread Paraschiv, Andra-Irina




On 02/07/2020 18:24, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

The Nitro Enclaves (NE) driver communicates with a new PCI device, that
is exposed to a virtual machine (VM) and handles commands meant for
handling enclaves lifetime e.g. creation, termination, setting memory
regions. The communication with the PCI device is handled using a MMIO
space and MSI-X interrupts.

This device communicates with the hypervisor on the host, where the VM
that spawned the enclave itself run, e.g. to launch a VM that is used
for the enclave.

Define the MMIO space of the PCI device, the commands that are
provided by this device. Add an internal data structure used as private
data for the PCI device driver and the functions for the PCI device init
/ uninit and command requests handling.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Alexandru Ciobotaru 
Signed-off-by: Andra Paraschiv 


Reviewed-by: Alexander Graf 


Added. Thank you.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition

2020-07-04 Thread Paraschiv, Andra-Irina




On 02/07/2020 18:24, Alexander Graf wrote:



On 22.06.20 22:03, Andra Paraschiv wrote:

The Nitro Enclaves driver handles the enclave lifetime management. This
includes enclave creation, termination and setting up its resources such
as memory and CPU.

An enclave runs alongside the VM that spawned it. It is abstracted as a
process running in the VM that launched it. The process interacts with
the NE driver, that exposes an ioctl interface for creating an enclave
and setting up its resources.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 


Reviewed-by: Alexander Graf 


Added. Thanks for reviewing the group of patches so far.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface

2020-06-30 Thread Paraschiv, Andra-Irina



On 30/06/2020 11:05, Greg KH wrote:


On Mon, Jun 29, 2020 at 08:45:25PM +0300, Paraschiv, Andra-Irina wrote:


On 29/06/2020 19:20, Greg KH wrote:

On Mon, Jun 22, 2020 at 11:03:18PM +0300, Andra Paraschiv wrote:

+static int __init ne_init(void)
+{
+ struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
+   PCI_DEVICE_ID_NE, NULL);
+ int rc = -EINVAL;
+
+ if (!pdev)
+ return -ENODEV;

Ick, that's a _very_ old-school way of binding to a pci device.  Please
just be a "real" pci driver and your probe function will be called if
your hardware is present (or when it shows up.)  To do it this way
prevents your driver from being auto-loaded for when your hardware is
seen in the system, as well as lots of other things.

This check is mainly here in the case any codebase is added before the pci
driver register call below.

What do you mean by "codebase"?  You control this driver, just do all of
the logic in the probe() function, no need to do this in the module init
call.


And if we log any error with dev_err() instead of pr_err() before the driver
register.

Don't do that.


That check was only for logging purposes, if done with dev_err(). I removed
the check in v5.

Again, don't do it :)


+
+ if (!zalloc_cpumask_var(_cpu_pool.avail, GFP_KERNEL))
+ return -ENOMEM;
+
+ mutex_init(_cpu_pool.mutex);
+
+ rc = pci_register_driver(_pci_driver);

Nice, you did it right here, but why the above crazy test?


+ if (rc < 0) {
+ dev_err(>dev,
+ "Error in pci register driver [rc=%d]\n", rc);
+
+ goto free_cpumask;
+ }
+
+ return 0;

You leaked a reference on that pci device, didn't you?  Not good :(

Yes, the pci get device call needs its pair - pci_dev_put(). I added it here
and for the other occurrences where it was missing.

Again, just don't do it and then you don't have to worry about any of
this.


Yup, already started this morning to check & update where we can go 
without this call to get a PCI device reference, as a follow-up to what 
we discussed yesterday.


Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface

2020-06-29 Thread Paraschiv, Andra-Irina



On 29/06/2020 19:20, Greg KH wrote:


On Mon, Jun 22, 2020 at 11:03:18PM +0300, Andra Paraschiv wrote:

+static int __init ne_init(void)
+{
+ struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
+   PCI_DEVICE_ID_NE, NULL);
+ int rc = -EINVAL;
+
+ if (!pdev)
+ return -ENODEV;

Ick, that's a _very_ old-school way of binding to a pci device.  Please
just be a "real" pci driver and your probe function will be called if
your hardware is present (or when it shows up.)  To do it this way
prevents your driver from being auto-loaded for when your hardware is
seen in the system, as well as lots of other things.


This check is mainly here in the case any codebase is added before the 
pci driver register call below.


And if we log any error with dev_err() instead of pr_err() before the 
driver register.


That check was only for logging purposes, if done with dev_err(). I 
removed the check in v5.



+
+ if (!zalloc_cpumask_var(_cpu_pool.avail, GFP_KERNEL))
+ return -ENOMEM;
+
+ mutex_init(_cpu_pool.mutex);
+
+ rc = pci_register_driver(_pci_driver);

Nice, you did it right here, but why the above crazy test?


+ if (rc < 0) {
+ dev_err(>dev,
+ "Error in pci register driver [rc=%d]\n", rc);
+
+ goto free_cpumask;
+ }
+
+ return 0;

You leaked a reference on that pci device, didn't you?  Not good :(


Yes, the pci get device call needs its pair - pci_dev_put(). I added it 
here and for the other occurrences where it was missing.


Thanks for review.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition

2020-06-25 Thread Paraschiv, Andra-Irina




On 25/06/2020 16:29, Stefan Hajnoczi wrote:

On Wed, Jun 24, 2020 at 05:02:54PM +0300, Paraschiv, Andra-Irina wrote:

On 23/06/2020 11:56, Stefan Hajnoczi wrote:

On Mon, Jun 22, 2020 at 11:03:12PM +0300, Andra Paraschiv wrote:

+/* User memory region flags */
+
+/* Memory region for enclave general usage. */
+#define NE_DEFAULT_MEMORY_REGION (0x00)
+
+/* Memory region to be set for an enclave (write). */
+struct ne_user_memory_region {
+   /**
+* Flags to determine the usage for the memory region (write).
+*/
+   __u64 flags;

Where is the write flag defined?

I guess it's supposed to be:

#define NE_USER_MEMORY_REGION_FLAG_WRITE (0x01)

For now, the flags field is included in the NE ioctl interface for
extensions, it is not part of the NE PCI device interface yet.

...

Ah, and just as a note, that "read" / "write" in parentheses means that a
certain data structure / field is read / written by user space. I updated to
use "in" / "out" instead of "read" / "write" in v5.

Oops, I got confused. I thought "(write)" was an example of a flag that
can be set on the memory region. Now I realize "write" means this field
is an input to the ioctl. :)

Thanks for updating the docs.


I was thinking this may be the case. :) Should be less confusing now, 
with the "in / out" updates.


Thanks also for feedback.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v4 17/18] nitro_enclaves: Add overview documentation

2020-06-25 Thread Paraschiv, Andra-Irina




On 25/06/2020 16:10, Stefan Hajnoczi wrote:

On Wed, Jun 24, 2020 at 05:39:39PM +0300, Paraschiv, Andra-Irina wrote:


On 23/06/2020 11:59, Stefan Hajnoczi wrote:

On Mon, Jun 22, 2020 at 11:03:28PM +0300, Andra Paraschiv wrote:

+The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
+Enclave Image Format (EIF); plus an EIF header including metadata such as magic
+number, eif version, image size and CRC.
+
+Hash values are computed for the entire enclave image (EIF), the kernel and
+ramdisk(s). That's used, for example, to check that the enclave image that is
+loaded in the enclave VM is the one that was intended to be run.
+
+These crypto measurements are included in a signed attestation document
+generated by the Nitro Hypervisor and further used to prove the identity of the
+enclave; KMS is an example of service that NE is integrated with and that 
checks
+the attestation doc.
+
+The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
+init process in the enclave connects to the vsock CID of the primary VM and a
+predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
+used to check in the primary VM that the enclave has booted.
+
+If the enclave VM crashes or gracefully exits, an interrupt event is received 
by
+the NE driver. This event is sent further to the user space enclave process
+running in the primary VM via a poll notification mechanism. Then the user 
space
+enclave process can exit.
+
+[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
+[2] https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
+[3] https://lwn.net/Articles/807108/
+[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
+[5] https://man7.org/linux/man-pages/man7/vsock.7.html

Is the EIF specification and the attestation protocol available?

For now, they are not publicly available. Once the refs are available (e.g.
AWS documentation, GitHub documentation), I'll include them in the kernel
documentation as well.

As a note here, the NE project is currently in preview
(https://aws.amazon.com/ec2/nitro/nitro-enclaves/) and part of the
documentation / codebase will be publicly available when NE is generally
available (GA). This will be in addition to the ones already publicly
available, like the NE kernel driver.

Let me know if I can help with any particular questions / clarifications.

Thanks!


You are welcome.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v4 17/18] nitro_enclaves: Add overview documentation

2020-06-24 Thread Paraschiv, Andra-Irina




On 23/06/2020 11:59, Stefan Hajnoczi wrote:

On Mon, Jun 22, 2020 at 11:03:28PM +0300, Andra Paraschiv wrote:

+The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
+Enclave Image Format (EIF); plus an EIF header including metadata such as magic
+number, eif version, image size and CRC.
+
+Hash values are computed for the entire enclave image (EIF), the kernel and
+ramdisk(s). That's used, for example, to check that the enclave image that is
+loaded in the enclave VM is the one that was intended to be run.
+
+These crypto measurements are included in a signed attestation document
+generated by the Nitro Hypervisor and further used to prove the identity of the
+enclave; KMS is an example of service that NE is integrated with and that 
checks
+the attestation doc.
+
+The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
+init process in the enclave connects to the vsock CID of the primary VM and a
+predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
+used to check in the primary VM that the enclave has booted.
+
+If the enclave VM crashes or gracefully exits, an interrupt event is received 
by
+the NE driver. This event is sent further to the user space enclave process
+running in the primary VM via a poll notification mechanism. Then the user 
space
+enclave process can exit.
+
+[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
+[2] https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
+[3] https://lwn.net/Articles/807108/
+[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
+[5] https://man7.org/linux/man-pages/man7/vsock.7.html

Is the EIF specification and the attestation protocol available?


For now, they are not publicly available. Once the refs are available 
(e.g. AWS documentation, GitHub documentation), I'll include them in the 
kernel documentation as well.


As a note here, the NE project is currently in preview 
(https://aws.amazon.com/ec2/nitro/nitro-enclaves/) and part of the 
documentation / codebase will be publicly available when NE is generally 
available (GA). This will be in addition to the ones already publicly 
available, like the NE kernel driver.


Let me know if I can help with any particular questions / clarifications.

Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition

2020-06-24 Thread Paraschiv, Andra-Irina




On 23/06/2020 11:56, Stefan Hajnoczi wrote:

On Mon, Jun 22, 2020 at 11:03:12PM +0300, Andra Paraschiv wrote:

diff --git a/include/uapi/linux/nitro_enclaves.h 
b/include/uapi/linux/nitro_enclaves.h
new file mode 100644
index ..3270eb939a97
--- /dev/null
+++ b/include/uapi/linux/nitro_enclaves.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
+#define _UAPI_LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+/* Nitro Enclaves (NE) Kernel Driver Interface */
+
+#define NE_API_VERSION (1)
+
+/**
+ * The command is used to get the version of the NE API. This way the user 
space
+ * processes can be aware of the feature sets provided by the NE kernel driver.
+ *
+ * The NE API version is returned as result of this ioctl call.
+ */
+#define NE_GET_API_VERSION _IO(0xAE, 0x20)
+
+/**
+ * The command is used to create a slot that is associated with an enclave VM.
+ *
+ * The generated unique slot id is a read parameter of this command. An enclave
+ * file descriptor is returned as result of this ioctl call. The enclave fd can
+ * be further used with ioctl calls to set vCPUs and memory regions, then start
+ * the enclave.
+ */
+#define NE_CREATE_VM _IOR(0xAE, 0x21, __u64)

Information that would be useful for the ioctls:

1. Which fd the ioctl must be invoked on (/dev/nitro-enclaves, enclave fd, vCPU 
fd)

2. Errnos and their meanings

3. Which state(s) the ioctls may be invoked in (e.g. enclave 
created/started/etc)


I'll include this info in v5. Indeed, that's useful for the user space 
tooling that interacts with the kernel driver, in addition to the code 
review itself and future refs, to understand how it works.





+/* User memory region flags */
+
+/* Memory region for enclave general usage. */
+#define NE_DEFAULT_MEMORY_REGION (0x00)
+
+/* Memory region to be set for an enclave (write). */
+struct ne_user_memory_region {
+   /**
+* Flags to determine the usage for the memory region (write).
+*/
+   __u64 flags;

Where is the write flag defined?

I guess it's supposed to be:

   #define NE_USER_MEMORY_REGION_FLAG_WRITE (0x01)


For now, the flags field is included in the NE ioctl interface for 
extensions, it is not part of the NE PCI device interface yet.


The enclave image is copied into enclave memory before the enclave 
memory is carved out of the primary / parent VM. After carving it out 
(when the command request to add memory is sent to the PCI device and it 
is successfully completed), there will be faults if the enclave memory 
is written from the primary / parent VM.


Ah, and just as a note, that "read" / "write" in parentheses means that 
a certain data structure / field is read / written by user space. I 
updated to use "in" / "out" instead of "read" / "write" in v5.


Thank you.

Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v3 01/18] nitro_enclaves: Add ioctl interface definition

2020-06-05 Thread Paraschiv, Andra-Irina




On 05/06/2020 11:15, Stefan Hajnoczi wrote:

On Mon, Jun 01, 2020 at 10:20:18AM +0300, Paraschiv, Andra-Irina wrote:


On 01/06/2020 06:02, Benjamin Herrenschmidt wrote:

On Wed, 2020-05-27 at 09:49 +0100, Stefan Hajnoczi wrote:

What about feature bits or a API version number field? If you add
features to the NE driver, how will userspace detect them?

Even if you intend to always compile userspace against the exact kernel
headers that the program will run on, it can still be useful to have an
API version for informational purposes and to easily prevent user
errors (running a new userspace binary on an old kernel where the API is
different).

Finally, reserved struct fields may come in handy in the future. That
way userspace and the kernel don't need to explicitly handle multiple
struct sizes.

Beware, Greg might disagree :)

That said, yes, at least a way to query the API version would be
useful.

I see there are several thoughts with regard to extensions possibilities. :)

I added an ioctl for getting the API version, we have now a way to query
that info. Also, I updated the sample in this patch series to check for the
API version.

Great. The ideas are orthogonal and not all of them need to be used
together. As long as their is a way of extending the API cleanly in the
future then extensions can be made without breaking userspace.


Agree, as we achieve the ultimate goal of having a stable interface, 
open for extensions without breaking changes.


Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.



Re: [PATCH v3 01/18] nitro_enclaves: Add ioctl interface definition

2020-06-01 Thread Paraschiv, Andra-Irina



On 01/06/2020 06:02, Benjamin Herrenschmidt wrote:

On Wed, 2020-05-27 at 09:49 +0100, Stefan Hajnoczi wrote:

What about feature bits or a API version number field? If you add
features to the NE driver, how will userspace detect them?

Even if you intend to always compile userspace against the exact kernel
headers that the program will run on, it can still be useful to have an
API version for informational purposes and to easily prevent user
errors (running a new userspace binary on an old kernel where the API is
different).

Finally, reserved struct fields may come in handy in the future. That
way userspace and the kernel don't need to explicitly handle multiple
struct sizes.

Beware, Greg might disagree :)

That said, yes, at least a way to query the API version would be
useful.


I see there are several thoughts with regard to extensions possibilities. :)

I added an ioctl for getting the API version, we have now a way to query 
that info. Also, I updated the sample in this patch series to check for 
the API version.


Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v3 02/18] nitro_enclaves: Define the PCI device interface

2020-06-01 Thread Paraschiv, Andra-Irina



On 01/06/2020 05:59, Benjamin Herrenschmidt wrote:

On Wed, 2020-05-27 at 00:21 +0200, Greg KH wrote:

There are a couple of data structures with more than one member and multiple
field sizes. And for the ones that are not, gathered as feedback from
previous rounds of review that should consider adding a "flags" field in
there for further extensibility.

Please do not do that in ioctls.  Just create new calls instead of
trying to "extend" existing ones.  It's always much easier.


I can modify to have "__packed" instead of the attribute callout.

Make sure you even need that, as I don't think you do for structures
like the above one, right?

Hrm, my impression (granted I only just started to look at this code)
is that these are protocol messages with the PCI devices, not strictly
just ioctl arguments (though they do get conveyed via such ioctls).

Andra-Irina, did I get that right ? :-)


Correct, these data structures having "__packed" attribute map the 
messages (requests / replies) for the communication with the NE PCI device.


The data structures from the ioctl commands are not directly used as 
part of the communication with the NE PCI device, but several fields of 
them e.g. enclave start flags. Some of the fields from the NE PCI device 
data structures e.g. the physical address of a memory region (gpa) are 
set by the internal kernel logic.




That said, I still think that by carefully ordering the fields and
using explicit padding, we can avoid the need of the packed attributed.


Regarding your question in the previous mail from this thread and the 
mention above on the same topic, that should be possible. IIRC, there 
were 2 data structures remaining with "__packed" attribute.


Thank you, Ben.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v3 04/18] nitro_enclaves: Init PCI device driver

2020-06-01 Thread Paraschiv, Andra-Irina



On 01/06/2020 05:55, Benjamin Herrenschmidt wrote:

On Tue, 2020-05-26 at 21:35 +0300, Paraschiv, Andra-Irina wrote:

This was needed to have an identifier for the overall NE logic - PCI
dev, ioctl and misc dev.

The ioctl and misc dev logic has pr_* logs, but I can update them to
dev_* with misc dev, then remove this prefix.

Or #define pr_fmt, but dev_ is better.


Yep, the codebase now includes dev_* usage overall.

Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v3 01/18] nitro_enclaves: Add ioctl interface definition

2020-05-28 Thread Paraschiv, Andra-Irina




On 27/05/2020 11:49, Stefan Hajnoczi wrote:

On Tue, May 26, 2020 at 01:13:17AM +0300, Andra Paraschiv wrote:

The Nitro Enclaves driver handles the enclave lifetime management. This
includes enclave creation, termination and setting up its resources such
as memory and CPU.

An enclave runs alongside the VM that spawned it. It is abstracted as a
process running in the VM that launched it. The process interacts with
the NE driver, that exposes an ioctl interface for creating an enclave
and setting up its resources.

Include part of the KVM ioctls in the provided ioctl interface, with
additional NE ioctl commands that e.g. triggers the enclave run.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
Changelog

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is already in
place.

v1 -> v2

* Add ioctl for getting enclave image load metadata.
* Update NE_ENCLAVE_START ioctl name to NE_START_ENCLAVE.
* Add entry in Documentation/userspace-api/ioctl/ioctl-number.rst for NE ioctls.
* Update NE ioctls definition based on the updated ioctl range for major and
minor.
---
  .../userspace-api/ioctl/ioctl-number.rst  |  5 +-
  include/linux/nitro_enclaves.h| 11 
  include/uapi/linux/nitro_enclaves.h   | 65 +++
  3 files changed, 80 insertions(+), 1 deletion(-)
  create mode 100644 include/linux/nitro_enclaves.h
  create mode 100644 include/uapi/linux/nitro_enclaves.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index f759edafd938..8a19b5e871d3 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -325,8 +325,11 @@ Code  Seq#Include File 
  Comments
  0xAC  00-1F  linux/raw.h
  0xAD  00 
Netfilter device in development:
   

-0xAE  alllinux/kvm.h 
Kernel-based Virtual Machine
+0xAE  00-1F  linux/kvm.h 
Kernel-based Virtual Machine
   

+0xAE  40-FF  linux/kvm.h 
Kernel-based Virtual Machine
+ 

+0xAE  20-3F  linux/nitro_enclaves.h  Nitro 
Enclaves
  0xAF  00-1F  linux/fsl_hypervisor.h  
Freescale hypervisor
  0xB0  allRATIO 
devices in development:
   


Reusing KVM ioctls seems a little hacky. Even the ioctls that are used
by this driver don't use all the fields or behave in the same way as
kvm.ko.

For example, the memory regions slot number is not used by Nitro
Enclaves.

It would be cleaner to define NE-specific ioctls instead.


Indeed, a couple of fields / parameters are not used during the KVM 
ioctl calls for now.


Will adapt the logic to follow a similar model of creating VMs and 
adding resources, with NE ioctls.





diff --git a/include/linux/nitro_enclaves.h b/include/linux/nitro_enclaves.h
new file mode 100644
index ..d91ef2bfdf47
--- /dev/null
+++ b/include/linux/nitro_enclaves.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _LINUX_NITRO_ENCLAVES_H_
+#define _LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+#endif /* _LINUX_NITRO_ENCLAVES_H_ */
diff --git a/include/uapi/linux/nitro_enclaves.h 
b/include/uapi/linux/nitro_enclaves.h
new file mode 100644
index ..3413352baf32
--- /dev/null
+++ b/include/uapi/linux/nitro_enclaves.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
+#define _UAPI_LINUX_NITRO_ENCLAVES_H_
+
+#include 
+#include 
+
+/* Nitro Enclaves (NE) Kernel Driver Interface */
+
+/**
+ * The command is used to get information needed for in-memory enclave image
+ * loading e.g. offset in enclave memory to start placing the enclave image.
+ *
+ * The image load metadata is an in / out data structure. It includes info
+ * provided by the caller - flags - and returns the offset in enclave memory
+ * where to start placing the enclave image.
+ */
+#define NE_GET_IMAGE_LOAD_METADATA _IOWR(0xAE, 0x20, struct 
image_load_metadata)
+
+/**
+ * The command is used to trigger enclave start after the enclave resources,
+ * such as memory and CPU, have been set.
+ 

Re: [PATCH v3 07/18] nitro_enclaves: Init misc device providing the ioctl interface

2020-05-28 Thread Paraschiv, Andra-Irina



On 28/05/2020 16:12, Greg KH wrote:


On Thu, May 28, 2020 at 03:01:36PM +0200, Alexander Graf wrote:


On 27.05.20 00:24, Greg KH wrote:

On Tue, May 26, 2020 at 03:44:30PM +0200, Alexander Graf wrote:


On 26.05.20 15:17, Greg KH wrote:

On Tue, May 26, 2020 at 02:44:18PM +0200, Alexander Graf wrote:


On 26.05.20 14:33, Greg KH wrote:

On Tue, May 26, 2020 at 01:42:41PM +0200, Alexander Graf wrote:


On 26.05.20 08:51, Greg KH wrote:

On Tue, May 26, 2020 at 01:13:23AM +0300, Andra Paraschiv wrote:

+#define NE "nitro_enclaves: "

Again, no need for this.


+#define NE_DEV_NAME "nitro_enclaves"

KBUILD_MODNAME?


+#define NE_IMAGE_LOAD_OFFSET (8 * 1024UL * 1024UL)
+
+static char *ne_cpus;
+module_param(ne_cpus, charp, 0644);
+MODULE_PARM_DESC(ne_cpus, " - CPU pool used for Nitro Enclaves");

Again, please do not do this.

I actually asked her to put this one in specifically.

The concept of this parameter is very similar to isolcpus= and maxcpus= in
that it takes CPUs away from Linux and instead donates them to the
underlying hypervisor, so that it can spawn enclaves using them.

From an admin's point of view, this is a setting I would like to keep
persisted across reboots. How would this work with sysfs?

How about just as the "initial" ioctl command to set things up?  Don't
grab any cpu pools until asked to.  Otherwise, what happens when you
load this module on a system that can't support it?

That would give any user with access to the enclave device the ability to
remove CPUs from the system. That's clearly a CAP_ADMIN task in my book.

Ok, what's wrong with that?

Would you want random users to get the ability to hot unplug CPUs from your
system? At unlimited quantity? I don't :).

A random user, no, but one with admin rights, why not?  They can do that
already today on your system, this isn't new.


Hence this whole split: The admin defines the CPU Pool, users can safely
consume this pool to spawn enclaves from it.

But having the admin define that at module load / boot time, is a major
pain.  What tools do they have that allow them to do that easily?

The normal toolbox: editing /etc/default/grub, adding an /etc/modprobe.d/
file.

Editing grub files is horrid, come on...

It's not editing grub files, it's editing template config files that then
are used as input for grub config file generation :).


When but at module load / boot time would you define it? I really don't want
to have a device node that in theory "the world" can use which then allows
any user on the system to hot unplug every CPU but 0 from my system.

But you have that already when the PCI device is found, right?  What is
the initial interface to the driver?  What's wrong with using that?

Or am I really missing something as to how this all fits together with
the different pieces?  Seeing the patches as-is doesn't really provide a
good overview, sorry.

Ok, let me walk you through the core donation process.

Imagine you have a parent VM with 8 cores. Every one of those virtual cores
is 1:1 mapped to a physical core.

You enumerate the PCI device, you start working with it. None of that
changes your topology.

You now create an enclave spanning 2 cores. The hypervisor will remove the
1:1 map for those 2 cores and instead mark them as "free floating" on the
remaining 6 cores. It then uses the 2 freed up cores and creates a 1:1 map
for the enclave's 2 cores

To ensure that we still see a realistic mapping of core topology, we need to
remove those 2 cores from the parent VM's scope of execution. The way this
is done today is by going through CPU offlining.

The first and obvious option would be to offline all respective CPUs when an
enclave gets created. But unprivileged users should be able to spawn
enclaves. So how do I ensure that my unprivileged user doesn't just offline
all of my parent VM's CPUs?

The option implemented here is that we fold this into a two-stage approach.
The admin reserves a "pool" of cores for enclaves to use. Unprivileged users
can then consume cores from that pool, but not more than those.

That way, unprivileged users have no influence over whether a core is
enabled or not. They can only consume the pool of cores that was dedicated
for enclave use.

It also has the big advantage that you don't have dynamically changing CPU
topology in your system. Long living processes that adjust their environment
to the topology can still do so, without cores getting pulled out under
their feet.

Ok, that makes more sense, but:


So I really don't think an ioctl would be a great user experience. Same for
a sysfs file - although that's probably slightly better than the ioctl.

You already are using ioctls to control this thing, right?  What's wrong
with "one more"? :)

So what we *could* do is add an ioctl to set the pool size which then does a
CAP_ADMIN check. That however means you now are in priority hell:

A user that wants to spawn an enclave as part of an nginx service would need
to create another service to 

Re: [PATCH v3 02/18] nitro_enclaves: Define the PCI device interface

2020-05-28 Thread Paraschiv, Andra-Irina



On 27/05/2020 01:21, Greg KH wrote:

On Tue, May 26, 2020 at 08:01:36PM +0300, Paraschiv, Andra-Irina wrote:


On 26/05/2020 09:44, Greg KH wrote:

On Tue, May 26, 2020 at 01:13:18AM +0300, Andra Paraschiv wrote:

+struct enclave_get_slot_req {
+   /* Context ID (CID) for the enclave vsock device. */
+   u64 enclave_cid;
+} __attribute__ ((__packed__));

Can you really "pack" a single member structure?

Anyway, we have better ways to specify this instead of the "raw"
__attribute__ option.  But first see if you really need any of these, at
first glance, I do not think you do at all, and they can all be removed.

There are a couple of data structures with more than one member and multiple
field sizes. And for the ones that are not, gathered as feedback from
previous rounds of review that should consider adding a "flags" field in
there for further extensibility.

Please do not do that in ioctls.  Just create new calls instead of
trying to "extend" existing ones.  It's always much easier.


I can modify to have "__packed" instead of the attribute callout.

Make sure you even need that, as I don't think you do for structures
like the above one, right?


For the ones like the above, not, I just customized the usage of "__packed".

Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v3 04/18] nitro_enclaves: Init PCI device driver

2020-05-28 Thread Paraschiv, Andra-Irina



On 27/05/2020 01:19, Greg KH wrote:

On Tue, May 26, 2020 at 09:35:33PM +0300, Paraschiv, Andra-Irina wrote:


On 26/05/2020 09:48, Greg KH wrote:

On Tue, May 26, 2020 at 01:13:20AM +0300, Andra Paraschiv wrote:

The Nitro Enclaves PCI device is used by the kernel driver as a means of
communication with the hypervisor on the host where the primary VM and
the enclaves run. It handles requests with regard to enclave lifetime.

Setup the PCI device driver and add support for MSI-X interrupts.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Alexandru Ciobotaru 
Signed-off-by: Andra Paraschiv 
---
Changelog

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is already in
place.
* Remove the WARN_ON calls.
* Remove linux/bug include that is not needed.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call paths.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update PCI device setup functions to receive PCI device data structure and
then get private data from it inside the functions logic.
* Remove the BUG_ON calls.
* Add teardown function for MSI-X setup.
* Update goto labels to match their purpose.
* Implement TODO for NE PCI device disable state check.
* Update function name for NE PCI device probe / remove.
---
   drivers/virt/nitro_enclaves/ne_pci_dev.c | 252 +++
   1 file changed, 252 insertions(+)
   create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
new file mode 100644
index ..0b66166787b6
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -0,0 +1,252 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/* Nitro Enclaves (NE) PCI device driver. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+#define DEFAULT_TIMEOUT_MSECS (12) /* 120 sec */
+
+#define NE "nitro_enclaves: "

Why is this needed?  The dev_* functions should give you all the
information that you need to properly describe the driver and device in
question.  No extra "prefixes" should be needed at all.

This was needed to have an identifier for the overall NE logic - PCI dev,
ioctl and misc dev.

Why?  They are all different "devices", and refer to different
interfaces.  Stick to what the dev_* gives you for them.  You probably
want to stick with the pci dev for almost all of those anyway.


The ioctl and misc dev logic has pr_* logs, but I can update them to dev_*
with misc dev, then remove this prefix.

That would be good, thanks.


That's already in v4, the pr_* logs are now replaced with dev_*.

Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v3 04/18] nitro_enclaves: Init PCI device driver

2020-05-26 Thread Paraschiv, Andra-Irina



On 26/05/2020 09:48, Greg KH wrote:

On Tue, May 26, 2020 at 01:13:20AM +0300, Andra Paraschiv wrote:

The Nitro Enclaves PCI device is used by the kernel driver as a means of
communication with the hypervisor on the host where the primary VM and
the enclaves run. It handles requests with regard to enclave lifetime.

Setup the PCI device driver and add support for MSI-X interrupts.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Alexandru Ciobotaru 
Signed-off-by: Andra Paraschiv 
---
Changelog

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is already in
place.
* Remove the WARN_ON calls.
* Remove linux/bug include that is not needed.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call paths.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update PCI device setup functions to receive PCI device data structure and
then get private data from it inside the functions logic.
* Remove the BUG_ON calls.
* Add teardown function for MSI-X setup.
* Update goto labels to match their purpose.
* Implement TODO for NE PCI device disable state check.
* Update function name for NE PCI device probe / remove.
---
  drivers/virt/nitro_enclaves/ne_pci_dev.c | 252 +++
  1 file changed, 252 insertions(+)
  create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
new file mode 100644
index ..0b66166787b6
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -0,0 +1,252 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/* Nitro Enclaves (NE) PCI device driver. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+#define DEFAULT_TIMEOUT_MSECS (12) /* 120 sec */
+
+#define NE "nitro_enclaves: "

Why is this needed?  The dev_* functions should give you all the
information that you need to properly describe the driver and device in
question.  No extra "prefixes" should be needed at all.


This was needed to have an identifier for the overall NE logic - PCI 
dev, ioctl and misc dev.


The ioctl and misc dev logic has pr_* logs, but I can update them to 
dev_* with misc dev, then remove this prefix.


Thank you.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v3 03/18] nitro_enclaves: Define enclave info for internal bookkeeping

2020-05-26 Thread Paraschiv, Andra-Irina



On 26/05/2020 09:46, Greg KH wrote:

On Tue, May 26, 2020 at 01:13:19AM +0300, Andra Paraschiv wrote:

+/* Nitro Enclaves (NE) misc device */
+extern struct miscdevice ne_miscdevice;

Why does your misc device need to be in a .h file?

Having the patch series like this (add random .h files, and then start
to use them), is hard to review.  Would you want to try to review a
series written in this way?


The misc device is registered / unregistered while having the NE PCI 
device probe / remove, as a dependency to actually having a PCI device 
working to expose a misc device.


The way the codebase is split in files is mainly the ioctl logic / misc 
device in one file and the PCI device logic in another file; thus not 
have all the codebase in a single big file. Given the misc device 
(un)register logic above, the misc device needs to be available to the 
PCI device setup logic.


Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v3 02/18] nitro_enclaves: Define the PCI device interface

2020-05-26 Thread Paraschiv, Andra-Irina



On 26/05/2020 09:44, Greg KH wrote:

On Tue, May 26, 2020 at 01:13:18AM +0300, Andra Paraschiv wrote:

+struct enclave_get_slot_req {
+   /* Context ID (CID) for the enclave vsock device. */
+   u64 enclave_cid;
+} __attribute__ ((__packed__));

Can you really "pack" a single member structure?

Anyway, we have better ways to specify this instead of the "raw"
__attribute__ option.  But first see if you really need any of these, at
first glance, I do not think you do at all, and they can all be removed.


There are a couple of data structures with more than one member and 
multiple field sizes. And for the ones that are not, gathered as 
feedback from previous rounds of review that should consider adding a 
"flags" field in there for further extensibility.


I can modify to have "__packed" instead of the attribute callout.

Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v3 07/18] nitro_enclaves: Init misc device providing the ioctl interface

2020-05-26 Thread Paraschiv, Andra-Irina



On 26/05/2020 15:33, Greg KH wrote:

On Tue, May 26, 2020 at 01:42:41PM +0200, Alexander Graf wrote:


On 26.05.20 08:51, Greg KH wrote:

On Tue, May 26, 2020 at 01:13:23AM +0300, Andra Paraschiv wrote:

+#define NE "nitro_enclaves: "

Again, no need for this.


+#define NE_DEV_NAME "nitro_enclaves"

KBUILD_MODNAME?


+#define NE_IMAGE_LOAD_OFFSET (8 * 1024UL * 1024UL)
+
+static char *ne_cpus;
+module_param(ne_cpus, charp, 0644);
+MODULE_PARM_DESC(ne_cpus, " - CPU pool used for Nitro Enclaves");

Again, please do not do this.

I actually asked her to put this one in specifically.

The concept of this parameter is very similar to isolcpus= and maxcpus= in
that it takes CPUs away from Linux and instead donates them to the
underlying hypervisor, so that it can spawn enclaves using them.

 From an admin's point of view, this is a setting I would like to keep
persisted across reboots. How would this work with sysfs?

How about just as the "initial" ioctl command to set things up?  Don't
grab any cpu pools until asked to.  Otherwise, what happens when you
load this module on a system that can't support it?

module parameters are a major pain, you know this :)


So yes, let's give everyone in CC the change to review v3 properly first
before v4 goes out.


And get them to sign off on it too, showing they agree with the design
decisions here :)

I would expect a Reviewed-by tag as a result from the above would satisfy
this? :)

That would be most appreciated.


With regarding to reviewing, yes, the patch series went through several 
rounds before submitting it upstream.


I posted v3 shortly after v2 to have more visibility into the changelog 
for each patch in addition to the cover letter changelog. But no major 
design changes in there. :)


Thank you.

Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v2 07/18] nitro_enclaves: Init misc device providing the ioctl interface

2020-05-26 Thread Paraschiv, Andra-Irina



On 26/05/2020 09:42, Greg KH wrote:

On Mon, May 25, 2020 at 11:49:50PM +0300, Paraschiv, Andra-Irina wrote:


On 22/05/2020 10:07, Greg KH wrote:

On Fri, May 22, 2020 at 09:29:35AM +0300, Andra Paraschiv wrote:

+static char *ne_cpus;
+module_param(ne_cpus, charp, 0644);
+MODULE_PARM_DESC(ne_cpus, " - CPU pool used for Nitro Enclaves");

This is not the 1990's, don't use module parameters if you can help it.
Why is this needed, and where is it documented?

This is a CPU pool that can be set by the root user and that includes CPUs
set aside to be used for the enclave(s) setup; these CPUs are offlined. From
this CPU pool, the kernel logic chooses the CPUs that are set for the
created enclave(s).

The cpu-list format is matching the same that is documented here:

https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html

I've also thought of having a sysfs entry for the setup of this enclave CPU
pool.

Ok, but again, do not use a module parameter, they are hard to use,
tough to document, and global.  All things we moved away from a long
time ago.  Please use something else for this (sysfs, configfs, etc.)
instead.


Alright, got it, will move on then with the other option for v4.

Thank you.

Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v2 16/18] nitro_enclaves: Add sample for ioctl interface usage

2020-05-26 Thread Paraschiv, Andra-Irina



On 26/05/2020 09:41, Greg KH wrote:

On Mon, May 25, 2020 at 11:57:26PM +0300, Paraschiv, Andra-Irina wrote:


On 22/05/2020 10:08, Greg KH wrote:

On Fri, May 22, 2020 at 09:29:44AM +0300, Andra Paraschiv wrote:

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 

I know I don't take commits with no changelog text :(

Included in v3 the changelog for each patch in the series, in addition to
the one in the cover letter; where no changes, I just mentioned that. :)

But you didn't cc: me on that version :(


I just added you on the CC list, from v4 going on should be all good wrt 
this.


Thanks,
Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v2 16/18] nitro_enclaves: Add sample for ioctl interface usage

2020-05-25 Thread Paraschiv, Andra-Irina



On 22/05/2020 10:11, Greg KH wrote:

On Fri, May 22, 2020 at 09:29:44AM +0300, Andra Paraschiv wrote:

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 

No changelog?


I included the changelog in v3.




---
  samples/nitro_enclaves/.gitignore |   2 +
  samples/nitro_enclaves/Makefile   |  28 +
  .../include/linux/nitro_enclaves.h|  23 +
  .../include/uapi/linux/nitro_enclaves.h   |  77 +++

Why are you not using the uapi files from the kernel itself?  How are
you going to keep these in sync?


Yeah, the uapi files should be used, I just removed the include folder 
from here.


Thank you.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v2 17/18] nitro_enclaves: Add overview documentation

2020-05-25 Thread Paraschiv, Andra-Irina



On 22/05/2020 10:09, Greg KH wrote:

On Fri, May 22, 2020 at 09:29:45AM +0300, Andra Paraschiv wrote:

Signed-off-by: Andra Paraschiv 

No changelog?

I included the changelog in v3.

Thanks,
Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v2 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves driver

2020-05-25 Thread Paraschiv, Andra-Irina



On 22/05/2020 10:09, Greg KH wrote:

On Fri, May 22, 2020 at 09:29:43AM +0300, Andra Paraschiv wrote:

Signed-off-by: Andra Paraschiv 

Changelog is needed


I included it in v3.

Thanks,
Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v2 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver

2020-05-25 Thread Paraschiv, Andra-Irina



On 22/05/2020 10:09, Greg KH wrote:

On Fri, May 22, 2020 at 09:29:42AM +0300, Andra Paraschiv wrote:

Signed-off-by: Andra Paraschiv 

changelog is needed.


I included it in v3.

Thanks,
Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v2 16/18] nitro_enclaves: Add sample for ioctl interface usage

2020-05-25 Thread Paraschiv, Andra-Irina



On 22/05/2020 10:08, Greg KH wrote:

On Fri, May 22, 2020 at 09:29:44AM +0300, Andra Paraschiv wrote:

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 

I know I don't take commits with no changelog text :(


Included in v3 the changelog for each patch in the series, in addition 
to the one in the cover letter; where no changes, I just mentioned that. :)


Thank you.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v2 08/18] nitro_enclaves: Add logic for enclave vm creation

2020-05-25 Thread Paraschiv, Andra-Irina



On 22/05/2020 10:08, Greg KH wrote:

On Fri, May 22, 2020 at 09:29:36AM +0300, Andra Paraschiv wrote:

Add ioctl command logic for enclave VM creation. It triggers a slot
allocation. The enclave resources will be associated with this slot and
it will be used as an identifier for triggering enclave run.

Return a file descriptor, namely enclave fd. This is further used by the
associated user space enclave process to set enclave resources and
trigger enclave termination.

The poll function is implemented in order to notify the enclave process
when an enclave exits without a specific enclave termination command
trigger e.g. when an enclave crashes.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
  drivers/virt/nitro_enclaves/ne_misc_dev.c | 169 ++
  1 file changed, 169 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index e1866fac8220..1036221238f4 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -63,6 +63,146 @@ struct ne_cpu_pool {
  
  static struct ne_cpu_pool ne_cpu_pool;
  
+static int ne_enclave_open(struct inode *node, struct file *file)

+{
+   return 0;
+}

Again, if a file operation does nothing, don't even provide it.


I removed open() in v3.

Thank you.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v2 07/18] nitro_enclaves: Init misc device providing the ioctl interface

2020-05-25 Thread Paraschiv, Andra-Irina



On 22/05/2020 10:07, Greg KH wrote:

On Fri, May 22, 2020 at 09:29:35AM +0300, Andra Paraschiv wrote:

+static char *ne_cpus;
+module_param(ne_cpus, charp, 0644);
+MODULE_PARM_DESC(ne_cpus, " - CPU pool used for Nitro Enclaves");

This is not the 1990's, don't use module parameters if you can help it.
Why is this needed, and where is it documented?


This is a CPU pool that can be set by the root user and that includes 
CPUs set aside to be used for the enclave(s) setup; these CPUs are 
offlined. From this CPU pool, the kernel logic chooses the CPUs that are 
set for the created enclave(s).


The cpu-list format is matching the same that is documented here:

https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html

I've also thought of having a sysfs entry for the setup of this enclave 
CPU pool.





+/* CPU pool used for Nitro Enclaves. */
+struct ne_cpu_pool {
+   /* Available CPUs in the pool. */
+   cpumask_var_t avail;
+   struct mutex mutex;
+};
+
+static struct ne_cpu_pool ne_cpu_pool;
+
+static int ne_open(struct inode *node, struct file *file)
+{
+   return 0;
+}

If open does nothing, just don't even provide it.


I removed this and other file ops occurrences that do nothing for now.




+
+static long ne_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+   switch (cmd) {
+
+   default:
+   return -ENOTTY;
+   }
+
+   return 0;
+}

Same for ioctl.


This logic is completed in the next patch in the series.




+
+static int ne_release(struct inode *inode, struct file *file)
+{
+   return 0;
+}

Same for release.


Done, I removed it for now.




+
+static const struct file_operations ne_fops = {
+   .owner  = THIS_MODULE,
+   .llseek = noop_llseek,
+   .unlocked_ioctl = ne_ioctl,
+   .open   = ne_open,
+   .release= ne_release,
+};
+
+struct miscdevice ne_miscdevice = {
+   .minor  = MISC_DYNAMIC_MINOR,
+   .name   = NE_DEV_NAME,
+   .fops   = _fops,
+   .mode   = 0660,
+};
+
+static int __init ne_init(void)
+{
+   unsigned int cpu = 0;
+   unsigned int cpu_sibling = 0;
+   int rc = -EINVAL;
+
+   memset(_cpu_pool, 0, sizeof(ne_cpu_pool));

Why did you just set a structure to 0 that was already initialized by
the system to 0?  Are you sure about this?


True, this is not needed. Removed the memset() call.




+
+   if (!zalloc_cpumask_var(_cpu_pool.avail, GFP_KERNEL))
+   return -ENOMEM;
+
+   mutex_init(_cpu_pool.mutex);
+
+   rc = cpulist_parse(ne_cpus, ne_cpu_pool.avail);
+   if (rc < 0) {
+   pr_err_ratelimited(NE "Error in cpulist parse [rc=%d]\n", rc);

Again, drop all ratelimited stuff please.


Updated to pr_err().

Thank you.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v2 04/18] nitro_enclaves: Init PCI device driver

2020-05-25 Thread Paraschiv, Andra-Irina



On 24/05/2020 09:32, Greg KH wrote:

On Sat, May 23, 2020 at 10:25:25PM +0200, Alexander Graf wrote:

Hey Greg,

On 22.05.20 09:04, Greg KH wrote:

On Fri, May 22, 2020 at 09:29:32AM +0300, Andra Paraschiv wrote:

+/**
+ * ne_setup_msix - Setup MSI-X vectors for the PCI device.
+ *
+ * @pdev: PCI device to setup the MSI-X for.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_setup_msix(struct pci_dev *pdev)
+{
+ struct ne_pci_dev *ne_pci_dev = NULL;
+ int nr_vecs = 0;
+ int rc = -EINVAL;
+
+ if (WARN_ON(!pdev))
+ return -EINVAL;

How can this ever happen?  If it can not, don't test for it.  If it can,
don't warn for it as that will crash systems that do panic-on-warn, just
test and return an error.

I think the point here is to catch situations that should never happen, but
keep a sanity check in in case they do happen. This would've usually been a
BUG_ON, but people tend to dislike those these days because they can bring
down your system ...

Same for WARN_ON when you run with panic-on-warn enabled :(


So in this particular case here I agree that it's a bit silly to check
whether pdev is != NULL. In other device code internal APIs though it's not
quite as clear of a cut. I by far prefer code that tells me it's broken over
reverse engineering stray pointer accesses ...

For static calls where you control the callers, don't do checks like
this.  Otherwise the kernel would just be full of these all over the
place and things would slow down.  It's just not needed.


+ ne_pci_dev = pci_get_drvdata(pdev);
+ if (WARN_ON(!ne_pci_dev))
+ return -EINVAL;

Same here, don't use WARN_ON if at all possible.


+
+ nr_vecs = pci_msix_vec_count(pdev);
+ if (nr_vecs < 0) {
+ rc = nr_vecs;
+
+ dev_err_ratelimited(>dev,
+ NE "Error in getting vec count [rc=%d]\n",
+ rc);
+

Why ratelimited, can this happen over and over and over?

In this particular function, no, so here it really should just be dev_err.
Other functions are implicitly callable from user space through an ioctl,
which means they really need to stay rate limited.

Think through these as the driver seems to ONLY use these ratelimited
calls right now, which is not correct.

Also, if a user can create a printk, that almost always is not a good
idea.  But yes, those should be ratelimited.


I updated the static calls checks and removed the WARN_ONs. And 
ratelimited is used now only in the ioctl call paths.


Thank you both.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v2 04/18] nitro_enclaves: Init PCI device driver

2020-05-25 Thread Paraschiv, Andra-Irina



On 22/05/2020 10:04, Greg KH wrote:

On Fri, May 22, 2020 at 09:29:32AM +0300, Andra Paraschiv wrote:

+/**
+ * ne_setup_msix - Setup MSI-X vectors for the PCI device.
+ *
+ * @pdev: PCI device to setup the MSI-X for.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_setup_msix(struct pci_dev *pdev)
+{
+   struct ne_pci_dev *ne_pci_dev = NULL;
+   int nr_vecs = 0;
+   int rc = -EINVAL;
+
+   if (WARN_ON(!pdev))
+   return -EINVAL;

How can this ever happen?  If it can not, don't test for it.  If it can,
don't warn for it as that will crash systems that do panic-on-warn, just
test and return an error.


It shouldn't happen, only used this and subsequent occurrences in the 
other functions for sanity checks.


I removed the WARN_ONs from the patch series and updated the static 
calls checks.





+
+   ne_pci_dev = pci_get_drvdata(pdev);
+   if (WARN_ON(!ne_pci_dev))
+   return -EINVAL;

Same here, don't use WARN_ON if at all possible.


Done.




+
+   nr_vecs = pci_msix_vec_count(pdev);
+   if (nr_vecs < 0) {
+   rc = nr_vecs;
+
+   dev_err_ratelimited(>dev,
+   NE "Error in getting vec count [rc=%d]\n",
+   rc);
+

Why ratelimited, can this happen over and over and over?


That's correct, not in this case. I updated the dev_error / pr_error 
calls to include ratelimited  only in the call paths of the ioctl commands.





+   return rc;
+   }
+
+   rc = pci_alloc_irq_vectors(pdev, nr_vecs, nr_vecs, PCI_IRQ_MSIX);
+   if (rc < 0) {
+   dev_err_ratelimited(>dev,
+   NE "Error in alloc MSI-X vecs [rc=%d]\n",
+   rc);

Same here.


Updated to dev_err().




+
+   return rc;
+   }
+
+   return 0;
+}
+
+/**
+ * ne_teardown_msix - Teardown MSI-X vectors for the PCI device.
+ *
+ * @pdev: PCI device to teardown the MSI-X for.
+ */
+static void ne_teardown_msix(struct pci_dev *pdev)
+{
+   struct ne_pci_dev *ne_pci_dev = NULL;

=NULL not needed.


Updated to init via pci_get_drvdata() directly.




+
+   if (WARN_ON(!pdev))
+   return;

Again, you control the callers, how can this ever be true?


Sanity check, should never happen.




+
+   ne_pci_dev = pci_get_drvdata(pdev);
+   if (WARN_ON(!ne_pci_dev))
+   return;

Again, same thing.  I'm just going to let you fix up all instances of
this pattern from now on and not call it out again.


Yep, I updated all the occurrences.




+
+   pci_free_irq_vectors(pdev);
+}
+
+/**
+ * ne_pci_dev_enable - Select PCI device version and enable it.
+ *
+ * @pdev: PCI device to select version for and then enable.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_pci_dev_enable(struct pci_dev *pdev)
+{
+   u8 dev_enable_reply = 0;
+   u16 dev_version_reply = 0;
+   struct ne_pci_dev *ne_pci_dev = NULL;
+
+   if (WARN_ON(!pdev))
+   return -EINVAL;
+
+   ne_pci_dev = pci_get_drvdata(pdev);
+   if (WARN_ON(!ne_pci_dev) || WARN_ON(!ne_pci_dev->iomem_base))
+   return -EINVAL;
+
+   iowrite16(NE_VERSION_MAX, ne_pci_dev->iomem_base + NE_VERSION);
+
+   dev_version_reply = ioread16(ne_pci_dev->iomem_base + NE_VERSION);
+   if (dev_version_reply != NE_VERSION_MAX) {
+   dev_err_ratelimited(>dev,
+   NE "Error in pci dev version cmd\n");

Same here, why all the ratelimited stuff?  Should just be dev_err(),
right?


True, I modified to dev_err().




+
+   return -EIO;
+   }
+
+   iowrite8(NE_ENABLE_ON, ne_pci_dev->iomem_base + NE_ENABLE);
+
+   dev_enable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
+   if (dev_enable_reply != NE_ENABLE_ON) {
+   dev_err_ratelimited(>dev,
+   NE "Error in pci dev enable cmd\n");
+
+   return -EIO;
+   }
+
+   return 0;
+}
+
+/**
+ * ne_pci_dev_disable - Disable PCI device.
+ *
+ * @pdev: PCI device to disable.
+ */
+static void ne_pci_dev_disable(struct pci_dev *pdev)
+{
+   u8 dev_disable_reply = 0;
+   struct ne_pci_dev *ne_pci_dev = NULL;
+   const unsigned int sleep_time = 10; // 10 ms
+   unsigned int sleep_time_count = 0;
+
+   if (WARN_ON(!pdev))
+   return;
+
+   ne_pci_dev = pci_get_drvdata(pdev);
+   if (WARN_ON(!ne_pci_dev) || WARN_ON(!ne_pci_dev->iomem_base))
+   return;
+
+   iowrite8(NE_ENABLE_OFF, ne_pci_dev->iomem_base + NE_ENABLE);
+
+   /*
+* Check for NE_ENABLE_OFF in a loop, to handle cases when the device
+* state is not immediately set to disabled and going through a
+* transitory state of disabling.
+*/
+   while (sleep_time_count < DEFAULT_TIMEOUT_MSECS) {
+   

Re: [PATCH v2 18/18] MAINTAINERS: Add entry for the Nitro Enclaves driver

2020-05-22 Thread Paraschiv, Andra-Irina



On 22/05/2020 10:03, Joe Perches wrote:

On Fri, 2020-05-22 at 09:29 +0300, Andra Paraschiv wrote:

trivia:


diff --git a/MAINTAINERS b/MAINTAINERS

[]

@@ -11956,6 +11956,19 @@ S: Maintained
  T:git git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2.git
  F:arch/nios2/
  
+NITRO ENCLAVES (NE)

+M: Andra Paraschiv 
+M: Alexandru Vasile 
+M: Alexandru Ciobotaru 
+L: linux-kernel@vger.kernel.org
+S: Supported
+W: https://aws.amazon.com/ec2/nitro/nitro-enclaves/
+F: include/linux/nitro_enclaves.h
+F: include/uapi/linux/nitro_enclaves.h
+F: drivers/virt/nitro_enclaves/
+F: samples/nitro_enclaves/
+F: Documentation/nitro_enclaves/

Please keep the F: entries in case sensitive alphabetic order

F:  Documentation/nitro_enclaves/
F:  drivers/virt/nitro_enclaves/
F:  include/linux/nitro_enclaves.h
F:  include/uapi/linux/nitro_enclaves.h
F:  samples/nitro_enclaves/


Done, I updated the entry in v3.

Thank you, Joe.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v2 01/18] nitro_enclaves: Add ioctl interface definition

2020-05-22 Thread Paraschiv, Andra-Irina



On 22/05/2020 10:00, Greg KH wrote:


On Fri, May 22, 2020 at 09:29:29AM +0300, Andra Paraschiv wrote:

--- /dev/null
+++ b/include/uapi/linux/nitro_enclaves.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */

Note, if you have the SPDX line, you can get rid of all of the
boilerplate "GPL text" as well.  We have been doing that for lots of
kernel files, no need to add more to increase our workload.


Ack. Thanks for the heads-up, Greg.

Removed in v3 from all the new files in this patch series.

Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v2 00/18] Add support for Nitro Enclaves

2020-05-22 Thread Paraschiv, Andra-Irina



On 22/05/2020 09:29, Andra Paraschiv wrote:

Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
that allows customers to carve out isolated compute environments within EC2
instances [1].

For example, an application that processes sensitive data and runs in a VM,
can be separated from other applications running in the same VM. This
application then runs in a separate VM than the primary VM, namely an enclave.

An enclave runs alongside the VM that spawned it. This setup matches low latency
applications needs. The resources that are allocated for the enclave, such as
memory and CPU, are carved out of the primary VM. Each enclave is mapped to a
process running in the primary VM, that communicates with the NE driver via an
ioctl interface.

In this sense, there are two components:

1. An enclave abstraction process - a user space process running in the primary
VM guest  that uses the provided ioctl interface of the NE driver to spawn an
enclave VM (that's 2 below).

How does all gets to an enclave VM running on the host?

There is a NE emulated PCI device exposed to the primary VM. The driver for this
new PCI device is included in the current patch series.

The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl
maps to an enclave start PCI command or the KVM_SET_USER_MEMORY_REGION maps to
an add memory PCI command. The PCI device commands are then translated into
actions taken on the hypervisor side; that's the Nitro hypervisor running on the
host where the primary VM is running. The Nitro hypervisor is based on core KVM
technology.

2. The enclave itself - a VM running on the same host as the primary VM that
spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
for the enclave VM. An enclave does not have persistent storage attached.

An enclave communicates with the primary VM via a local communication channel,
using virtio-vsock [2]. The primary VM has virtio-pci vsock emulated device,
while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
uses eventfd for signaling. The enclave VM sees the usual interfaces - local
APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
device is placed in memory below the typical 4 GiB.

The application that runs in the enclave needs to be packaged in an enclave
image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
enclave VM. The enclave VM has its own kernel and follows the standard Linux
boot protocol.

The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
Enclave Image Format (EIF); plus an EIF header including metadata such as magic
number, eif version, image size and CRC.


Adding here that we've also considered FIT image format [1] as an option.

Andra

[1] https://github.com/u-boot/u-boot/tree/master/doc/uImage.FIT



Hash values are computed for the entire enclave image (EIF), the kernel and
ramdisk(s). That's used, for example, to check that the enclave image that is
loaded in the enclave VM is the one that was intended to be run.

These crypto measurements are included in a signed attestation document
generated by the Nitro Hypervisor and further used to prove the identity of the
enclave; KMS is an example of service that NE is integrated with and that checks
the attestation doc.

The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
init process in the enclave connects to the vsock CID of the primary VM and a
predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
used to check in the primary VM that the enclave has booted.

If the enclave VM crashes or gracefully exits, an interrupt event is received by
the NE driver. This event is sent further to the user space enclave process
running in the primary VM via a poll notification mechanism. Then the user space
enclave process can exit.

The following patch series covers the NE driver for enclave lifetime management.
It provides an ioctl interface to the user space and includes the NE PCI device
driver that is the means of communication with the hypervisor running on the
host where the primary VM and the enclave are launched.

The proposed solution is following the KVM model and uses KVM ioctls to be able
to create and set resources for enclaves. Additional NE ioctl commands, besides
the ones provided by KVM, are used to start an enclave and get memory offset for
in-memory enclave image loading.

Thank you.

Andra

[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
[2] http://man7.org/linux/man-pages/man7/vsock.7.html

---

Patch Series Changelog

The patch series is built on top of v5.7-rc6.

v1 -> v2

* Rebase on top of v5.7-rc6.
* Adapt codebase based on feedback from v1.
* Update ioctl number definition - major and minor.
* Add sample / documentation for the ioctl interface basic flow usage.
* Update cover letter to include more context on the NE overall.
* Add fix for the enclave / vcpu fd 

Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-05-11 Thread Paraschiv, Andra-Irina



On 10/05/2020 12:57, Li Qiang wrote:



Paraschiv, Andra-Irina <mailto:andra...@amazon.com>> 于2020年4月24日周五 下午10:03写道:




On 24/04/2020 12:59, Tian, Kevin wrote:
>
    >> From: Paraschiv, Andra-Irina
>> Sent: Thursday, April 23, 2020 9:20 PM
>>
>> On 22/04/2020 00:46, Paolo Bonzini wrote:
>>> On 21/04/20 20:41, Andra Paraschiv wrote:
>>>> An enclave communicates with the primary VM via a local
communication
>> channel,
>>>> using virtio-vsock [2]. An enclave does not have a disk or a
network device
>>>> attached.
>>> Is it possible to have a sample of this in the samples/ directory?
>> I can add in v2 a sample file including the basic flow of how
to use the
>> ioctl interface to create / terminate an enclave.
>>
>> Then we can update / build on top it based on the ongoing
discussions on
>> the patch series and the received feedback.
>>
>>> I am interested especially in:
>>>
>>> - the initial CPU state: CPL0 vs. CPL3, initial program
counter, etc.
>>>
>>> - the communication channel; does the enclave see the usual
local APIC
>>> and IOAPIC interfaces in order to get interrupts from
virtio-vsock, and
>>> where is the virtio-vsock device (virtio-mmio I suppose)
placed in memory?
>>>
>>> - what the enclave is allowed to do: can it change privilege
levels,
>>> what happens if the enclave performs an access to nonexistent
memory,
>> etc.
>>> - whether there are special hypercall interfaces for the enclave
>> An enclave is a VM, running on the same host as the primary VM,
that
>> launched the enclave. They are siblings.
>>
>> Here we need to think of two components:
>>
>> 1. An enclave abstraction process - a process running in the
primary VM
>> guest, that uses the provided ioctl interface of the Nitro Enclaves
>> kernel driver to spawn an enclave VM (that's 2 below).
>>
>> How does all gets to an enclave VM running on the host?
>>
>> There is a Nitro Enclaves emulated PCI device exposed to the
primary VM.
>> The driver for this new PCI device is included in the current
patch series.
>>
>> The ioctl logic is mapped to PCI device commands e.g. the
>> NE_ENCLAVE_START ioctl maps to an enclave start PCI command or the
>> KVM_SET_USER_MEMORY_REGION maps to an add memory PCI command.
>> The PCI
>> device commands are then translated into actions taken on the
hypervisor
>> side; that's the Nitro hypervisor running on the host where the
primary
>> VM is running.
>>
>> 2. The enclave itself - a VM running on the same host as the
primary VM
>> that spawned it.
>>
>> The enclave VM has no persistent storage or network interface
attached,
>> it uses its own memory and CPUs + its virtio-vsock emulated
device for
>> communication with the primary VM.
> sounds like a firecracker VM?

It's a VM crafted for enclave needs.

>
>> The memory and CPUs are carved out of the primary VM, they are
dedicated
>> for the enclave. The Nitro hypervisor running on the host
ensures memory
>> and CPU isolation between the primary VM and the enclave VM.
> In last paragraph, you said that the enclave VM uses its own
memory and
> CPUs. Then here, you said the memory/CPUs are carved out and
dedicated
> from the primary VM. Can you elaborate which one is accurate? or
a mixed
> model?

Memory and CPUs are carved out of the primary VM and are dedicated
for
the enclave VM. I mentioned above as "its own" in the sense that the
primary VM doesn't use these carved out resources while the
enclave is
running, as they are dedicated to the enclave.

Hope that now it's more clear.

>
>>
>> These two components need to reflect the same state e.g. when the
>> enclave abstraction process (1) is terminated, the enclave VM
(2) is
>> terminated as well.
>>
>> With regard to the communication channel, the primary VM has
its own
>> emulated virtio-vsock PCI device. The enclave VM has its own
emulated
>> virtio-vsock device as well. This channel is used, for example,
to fetch
>> data in the enclave and then process it. An application that
sets up the
>> vsock socket and connects or listens, depending on the use
 

Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-05-11 Thread Paraschiv, Andra-Irina



On 10/05/2020 14:02, Herrenschmidt, Benjamin wrote:

On Sat, 2020-05-09 at 21:21 +0200, Pavel Machek wrote:

On Fri 2020-05-08 10:00:27, Paraschiv, Andra-Irina wrote:


On 07/05/2020 20:44, Pavel Machek wrote:

Hi!


it uses its own memory and CPUs + its virtio-vsock emulated device for
communication with the primary VM.

The memory and CPUs are carved out of the primary VM, they are dedicated
for the enclave. The Nitro hypervisor running on the host ensures memory
and CPU isolation between the primary VM and the enclave VM.

These two components need to reflect the same state e.g. when the
enclave abstraction process (1) is terminated, the enclave VM (2) is
terminated as well.

With regard to the communication channel, the primary VM has its own
emulated virtio-vsock PCI device. The enclave VM has its own emulated
virtio-vsock device as well. This channel is used, for example, to fetch
data in the enclave and then process it. An application that sets up the
vsock socket and connects or listens, depending on the use case, is then
developed to use this channel; this happens on both ends - primary VM
and enclave VM.

Let me know if further clarifications are needed.

Thanks, this is all useful.  However can you please clarify the
low-level details here?

Is the virtual machine manager open-source? If so, I guess pointer for sources
would be useful.

Hi Pavel,

Thanks for reaching out.

The VMM that is used for the primary / parent VM is not open source.

Do we want to merge code that opensource community can not test?

Hehe.. this isn't quite the story Pavel :)

We merge support for proprietary hypervisors, this is no different. You
can test it, well at least you'll be able to ... when AWS deploys the
functionality. You don't need the hypervisor itself to be open source.

In fact, in this case, it's not even low level invasive arch code like
some of the above can be. It's a driver for a PCI device :-) Granted a
virtual one. We merge drivers for PCI devices routinely without the RTL
or firmware of those devices being open source.

So yes, we probably want this if it's going to be a useful features to
users when running on AWS EC2. (Disclaimer: I work for AWS these days).


Indeed, it will available for checking out how it works.

The discussions are ongoing here on the LKML - understanding the 
context, clarifying items, sharing feedback and coming with codebase 
updates and basic example flow of the ioctl interface usage. This all 
helps with the path towards merging.


Thanks, Ben, for the follow-up.

Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-05-08 Thread Paraschiv, Andra-Irina



On 07/05/2020 20:44, Pavel Machek wrote:


Hi!


it uses its own memory and CPUs + its virtio-vsock emulated device for
communication with the primary VM.

The memory and CPUs are carved out of the primary VM, they are dedicated
for the enclave. The Nitro hypervisor running on the host ensures memory
and CPU isolation between the primary VM and the enclave VM.

These two components need to reflect the same state e.g. when the
enclave abstraction process (1) is terminated, the enclave VM (2) is
terminated as well.

With regard to the communication channel, the primary VM has its own
emulated virtio-vsock PCI device. The enclave VM has its own emulated
virtio-vsock device as well. This channel is used, for example, to fetch
data in the enclave and then process it. An application that sets up the
vsock socket and connects or listens, depending on the use case, is then
developed to use this channel; this happens on both ends - primary VM
and enclave VM.

Let me know if further clarifications are needed.

Thanks, this is all useful.  However can you please clarify the
low-level details here?

Is the virtual machine manager open-source? If so, I guess pointer for sources
would be useful.


Hi Pavel,

Thanks for reaching out.

The VMM that is used for the primary / parent VM is not open source.

Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


  1   2   >