Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-27 Thread Tejun Heo
Hello,

On Tue, Jan 26, 2021 at 05:11:59PM -0800, Vipin Sharma wrote:
> Sounds good, we can have a single top level stat file
> 
> misc.stat
>   Shows how many are supported on the host:
>   $ cat misc.stat
>   sev 500
>   sev_es 10
> 
> If total value of some resource is 0 then it will be considered inactive and
> won't show in misc.{stat, current, max}
> 
> We discussed earlier, instead of having "stat" file we should show
> "current" and "capacity" files in the root but I think we can just have stat
> at top showing total resources to keep it consistent with other cgroup
> files.

Let's do misc.capacity and show only the entries which have their resources
initialized.

Thanks.

-- 
tejun


Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-26 Thread David Rientjes
On Thu, 21 Jan 2021, Sean Christopherson wrote:

> True, but the expected dual-usage is more about backwards compatibility than
> anything else.  Running an SEV-ES VM requires a heavily enlightened guest 
> vBIOS
> and kernel, which means that a VM that was created as an SEV guest cannot 
> easily
> be converted to an SEV-ES guest, and it would require cooperation from the 
> guest
> (if it's even feasible?).
> 
> SEV-SNP, another incremental enhancement (on SEV-ES), further strengthens the
> argument for SEV and SEV-* coexistenence.  SEV-SNP and SEV-ES will share the
> same ASID range, so the question is really, "do we expect to run SEV guests 
> and
> any flavor of SEV-* guests on the same platform".  And due to SEV-* not being
> directly backward compatible with SEV, the answer will eventually be "yes", as
> we'll want to keep running existing SEV guest while also spinning up new SEV-*
> guests.
> 

Agreed, cloud providers will most certainly want to run both SEV and SEV-* 
guests on the same platform.

> That being said, it's certainly possible to abstract the different key types
> between AMD and Intel (assuming s390 won't use the cgroup due to it's plethora
> of keys).  TDX private keys are equivalent to SEV-ES ASIDs, and MKTME keys (if
> the kernel ever gains a user) could be thrown into the same bucket as SEV IDs,
> albeit with some minor mental gymnastics.
> 
> E.g. this mapping isn't horrendous:
> 
>   encrpytion_ids.basic.*   == SEV   == MKTME
>   encrpytion_ids.enhanced.*== SEV-* == TDX
> 
> The names will likely be a bit vague, but I don't think they'll be so far off
> that it'd be impossible for someone with SEV/TDX knowledge to glean their 
> intent.
> And realistically, if anyone gets to the point where they care about 
> controlling
> SEV or TDX IDs, they've already plowed through hundreds of pages of dense
> documentation; having to read a few lines of cgroup docs to understand basic 
> vs.
> enhanced probably won't faze them at all.
> 

The abstraction makes sense for both AMD and Intel offerings today.  It 
makes me wonder if we want a read-only 
encryption_ids.{basic,enhanced}.type file to describe the underlying 
technology ("SEV-ES/SEV-SNP", "TDX", etc).  Since the technology is 
discoverable by other means and we are assuming one encryption type per 
pool of encryption ids, we likely don't need this.

I'm slightly concerned about extensibility if there is to be an 
incremental enhancement atop SEV-* or TDX with yet another pool of 
encryption ids.  (For example, when we only had hugepages, this name was 
perfect; then we got 1GB pages which became "gigantic pages", so are 512GB 
pages "enormous"? :)  I could argue (encryption_ids.basic.*,
encryption_ids.enhanced.*) should map to 
(encryption_ids.legacy.*, encryption_ids.*) but that's likely 
bikeshedding.

Thomas: does encryption_ids.{basic,enhanced}.* make sense for ASID 
partitioning?

Tejun: if this makes sense for legacy SEV and SEV-* per Thomas, and this 
is now abstracted to be technology (vendor) neutral, does this make sense 
to you?


Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-26 Thread Vipin Sharma
On Tue, Jan 26, 2021 at 05:01:04PM -0500, Tejun Heo wrote:
> The whole thing seems pretty immature to me and I agree with you that coming
> up with an abstraction at this stage feels risky.
> 
> I'm leaning towards creating a misc controller to shove these things into:
> 
> * misc.max and misc.current: nested keyed files listing max and current
>   usage for the cgroup.
> 
> * Have an API to activate or update a given resource with total resource
>   count. I'd much prefer the resource list to be in the controller itself
>   rather than being through some dynamic API just so that there is some
>   review in what keys get added.
> 
> * Top level cgroup lists which resource is active and how many are
>   available.

Sounds good, we can have a single top level stat file

misc.stat
  Shows how many are supported on the host:
  $ cat misc.stat
  sev 500
  sev_es 10

If total value of some resource is 0 then it will be considered inactive and
won't show in misc.{stat, current, max}

We discussed earlier, instead of having "stat" file we should show
"current" and "capacity" files in the root but I think we can just have stat
at top showing total resources to keep it consistent with other cgroup
files.

Thanks
Vipin



Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-26 Thread Tejun Heo
On Tue, Jan 26, 2021 at 05:01:04PM -0500, Tejun Heo wrote:
> * misc.max and misc.current: nested keyed files listing max and current
>   usage for the cgroup.

Keyed files, not nested keyed files.

Thanks.

-- 
tejun


Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-26 Thread Tejun Heo
Hello,

On Tue, Jan 26, 2021 at 12:49:14PM -0800, David Rientjes wrote:
> > SEV-SNP, another incremental enhancement (on SEV-ES), further strengthens 
> > the
> > argument for SEV and SEV-* coexistenence.  SEV-SNP and SEV-ES will share the
> > same ASID range, so the question is really, "do we expect to run SEV guests 
> > and
> > any flavor of SEV-* guests on the same platform".  And due to SEV-* not 
> > being
> > directly backward compatible with SEV, the answer will eventually be "yes", 
> > as
> > we'll want to keep running existing SEV guest while also spinning up new 
> > SEV-*
> > guests.
> > 
> 
> Agreed, cloud providers will most certainly want to run both SEV and SEV-* 
> guests on the same platform.

Am I correct in thinking that the reason why these IDs are limited is
because they need to be embedded into the page table entries? If so, we
aren't talking about that many IDs and having to divide the already small
pool into disjoint purposes doesn't seem like a particularly smart use of
those bits. It is what it is, I guess.

> I'm slightly concerned about extensibility if there is to be an 
> incremental enhancement atop SEV-* or TDX with yet another pool of 
> encryption ids.  (For example, when we only had hugepages, this name was 
> perfect; then we got 1GB pages which became "gigantic pages", so are 512GB 
> pages "enormous"? :)  I could argue (encryption_ids.basic.*,
> encryption_ids.enhanced.*) should map to 
> (encryption_ids.legacy.*, encryption_ids.*) but that's likely 
> bikeshedding.
> 
> Thomas: does encryption_ids.{basic,enhanced}.* make sense for ASID 
> partitioning?
> 
> Tejun: if this makes sense for legacy SEV and SEV-* per Thomas, and this 
> is now abstracted to be technology (vendor) neutral, does this make sense 
> to you?

The whole thing seems pretty immature to me and I agree with you that coming
up with an abstraction at this stage feels risky.

I'm leaning towards creating a misc controller to shove these things into:

* misc.max and misc.current: nested keyed files listing max and current
  usage for the cgroup.

* Have an API to activate or update a given resource with total resource
  count. I'd much prefer the resource list to be in the controller itself
  rather than being through some dynamic API just so that there is some
  review in what keys get added.

* Top level cgroup lists which resource is active and how many are
  available.

So, behavior-wise, not that different from the proposed code. Just made
generic into a misc controller. Would that work?

Thanks.

-- 
tejun


Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-21 Thread Sean Christopherson
On Thu, Jan 21, 2021, Tom Lendacky wrote:
> On 1/21/21 9:55 AM, Tejun Heo wrote:
> > Hello,
> > 
> > On Thu, Jan 21, 2021 at 08:55:07AM -0600, Tom Lendacky wrote:
> > > The hardware will allow any SEV capable ASID to be run as SEV-ES, however,
> > > the SEV firmware will not allow the activation of an SEV-ES VM to be
> > > assigned to an ASID greater than or equal to the SEV minimum ASID value. 
> > > The
> > > reason for the latter is to prevent an !SEV-ES ASID starting out as an
> > > SEV-ES guest and then disabling the SEV-ES VMCB bit that is used by VMRUN.
> > > This would result in the downgrading of the security of the VM without the
> > > VM realizing it.
> > > 
> > > As a result, you have a range of ASIDs that can only run SEV-ES VMs and a
> > > range of ASIDs that can only run SEV VMs.
> > 
> > I see. That makes sense. What's the downside of SEV-ES compared to SEV w/o
> > ES? Are there noticeable performance / feature penalties or is the split
> > mostly for backward compatibility?
> 
> SEV-ES is an incremental enhancement of SEV where the register state of the
> guest is protected/encrypted. As with a lot of performance questions, the
> answer is ...it depends. 

True, but the expected dual-usage is more about backwards compatibility than
anything else.  Running an SEV-ES VM requires a heavily enlightened guest vBIOS
and kernel, which means that a VM that was created as an SEV guest cannot easily
be converted to an SEV-ES guest, and it would require cooperation from the guest
(if it's even feasible?).

SEV-SNP, another incremental enhancement (on SEV-ES), further strengthens the
argument for SEV and SEV-* coexistenence.  SEV-SNP and SEV-ES will share the
same ASID range, so the question is really, "do we expect to run SEV guests and
any flavor of SEV-* guests on the same platform".  And due to SEV-* not being
directly backward compatible with SEV, the answer will eventually be "yes", as
we'll want to keep running existing SEV guest while also spinning up new SEV-*
guests.

That being said, it's certainly possible to abstract the different key types
between AMD and Intel (assuming s390 won't use the cgroup due to it's plethora
of keys).  TDX private keys are equivalent to SEV-ES ASIDs, and MKTME keys (if
the kernel ever gains a user) could be thrown into the same bucket as SEV IDs,
albeit with some minor mental gymnastics.

E.g. this mapping isn't horrendous:

  encrpytion_ids.basic.*   == SEV   == MKTME
  encrpytion_ids.enhanced.*== SEV-* == TDX

The names will likely be a bit vague, but I don't think they'll be so far off
that it'd be impossible for someone with SEV/TDX knowledge to glean their 
intent.
And realistically, if anyone gets to the point where they care about controlling
SEV or TDX IDs, they've already plowed through hundreds of pages of dense
documentation; having to read a few lines of cgroup docs to understand basic vs.
enhanced probably won't faze them at all.


Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-21 Thread Vipin Sharma
On Wed, Jan 20, 2021 at 06:32:56PM -0500, Tejun Heo wrote:
> I don't know how many times I have to repeat the same point to get it
> across. For any question about actual abstraction, you haven't provided any
> kind of actual research or analysis and just keep pushing the same thing
> over and over again. Maybe the situation is such that it makes sense to
> change the rule but that needs substantial justifications. I've been asking
> to see whether there are such justifications but all I've been getting are
> empty answers. Until such discussions take place, please consider the series
> nacked and please excuse if I don't respond promptly in this thread.

I am sorry Tejun that you felt your feedback and questions are being ignored
or not answered properly by me. It was not my intent. Let me try again.

I am not able to come up with an abstraction for underlying the hardware
like we have for memory, cpu, and io with their respective cgroup
controllers, because each vendor is solving VM security issue in
different ways. For example:

s390 is using Ultravisor (UV) to disable access to the VMs memory from
the host.  All KVM interaction with their Protected Virtual Machines
(PVM) are handled through UV APIs. Here an encrypted guest image is
loaded first which is decrypted by UV and then UV disallows access to
PVMs memory and register state from KVM or other PVMs. PVMs are assigned
IDs known as secure execution IDs (SEID).  These IDs are not scarce
resource on the host.

AMD is encrypting runtime memory of a VM using an hardware AES engine in
the memory controller and keys are managed by an Arm based coprocessor
inside the CPU, for encryption and decryption of the data flow between
CPU and memory.  Their offering is known as Secure Encrypted
Virtualization (SEV). There are also two more enhanced offerings SEV-ES,
(memory + guest register state encryption), SEV-SNP (SEV-ES + memory
integrity protection + TCB rollback) in later generation of CPUs. At any
time only a limited number of IDs can be used simultaneously in the
processor. Initially only SEV IDs we available on the CPUs but in the
later generations of CPUs with the addition of SEV-ES, IDs were divided
in two groups SEV ASIDs for SEV guests, and SEV-ES ASIDs for SEV-ES and
SEV-SNP VMs. SEV firmware doesn't allow SEV ASIDs to launch SEV-ES and
SEV-SNP VMs. Ideally, I think its better to use SEV-SNP as it provides
highest protection but support in vmm and guest kernels are not there
yet. Also, old HW will not be able to run SEV-ES or SEV-SNP as they can
only run SEV ASIDs. I dont have data in terms of drawbacks running VM on
SEV-SNP in terms of speed and cost but I think it will be dependent on
workloads.

Intel has come up with Trusted Domain Extension (TDX) for their secure
VMs offering. They allow a VM to use multiple keys for private pages and
for pages shared with other VMs. Overall, this is called as Multi-Key
Total Memory Encryption (MKTME). A fixed number of encryption keys are
supported in MKTME engine. During execution these keys are identified
using KeyIDs which are present in upper bits of platform physical
addresses.

Only limited form of abstraction present here is that all are providing
a way to have secure VMs and processes, either through single key
encryption, multiple key encryptions or access denial.

A common abstraction of different underlying security behavior/approach
can mislead users in giving impression that all secure VMs/processes are
same. In my opinion, this kind of thing can work when we talk about
memory, cpu, etc, but for security related stuff will do more harm to
the end user than the benefit of simplicity of abstraction. The name of
the underlying feature also tells what kind of security guarantees a
user can expect on the platform for a VM and what kind is used.

Taking a step back, in the current scenario, we have some global shared
resources which are limited for SEV, SEV-ES, and TDX. There is also a
need for tracking and controlling on all 4 features for now. This is a
case for some kind of cgroup behavior to limit and control an aggregate
of processes using these system resources. After all, "cgroup is a
mechanism to organize processes hierarchically and distribute system
resources along the hierarchy in a controlled and configurable manner."

We are using SEV in KVM and outside KVM also for other products on
horizon. As cgroups are commonly used in many infrastructures for
resource control, scheduling, and tracking, this patch is helping us in
allocating jobs in the infrastructure along with memory, cpu and other
constraints in a coherent way.

If you feel encryption id cgroup is not good for long term or a
too specific use case then may be there should be a common cgroup which
can be a home for this kind and other kind of future resources where
there is need to limit a global resource allocation but are not abstract
or cannot be abstracted as the other existing cgroups. My current patch
is very generic and with 

Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-21 Thread Tom Lendacky

On 1/21/21 9:55 AM, Tejun Heo wrote:

Hello,

On Thu, Jan 21, 2021 at 08:55:07AM -0600, Tom Lendacky wrote:

The hardware will allow any SEV capable ASID to be run as SEV-ES, however,
the SEV firmware will not allow the activation of an SEV-ES VM to be
assigned to an ASID greater than or equal to the SEV minimum ASID value. The
reason for the latter is to prevent an !SEV-ES ASID starting out as an
SEV-ES guest and then disabling the SEV-ES VMCB bit that is used by VMRUN.
This would result in the downgrading of the security of the VM without the
VM realizing it.

As a result, you have a range of ASIDs that can only run SEV-ES VMs and a
range of ASIDs that can only run SEV VMs.


I see. That makes sense. What's the downside of SEV-ES compared to SEV w/o
ES? Are there noticeable performance / feature penalties or is the split
mostly for backward compatibility?


SEV-ES is an incremental enhancement of SEV where the register state of 
the guest is protected/encrypted. As with a lot of performance questions, 
the answer is ...it depends. With SEV-ES, there is additional overhead 
associated with a world switch (VMRUN/VMEXIT) to restore and save 
additional register state. Also, exit events are now divided up into 
automatic exits (AE) and non-automatic exits (NAE). NAE events result in a 
new #VC exception being generated where the guest is then required to use 
the VMGEXIT instruction to communicate only the state necessary to perform 
the operation. A CPUID instruction is a good example, where a shared page 
is used to communicate required state to the hypervisor to perform the 
CPUID emulation, which then returns the results back through the shared 
page to the guest. So it all depends on how often the workload in question 
performs operations that result in a VMEXIT of the vCPU, etc.


Thanks,
Tom



Thanks.



Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-21 Thread Tejun Heo
Hello,

On Thu, Jan 21, 2021 at 08:55:07AM -0600, Tom Lendacky wrote:
> The hardware will allow any SEV capable ASID to be run as SEV-ES, however,
> the SEV firmware will not allow the activation of an SEV-ES VM to be
> assigned to an ASID greater than or equal to the SEV minimum ASID value. The
> reason for the latter is to prevent an !SEV-ES ASID starting out as an
> SEV-ES guest and then disabling the SEV-ES VMCB bit that is used by VMRUN.
> This would result in the downgrading of the security of the VM without the
> VM realizing it.
> 
> As a result, you have a range of ASIDs that can only run SEV-ES VMs and a
> range of ASIDs that can only run SEV VMs.

I see. That makes sense. What's the downside of SEV-ES compared to SEV w/o
ES? Are there noticeable performance / feature penalties or is the split
mostly for backward compatibility?

Thanks.

-- 
tejun


Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-21 Thread Tom Lendacky

On 1/20/21 10:40 AM, Tejun Heo wrote:

Hello,

On Tue, Jan 19, 2021 at 11:13:51PM -0800, Vipin Sharma wrote:

Can you please elaborate? I skimmed through the amd manual and it seemed to
say that SEV-ES ASIDs are superset of SEV but !SEV-ES ASIDs. What's the use
case for mixing those two?


For example, customers can be given options for which kind of protection they
want to choose for their workloads based on factors like data protection
requirement, cost, speed, etc.


So, I'm looking for is a bit more in-depth analysis than that. ie. What's
the downside of SEV && !SEV-ES and is the disticntion something inherently
useful?


In terms of features SEV-ES is superset of SEV but that doesn't mean SEV
ASIDs are superset of SEV ASIDs. SEV ASIDs cannot be used for SEV-ES VMs
and similarly SEV-ES ASIDs cannot be used for SEV VMs. Once a system is
booted, based on the BIOS settings each type will have their own
capacity and that number cannot be changed until the next boot and BIOS
changes.


Here's an excerpt from the AMD's system programming manual, section 15.35.2:

   On some systems, there is a limitation on which ASID values can be used on
   SEV guests that are run with SEV-ES disabled. While SEV-ES may be enabled
   on any valid SEV ASID (as defined by CPUID Fn8000_001F[ECX]), there are
   restrictions on which ASIDs may be used for SEV guests with SEV- ES
   disabled. CPUID Fn8000_001F[EDX] indicates the minimum ASID value that
   must be used for an SEV-enabled, SEV-ES-disabled guest. For example, if
   CPUID Fn8000_001F[EDX] returns the value 5, then any VMs which use ASIDs
   1-4 and which enable SEV must also enable SEV-ES.


The hardware will allow any SEV capable ASID to be run as SEV-ES, however, 
the SEV firmware will not allow the activation of an SEV-ES VM to be 
assigned to an ASID greater than or equal to the SEV minimum ASID value. 
The reason for the latter is to prevent an !SEV-ES ASID starting out as an 
SEV-ES guest and then disabling the SEV-ES VMCB bit that is used by VMRUN. 
This would result in the downgrading of the security of the VM without the 
VM realizing it.


As a result, you have a range of ASIDs that can only run SEV-ES VMs and a 
range of ASIDs that can only run SEV VMs.


Thanks,
Tom




We are not mixing the two types of ASIDs, they are separate and used
separately.


Maybe in practice, the key management on the BIOS side is implemented in a
more restricted way but at least the processor manual says differently.


I'm very reluctant to ack vendor specific interfaces for a few reasons but
most importantly because they usually indicate abstraction and/or the
underlying feature not being sufficiently developed and they tend to become
baggages after a while. So, here are my suggestions:


My first patch was only for SEV, but soon we got comments that this can
be abstracted and used by TDX and SEID for their use cases.

I see this patch as providing an abstraction for simple accounting of
resources used for creating secure execution contexts. Here, secure
execution is achieved through different means. SEID, TDX, and SEV
provide security using different features and capabilities. I am not
sure if we will reach a point where all three and other vendors will use
the same approach and technology for this purpose.

Instead of each one coming up with their own resource tracking for their
features, this patch is providing a common framework and cgroup for
tracking these resources.


What's implemented is a shared place where similar things can be thrown in
bu from user's perspective the underlying hardware feature isn't really
abstracted. It's just exposing whatever hardware knobs there are. If you
look at any other cgroup controllers, nothing is exposing this level of
hardware dependent details and I'd really like to keep it that way.

So, what I'm asking for is more in-depth analysis of the landscape and
inherent differences among different vendor implementations to see whether
there can be better approaches or we should just wait and see.


* If there can be a shared abstraction which hopefully makes intuitive
   sense, that'd be ideal. It doesn't have to be one knob but it shouldn't be
   something arbitrary to specific vendors.


I think we should see these as features provided on a host. Tasks can
be executed securely on a host with the guarantees provided by the
specific feature (SEV, SEV-ES, TDX, SEID) used by the task.

I don't think each H/W vendor can agree to a common set of security
guarantees and approach.


Do TDX and SEID have multiple key types tho?


* If we aren't there yet and vendor-specific interface is a must, attach
   that part to an interface which is already vendor-aware.

Sorry, I don't understand this approach. Can you please give more
details about it?


Attaching the interface to kvm side, most likely, instead of exposing the
feature through cgroup.

Thanks.



Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-20 Thread Vipin Sharma
On Wed, Jan 20, 2021 at 11:40:18AM -0500, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jan 19, 2021 at 11:13:51PM -0800, Vipin Sharma wrote:
> > > Can you please elaborate? I skimmed through the amd manual and it seemed 
> > > to
> > > say that SEV-ES ASIDs are superset of SEV but !SEV-ES ASIDs. What's the 
> > > use
> > > case for mixing those two?
> > 
> > For example, customers can be given options for which kind of protection 
> > they
> > want to choose for their workloads based on factors like data protection
> > requirement, cost, speed, etc.
> 
> So, I'm looking for is a bit more in-depth analysis than that. ie. What's
> the downside of SEV && !SEV-ES and is the disticntion something inherently
> useful?

I will leave this for AMD folks to respond, as they can give much better
answer than me.

> > > I'm very reluctant to ack vendor specific interfaces for a few reasons but
> > > most importantly because they usually indicate abstraction and/or the
> > > underlying feature not being sufficiently developed and they tend to 
> > > become
> > > baggages after a while. So, here are my suggestions:
> > 
> > My first patch was only for SEV, but soon we got comments that this can
> > be abstracted and used by TDX and SEID for their use cases.
> > 
> > I see this patch as providing an abstraction for simple accounting of
> > resources used for creating secure execution contexts. Here, secure
> > execution is achieved through different means. SEID, TDX, and SEV
> > provide security using different features and capabilities. I am not
> > sure if we will reach a point where all three and other vendors will use
> > the same approach and technology for this purpose.
> > 
> > Instead of each one coming up with their own resource tracking for their
> > features, this patch is providing a common framework and cgroup for
> > tracking these resources.
> 
> What's implemented is a shared place where similar things can be thrown in
> bu from user's perspective the underlying hardware feature isn't really
> abstracted. It's just exposing whatever hardware knobs there are. If you
> look at any other cgroup controllers, nothing is exposing this level of
> hardware dependent details and I'd really like to keep it that way.

RDMA cgroup expose hardware details to users. In rdma.{max, current}
interface files we can see actual hardware names. Only difference
compared to Encryption ID cgroup is that latter is exposing that detail
via file names.

Will you prefer that encryption ID cgroup do things similar to RDMA
cgroup? It can have 3 files
1. encids.capacity (only on root)
   Shows features (SEV, SEV-ES, TDX, SEID) available along with capacity
   on the host.
   $ cat encids.capacity
   sev 400
   sev-es 100

2. encids.max (only on non-root)
   Allows setting of the max value of a feature in the cgroup.
   It will only show max for features shown in the capacity file.
   $ cat encids.max
   sev max
   sev-es 100

3. encids.current (all levels)
   Shows total getting used in the cgroup and its descendants.
   $ cat encids.current
   sev 3
   sev-es 0

> 
> So, what I'm asking for is more in-depth analysis of the landscape and
> inherent differences among different vendor implementations to see whether
> there can be better approaches or we should just wait and see.
> 
> > > * If there can be a shared abstraction which hopefully makes intuitive
> > >   sense, that'd be ideal. It doesn't have to be one knob but it shouldn't 
> > > be
> > >   something arbitrary to specific vendors.
> > 
> > I think we should see these as features provided on a host. Tasks can
> > be executed securely on a host with the guarantees provided by the
> > specific feature (SEV, SEV-ES, TDX, SEID) used by the task.
> > 
> > I don't think each H/W vendor can agree to a common set of security
> > guarantees and approach.
> 
> Do TDX and SEID have multiple key types tho?
To my limited knowledge I don't think so. I don't know their future
plans.

> 
> > > * If we aren't there yet and vendor-specific interface is a must, attach
> > >   that part to an interface which is already vendor-aware.
> > Sorry, I don't understand this approach. Can you please give more
> > details about it?
> 
> Attaching the interface to kvm side, most likely, instead of exposing the
> feature through cgroup.
I am little confused, do you mean moving files from the kernel/cgroup/
to kvm related directories or you are recommending not to use cgroup at
all?  I hope it is the former :)

Only issue with this is that TDX is not limited to KVM, they have
potential use cases for MKTME without KVM.

Thanks
Vipin


Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-20 Thread Tejun Heo
Hello,

On Wed, Jan 20, 2021 at 03:18:29PM -0800, Vipin Sharma wrote:
> RDMA cgroup expose hardware details to users. In rdma.{max, current}
> interface files we can see actual hardware names. Only difference

No, what's shown is the device name followed by resources which are commonly
defined for all rdma devices. The format is the same as io controller
interface files.

> compared to Encryption ID cgroup is that latter is exposing that detail
> via file names.
> 
> Will you prefer that encryption ID cgroup do things similar to RDMA
> cgroup? It can have 3 files

I don't know how many times I have to repeat the same point to get it
across. For any question about actual abstraction, you haven't provided any
kind of actual research or analysis and just keep pushing the same thing
over and over again. Maybe the situation is such that it makes sense to
change the rule but that needs substantial justifications. I've been asking
to see whether there are such justifications but all I've been getting are
empty answers. Until such discussions take place, please consider the series
nacked and please excuse if I don't respond promptly in this thread.

> > Attaching the interface to kvm side, most likely, instead of exposing the
> > feature through cgroup.
> I am little confused, do you mean moving files from the kernel/cgroup/
> to kvm related directories or you are recommending not to use cgroup at
> all?  I hope it is the former :)
> 
> Only issue with this is that TDX is not limited to KVM, they have
> potential use cases for MKTME without KVM.

There are ways to integrate with cgroup through other interfaces - e.g. take
a look at how bpf works with cgroups. Here, it isn't ideal but may work out
if things actually require a lot of hardware dependent bits. There's also
RDT which exists outside of cgroup for similar reasons.

Thanks.

-- 
tejun


Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-20 Thread Tejun Heo
Hello,

On Tue, Jan 19, 2021 at 11:13:51PM -0800, Vipin Sharma wrote:
> > Can you please elaborate? I skimmed through the amd manual and it seemed to
> > say that SEV-ES ASIDs are superset of SEV but !SEV-ES ASIDs. What's the use
> > case for mixing those two?
> 
> For example, customers can be given options for which kind of protection they
> want to choose for their workloads based on factors like data protection
> requirement, cost, speed, etc.

So, I'm looking for is a bit more in-depth analysis than that. ie. What's
the downside of SEV && !SEV-ES and is the disticntion something inherently
useful?

> In terms of features SEV-ES is superset of SEV but that doesn't mean SEV
> ASIDs are superset of SEV ASIDs. SEV ASIDs cannot be used for SEV-ES VMs
> and similarly SEV-ES ASIDs cannot be used for SEV VMs. Once a system is
> booted, based on the BIOS settings each type will have their own
> capacity and that number cannot be changed until the next boot and BIOS
> changes.

Here's an excerpt from the AMD's system programming manual, section 15.35.2:

  On some systems, there is a limitation on which ASID values can be used on
  SEV guests that are run with SEV-ES disabled. While SEV-ES may be enabled
  on any valid SEV ASID (as defined by CPUID Fn8000_001F[ECX]), there are
  restrictions on which ASIDs may be used for SEV guests with SEV- ES
  disabled. CPUID Fn8000_001F[EDX] indicates the minimum ASID value that
  must be used for an SEV-enabled, SEV-ES-disabled guest. For example, if
  CPUID Fn8000_001F[EDX] returns the value 5, then any VMs which use ASIDs
  1-4 and which enable SEV must also enable SEV-ES.

> We are not mixing the two types of ASIDs, they are separate and used
> separately.

Maybe in practice, the key management on the BIOS side is implemented in a
more restricted way but at least the processor manual says differently.

> > I'm very reluctant to ack vendor specific interfaces for a few reasons but
> > most importantly because they usually indicate abstraction and/or the
> > underlying feature not being sufficiently developed and they tend to become
> > baggages after a while. So, here are my suggestions:
> 
> My first patch was only for SEV, but soon we got comments that this can
> be abstracted and used by TDX and SEID for their use cases.
> 
> I see this patch as providing an abstraction for simple accounting of
> resources used for creating secure execution contexts. Here, secure
> execution is achieved through different means. SEID, TDX, and SEV
> provide security using different features and capabilities. I am not
> sure if we will reach a point where all three and other vendors will use
> the same approach and technology for this purpose.
> 
> Instead of each one coming up with their own resource tracking for their
> features, this patch is providing a common framework and cgroup for
> tracking these resources.

What's implemented is a shared place where similar things can be thrown in
bu from user's perspective the underlying hardware feature isn't really
abstracted. It's just exposing whatever hardware knobs there are. If you
look at any other cgroup controllers, nothing is exposing this level of
hardware dependent details and I'd really like to keep it that way.

So, what I'm asking for is more in-depth analysis of the landscape and
inherent differences among different vendor implementations to see whether
there can be better approaches or we should just wait and see.

> > * If there can be a shared abstraction which hopefully makes intuitive
> >   sense, that'd be ideal. It doesn't have to be one knob but it shouldn't be
> >   something arbitrary to specific vendors.
> 
> I think we should see these as features provided on a host. Tasks can
> be executed securely on a host with the guarantees provided by the
> specific feature (SEV, SEV-ES, TDX, SEID) used by the task.
> 
> I don't think each H/W vendor can agree to a common set of security
> guarantees and approach.

Do TDX and SEID have multiple key types tho?

> > * If we aren't there yet and vendor-specific interface is a must, attach
> >   that part to an interface which is already vendor-aware.
> Sorry, I don't understand this approach. Can you please give more
> details about it?

Attaching the interface to kvm side, most likely, instead of exposing the
feature through cgroup.

Thanks.

-- 
tejun


Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-19 Thread Vipin Sharma
On Tue, Jan 19, 2021 at 10:51:24AM -0500, Tejun Heo wrote:
> Hello,
> 
> On Fri, Jan 15, 2021 at 08:32:19PM -0800, Vipin Sharma wrote:
> > SEV-ES has stronger memory encryption gurantees compared to SEV, apart
> > from encrypting the application memory it also encrypts register state
> > among other things. In a single host ASIDs can be distributed between
> > these two types by BIOS settings.
> > 
> > Currently, Google Cloud has Confidential VM machines offering using SEV.
> > ASIDs are not compatible between SEV and SEV-ES, so a VM running on SEV
> > cannot run on SEV-ES and vice versa
> > 
> > There are use cases for both types of VMs getting used in future.
> 
> Can you please elaborate? I skimmed through the amd manual and it seemed to
> say that SEV-ES ASIDs are superset of SEV but !SEV-ES ASIDs. What's the use
> case for mixing those two?

For example, customers can be given options for which kind of protection they
want to choose for their workloads based on factors like data protection
requirement, cost, speed, etc.

In terms of features SEV-ES is superset of SEV but that doesn't mean SEV
ASIDs are superset of SEV ASIDs. SEV ASIDs cannot be used for SEV-ES VMs
and similarly SEV-ES ASIDs cannot be used for SEV VMs. Once a system is
booted, based on the BIOS settings each type will have their own
capacity and that number cannot be changed until the next boot and BIOS
changes.

We are not mixing the two types of ASIDs, they are separate and used
separately.

> 
> > > > > > Other ID types can be easily added in the controller in the same 
> > > > > > way.
> > > > > 
> > > > > I'm not sure this is necessarily a good thing.
> > > > 
> > > > This is to just say that when Intel and PowerPC changes are ready it
> > > > won't be difficult for them to add their controller.
> > > 
> > > I'm not really enthused about having per-hardware-type control knobs. None
> > > of other controllers behave that way. Unless it can be abstracted into
> > > something common, I'm likely to object.
> > 
> > There was a discussion in Patch v1 and consensus was to have individual
> > files because it makes kernel implementation extremely simple.
> > 
> > https://lore.kernel.org/lkml/alpine.deb.2.23.453.2011131615510.333...@chino.kir.corp.google.com/#t
> 
> I'm very reluctant to ack vendor specific interfaces for a few reasons but
> most importantly because they usually indicate abstraction and/or the
> underlying feature not being sufficiently developed and they tend to become
> baggages after a while. So, here are my suggestions:

My first patch was only for SEV, but soon we got comments that this can
be abstracted and used by TDX and SEID for their use cases.

I see this patch as providing an abstraction for simple accounting of
resources used for creating secure execution contexts. Here, secure
execution is achieved through different means. SEID, TDX, and SEV
provide security using different features and capabilities. I am not
sure if we will reach a point where all three and other vendors will use
the same approach and technology for this purpose.

Instead of each one coming up with their own resource tracking for their
features, this patch is providing a common framework and cgroup for
tracking these resources.

> 
> * If there can be a shared abstraction which hopefully makes intuitive
>   sense, that'd be ideal. It doesn't have to be one knob but it shouldn't be
>   something arbitrary to specific vendors.

I think we should see these as features provided on a host. Tasks can
be executed securely on a host with the guarantees provided by the
specific feature (SEV, SEV-ES, TDX, SEID) used by the task.

I don't think each H/W vendor can agree to a common set of security
guarantees and approach.

> 
> * If we aren't there yet and vendor-specific interface is a must, attach
>   that part to an interface which is already vendor-aware.
Sorry, I don't understand this approach. Can you please give more
details about it?

Thanks
Vipin


Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-19 Thread Tejun Heo
Hello,

On Fri, Jan 15, 2021 at 08:32:19PM -0800, Vipin Sharma wrote:
> SEV-ES has stronger memory encryption gurantees compared to SEV, apart
> from encrypting the application memory it also encrypts register state
> among other things. In a single host ASIDs can be distributed between
> these two types by BIOS settings.
> 
> Currently, Google Cloud has Confidential VM machines offering using SEV.
> ASIDs are not compatible between SEV and SEV-ES, so a VM running on SEV
> cannot run on SEV-ES and vice versa
> 
> There are use cases for both types of VMs getting used in future.

Can you please elaborate? I skimmed through the amd manual and it seemed to
say that SEV-ES ASIDs are superset of SEV but !SEV-ES ASIDs. What's the use
case for mixing those two?

> > > > > Other ID types can be easily added in the controller in the same way.
> > > > 
> > > > I'm not sure this is necessarily a good thing.
> > > 
> > > This is to just say that when Intel and PowerPC changes are ready it
> > > won't be difficult for them to add their controller.
> > 
> > I'm not really enthused about having per-hardware-type control knobs. None
> > of other controllers behave that way. Unless it can be abstracted into
> > something common, I'm likely to object.
> 
> There was a discussion in Patch v1 and consensus was to have individual
> files because it makes kernel implementation extremely simple.
> 
> https://lore.kernel.org/lkml/alpine.deb.2.23.453.2011131615510.333...@chino.kir.corp.google.com/#t

I'm very reluctant to ack vendor specific interfaces for a few reasons but
most importantly because they usually indicate abstraction and/or the
underlying feature not being sufficiently developed and they tend to become
baggages after a while. So, here are my suggestions:

* If there can be a shared abstraction which hopefully makes intuitive
  sense, that'd be ideal. It doesn't have to be one knob but it shouldn't be
  something arbitrary to specific vendors.

* If we aren't there yet and vendor-specific interface is a must, attach
  that part to an interface which is already vendor-aware.

> This information is not available anywhere else in the system. Only
> other way to get this value is to use CPUID instruction (0x801F) of
> the processor. Which also has disdvantage if sev module in kernel
> doesn't use all of the available ASIDs for its work (right now it uses
> all) then there will be a mismatch between what user get through their
> code and what is actually getting used in the kernel by sev.
> 
> In cgroup v2, I didn't see current files for other cgroups in root
> folder that is why I didn't show that file in root folder.
> 
> Will you be fine if I show two files in the root, something like:
> 
> encids.sev.capacity
> encids.sev.current
> 
> In non root folder, it will be:
> encids.sev.max
> encids.sev.current
> 
> I still prefer encids.sev.stat, as it won't repeat same information in
> each cgroup but let me know what you think.

Yeah, this will be a first and I was mostly wondering about the same number
appearing under different files / names on root and !root cgroups. I'm
leaning more towards capacity/current but let me think about it a bit more.

Thank you.

-- 
tejun


Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-15 Thread Vipin Sharma
On Fri, Jan 15, 2021 at 10:43:32PM -0500, Tejun Heo wrote:
> On Fri, Jan 15, 2021 at 02:18:40PM -0800, Vipin Sharma wrote:
> > > * Why is .sev a separate namespace? Isn't the controller supposed to cover
> > >   encryption ids across different implementations? It's not like multiple
> > >   types of IDs can be in use on the same machine, right?
> > > 
> > 
> > On AMD platform we have two types SEV and SEV-ES which can exists
> > simultaneously and they have their own quota.
> 
> Can you please give a brief explanation of the two and lay out a scenario
> where the two are being used / allocated disjointly?
> 

SEV-ES has stronger memory encryption gurantees compared to SEV, apart
from encrypting the application memory it also encrypts register state
among other things. In a single host ASIDs can be distributed between
these two types by BIOS settings.

Currently, Google Cloud has Confidential VM machines offering using SEV.
ASIDs are not compatible between SEV and SEV-ES, so a VM running on SEV
cannot run on SEV-ES and vice versa

There are use cases for both types of VMs getting used in future.

> > > > Other ID types can be easily added in the controller in the same way.
> > > 
> > > I'm not sure this is necessarily a good thing.
> > 
> > This is to just say that when Intel and PowerPC changes are ready it
> > won't be difficult for them to add their controller.
> 
> I'm not really enthused about having per-hardware-type control knobs. None
> of other controllers behave that way. Unless it can be abstracted into
> something common, I'm likely to object.

There was a discussion in Patch v1 and consensus was to have individual
files because it makes kernel implementation extremely simple.

https://lore.kernel.org/lkml/alpine.deb.2.23.453.2011131615510.333...@chino.kir.corp.google.com/#t

> 
> > > > +static int enc_id_cg_stat_show(struct seq_file *sf, void *v)
> > > > +{
> > > > +   unsigned long flags;
> > > > +   enum encryption_id_type type = seq_cft(sf)->private;
> > > > +
> > > > +   spin_lock_irqsave(_id_cg_lock, flags);
> > > > +
> > > > +   seq_printf(sf, "total %u\n", enc_id_capacity[type]);
> > > > +   seq_printf(sf, "used %u\n", root_cg.res[type].usage);
> > > 
> > > Dup with .current and no need to show total on every cgroup, right?
> > 
> > This is for the stat file which will only be seen in the root cgroup
> > directory.  It is to know overall picture for the resource, what is the
> > total capacity and what is the current usage. ".current" file is not
> > shown on the root cgroup.
> 
> Ah, missed the flags. It's odd for the usage to be presented in two
> different ways tho. I think it'd make more sense w/ cgroup.current at root
> level. Is the total number available somewhere else in the system?

This information is not available anywhere else in the system. Only
other way to get this value is to use CPUID instruction (0x801F) of
the processor. Which also has disdvantage if sev module in kernel
doesn't use all of the available ASIDs for its work (right now it uses
all) then there will be a mismatch between what user get through their
code and what is actually getting used in the kernel by sev.

In cgroup v2, I didn't see current files for other cgroups in root
folder that is why I didn't show that file in root folder.

Will you be fine if I show two files in the root, something like:

encids.sev.capacity
encids.sev.current

In non root folder, it will be:
encids.sev.max
encids.sev.current

I still prefer encids.sev.stat, as it won't repeat same information in
each cgroup but let me know what you think.

Thanks


Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-15 Thread Tejun Heo
On Fri, Jan 15, 2021 at 02:18:40PM -0800, Vipin Sharma wrote:
> > * Why is .sev a separate namespace? Isn't the controller supposed to cover
> >   encryption ids across different implementations? It's not like multiple
> >   types of IDs can be in use on the same machine, right?
> > 
> 
> On AMD platform we have two types SEV and SEV-ES which can exists
> simultaneously and they have their own quota.

Can you please give a brief explanation of the two and lay out a scenario
where the two are being used / allocated disjointly?

> > > Other ID types can be easily added in the controller in the same way.
> > 
> > I'm not sure this is necessarily a good thing.
> 
> This is to just say that when Intel and PowerPC changes are ready it
> won't be difficult for them to add their controller.

I'm not really enthused about having per-hardware-type control knobs. None
of other controllers behave that way. Unless it can be abstracted into
something common, I'm likely to object.

> > > +static int enc_id_cg_stat_show(struct seq_file *sf, void *v)
> > > +{
> > > + unsigned long flags;
> > > + enum encryption_id_type type = seq_cft(sf)->private;
> > > +
> > > + spin_lock_irqsave(_id_cg_lock, flags);
> > > +
> > > + seq_printf(sf, "total %u\n", enc_id_capacity[type]);
> > > + seq_printf(sf, "used %u\n", root_cg.res[type].usage);
> > 
> > Dup with .current and no need to show total on every cgroup, right?
> 
> This is for the stat file which will only be seen in the root cgroup
> directory.  It is to know overall picture for the resource, what is the
> total capacity and what is the current usage. ".current" file is not
> shown on the root cgroup.

Ah, missed the flags. It's odd for the usage to be presented in two
different ways tho. I think it'd make more sense w/ cgroup.current at root
level. Is the total number available somewhere else in the system?

Thanks.

-- 
tejun


Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-15 Thread Vipin Sharma
On Fri, Jan 15, 2021 at 03:59:25PM -0500, Tejun Heo wrote:
> Hello,
> 
> On Thu, Jan 07, 2021 at 05:28:45PM -0800, Vipin Sharma wrote:
> > 1. encrpytion_ids.sev.max
> > Sets the maximum usage of SEV IDs in the cgroup.
> > 2. encryption_ids.sev.current
> > Current usage of SEV IDs in the cgroup and its children.
> > 3. encryption_ids.sev.stat
> > Shown only at the root cgroup. Displays total SEV IDs available
> > on the platform and current usage count.
> 
> Sorry, should have raised these earlier:
> 
> * Can we shorten the name to encids?

Sure.

> 
> * Why is .sev a separate namespace? Isn't the controller supposed to cover
>   encryption ids across different implementations? It's not like multiple
>   types of IDs can be in use on the same machine, right?
> 

On AMD platform we have two types SEV and SEV-ES which can exists
simultaneously and they have their own quota.

> > Other ID types can be easily added in the controller in the same way.
> 
> I'm not sure this is necessarily a good thing.

This is to just say that when Intel and PowerPC changes are ready it
won't be difficult for them to add their controller.

> 
> > +/**
> > + * enc_id_cg_uncharge_hierarchy() - Uncharge the enryption ID cgroup 
> > hierarchy.
> > + * @start_cg: Starting cgroup.
> > + * @stop_cg: cgroup at which uncharge stops.
> > + * @type: type of encryption ID to uncharge.
> > + * @amount: Charge amount.
> > + *
> > + * Uncharge the cgroup tree from the given start cgroup to the stop cgroup.
> > + *
> > + * Context: Any context. Expects enc_id_cg_lock to be held by the caller.
> > + */
> > +static void enc_id_cg_uncharge_hierarchy(struct encryption_id_cgroup 
> > *start_cg,
> > +struct encryption_id_cgroup *stop_cg,
> > +enum encryption_id_type type,
> > +unsigned int amount)
> > +{
> > +   struct encryption_id_cgroup *i;
> > +
> > +   lockdep_assert_held(_id_cg_lock);
> > +
> > +   for (i = start_cg; i != stop_cg; i = parent_enc(i)) {
> > +   WARN_ON_ONCE(i->res[type].usage < amount);
> > +   i->res[type].usage -= amount;
> > +   }
> > +   css_put(_cg->css);
> 
> I'm curious whether this is necessary given that a css can't be destroyed
> while tasks are attached. Are there cases where this wouldn't hold true? If
> so, it'd be great to have some comments on how that can happen.

We are not moving charges when a task moves out. This can lead us to the
cases where all of the tasks in the cgroup have moved out but it
still has charges. In that scenarios cgroup can be deleted. Taking a
reference will make sure cgroup is atleast present internally.

Also, struct encryption_ic_cgroup has a reference to the cgroup which is
used during uncharge call to correctly identify from which cgroup charge
should be deducted.

> 
> > +/**
> > + * enc_id_cg_max_write() - Update the maximum limit of the cgroup.
> > + * @of: Handler for the file.
> > + * @buf: Data from the user. It should be either "max", 0, or a positive
> > + *  integer.
> > + * @nbytes: Number of bytes of the data.
> > + * @off: Offset in the file.
> > + *
> > + * Uses cft->private value to determine for which enryption ID type 
> > results be
> > + * shown.
> > + *
> > + * Context: Any context. Takes and releases enc_id_cg_lock.
> > + * Return:
> > + * * >= 0 - Number of bytes processed in the input.
> > + * * -EINVAL - If buf is not valid.
> > + * * -ERANGE - If number is bigger than unsigned int capacity.
> > + * * -EBUSY - If usage can become more than max limit.
> 
> The aboves are stale, right?

-EBUSY is not valid anymore. We can now set max to be less than the usage. I
will remove it in the next patch.

> 
> > +static int enc_id_cg_stat_show(struct seq_file *sf, void *v)
> > +{
> > +   unsigned long flags;
> > +   enum encryption_id_type type = seq_cft(sf)->private;
> > +
> > +   spin_lock_irqsave(_id_cg_lock, flags);
> > +
> > +   seq_printf(sf, "total %u\n", enc_id_capacity[type]);
> > +   seq_printf(sf, "used %u\n", root_cg.res[type].usage);
> 
> Dup with .current and no need to show total on every cgroup, right?

This is for the stat file which will only be seen in the root cgroup
directory.  It is to know overall picture for the resource, what is the
total capacity and what is the current usage. ".current" file is not
shown on the root cgroup.

This information is good for resource allocation in the cloud
infrastructure.

> 
> Thanks.
> 
> -- 
> tejun


Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-15 Thread Tejun Heo
Hello,

On Thu, Jan 07, 2021 at 05:28:45PM -0800, Vipin Sharma wrote:
> 1. encrpytion_ids.sev.max
>   Sets the maximum usage of SEV IDs in the cgroup.
> 2. encryption_ids.sev.current
>   Current usage of SEV IDs in the cgroup and its children.
> 3. encryption_ids.sev.stat
>   Shown only at the root cgroup. Displays total SEV IDs available
>   on the platform and current usage count.

Sorry, should have raised these earlier:

* Can we shorten the name to encids?

* Why is .sev a separate namespace? Isn't the controller supposed to cover
  encryption ids across different implementations? It's not like multiple
  types of IDs can be in use on the same machine, right?

> Other ID types can be easily added in the controller in the same way.

I'm not sure this is necessarily a good thing.

> +/**
> + * enc_id_cg_uncharge_hierarchy() - Uncharge the enryption ID cgroup 
> hierarchy.
> + * @start_cg: Starting cgroup.
> + * @stop_cg: cgroup at which uncharge stops.
> + * @type: type of encryption ID to uncharge.
> + * @amount: Charge amount.
> + *
> + * Uncharge the cgroup tree from the given start cgroup to the stop cgroup.
> + *
> + * Context: Any context. Expects enc_id_cg_lock to be held by the caller.
> + */
> +static void enc_id_cg_uncharge_hierarchy(struct encryption_id_cgroup 
> *start_cg,
> +  struct encryption_id_cgroup *stop_cg,
> +  enum encryption_id_type type,
> +  unsigned int amount)
> +{
> + struct encryption_id_cgroup *i;
> +
> + lockdep_assert_held(_id_cg_lock);
> +
> + for (i = start_cg; i != stop_cg; i = parent_enc(i)) {
> + WARN_ON_ONCE(i->res[type].usage < amount);
> + i->res[type].usage -= amount;
> + }
> + css_put(_cg->css);

I'm curious whether this is necessary given that a css can't be destroyed
while tasks are attached. Are there cases where this wouldn't hold true? If
so, it'd be great to have some comments on how that can happen.

> +/**
> + * enc_id_cg_max_write() - Update the maximum limit of the cgroup.
> + * @of: Handler for the file.
> + * @buf: Data from the user. It should be either "max", 0, or a positive
> + *integer.
> + * @nbytes: Number of bytes of the data.
> + * @off: Offset in the file.
> + *
> + * Uses cft->private value to determine for which enryption ID type results 
> be
> + * shown.
> + *
> + * Context: Any context. Takes and releases enc_id_cg_lock.
> + * Return:
> + * * >= 0 - Number of bytes processed in the input.
> + * * -EINVAL - If buf is not valid.
> + * * -ERANGE - If number is bigger than unsigned int capacity.
> + * * -EBUSY - If usage can become more than max limit.

The aboves are stale, right?

> +static int enc_id_cg_stat_show(struct seq_file *sf, void *v)
> +{
> + unsigned long flags;
> + enum encryption_id_type type = seq_cft(sf)->private;
> +
> + spin_lock_irqsave(_id_cg_lock, flags);
> +
> + seq_printf(sf, "total %u\n", enc_id_capacity[type]);
> + seq_printf(sf, "used %u\n", root_cg.res[type].usage);

Dup with .current and no need to show total on every cgroup, right?

Thanks.

-- 
tejun


Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-13 Thread Brijesh Singh



On 1/7/2021 7:28 PM, Vipin Sharma wrote:

Hardware memory encryption is available on multiple generic CPUs. For
example AMD has Secure Encrypted Virtualization (SEV) and SEV -
Encrypted State (SEV-ES).

These memory encryptions are useful in creating encrypted virtual
machines (VMs) and user space programs.

There are limited number of encryption IDs that can be used
simultaneously on a machine for encryption. This generates a need for
the system admin to track, limit, allocate resources, and optimally
schedule VMs and user workloads in the cloud infrastructure. Some
malicious programs can exhaust all of these resources on a host causing
starvation of other workloads.

Encryption ID controller allows control of these resources using
Cgroups.

Controller is enabled by CGROUP_ENCRYPTION_IDS config option.
Encryption controller provide 3 interface files for each encryption ID
type. For example, in SEV:

1. encrpytion_ids.sev.max
Sets the maximum usage of SEV IDs in the cgroup.
2. encryption_ids.sev.current
Current usage of SEV IDs in the cgroup and its children.
3. encryption_ids.sev.stat
Shown only at the root cgroup. Displays total SEV IDs available
on the platform and current usage count.

Other ID types can be easily added in the controller in the same way.

Signed-off-by: Vipin Sharma 
Reviewed-by: David Rientjes 
Reviewed-by: Dionna Glaze 



Acked-by: Brijesh Singh 


---
  arch/x86/kvm/svm/sev.c|  52 +++-
  include/linux/cgroup_subsys.h |   4 +
  include/linux/encryption_ids_cgroup.h |  72 +
  include/linux/kvm_host.h  |   4 +
  init/Kconfig  |  14 +
  kernel/cgroup/Makefile|   1 +
  kernel/cgroup/encryption_ids.c| 422 ++
  7 files changed, 557 insertions(+), 12 deletions(-)
  create mode 100644 include/linux/encryption_ids_cgroup.h
  create mode 100644 kernel/cgroup/encryption_ids.c

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9858d5ae9ddd..1924ab2eaf11 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -14,6 +14,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -86,10 +87,18 @@ static bool __sev_recycle_asids(int min_asid, int max_asid)
return true;
  }
  
-static int sev_asid_new(struct kvm_sev_info *sev)

+static int sev_asid_new(struct kvm *kvm)
  {
-   int pos, min_asid, max_asid;
+   int pos, min_asid, max_asid, ret;
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
bool retry = true;
+   enum encryption_id_type type;
+
+   type = sev->es_active ? ENCRYPTION_ID_SEV_ES : ENCRYPTION_ID_SEV;
+
+   ret = enc_id_cg_try_charge(kvm, type, 1);
+   if (ret)
+   return ret;
  
  	mutex_lock(_bitmap_lock);
  
@@ -107,7 +116,8 @@ static int sev_asid_new(struct kvm_sev_info *sev)

goto again;
}
mutex_unlock(_bitmap_lock);
-   return -EBUSY;
+   ret = -EBUSY;
+   goto e_uncharge;
}
  
  	__set_bit(pos, sev_asid_bitmap);

@@ -115,6 +125,9 @@ static int sev_asid_new(struct kvm_sev_info *sev)
mutex_unlock(_bitmap_lock);
  
  	return pos + 1;

+e_uncharge:
+   enc_id_cg_uncharge(kvm, type, 1);
+   return ret;
  }
  
  static int sev_get_asid(struct kvm *kvm)

@@ -124,14 +137,16 @@ static int sev_get_asid(struct kvm *kvm)
return sev->asid;
  }
  
-static void sev_asid_free(int asid)

+static void sev_asid_free(struct kvm *kvm)
  {
struct svm_cpu_data *sd;
int cpu, pos;
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
+   enum encryption_id_type type;
  
  	mutex_lock(_bitmap_lock);
  
-	pos = asid - 1;

+   pos = sev->asid - 1;
__set_bit(pos, sev_reclaim_asid_bitmap);
  
  	for_each_possible_cpu(cpu) {

@@ -140,6 +155,9 @@ static void sev_asid_free(int asid)
}
  
  	mutex_unlock(_bitmap_lock);

+
+   type = sev->es_active ? ENCRYPTION_ID_SEV_ES : ENCRYPTION_ID_SEV;
+   enc_id_cg_uncharge(kvm, type, 1);
  }
  
  static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)

@@ -184,22 +202,22 @@ static int sev_guest_init(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
if (unlikely(sev->active))
return ret;
  
-	asid = sev_asid_new(sev);

+   asid = sev_asid_new(kvm);
if (asid < 0)
return ret;
+   sev->asid = asid;
  
  	ret = sev_platform_init(>error);

if (ret)
goto e_free;
  
  	sev->active = true;

-   sev->asid = asid;
INIT_LIST_HEAD(>regions_list);
  
  	return 0;
  
  e_free:

-   sev_asid_free(asid);
+   sev_asid_free(kvm);
return ret;
  }
  
@@ -1240,12 +1258,12 @@ void sev_vm_destroy(struct kvm *kvm)

mutex_unlock(>lock);
  
  	sev_unbind_asid(kvm, sev->handle);

-   sev_asid_free(sev->asid);
+   sev_asid_free(kvm);