Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-27 Thread Jarkko Sakkinen
On Mon Feb 26, 2024 at 11:56 PM EET, Dave Hansen wrote:
> On 2/26/24 13:48, Haitao Huang wrote:
> > In case of overcomitting, i.e., sum of limits greater than the EPC
> > capacity, if one group has a fault, and its usage is not above its own
> > limit (try_charge() passes), yet total usage of the system has exceeded
> > the capacity, whether we do global reclaim or just reclaim pages in the
> > current faulting group.
>
> I don't see _any_ reason to limit reclaim to the current faulting cgroup.
>
> >> Last, what is the simplest (least amount of code) thing that the SGX
> >> cgroup controller could implement here?
> > 
> > I still think the current approach of doing global reclaim is reasonable
> > and simple: try_charge() checks cgroup limit and reclaim within the
> > group if needed, then do EPC page allocation, reclaim globally if
> > allocation fails due to global usage reaches the capacity.
> > 
> > I'm not sure how not doing global reclaiming in this case would bring
> > any benefit.
> I tend to agree.
>
> Kai, I think your examples sound a little bit contrived.  Have actual
> users expressed a strong intent for doing anything with this series
> other than limiting bad actors from eating all the EPC?

I'd consider this from the viewpoint is there anything in the user space
visible portion of the patch set that would limit tuning the performance
later on, if required let's say by a workload that acts sub-optimally.

If not, then most of performance related issues can be only identified
by actual use of the code.

BR, Jarkko



Re: Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-27 Thread Michal Koutný
Hello.

On Mon, Feb 26, 2024 at 03:48:18PM -0600, Haitao Huang 
 wrote:
> In case of overcomitting, i.e., sum of limits greater than the EPC capacity,
> if one group has a fault, and its usage is not above its own limit
> (try_charge() passes), yet total usage of the system has exceeded the
> capacity, whether we do global reclaim or just reclaim pages in the current
> faulting group.
> 
> > Also, what does the core mm memcg code do?
> > 
> I'm not sure. I'll try to find out but it'd be appreciated if someone more
> knowledgeable can comment on this. memcg also has the protection mechanism
> (i.e., min, low settings) to guarantee some allocation per group so its
> approach might not be applicable to misc controller here.

I only follow the discussion superficially but it'd be nice to have
analogous mechanisms in memcg and sgx controller.

The memory limits are rather simple -- when allocating new memory, the
tightest limit of ancestor applies and reclaim is applied to whole
subtree of such an ancestor (not necessearily the originating cgroup of
the allocation). Overcommit is admited, competition among siblings is
resolved on the first comes, first served basis.

The memory protections are an additional (and in a sense orthogoal)
mechanism. They can be interpretted as limits that are enforced not at
the time of allocation but at the time of reclaim (and reclaim is
triggered independetly, either global or with the limits above). The
consequence is that the protection code must do some normalization to
evaluate overcommited values sensibly.

(Historically, there was also memory.soft_limit_in_bytes, which combined
"protection" and global reclaim but it turned out broken. I suggest
reading Documentation/admin-guide/cgroup-v2.rst section Controller
Issues and Remedies/Memory for more details and as cautionary example.)

HTH,
Michal


signature.asc
Description: PGP signature


Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-26 Thread Huang, Kai




On 27/02/2024 11:38 am, Dave Hansen wrote:

On 2/26/24 14:34, Huang, Kai wrote:

So I am trying to get the actual downside of doing per-cgroup reclaim or
the full reason that we choose global reclaim.


Take the most extreme example:

while (hit_global_sgx_limit())
reclaim_from_this(cgroup);

You eventually end up with all of 'cgroup's pages gone and handed out to
other users on the system who stole them all.  Other users might cause
you to go over the global limit.  *They* should be paying part of the
cost, not just you and your cgroup.


Yeah likely we will need another layer of logic to decide when to do 
global reclaim.  I agree that is downside and is unnecessary for this 
patchset.


Thanks for the comments.



Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-26 Thread Huang, Kai




On 27/02/2024 11:31 am, Dave Hansen wrote:

On 2/26/24 14:24, Huang, Kai wrote:

What is the downside of doing per-group reclaim when try_charge()
succeeds for the enclave but failed to allocate EPC page?

Could you give an complete answer why you choose to use global reclaim
for the above case?


There are literally two different limits at play.  There's the limit
that the cgroup imposes and then the actual physical limit.

Hitting the cgroup limit induces cgroup reclaim.

Hitting the physical limit induces global reclaim.

Maybe I'm just being dense, but I fail to understand why you would want
to entangle those two different concepts more than absolutely necessary.


OK.  Yes I agree doing per-cgroup reclaim when hitting physical limit 
would bring another layer of consideration of when to do global reclaim, 
which is not necessary now.




Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-26 Thread Dave Hansen
On 2/26/24 14:34, Huang, Kai wrote:
> So I am trying to get the actual downside of doing per-cgroup reclaim or
> the full reason that we choose global reclaim.

Take the most extreme example:

while (hit_global_sgx_limit())
reclaim_from_this(cgroup);

You eventually end up with all of 'cgroup's pages gone and handed out to
other users on the system who stole them all.  Other users might cause
you to go over the global limit.  *They* should be paying part of the
cost, not just you and your cgroup.



Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-26 Thread Huang, Kai





Kai, I think your examples sound a little bit contrived.  Have actual
users expressed a strong intent for doing anything with this series
other than limiting bad actors from eating all the EPC?

I am not sure about this.  I am also trying to get a full picture.

I asked because I didn't quite like the duplicated code change in 
try_charge() in this patch and in sgx_alloc_epc_page().  I think using 
per-group reclaim we can unify the code (I have even started to write 
the code) and I don't see the downside of doing so.


So I am trying to get the actual downside of doing per-cgroup reclaim or 
the full reason that we choose global reclaim.





Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-26 Thread Dave Hansen
On 2/26/24 14:24, Huang, Kai wrote:
> What is the downside of doing per-group reclaim when try_charge()
> succeeds for the enclave but failed to allocate EPC page?
> 
> Could you give an complete answer why you choose to use global reclaim
> for the above case?

There are literally two different limits at play.  There's the limit
that the cgroup imposes and then the actual physical limit.

Hitting the cgroup limit induces cgroup reclaim.

Hitting the physical limit induces global reclaim.

Maybe I'm just being dense, but I fail to understand why you would want
to entangle those two different concepts more than absolutely necessary.



Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-26 Thread Huang, Kai




On 27/02/2024 10:18 am, Haitao Huang wrote:

On Mon, 26 Feb 2024 05:36:02 -0600, Huang, Kai  wrote:


On Sun, 2024-02-25 at 22:03 -0600, Haitao Huang wrote:
On Sun, 25 Feb 2024 19:38:26 -0600, Huang, Kai  
wrote:


>
>
> On 24/02/2024 6:00 am, Haitao Huang wrote:
> > On Fri, 23 Feb 2024 04:18:18 -0600, Huang, Kai 
> > wrote:
> >
> > > > >
> > > > Right. When code reaches to here, we already passed reclaim per
> > > > cgroup.
> > >
> > > Yes if try_charge() failed we must do pre-cgroup reclaim.
> > >
> > > > The cgroup may not at or reach limit but system has run out of
> > > > physical
> > > > EPC.
> > > >
> > >
> > > But after try_charge() we can still choose to reclaim from the 
current

> > > group,
> > > but not necessarily have to be global, right?  I am not sure 
whether I

> > > am
> > > missing something, but could you elaborate why we should choose to
> > > reclaim from
> > > the global?
> > >
> >  Once try_charge is done and returns zero that means the cgroup 
usage
> > is charged and it's not over usage limit. So you really can't 
reclaim
> > from that cgroup if allocation failed. The only  thing you can do 
is to

> > reclaim globally.
>
> Sorry I still cannot establish the logic here.
>
> Let's say the sum of all cgroups are greater than the physical EPC, 
and
> elclave(s) in each cgroup could potentially fault w/o reaching 
cgroup's

> limit.
>
> In this case, when enclave(s) in one cgroup faults, why we cannot
> reclaim from the current cgroup, but have to reclaim from global?
>
> Is there any real downside of the former, or you just want to 
follow the

> reclaim logic w/o cgroup at all?
>
> IIUC, there's at least one advantage of reclaim from the current 
group,

> that faults of enclave(s) in one group won't impact other enclaves in
> other cgroups.  E.g., in this way other enclaves in other groups may
> never need to trigger faults.
>
> Or perhaps I am missing anything?
>
The use case here is that user knows it's OK for group A to borrow some
pages from group B for some time without impact much performance, vice
versa. That's why the user is overcomitting so system can run more
enclave/groups. Otherwise, if she is concerned about impact of A on 
B, she

could lower limit for A so it never interfere or interfere less with B
(assume the lower limit is still high enough to run all enclaves in A),
and sacrifice some of A's performance. Or if she does not want any
interference between groups, just don't over-comit. So we don't really
lose anything here.


But if we reclaim from the same group, seems we could enable a user 
case that

allows the admin to ensure certain group won't be impacted at all, while
allowing other groups to over-commit?

E.g., let's say we have 100M physical EPC.  And let's say the admin 
wants to run
some performance-critical enclave(s) which costs 50M EPC w/o being 
impacted.
The admin also wants to run other enclaves which could cost 100M EPC 
in total

but EPC swapping among them is acceptable.

If we choose to reclaim from the current EPC cgroup, then seems to 
that the
admin can achieve the above by setting up 2 groups with group1 having 
50M limit
and group2 having 100M limit, and then run performance-critical 
enclave(s) in

group1 and others in group2?  Or am I missing anything?



The more important groups should have limits higher than or equal to 
peak usage to ensure no impact.


Yes.  But if you do global reclaim there's no guarantee of this 
regardless of the limit setting.  It depends on setting of limits of 
other groups.


The less important groups should have lower limits than its peak usage 
to avoid impacting higher priority groups.


Yeah, but depending on how low the limit is, the try_charge() can still 
succeed but physical EPC is already running out.


Are you saying we should always expect the admin to set limits of groups 
not exceeding the physical EPC?



The limit is the maximum usage allowed.

By setting group2 limit to 100M, you are allowing it to use 100M. So as 
soon as it gets up and consume 100M, group1 can not even load any 
enclave if we only reclaim per-cgroup and do not do global reclaim.


I kinda forgot, but I think SGX supports swapping out EPC of an enclave 
before EINIT?  Also, with SGX2 the initial enclave can take less EPC to 
be loaded.





If we choose to do global reclaim, then we cannot achieve that.



You can achieve this by setting group 2 limit to 50M. No need to 
overcommiting to the system.
Group 2 will swap as soon as it hits 50M, which is the maximum it can 
consume so no impact to group 1.


Right.  We can achieve this by doing so.  But as said above, you are 
depending on setting up the limit to do per-cgorup reclaim.


So, back to the question:

What is the downside of doing per-group reclaim when try_charge() 
succeeds for the enclave but failed to allocate EPC page?


Could you give an complete answer why you choose to use global reclaim 
for the above case?




Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-26 Thread Dave Hansen
On 2/26/24 13:48, Haitao Huang wrote:
> In case of overcomitting, i.e., sum of limits greater than the EPC
> capacity, if one group has a fault, and its usage is not above its own
> limit (try_charge() passes), yet total usage of the system has exceeded
> the capacity, whether we do global reclaim or just reclaim pages in the
> current faulting group.

I don't see _any_ reason to limit reclaim to the current faulting cgroup.

>> Last, what is the simplest (least amount of code) thing that the SGX
>> cgroup controller could implement here?
> 
> I still think the current approach of doing global reclaim is reasonable
> and simple: try_charge() checks cgroup limit and reclaim within the
> group if needed, then do EPC page allocation, reclaim globally if
> allocation fails due to global usage reaches the capacity.
> 
> I'm not sure how not doing global reclaiming in this case would bring
> any benefit.
I tend to agree.

Kai, I think your examples sound a little bit contrived.  Have actual
users expressed a strong intent for doing anything with this series
other than limiting bad actors from eating all the EPC?




Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-26 Thread Haitao Huang

Hi Dave,

On Mon, 26 Feb 2024 08:04:54 -0600, Dave Hansen   
wrote:



On 2/26/24 03:36, Huang, Kai wrote:
In case of overcomitting, even if we always reclaim from the same  
cgroup
for each fault, one group may still interfere the other: e.g.,  
consider an

extreme case in that group A used up almost all EPC at the time group B
has a fault, B has to fail allocation and kill enclaves.
If the admin allows group A to use almost all EPC, to me it's fair to  
say he/she
doesn't want to run anything inside B at all and it is acceptable  
enclaves in B

to be killed.


Folks, I'm having a really hard time following this thread.  It sounds
like there's disagreement about when to do system-wide reclaim.  Could
someone remind me of the choices that we have?  (A proposed patch would
go a _long_ way to helping me understand)



In case of overcomitting, i.e., sum of limits greater than the EPC  
capacity, if one group has a fault, and its usage is not above its own  
limit (try_charge() passes), yet total usage of the system has exceeded  
the capacity, whether we do global reclaim or just reclaim pages in the  
current faulting group.



Also, what does the core mm memcg code do?

I'm not sure. I'll try to find out but it'd be appreciated if someone more  
knowledgeable can comment on this. memcg also has the protection mechanism  
(i.e., min, low settings) to guarantee some allocation per group so its  
approach might not be applicable to misc controller here.



Last, what is the simplest (least amount of code) thing that the SGX
cgroup controller could implement here?




I still think the current approach of doing global reclaim is reasonable  
and simple: try_charge() checks cgroup limit and reclaim within the group  
if needed, then do EPC page allocation, reclaim globally if allocation  
fails due to global usage reaches the capacity.


I'm not sure how not doing global reclaiming in this case would bring any  
benefit. Please see my response to Kai's example cases.


Thanks
Haitao



Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-26 Thread Haitao Huang

On Mon, 26 Feb 2024 05:36:02 -0600, Huang, Kai  wrote:


On Sun, 2024-02-25 at 22:03 -0600, Haitao Huang wrote:
On Sun, 25 Feb 2024 19:38:26 -0600, Huang, Kai   
wrote:


>
>
> On 24/02/2024 6:00 am, Haitao Huang wrote:
> > On Fri, 23 Feb 2024 04:18:18 -0600, Huang, Kai 
> > wrote:
> >
> > > > >
> > > > Right. When code reaches to here, we already passed reclaim per
> > > > cgroup.
> > >
> > > Yes if try_charge() failed we must do pre-cgroup reclaim.
> > >
> > > > The cgroup may not at or reach limit but system has run out of
> > > > physical
> > > > EPC.
> > > >
> > >
> > > But after try_charge() we can still choose to reclaim from the  
current

> > > group,
> > > but not necessarily have to be global, right?  I am not sure  
whether I

> > > am
> > > missing something, but could you elaborate why we should choose to
> > > reclaim from
> > > the global?
> > >
> >  Once try_charge is done and returns zero that means the cgroup  
usage
> > is charged and it's not over usage limit. So you really can't  
reclaim
> > from that cgroup if allocation failed. The only  thing you can do  
is to

> > reclaim globally.
>
> Sorry I still cannot establish the logic here.
>
> Let's say the sum of all cgroups are greater than the physical EPC,  
and
> elclave(s) in each cgroup could potentially fault w/o reaching  
cgroup's

> limit.
>
> In this case, when enclave(s) in one cgroup faults, why we cannot
> reclaim from the current cgroup, but have to reclaim from global?
>
> Is there any real downside of the former, or you just want to follow  
the

> reclaim logic w/o cgroup at all?
>
> IIUC, there's at least one advantage of reclaim from the current  
group,

> that faults of enclave(s) in one group won't impact other enclaves in
> other cgroups.  E.g., in this way other enclaves in other groups may
> never need to trigger faults.
>
> Or perhaps I am missing anything?
>
The use case here is that user knows it's OK for group A to borrow some
pages from group B for some time without impact much performance, vice
versa. That's why the user is overcomitting so system can run more
enclave/groups. Otherwise, if she is concerned about impact of A on B,  
she

could lower limit for A so it never interfere or interfere less with B
(assume the lower limit is still high enough to run all enclaves in A),
and sacrifice some of A's performance. Or if she does not want any
interference between groups, just don't over-comit. So we don't really
lose anything here.


But if we reclaim from the same group, seems we could enable a user case  
that

allows the admin to ensure certain group won't be impacted at all, while
allowing other groups to over-commit?

E.g., let's say we have 100M physical EPC.  And let's say the admin  
wants to run
some performance-critical enclave(s) which costs 50M EPC w/o being  
impacted.
The admin also wants to run other enclaves which could cost 100M EPC in  
total

but EPC swapping among them is acceptable.

If we choose to reclaim from the current EPC cgroup, then seems to that  
the
admin can achieve the above by setting up 2 groups with group1 having  
50M limit
and group2 having 100M limit, and then run performance-critical  
enclave(s) in

group1 and others in group2?  Or am I missing anything?



The more important groups should have limits higher than or equal to peak  
usage to ensure no impact.
The less important groups should have lower limits than its peak usage to  
avoid impacting higher priority groups.

The limit is the maximum usage allowed.

By setting group2 limit to 100M, you are allowing it to use 100M. So as  
soon as it gets up and consume 100M, group1 can not even load any enclave  
if we only reclaim per-cgroup and do not do global reclaim.



If we choose to do global reclaim, then we cannot achieve that.



You can achieve this by setting group 2 limit to 50M. No need to  
overcommiting to the system.
Group 2 will swap as soon as it hits 50M, which is the maximum it can  
consume so no impact to group 1.






In case of overcomitting, even if we always reclaim from the same cgroup
for each fault, one group may still interfere the other: e.g., consider  
an

extreme case in that group A used up almost all EPC at the time group B
has a fault, B has to fail allocation and kill enclaves.


If the admin allows group A to use almost all EPC, to me it's fair to  
say he/she
doesn't want to run anything inside B at all and it is acceptable  
enclaves in B

to be killed.


I don't think so. The user just knows group A + B peak usages higher than  
system capacity. And she is OK for them to share some of the pages  
dynamically. So kernel should allow one borrow from the other at a  
particular instance when one group has higher demand. And later doing the  
opposite. IOW, the favor goes both ways.


Haitao



Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-26 Thread Dave Hansen
On 2/26/24 03:36, Huang, Kai wrote:
>> In case of overcomitting, even if we always reclaim from the same cgroup  
>> for each fault, one group may still interfere the other: e.g., consider an  
>> extreme case in that group A used up almost all EPC at the time group B  
>> has a fault, B has to fail allocation and kill enclaves.
> If the admin allows group A to use almost all EPC, to me it's fair to say 
> he/she
> doesn't want to run anything inside B at all and it is acceptable enclaves in 
> B
> to be killed.

Folks, I'm having a really hard time following this thread.  It sounds
like there's disagreement about when to do system-wide reclaim.  Could
someone remind me of the choices that we have?  (A proposed patch would
go a _long_ way to helping me understand)

Also, what does the core mm memcg code do?

Last, what is the simplest (least amount of code) thing that the SGX
cgroup controller could implement here?




Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-26 Thread Huang, Kai
On Sun, 2024-02-25 at 22:03 -0600, Haitao Huang wrote:
> On Sun, 25 Feb 2024 19:38:26 -0600, Huang, Kai  wrote:
> 
> > 
> > 
> > On 24/02/2024 6:00 am, Haitao Huang wrote:
> > > On Fri, 23 Feb 2024 04:18:18 -0600, Huang, Kai   
> > > wrote:
> > > 
> > > > > > 
> > > > > Right. When code reaches to here, we already passed reclaim per  
> > > > > cgroup.
> > > > 
> > > > Yes if try_charge() failed we must do pre-cgroup reclaim.
> > > > 
> > > > > The cgroup may not at or reach limit but system has run out of  
> > > > > physical
> > > > > EPC.
> > > > > 
> > > > 
> > > > But after try_charge() we can still choose to reclaim from the current  
> > > > group,
> > > > but not necessarily have to be global, right?  I am not sure whether I  
> > > > am
> > > > missing something, but could you elaborate why we should choose to  
> > > > reclaim from
> > > > the global?
> > > > 
> > >  Once try_charge is done and returns zero that means the cgroup usage  
> > > is charged and it's not over usage limit. So you really can't reclaim  
> > > from that cgroup if allocation failed. The only  thing you can do is to  
> > > reclaim globally.
> > 
> > Sorry I still cannot establish the logic here.
> > 
> > Let's say the sum of all cgroups are greater than the physical EPC, and  
> > elclave(s) in each cgroup could potentially fault w/o reaching cgroup's  
> > limit.
> > 
> > In this case, when enclave(s) in one cgroup faults, why we cannot  
> > reclaim from the current cgroup, but have to reclaim from global?
> > 
> > Is there any real downside of the former, or you just want to follow the  
> > reclaim logic w/o cgroup at all?
> > 
> > IIUC, there's at least one advantage of reclaim from the current group,  
> > that faults of enclave(s) in one group won't impact other enclaves in  
> > other cgroups.  E.g., in this way other enclaves in other groups may  
> > never need to trigger faults.
> > 
> > Or perhaps I am missing anything?
> > 
> The use case here is that user knows it's OK for group A to borrow some  
> pages from group B for some time without impact much performance, vice  
> versa. That's why the user is overcomitting so system can run more  
> enclave/groups. Otherwise, if she is concerned about impact of A on B, she  
> could lower limit for A so it never interfere or interfere less with B  
> (assume the lower limit is still high enough to run all enclaves in A),  
> and sacrifice some of A's performance. Or if she does not want any  
> interference between groups, just don't over-comit. So we don't really  
> lose anything here.

But if we reclaim from the same group, seems we could enable a user case that
allows the admin to ensure certain group won't be impacted at all, while
allowing other groups to over-commit?

E.g., let's say we have 100M physical EPC.  And let's say the admin wants to run
some performance-critical enclave(s) which costs 50M EPC w/o being impacted. 
The admin also wants to run other enclaves which could cost 100M EPC in total
but EPC swapping among them is acceptable.

If we choose to reclaim from the current EPC cgroup, then seems to that the
admin can achieve the above by setting up 2 groups with group1 having 50M limit
and group2 having 100M limit, and then run performance-critical enclave(s) in
group1 and others in group2?  Or am I missing anything?

If we choose to do global reclaim, then we cannot achieve that.

> 
> In case of overcomitting, even if we always reclaim from the same cgroup  
> for each fault, one group may still interfere the other: e.g., consider an  
> extreme case in that group A used up almost all EPC at the time group B  
> has a fault, B has to fail allocation and kill enclaves.

If the admin allows group A to use almost all EPC, to me it's fair to say he/she
doesn't want to run anything inside B at all and it is acceptable enclaves in B
to be killed.




Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-25 Thread Haitao Huang

On Sun, 25 Feb 2024 19:38:26 -0600, Huang, Kai  wrote:




On 24/02/2024 6:00 am, Haitao Huang wrote:
On Fri, 23 Feb 2024 04:18:18 -0600, Huang, Kai   
wrote:



>
Right. When code reaches to here, we already passed reclaim per  
cgroup.


Yes if try_charge() failed we must do pre-cgroup reclaim.

The cgroup may not at or reach limit but system has run out of  
physical

EPC.



But after try_charge() we can still choose to reclaim from the current  
group,
but not necessarily have to be global, right?  I am not sure whether I  
am
missing something, but could you elaborate why we should choose to  
reclaim from

the global?

 Once try_charge is done and returns zero that means the cgroup usage  
is charged and it's not over usage limit. So you really can't reclaim  
from that cgroup if allocation failed. The only  thing you can do is to  
reclaim globally.


Sorry I still cannot establish the logic here.

Let's say the sum of all cgroups are greater than the physical EPC, and  
elclave(s) in each cgroup could potentially fault w/o reaching cgroup's  
limit.


In this case, when enclave(s) in one cgroup faults, why we cannot  
reclaim from the current cgroup, but have to reclaim from global?


Is there any real downside of the former, or you just want to follow the  
reclaim logic w/o cgroup at all?


IIUC, there's at least one advantage of reclaim from the current group,  
that faults of enclave(s) in one group won't impact other enclaves in  
other cgroups.  E.g., in this way other enclaves in other groups may  
never need to trigger faults.


Or perhaps I am missing anything?

The use case here is that user knows it's OK for group A to borrow some  
pages from group B for some time without impact much performance, vice  
versa. That's why the user is overcomitting so system can run more  
enclave/groups. Otherwise, if she is concerned about impact of A on B, she  
could lower limit for A so it never interfere or interfere less with B  
(assume the lower limit is still high enough to run all enclaves in A),  
and sacrifice some of A's performance. Or if she does not want any  
interference between groups, just don't over-comit. So we don't really  
lose anything here.


In case of overcomitting, even if we always reclaim from the same cgroup  
for each fault, one group may still interfere the other: e.g., consider an  
extreme case in that group A used up almost all EPC at the time group B  
has a fault, B has to fail allocation and kill enclaves.


Haitao



Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-25 Thread Huang, Kai




On 24/02/2024 6:00 am, Haitao Huang wrote:

On Fri, 23 Feb 2024 04:18:18 -0600, Huang, Kai  wrote:


>
Right. When code reaches to here, we already passed reclaim per cgroup.


Yes if try_charge() failed we must do pre-cgroup reclaim.


The cgroup may not at or reach limit but system has run out of physical
EPC.



But after try_charge() we can still choose to reclaim from the current 
group,

but not necessarily have to be global, right?  I am not sure whether I am
missing something, but could you elaborate why we should choose to 
reclaim from

the global?



Once try_charge is done and returns zero that means the cgroup usage is 
charged and it's not over usage limit. So you really can't reclaim from 
that cgroup if allocation failed. The only  thing you can do is to 
reclaim globally.


Sorry I still cannot establish the logic here.

Let's say the sum of all cgroups are greater than the physical EPC, and 
elclave(s) in each cgroup could potentially fault w/o reaching cgroup's 
limit.


In this case, when enclave(s) in one cgroup faults, why we cannot 
reclaim from the current cgroup, but have to reclaim from global?


Is there any real downside of the former, or you just want to follow the 
reclaim logic w/o cgroup at all?


IIUC, there's at least one advantage of reclaim from the current group, 
that faults of enclave(s) in one group won't impact other enclaves in 
other cgroups.  E.g., in this way other enclaves in other groups may 
never need to trigger faults.


Or perhaps I am missing anything?



Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-23 Thread Haitao Huang

On Fri, 23 Feb 2024 04:18:18 -0600, Huang, Kai  wrote:


>
Right. When code reaches to here, we already passed reclaim per cgroup.


Yes if try_charge() failed we must do pre-cgroup reclaim.


The cgroup may not at or reach limit but system has run out of physical
EPC.



But after try_charge() we can still choose to reclaim from the current  
group,

but not necessarily have to be global, right?  I am not sure whether I am
missing something, but could you elaborate why we should choose to  
reclaim from

the global?



Once try_charge is done and returns zero that means the cgroup usage is  
charged and it's not over usage limit. So you really can't reclaim from  
that cgroup if allocation failed. The only  thing you can do is to reclaim  
globally.


This could happen when the sum of limits of all cgroups is greater than  
the physical EPC, i.e., user is overcommitting.


Thanks

Haitao



Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-23 Thread Huang, Kai
> > 
> Right. When code reaches to here, we already passed reclaim per cgroup.  

Yes if try_charge() failed we must do pre-cgroup reclaim.

> The cgroup may not at or reach limit but system has run out of physical  
> EPC.
> 

But after try_charge() we can still choose to reclaim from the current group,
but not necessarily have to be global, right?  I am not sure whether I am
missing something, but could you elaborate why we should choose to reclaim from
the global?



Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-22 Thread Haitao Huang

On Thu, 22 Feb 2024 15:26:05 -0600, Huang, Kai  wrote:




On 23/02/2024 6:09 am, Haitao Huang wrote:
On Wed, 21 Feb 2024 05:06:02 -0600, Huang, Kai   
wrote:





-int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
+int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool  
reclaim)

 {
-return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,  
PAGE_SIZE);

+for (;;) {
+if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
+PAGE_SIZE))
+break;
+
+if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
+return -ENOMEM;
+
+if (signal_pending(current))
+return -ERESTARTSYS;
+
+if (!reclaim) {
+queue_work(sgx_epc_cg_wq, _cg->reclaim_work);
+return -EBUSY;
+}
+
+if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false))
+/* All pages were too young to reclaim, try again a  
little later */

+schedule();
+}
+
+return 0;
 }



Seems this code change is 90% similar to the existing code in the
sgx_alloc_epc_page():

...
for ( ; ; ) {
page = __sgx_alloc_epc_page();
if (!IS_ERR(page)) {
page->owner = owner;
break;
}

if (list_empty(_active_page_list))
return ERR_PTR(-ENOMEM);

if (!reclaim) {
page = ERR_PTR(-EBUSY);
break;
}

if (signal_pending(current)) {
page = ERR_PTR(-ERESTARTSYS);
break;
}

sgx_reclaim_pages();
cond_resched();
}
...

Is it better to move the logic/code change in try_charge() out to
sgx_alloc_epc_page() to unify them?

IIUC, the logic is quite similar: When you either failed to allocate  
one page,
or failed to charge one page, you try to reclaim EPC page(s) from the  
current

EPC cgroup, either directly or indirectly.

No?

 Only these lines are the same:
 if (!reclaim) {
 page = ERR_PTR(-EBUSY);
 break;
 }
  if (signal_pending(current)) {
 page = ERR_PTR(-ERESTARTSYS);
 break;
 }
 In sgx_alloc_epc_page() we do global reclamation but here we do  
per-cgroup reclamation.


But why?  If we failed to allocate, shouldn't we try to reclaim from the  
_current_ EPC cgroup instead of global?  E.g., I thought one enclave in  
one EPC cgroup requesting insane amount of EPC shouldn't impact enclaves  
inside other cgroups?


Right. When code reaches to here, we already passed reclaim per cgroup.  
The cgroup may not at or reach limit but system has run out of physical  
EPC.


Thanks
Haitao



Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-22 Thread Huang, Kai




On 23/02/2024 6:09 am, Haitao Huang wrote:

On Wed, 21 Feb 2024 05:06:02 -0600, Huang, Kai  wrote:




-int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
+int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool 
reclaim)

 {
-    return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, 
PAGE_SIZE);

+    for (;;) {
+    if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
+    PAGE_SIZE))
+    break;
+
+    if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
+    return -ENOMEM;
+
+    if (signal_pending(current))
+    return -ERESTARTSYS;
+
+    if (!reclaim) {
+    queue_work(sgx_epc_cg_wq, _cg->reclaim_work);
+    return -EBUSY;
+    }
+
+    if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false))
+    /* All pages were too young to reclaim, try again a 
little later */

+    schedule();
+    }
+
+    return 0;
 }



Seems this code change is 90% similar to the existing code in the
sgx_alloc_epc_page():

...
for ( ; ; ) {
    page = __sgx_alloc_epc_page();
    if (!IS_ERR(page)) {
    page->owner = owner;
    break;
    }

    if (list_empty(_active_page_list))
    return ERR_PTR(-ENOMEM);

    if (!reclaim) {
    page = ERR_PTR(-EBUSY);
    break;
    }

    if (signal_pending(current)) {
    page = ERR_PTR(-ERESTARTSYS);
    break;
    }

    sgx_reclaim_pages();
    cond_resched();
    }
...

Is it better to move the logic/code change in try_charge() out to
sgx_alloc_epc_page() to unify them?

IIUC, the logic is quite similar: When you either failed to allocate 
one page,
or failed to charge one page, you try to reclaim EPC page(s) from the 
current

EPC cgroup, either directly or indirectly.

No?


Only these lines are the same:
     if (!reclaim) {
     page = ERR_PTR(-EBUSY);
     break;
     }

     if (signal_pending(current)) {
     page = ERR_PTR(-ERESTARTSYS);
     break;
     }

In sgx_alloc_epc_page() we do global reclamation but here we do 
per-cgroup reclamation. 


But why?  If we failed to allocate, shouldn't we try to reclaim from the 
_current_ EPC cgroup instead of global?  E.g., I thought one enclave in 
one EPC cgroup requesting insane amount of EPC shouldn't impact enclaves 
inside other cgroups?


That's why the logic of other lines is different
though they look similar due to similar function names. For the global 
reclamation we need consider case in that cgroup is not enabled. 
Similarly list_empty(_active_page_list) would have to be changed to 
check root cgroup if cgroups enabled otherwise check global LRU.  The 
(!reclaim) case is also different.  


W/o getting clear on my above question, so far I am not convinced why 
such difference cannot be hide inside wrapper function(s).


So I don't see an obvious good way

to abstract those to get meaningful savings.

Thanks
Haitao




Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-22 Thread Haitao Huang

On Wed, 21 Feb 2024 05:06:02 -0600, Huang, Kai  wrote:




-int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
+int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool  
reclaim)

 {
-   return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE);
+   for (;;) {
+   if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
+   PAGE_SIZE))
+   break;
+
+   if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
+   return -ENOMEM;
+
+   if (signal_pending(current))
+   return -ERESTARTSYS;
+
+   if (!reclaim) {
+   queue_work(sgx_epc_cg_wq, _cg->reclaim_work);
+   return -EBUSY;
+   }
+
+   if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false))
+   /* All pages were too young to reclaim, try again a 
little later */
+   schedule();
+   }
+
+   return 0;
 }



Seems this code change is 90% similar to the existing code in the
sgx_alloc_epc_page():

...
for ( ; ; ) {
page = __sgx_alloc_epc_page();
if (!IS_ERR(page)) {
page->owner = owner;
break;
}

if (list_empty(_active_page_list))
return ERR_PTR(-ENOMEM);

if (!reclaim) {
page = ERR_PTR(-EBUSY);
break;
}

if (signal_pending(current)) {
page = ERR_PTR(-ERESTARTSYS);
break;
}

sgx_reclaim_pages();
cond_resched();
}
...

Is it better to move the logic/code change in try_charge() out to
sgx_alloc_epc_page() to unify them?

IIUC, the logic is quite similar: When you either failed to allocate one  
page,
or failed to charge one page, you try to reclaim EPC page(s) from the  
current

EPC cgroup, either directly or indirectly.

No?


Only these lines are the same:
if (!reclaim) {
page = ERR_PTR(-EBUSY);
break;
}

if (signal_pending(current)) {
page = ERR_PTR(-ERESTARTSYS);
break;
}

In sgx_alloc_epc_page() we do global reclamation but here we do per-cgroup  
reclamation. That's why the logic of other lines is different though they  
look similar due to similar function names. For the global reclamation we  
need consider case in that cgroup is not enabled. Similarly  
list_empty(_active_page_list) would have to be changed to check root  
cgroup if cgroups enabled otherwise check global LRU.  The (!reclaim) case  
is also different.  So I don't see an obvious good way to abstract those  
to get meaningful savings.


Thanks
Haitao



Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-21 Thread Huang, Kai

> -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
> +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool reclaim)
>  {
> - return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE);
> + for (;;) {
> + if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
> + PAGE_SIZE))
> + break;
> +
> + if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
> + return -ENOMEM;
> +
> + if (signal_pending(current))
> + return -ERESTARTSYS;
> +
> + if (!reclaim) {
> + queue_work(sgx_epc_cg_wq, _cg->reclaim_work);
> + return -EBUSY;
> + }
> +
> + if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false))
> + /* All pages were too young to reclaim, try again a 
> little later */
> + schedule();
> + }
> +
> + return 0;
>  }
>  

Seems this code change is 90% similar to the existing code in the
sgx_alloc_epc_page():

...
for ( ; ; ) {
page = __sgx_alloc_epc_page();
if (!IS_ERR(page)) {
page->owner = owner;
break;
}

if (list_empty(_active_page_list))
return ERR_PTR(-ENOMEM);

if (!reclaim) {
page = ERR_PTR(-EBUSY);
break;
}

if (signal_pending(current)) {
page = ERR_PTR(-ERESTARTSYS);
break;
}

sgx_reclaim_pages();
cond_resched();
}
...

Is it better to move the logic/code change in try_charge() out to
sgx_alloc_epc_page() to unify them?

IIUC, the logic is quite similar: When you either failed to allocate one page,
or failed to charge one page, you try to reclaim EPC page(s) from the current
EPC cgroup, either directly or indirectly.

No?


Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-19 Thread Jarkko Sakkinen
On Mon Feb 19, 2024 at 3:12 PM UTC, Haitao Huang wrote:
> On Tue, 13 Feb 2024 19:52:25 -0600, Jarkko Sakkinen   
> wrote:
>
> > On Tue Feb 13, 2024 at 1:15 AM EET, Haitao Huang wrote:
> >> Hi Jarkko
> >>
> >> On Mon, 12 Feb 2024 13:55:46 -0600, Jarkko Sakkinen 
> >> wrote:
> >>
> >> > On Mon Feb 5, 2024 at 11:06 PM EET, Haitao Huang wrote:
> >> >> From: Kristen Carlson Accardi 
> >> >>
> >> >> When the EPC usage of a cgroup is near its limit, the cgroup needs to
> >> >> reclaim pages used in the same cgroup to make room for new  
> >> allocations.
> >> >> This is analogous to the behavior that the global reclaimer is  
> >> triggered
> >> >> when the global usage is close to total available EPC.
> >> >>
> >> >> Add a Boolean parameter for sgx_epc_cgroup_try_charge() to indicate
> >> >> whether synchronous reclaim is allowed or not. And trigger the
> >> >> synchronous/asynchronous reclamation flow accordingly.
> >> >>
> >> >> Note at this point, all reclaimable EPC pages are still tracked in  
> >> the
> >> >> global LRU and per-cgroup LRUs are empty. So no per-cgroup  
> >> reclamation
> >> >> is activated yet.
> >> >>
> >> >> Co-developed-by: Sean Christopherson  
> >> 
> >> >> Signed-off-by: Sean Christopherson 
> >> >> Signed-off-by: Kristen Carlson Accardi 
> >> >> Co-developed-by: Haitao Huang 
> >> >> Signed-off-by: Haitao Huang 
> >> >> ---
> >> >> V7:
> >> >> - Split this out from the big patch, #10 in V6. (Dave, Kai)
> >> >> ---
> >> >>  arch/x86/kernel/cpu/sgx/epc_cgroup.c | 26 --
> >> >>  arch/x86/kernel/cpu/sgx/epc_cgroup.h |  4 ++--
> >> >>  arch/x86/kernel/cpu/sgx/main.c   |  2 +-
> >> >>  3 files changed, 27 insertions(+), 5 deletions(-)
> >> >>
> >> >> diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> >> >> b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> >> >> index d399fda2b55e..abf74fdb12b4 100644
> >> >> --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> >> >> +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> >> >> @@ -184,13 +184,35 @@ static void
> >> >> sgx_epc_cgroup_reclaim_work_func(struct work_struct *work)
> >> >>  /**
> >> >>   * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single  
> >> EPC
> >> >> page
> >> >>   * @epc_cg:The EPC cgroup to be charged for the page.
> >> >> + * @reclaim:   Whether or not synchronous reclaim is allowed
> >> >>   * Return:
> >> >>   * * %0 - If successfully charged.
> >> >>   * * -errno - for failures.
> >> >>   */
> >> >> -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
> >> >> +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool
> >> >> reclaim)
> >> >>  {
> >> >> -   return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,  
> >> PAGE_SIZE);
> >> >> +   for (;;) {
> >> >> +   if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
> >> >> +   PAGE_SIZE))
> >> >> +   break;
> >> >> +
> >> >> +   if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
> >> >> +   return -ENOMEM;
> >> >> + + if (signal_pending(current))
> >> >> +   return -ERESTARTSYS;
> >> >> +
> >> >> +   if (!reclaim) {
> >> >> +   queue_work(sgx_epc_cg_wq, 
> >> >> _cg->reclaim_work);
> >> >> +   return -EBUSY;
> >> >> +   }
> >> >> +
> >> >> +   if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false))
> >> >> +   /* All pages were too young to reclaim, try 
> >> >> again a little later  
> >> */
> >> >> +   schedule();
> >> >
> >> > This will be total pain to backtrack after a while when something
> >> > needs to be changed so there definitely should be inline comments
> >> > addressing each branch condition.
> >> >
> >> > I'd rethink this as:
> >> >
> >> > 1. Create static __sgx_epc_cgroup_try_charge() for addressing single
> >> >iteration with the new "reclaim" parameter.
> >> > 2. Add a new sgx_epc_group_try_charge_reclaim() function.
> >> >
> >> > There's a bit of redundancy with sgx_epc_cgroup_try_charge() and
> >> > sgx_epc_cgroup_try_charge_reclaim() because both have almost the
> >> > same loop calling internal __sgx_epc_cgroup_try_charge() with
> >> > different parameters. That is totally acceptable.
> >> >
> >> > Please also add my suggested-by.
> >> >
> >> > BR, Jarkko
> >> >
> >> > BR, Jarkko
> >> >
> >> For #2:
> >> The only caller of this function, sgx_alloc_epc_page(), has the same
> >> boolean which is passed into this this function.
> >
> > I know. This would be good opportunity to fix that up. Large patch
> > sets should try to make the space for its feature best possible and
> > thus also clean up the code base overally.
> >
> >> If we separate it into sgx_epc_cgroup_try_charge() and
> >> sgx_epc_cgroup_try_charge_reclaim(), then the caller has to have the
> >> if/else branches. So separation here seems not help?
> >
> > Of course it does. It makes the code in 

Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-19 Thread Haitao Huang
On Tue, 13 Feb 2024 19:52:25 -0600, Jarkko Sakkinen   
wrote:



On Tue Feb 13, 2024 at 1:15 AM EET, Haitao Huang wrote:

Hi Jarkko

On Mon, 12 Feb 2024 13:55:46 -0600, Jarkko Sakkinen 
wrote:

> On Mon Feb 5, 2024 at 11:06 PM EET, Haitao Huang wrote:
>> From: Kristen Carlson Accardi 
>>
>> When the EPC usage of a cgroup is near its limit, the cgroup needs to
>> reclaim pages used in the same cgroup to make room for new  
allocations.
>> This is analogous to the behavior that the global reclaimer is  
triggered

>> when the global usage is close to total available EPC.
>>
>> Add a Boolean parameter for sgx_epc_cgroup_try_charge() to indicate
>> whether synchronous reclaim is allowed or not. And trigger the
>> synchronous/asynchronous reclamation flow accordingly.
>>
>> Note at this point, all reclaimable EPC pages are still tracked in  
the
>> global LRU and per-cgroup LRUs are empty. So no per-cgroup  
reclamation

>> is activated yet.
>>
>> Co-developed-by: Sean Christopherson  


>> Signed-off-by: Sean Christopherson 
>> Signed-off-by: Kristen Carlson Accardi 
>> Co-developed-by: Haitao Huang 
>> Signed-off-by: Haitao Huang 
>> ---
>> V7:
>> - Split this out from the big patch, #10 in V6. (Dave, Kai)
>> ---
>>  arch/x86/kernel/cpu/sgx/epc_cgroup.c | 26 --
>>  arch/x86/kernel/cpu/sgx/epc_cgroup.h |  4 ++--
>>  arch/x86/kernel/cpu/sgx/main.c   |  2 +-
>>  3 files changed, 27 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
>> b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
>> index d399fda2b55e..abf74fdb12b4 100644
>> --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
>> +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
>> @@ -184,13 +184,35 @@ static void
>> sgx_epc_cgroup_reclaim_work_func(struct work_struct *work)
>>  /**
>>   * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single  
EPC

>> page
>>   * @epc_cg:   The EPC cgroup to be charged for the page.
>> + * @reclaim:  Whether or not synchronous reclaim is allowed
>>   * Return:
>>   * * %0 - If successfully charged.
>>   * * -errno - for failures.
>>   */
>> -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
>> +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool
>> reclaim)
>>  {
>> -	return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,  
PAGE_SIZE);

>> +  for (;;) {
>> +  if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
>> +  PAGE_SIZE))
>> +  break;
>> +
>> +  if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
>> +  return -ENOMEM;
>> + +if (signal_pending(current))
>> +  return -ERESTARTSYS;
>> +
>> +  if (!reclaim) {
>> +  queue_work(sgx_epc_cg_wq, _cg->reclaim_work);
>> +  return -EBUSY;
>> +  }
>> +
>> +  if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false))
>> +			/* All pages were too young to reclaim, try again a little later  
*/

>> +  schedule();
>
> This will be total pain to backtrack after a while when something
> needs to be changed so there definitely should be inline comments
> addressing each branch condition.
>
> I'd rethink this as:
>
> 1. Create static __sgx_epc_cgroup_try_charge() for addressing single
>iteration with the new "reclaim" parameter.
> 2. Add a new sgx_epc_group_try_charge_reclaim() function.
>
> There's a bit of redundancy with sgx_epc_cgroup_try_charge() and
> sgx_epc_cgroup_try_charge_reclaim() because both have almost the
> same loop calling internal __sgx_epc_cgroup_try_charge() with
> different parameters. That is totally acceptable.
>
> Please also add my suggested-by.
>
> BR, Jarkko
>
> BR, Jarkko
>
For #2:
The only caller of this function, sgx_alloc_epc_page(), has the same
boolean which is passed into this this function.


I know. This would be good opportunity to fix that up. Large patch
sets should try to make the space for its feature best possible and
thus also clean up the code base overally.


If we separate it into sgx_epc_cgroup_try_charge() and
sgx_epc_cgroup_try_charge_reclaim(), then the caller has to have the
if/else branches. So separation here seems not help?


Of course it does. It makes the code in that location self-documenting
and easier to remember what it does.

BR, Jarkko



Please let me know if this aligns with your suggestion.


static int ___sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
{
if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
PAGE_SIZE))
return 0;

if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
return -ENOMEM;

if (signal_pending(current))
return -ERESTARTSYS;

return -EBUSY;
}

/**
 * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single page
 * @epc_cg: The EPC cgroup to be charged for the page.
 *
 * Try 

Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-13 Thread Jarkko Sakkinen
On Tue Feb 13, 2024 at 1:15 AM EET, Haitao Huang wrote:
> Hi Jarkko
>
> On Mon, 12 Feb 2024 13:55:46 -0600, Jarkko Sakkinen   
> wrote:
>
> > On Mon Feb 5, 2024 at 11:06 PM EET, Haitao Huang wrote:
> >> From: Kristen Carlson Accardi 
> >>
> >> When the EPC usage of a cgroup is near its limit, the cgroup needs to
> >> reclaim pages used in the same cgroup to make room for new allocations.
> >> This is analogous to the behavior that the global reclaimer is triggered
> >> when the global usage is close to total available EPC.
> >>
> >> Add a Boolean parameter for sgx_epc_cgroup_try_charge() to indicate
> >> whether synchronous reclaim is allowed or not. And trigger the
> >> synchronous/asynchronous reclamation flow accordingly.
> >>
> >> Note at this point, all reclaimable EPC pages are still tracked in the
> >> global LRU and per-cgroup LRUs are empty. So no per-cgroup reclamation
> >> is activated yet.
> >>
> >> Co-developed-by: Sean Christopherson 
> >> Signed-off-by: Sean Christopherson 
> >> Signed-off-by: Kristen Carlson Accardi 
> >> Co-developed-by: Haitao Huang 
> >> Signed-off-by: Haitao Huang 
> >> ---
> >> V7:
> >> - Split this out from the big patch, #10 in V6. (Dave, Kai)
> >> ---
> >>  arch/x86/kernel/cpu/sgx/epc_cgroup.c | 26 --
> >>  arch/x86/kernel/cpu/sgx/epc_cgroup.h |  4 ++--
> >>  arch/x86/kernel/cpu/sgx/main.c   |  2 +-
> >>  3 files changed, 27 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c  
> >> b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> >> index d399fda2b55e..abf74fdb12b4 100644
> >> --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> >> +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> >> @@ -184,13 +184,35 @@ static void  
> >> sgx_epc_cgroup_reclaim_work_func(struct work_struct *work)
> >>  /**
> >>   * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single EPC  
> >> page
> >>   * @epc_cg:   The EPC cgroup to be charged for the page.
> >> + * @reclaim:  Whether or not synchronous reclaim is allowed
> >>   * Return:
> >>   * * %0 - If successfully charged.
> >>   * * -errno - for failures.
> >>   */
> >> -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
> >> +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool  
> >> reclaim)
> >>  {
> >> -  return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE);
> >> +  for (;;) {
> >> +  if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
> >> +  PAGE_SIZE))
> >> +  break;
> >> +
> >> +  if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
> >> +  return -ENOMEM;
> >> + +if (signal_pending(current))
> >> +  return -ERESTARTSYS;
> >> +
> >> +  if (!reclaim) {
> >> +  queue_work(sgx_epc_cg_wq, _cg->reclaim_work);
> >> +  return -EBUSY;
> >> +  }
> >> +
> >> +  if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false))
> >> +  /* All pages were too young to reclaim, try again a 
> >> little later */
> >> +  schedule();
> >
> > This will be total pain to backtrack after a while when something
> > needs to be changed so there definitely should be inline comments
> > addressing each branch condition.
> >
> > I'd rethink this as:
> >
> > 1. Create static __sgx_epc_cgroup_try_charge() for addressing single
> >iteration with the new "reclaim" parameter.
> > 2. Add a new sgx_epc_group_try_charge_reclaim() function.
> >
> > There's a bit of redundancy with sgx_epc_cgroup_try_charge() and
> > sgx_epc_cgroup_try_charge_reclaim() because both have almost the
> > same loop calling internal __sgx_epc_cgroup_try_charge() with
> > different parameters. That is totally acceptable.
> >
> > Please also add my suggested-by.
> >
> > BR, Jarkko
> >
> > BR, Jarkko
> >
> For #2:
> The only caller of this function, sgx_alloc_epc_page(), has the same  
> boolean which is passed into this this function.

I know. This would be good opportunity to fix that up. Large patch
sets should try to make the space for its feature best possible and
thus also clean up the code base overally.

> If we separate it into sgx_epc_cgroup_try_charge() and  
> sgx_epc_cgroup_try_charge_reclaim(), then the caller has to have the  
> if/else branches. So separation here seems not help?

Of course it does. It makes the code in that location self-documenting
and easier to remember what it does.

BR, Jarkko



Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-12 Thread Haitao Huang

Hi Jarkko

On Mon, 12 Feb 2024 13:55:46 -0600, Jarkko Sakkinen   
wrote:



On Mon Feb 5, 2024 at 11:06 PM EET, Haitao Huang wrote:

From: Kristen Carlson Accardi 

When the EPC usage of a cgroup is near its limit, the cgroup needs to
reclaim pages used in the same cgroup to make room for new allocations.
This is analogous to the behavior that the global reclaimer is triggered
when the global usage is close to total available EPC.

Add a Boolean parameter for sgx_epc_cgroup_try_charge() to indicate
whether synchronous reclaim is allowed or not. And trigger the
synchronous/asynchronous reclamation flow accordingly.

Note at this point, all reclaimable EPC pages are still tracked in the
global LRU and per-cgroup LRUs are empty. So no per-cgroup reclamation
is activated yet.

Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
Signed-off-by: Kristen Carlson Accardi 
Co-developed-by: Haitao Huang 
Signed-off-by: Haitao Huang 
---
V7:
- Split this out from the big patch, #10 in V6. (Dave, Kai)
---
 arch/x86/kernel/cpu/sgx/epc_cgroup.c | 26 --
 arch/x86/kernel/cpu/sgx/epc_cgroup.h |  4 ++--
 arch/x86/kernel/cpu/sgx/main.c   |  2 +-
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c  
b/arch/x86/kernel/cpu/sgx/epc_cgroup.c

index d399fda2b55e..abf74fdb12b4 100644
--- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
+++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
@@ -184,13 +184,35 @@ static void  
sgx_epc_cgroup_reclaim_work_func(struct work_struct *work)

 /**
  * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single EPC  
page

  * @epc_cg:The EPC cgroup to be charged for the page.
+ * @reclaim:   Whether or not synchronous reclaim is allowed
  * Return:
  * * %0 - If successfully charged.
  * * -errno - for failures.
  */
-int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
+int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool  
reclaim)

 {
-   return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE);
+   for (;;) {
+   if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
+   PAGE_SIZE))
+   break;
+
+   if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
+   return -ENOMEM;
+ + if (signal_pending(current))
+   return -ERESTARTSYS;
+
+   if (!reclaim) {
+   queue_work(sgx_epc_cg_wq, _cg->reclaim_work);
+   return -EBUSY;
+   }
+
+   if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false))
+   /* All pages were too young to reclaim, try again a 
little later */
+   schedule();


This will be total pain to backtrack after a while when something
needs to be changed so there definitely should be inline comments
addressing each branch condition.

I'd rethink this as:

1. Create static __sgx_epc_cgroup_try_charge() for addressing single
   iteration with the new "reclaim" parameter.
2. Add a new sgx_epc_group_try_charge_reclaim() function.

There's a bit of redundancy with sgx_epc_cgroup_try_charge() and
sgx_epc_cgroup_try_charge_reclaim() because both have almost the
same loop calling internal __sgx_epc_cgroup_try_charge() with
different parameters. That is totally acceptable.

Please also add my suggested-by.

BR, Jarkko

BR, Jarkko


For #2:
The only caller of this function, sgx_alloc_epc_page(), has the same  
boolean which is passed into this this function.


If we separate it into sgx_epc_cgroup_try_charge() and  
sgx_epc_cgroup_try_charge_reclaim(), then the caller has to have the  
if/else branches. So separation here seems not help?



For #1:
If we don't do #2, It seems overkill at the moment for such a short  
function.


How about we add inline comments for each branch for now, and if later  
there are more branches and the function become too long we add  
__sgx_epc_cgroup_try_charge() as you suggested?


Thanks
Haitao



Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-12 Thread Jarkko Sakkinen
On Mon Feb 5, 2024 at 11:06 PM EET, Haitao Huang wrote:
> From: Kristen Carlson Accardi 
>
> When the EPC usage of a cgroup is near its limit, the cgroup needs to
> reclaim pages used in the same cgroup to make room for new allocations.
> This is analogous to the behavior that the global reclaimer is triggered
> when the global usage is close to total available EPC.
>
> Add a Boolean parameter for sgx_epc_cgroup_try_charge() to indicate
> whether synchronous reclaim is allowed or not. And trigger the
> synchronous/asynchronous reclamation flow accordingly.
>
> Note at this point, all reclaimable EPC pages are still tracked in the
> global LRU and per-cgroup LRUs are empty. So no per-cgroup reclamation
> is activated yet.
>
> Co-developed-by: Sean Christopherson 
> Signed-off-by: Sean Christopherson 
> Signed-off-by: Kristen Carlson Accardi 
> Co-developed-by: Haitao Huang 
> Signed-off-by: Haitao Huang 
> ---
> V7:
> - Split this out from the big patch, #10 in V6. (Dave, Kai)
> ---
>  arch/x86/kernel/cpu/sgx/epc_cgroup.c | 26 --
>  arch/x86/kernel/cpu/sgx/epc_cgroup.h |  4 ++--
>  arch/x86/kernel/cpu/sgx/main.c   |  2 +-
>  3 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c 
> b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> index d399fda2b55e..abf74fdb12b4 100644
> --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> @@ -184,13 +184,35 @@ static void sgx_epc_cgroup_reclaim_work_func(struct 
> work_struct *work)
>  /**
>   * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single EPC page
>   * @epc_cg:  The EPC cgroup to be charged for the page.
> + * @reclaim: Whether or not synchronous reclaim is allowed
>   * Return:
>   * * %0 - If successfully charged.
>   * * -errno - for failures.
>   */
> -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
> +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool reclaim)
>  {
> - return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE);
> + for (;;) {
> + if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
> + PAGE_SIZE))
> + break;
> +
> + if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
> + return -ENOMEM;
> + +   if (signal_pending(current))
> + return -ERESTARTSYS;
> +
> + if (!reclaim) {
> + queue_work(sgx_epc_cg_wq, _cg->reclaim_work);
> + return -EBUSY;
> + }
> +
> + if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false))
> + /* All pages were too young to reclaim, try again a 
> little later */
> + schedule();

This will be total pain to backtrack after a while when something
needs to be changed so there definitely should be inline comments
addressing each branch condition.

I'd rethink this as:

1. Create static __sgx_epc_cgroup_try_charge() for addressing single
   iteration with the new "reclaim" parameter.
2. Add a new sgx_epc_group_try_charge_reclaim() function.

There's a bit of redundancy with sgx_epc_cgroup_try_charge() and
sgx_epc_cgroup_try_charge_reclaim() because both have almost the
same loop calling internal __sgx_epc_cgroup_try_charge() with
different parameters. That is totally acceptable.

Please also add my suggested-by.

BR, Jarkko

BR, Jarkko