Re: [PATCH v10 08/14] x86/sgx: Add basic EPC reclamation flow for cgroup
On Fri Apr 5, 2024 at 6:07 AM EEST, Huang, Kai wrote: > On Thu, 2024-04-04 at 12:05 -0500, Haitao Huang wrote: > > > > -static inline int sgx_cgroup_try_charge(struct sgx_cgroup *sgx_cg) > > > > +static inline int sgx_cgroup_try_charge(struct sgx_cgroup *sgx_cg, > > > > enum sgx_reclaim r) > > > > > > Is the @r here intentional for shorter typing? > > > > > > > yes :-) > > Will speel out to make it consistent if that's the concern. > > I kinda prefer the full name to match the CONFIG_CGROUP_SGX_EPC on case. You > can put the 'enum sgx_reclaim reclaim' parameter into the new line if needed: Why don't cgroups for SGX get always enabled when SGX and cgroups support are enabled? BR, Jarkko
Re: [PATCH v10 08/14] x86/sgx: Add basic EPC reclamation flow for cgroup
On Thu, 2024-04-04 at 12:05 -0500, Haitao Huang wrote: > > > -static inline int sgx_cgroup_try_charge(struct sgx_cgroup *sgx_cg) > > > +static inline int sgx_cgroup_try_charge(struct sgx_cgroup *sgx_cg, > > > enum sgx_reclaim r) > > > > Is the @r here intentional for shorter typing? > > > > yes :-) > Will speel out to make it consistent if that's the concern. I kinda prefer the full name to match the CONFIG_CGROUP_SGX_EPC on case. You can put the 'enum sgx_reclaim reclaim' parameter into the new line if needed: static inline int sgx_cgroup_try_charge(struct sgx_cgroup *sgx_cg, enum sgx_reclaim reclaim) { return 0; }
Re: [PATCH v10 08/14] x86/sgx: Add basic EPC reclamation flow for cgroup
On Thu, 2024-04-04 at 12:05 -0500, Haitao Huang wrote: > > Please also mention why "leaving asynchronous reclamation to later > > patch(es)" is > > fine. E.g., it won't break anything I suppose. > > > > Right. Pages are still in the global list at the moment and only global > reclaiming is active until the "turn on" patch. Separating out is really > just for the purpose of review IMHO. Sounds good to me. Thanks.
Re: [PATCH v10 08/14] x86/sgx: Add basic EPC reclamation flow for cgroup
Hi Kai, Thanks for your suggestions. I'll adopt most of it as it. Minor details below. On Wed, 03 Apr 2024 08:08:28 -0500, Huang, Kai wrote: On Wed, 2024-03-27 at 17:22 -0700, Haitao Huang wrote: From: Kristen Carlson Accardi When a cgroup usage reaches its limit, and it is to be charged, i.e., sgx_cgroup_try_charge() called for new allocations, the cgroup needs to reclaim pages from its LRU or LRUs of its descendants to make room for any new allocations. This patch adds the basic building block for the per-cgroup reclamation flow and use it for synchronous reclamation in sgx_cgroup_try_charge(). It's better to firstly mention _why_ we need this first: Currently in the EPC page allocation, the kernel simply fails the allocation when the current EPC cgroup fails to charge due to its usage reaching limit. This is not ideal. When that happens, a better way is to reclaim EPC page(s) from the current EPC cgroup (and/or its descendants) to reduce its usage so the new allocation can succeed. Add the basic building blocks to support the per-cgroup reclamation flow ... ok First, modify sgx_reclaim_pages() to let callers to pass in the LRU from which pages are reclaimed, so it can be reused by both the global and cgroup reclaimers. Also return the number of pages attempted, so a cgroup reclaimer can use it to track reclamation progress from its descendants. IMHO you are jumping too fast to the implementation details. Better to have some more background: " Currently the kernel only has one place to reclaim EPC pages: the global EPC LRU list. To support the "per-cgroup" EPC reclaim, maintain an LRU list for each EPC cgroup, and introduce a "cgroup" variant function to reclaim EPC page(s) from a given EPC cgroup (and its descendants). " ok For the global reclaimer, replace all call sites of sgx_reclaim_pages() with calls to a newly created wrapper, sgx_reclaim_pages_global(), which just calls sgx_reclaim_pages() with the global LRU passed in. For cgroup reclamation, implement a basic reclamation flow, encapsulated in the top-level function, sgx_cgroup_reclaim_pages(). It performs a pre-order walk on a given cgroup subtree, and calls sgx_reclaim_pages() at each node passing in the LRU of that node. It keeps track of total attempted pages and stops the walk if desired number of pages are attempted. Then it's time to jump to implementation details: " Currently the kernel does the global EPC reclaim in sgx_reclaim_page(). It always tries to reclaim EPC pages in batch of SGX_NR_TO_SCAN (16) pages. Specifically, it always "scans", or "isolates" SGX_NR_TO_SCAN pages from the global LRU, and then tries to reclaim these pages at once for better performance. Use similar way to implement the "cgroup" variant EPC reclaim, but keep the implementation simple: 1) change sgx_reclaim_pages() to take an LRU as input, and return the pages that are "scanned" (but not actually reclaimed); 2) loop the given EPC cgroup and its descendants and do the new sgx_reclaim_pages() until SGX_NR_TO_SCAN pages are "scanned". This implementation always tries to reclaim SGX_NR_TO_SCAN pages from the LRU of the given EPC cgroup, and only moves to its descendants when there's no enough reclaimable EPC pages to "scan" in its LRU. It should be enough for most cases. " ok Then I think it's better to explain why "alternatives" are not chosen: " Note, this simple implementation doesn't _exactly_ mimic the current global EPC reclaim (which always tries to do the actual reclaim in batch of SGX_NR_TO_SCAN pages): when LRUs have less than SGX_NR_TO_SCAN reclaimable pages, the actual reclaim of EPC pages will be split into smaller batches _across_ multiple LRUs with each being smaller than SGX_NR_TO_SCAN pages. A more precise way to mimic the current global EPC reclaim would be to have a new function to only "scan" (or "isolate") SGX_NR_TO_SCAN pages _across_ the given EPC cgroup _AND_ its descendants, and then do the actual reclaim in one batch. But this is unnecessarily complicated at this stage. Alternatively, the current sgx_reclaim_pages() could be changed to return the actual "reclaimed" pages, but not "scanned" pages. However this solution also has cons: " : I recall you mentioned "unable to control latency of each reclaim" etc, but IIUC one could be: This approach may result in higher chance of "reclaiming EPC pages from descendants but not the root/given EPC cgorup", e.g., when all EPC pages in the root EPC cgroup are all young while these in its descendants are not. This may not be desired. Makes sense? Agree with the flow. The con is that this function may block too long that is unacceptable for some callers like synchronous flow which only needs some minimal (e.g., one page to pass try_charge()) to make forward progress. Convention is to call this function loops to ensure caller's condition is met, i.e., the way the
Re: [PATCH v10 08/14] x86/sgx: Add basic EPC reclamation flow for cgroup
On Wed, 2024-03-27 at 17:22 -0700, Haitao Huang wrote: > From: Kristen Carlson Accardi > > When a cgroup usage reaches its limit, and it is to be charged, i.e., > sgx_cgroup_try_charge() called for new allocations, the cgroup needs to > reclaim pages from its LRU or LRUs of its descendants to make room for > any new allocations. This patch adds the basic building block for the > per-cgroup reclamation flow and use it for synchronous reclamation in > sgx_cgroup_try_charge(). It's better to firstly mention _why_ we need this first: Currently in the EPC page allocation, the kernel simply fails the allocation when the current EPC cgroup fails to charge due to its usage reaching limit. This is not ideal. When that happens, a better way is to reclaim EPC page(s) from the current EPC cgroup (and/or its descendants) to reduce its usage so the new allocation can succeed. Add the basic building blocks to support the per-cgroup reclamation flow ... > > First, modify sgx_reclaim_pages() to let callers to pass in the LRU from > which pages are reclaimed, so it can be reused by both the global and > cgroup reclaimers. Also return the number of pages attempted, so a > cgroup reclaimer can use it to track reclamation progress from its > descendants. IMHO you are jumping too fast to the implementation details. Better to have some more background: " Currently the kernel only has one place to reclaim EPC pages: the global EPC LRU list. To support the "per-cgroup" EPC reclaim, maintain an LRU list for each EPC cgroup, and introduce a "cgroup" variant function to reclaim EPC page(s) from a given EPC cgroup (and its descendants). " > > For the global reclaimer, replace all call sites of sgx_reclaim_pages() > with calls to a newly created wrapper, sgx_reclaim_pages_global(), which > just calls sgx_reclaim_pages() with the global LRU passed in. > > For cgroup reclamation, implement a basic reclamation flow, encapsulated > in the top-level function, sgx_cgroup_reclaim_pages(). It performs a > pre-order walk on a given cgroup subtree, and calls sgx_reclaim_pages() > at each node passing in the LRU of that node. It keeps track of total > attempted pages and stops the walk if desired number of pages are > attempted. Then it's time to jump to implementation details: " Currently the kernel does the global EPC reclaim in sgx_reclaim_page(). It always tries to reclaim EPC pages in batch of SGX_NR_TO_SCAN (16) pages. Specifically, it always "scans", or "isolates" SGX_NR_TO_SCAN pages from the global LRU, and then tries to reclaim these pages at once for better performance. Use similar way to implement the "cgroup" variant EPC reclaim, but keep the implementation simple: 1) change sgx_reclaim_pages() to take an LRU as input, and return the pages that are "scanned" (but not actually reclaimed); 2) loop the given EPC cgroup and its descendants and do the new sgx_reclaim_pages() until SGX_NR_TO_SCAN pages are "scanned". This implementation always tries to reclaim SGX_NR_TO_SCAN pages from the LRU of the given EPC cgroup, and only moves to its descendants when there's no enough reclaimable EPC pages to "scan" in its LRU. It should be enough for most cases. " Then I think it's better to explain why "alternatives" are not chosen: " Note, this simple implementation doesn't _exactly_ mimic the current global EPC reclaim (which always tries to do the actual reclaim in batch of SGX_NR_TO_SCAN pages): when LRUs have less than SGX_NR_TO_SCAN reclaimable pages, the actual reclaim of EPC pages will be split into smaller batches _across_ multiple LRUs with each being smaller than SGX_NR_TO_SCAN pages. A more precise way to mimic the current global EPC reclaim would be to have a new function to only "scan" (or "isolate") SGX_NR_TO_SCAN pages _across_ the given EPC cgroup _AND_ its descendants, and then do the actual reclaim in one batch. But this is unnecessarily complicated at this stage. Alternatively, the current sgx_reclaim_pages() could be changed to return the actual "reclaimed" pages, but not "scanned" pages. However this solution also has cons: " : I recall you mentioned "unable to control latency of each reclaim" etc, but IIUC one could be: This approach may result in higher chance of "reclaiming EPC pages from descendants but not the root/given EPC cgorup", e.g., when all EPC pages in the root EPC cgroup are all young while these in its descendants are not. This may not be desired. Makes sense? > > Finally, pass a parameter to sgx_cgroup_try_charge() to indicate whether > a synchronous reclamation is allowed. If the caller allows and cgroup > usage is at its limit, trigger the synchronous reclamation by calling > sgx_cgroup_reclaim_pages() in a loop with cond_resched() in between > iterations. This isn't needed IMHO as you can easily see in the code, and there's no "design choices" here. General rule: focus on explaining "why", and "design choices", but not implementation details, which can be seen in the