On 2/10/21 12:10 AM, Peter Zijlstra wrote:
> On Tue, Feb 09, 2021 at 11:09:27PM +, Luck, Tony wrote:
>>> +#define X86_BUG_NUMA_SHARES_LLCX86_BUG(25) /* CPU may
>>> enumerate an LLC shared by multiple NUMA nodes */
>>
>> During internal review I wondered why this is a "BUG" rather t
On 2/10/21 2:21 AM, Joerg Roedel wrote:
> + /* Store to memory and keep it in the registers */
> + movl%eax, rva(sev_check_data)(%ebp)
> + movl%ebx, rva(sev_check_data+4)(%ebp)
> +
> + /* Enable paging to see if encryption is active */
> + movl%cr0, %edx /* Back
On 2/10/21 2:21 AM, Joerg Roedel wrote:
> +1: rdrand %eax
> + jnc 1b
> +2: rdrand %ebx
> + jnc 2b
> +
> + /* Store to memory and keep it in the registers */
> + movl%eax, rva(sev_check_data)(%ebp)
> + movl%ebx, rva(sev_check_data+4)(%ebp)
> +
> + /* Ena
On 2/10/21 12:05 AM, Peter Zijlstra wrote:
>> +if (IS_ENABLED(CONFIG_NUMA))
>> +set_cpu_bug(c, X86_BUG_NUMA_SHARES_LLC);
>> }
> This seens wrong too, it shouldn't be allowed pre SKX. And ideally only
> be allowed when SNC is enabled.
Originally, this just added a few more models t
On 2/2/21 3:42 AM, Oscar Salvador wrote:
>> +static int __meminit migrate_on_reclaim_callback(struct notifier_block
>> *self,
>> + unsigned long action, void
>> *arg)
>> +{
>> +switch (action) {
>> +case MEM_GOING_OFFLINE:
>> +/*
>>
On 1/29/21 1:04 PM, Yang Shi wrote:
>> @@ -1527,7 +1527,7 @@ retry:
>> nr_succeeded += nr_subpages;
> It seems the above line is missed. The THP accounting change was
> merged in v5.9 before I submitted this patch.
Thanks for reporting that. Ying found and
rnel's
> direct mapping range.
Looks fine to me:
Acked-by: Dave Hansen
On 2/8/21 8:16 AM, Jing Liu wrote:
> -#define XSTATE_COMPACTION_ENABLED (1ULL << 63)
> -
> static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
> {
> struct xregs_state *xsave = &vcpu->arch.guest_fpu->state.xsave;
> @@ -4494,7 +4492,8 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *
> This has been shown in tests:
>
> [ +0.08] WARNING: CPU: 3 PID: 7620 at kernel/rcu/srcutree.c:374
> cleanup_srcu_struct+0xed/0x100
>
> This is essentially a use-after free, although SRCU notices it as
> an SRCU cleanup in an invalid context.
...
Acked-by: Dave Hansen
On 2/7/21 12:44 PM, Alexei Starovoitov wrote:
>>> It probably is an item on some Intel manager's to-enable list. So far,
>>> the CET enablement concentrates only on userspace but dhansen might know
>>> more about future plans. CCed.
>> It's definitely on our radar to look at after CET userspace.
>
On 2/7/21 12:29 PM, Kirill A. Shutemov wrote:
>> Couldn't you just have one big helper that takes *all* the registers
>> that get used in any TDVMCALL and sets all the rcx bits? The users
>> could just pass 0's for the things they don't use.
>>
>> Then you've got the ugly inline asm in one place.
On 2/7/21 10:15 AM, Linus Torvalds wrote:
> On Sun, Feb 7, 2021 at 9:58 AM Borislav Petkov wrote:
>> It probably is an item on some Intel manager's to-enable list. So far,
>> the CET enablement concentrates only on userspace but dhansen might know
>> more about future plans. CCed.
> I think the ne
On 2/7/21 9:58 AM, Borislav Petkov wrote:
> On Sun, Feb 07, 2021 at 09:49:18AM -0800, Linus Torvalds wrote:
>> On Sun, Feb 7, 2021 at 2:40 AM Borislav Petkov wrote:
>>> - Disable CET instrumentation in the kernel so that gcc doesn't add
>>> ENDBR64 to kernel code and thus confuse tracing.
>> So th
On 2/7/21 6:13 AM, Kirill A. Shutemov wrote:
>>> + /* Allow to pass R10, R11, R12, R13, R14 and R15 down to the VMM */
>>> + rcx = BIT(10) | BIT(11) | BIT(12) | BIT(13) | BIT(14) | BIT(15);
>>> +
>>> + asm volatile(TDCALL
>>> + : "=a"(ret), "=r"(r10), "=r"(r1
to get a page aligned kernel address to use.
>
> In addition add a comment to document the alignment requirements so that
> others like myself don't attempt to 'fix' this again.
Looks good:
Acked-by: Dave Hansen
encl->refcount when encl_mm->encl is established. Release
this reference encl_mm is freed. This ensures that 'encl' outlives
'encl_mm'.
Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
Cc: Dave Hansen
Signed-off-by: Jarkko Sakkinen
---
b/arch/x86/kern
On 2/4/21 12:19 PM, Kees Cook wrote:
>> (e) A page where the processor observed a Write=1 PTE, started a write, set
>> Dirty=1, but then observed a Write=0 PTE. That's possible today, but
>> will not happen on processors that support shadow stack.
> What happens for "e" with/without CET? I
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 25a068b8e9a4eb193d755d58efcb3c98928636e0
Gitweb:
https://git.kernel.org/tip/25a068b8e9a4eb193d755d58efcb3c98928636e0
Author:Dave Hansen
AuthorDate:Thu, 05 Mar 2020 09:47:08 -08:00
Committer
...
> Reported-by: Jan Kiszka
> Cc: x...@kernel.org
> Cc: Peter Zijlstra
Don't know how I managed to miss it in the first place, but:
Signed-off-by: Dave Hansen
On 2/3/21 1:54 PM, Yu, Yu-cheng wrote:
> On 1/29/2021 10:56 AM, Yu, Yu-cheng wrote:
>> On 1/29/2021 9:07 AM, Dave Hansen wrote:
>>> On 1/27/21 1:25 PM, Yu-cheng Yu wrote:
>>>> + if (!IS_ENABLED(CONFIG_X86_CET))
>>>> + return -EOPNOTSUPP;
>
On 1/30/21 11:20 AM, Jarkko Sakkinen wrote:
...
> Example scenario would such that all removals "side-channel" through
> the notifier callback. Then mmu_notifier_unregister() gets called
> exactly zero times. No MMU notifier srcu sync would be then happening.
>
> NOTE: There's bunch of other examp
On 2/3/21 3:23 AM, Borislav Petkov wrote:
>> -/*
>> - * 'XSAVES' implies two different things:
>> - * 1. saving of supervisor/system state
>> - * 2. using the compacted format
>> - *
>> - * Use this function when dealing with the compacted format so
>> - * that it is obvious which aspect of 'XSAVES
On 2/2/21 9:46 AM, Yang Shi wrote:
> On Mon, Feb 1, 2021 at 11:13 AM Dave Hansen wrote:
>> On 1/29/21 12:46 PM, Yang Shi wrote:
>> ...
>>>> int next_demotion_node(int node)
>>>> {
>>>> - return node_demotion[node];
>>>>
On 2/2/21 2:45 PM, Yang Shi wrote:
>> Should we keep it simple for now and only try to demote those pages that are
>> free of cpusets and memory policies?
>> Actually, demoting those pages to a CPU or a NUMA node that does not fall
>> into
>> their set, would violate those constraints right?
> Yes
On 2/2/21 10:56 AM, Yang Shi wrote:
>>
>> /* If we have no swap space, do not bother scanning anon pages. */
>> - if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0) {
>> + if (!sc->may_swap || !can_reclaim_anon_pages(memcg, pgdat->node_id))
>> {
> Just one minor thi
On 2/1/21 5:37 PM, ira.we...@intel.com wrote:
> kmap is inefficient and we are trying to reduce the usage in the kernel.
> There is no readily apparent reason why the initp_page page needs to be
> allocated and kmap'ed() but sigstruct needs to be page aligned and token
> 512 byte aligned.
Hi Ira,
On 2/2/21 10:22 AM, Yang Shi wrote:
>> +static struct page *alloc_demote_page(struct page *page, unsigned long node)
>> +{
>> +struct migration_target_control mtc = {
>> + /*
>> +* Fail quickly and quietly. Page will likely
>> +* just be discar
On 2/1/21 3:05 PM, Yu, Yu-cheng wrote:
>>>
>>
>> Wait a sec... What about *THIS* series? Will *THIS* series give us
>> oopses when userspace blasts a new XSAVE buffer in with NT_X86_XSTATE?
>>
>
> Fortunately, CET states are supervisor states. NT_x86_XSTATE has only
> user states.
Ahhh, good p
On 2/1/21 2:43 PM, Yu, Yu-cheng wrote:
> On 1/29/2021 2:53 PM, Dave Hansen wrote:
>> On 1/29/21 2:35 PM, Yu, Yu-cheng wrote:
>>>> Andy Cooper just mentioned on IRC about this nugget in the spec:
>>>>
>>>> XRSTORS on CET state will do reserved bit a
On 1/27/21 1:25 PM, Yu-cheng Yu wrote:
> To deliver a signal, create a shadow stack restore token and put a restore
> token and the signal restorer address on the shadow stack. For sigreturn,
> verify the token and restore the shadow stack pointer.
>
> Introduce WRUSS, which is a kernel-mode inst
On 1/29/21 12:46 PM, Yang Shi wrote:
...
>> int next_demotion_node(int node)
>> {
>> - return node_demotion[node];
>> + /*
>> +* node_demotion[] is updated without excluding
>> +* this function from running. READ_ONCE() avoids
>> +* reading multiple, inconsist
On 1/30/21 5:19 PM, David Rientjes wrote:
> On Mon, 25 Jan 2021, Dave Hansen wrote:
>
>> diff -puN mm/migrate.c~0006-node-Define-and-export-memory-migration-path
>> mm/migrate.c
>> --- a/mm/migrate.c~0006-node-Define-and-export-memory-migration-path
>> 2021-
On 1/29/21 2:35 PM, Yu, Yu-cheng wrote:
>> Andy Cooper just mentioned on IRC about this nugget in the spec:
>>
>> XRSTORS on CET state will do reserved bit and canonicality
>> checks on the state in similar manner as done by the WRMSR to
>> these state elements.
>>
>> We're using copy_k
On 1/27/21 1:25 PM, Yu-cheng Yu wrote:
> @@ -135,6 +135,8 @@ enum xfeature {
> #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
> #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU)
> #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID)
> +#define XFEATURE
On 1/29/21 11:58 AM, Andy Lutomirski wrote:
>> Did any CPUs ever get released that have this? If so, name them. If
>> not, time to change this to 2021, I think.
> Zen 3 :)
In that case is there any reason to keep the "depends on CPU_SUP_INTEL"?
On 1/27/21 1:25 PM, Yu-cheng Yu wrote:
> + help
> + Control-flow protection is a hardware security hardening feature
> + that detects function-return address or jump target changes by
> + malicious code.
It's not really one feature. I also think it's not worth talking about
On 1/29/21 10:56 AM, Yu, Yu-cheng wrote:
> On 1/29/2021 9:07 AM, Dave Hansen wrote:
>> On 1/27/21 1:25 PM, Yu-cheng Yu wrote:
>>> + u64 buf[3] = {0, 0, 0};
Doesn't the compiler zero these if you initialize it to anything? In
other words, doesn't t
On 1/27/21 1:25 PM, Yu-cheng Yu wrote:
> arch_prctl(ARCH_X86_CET_STATUS, u64 *args)
> Get CET feature status.
>
> The parameter 'args' is a pointer to a user buffer. The kernel returns
> the following information:
>
> *args = shadow stack/IBT status
> *(args + 1) = shadow sta
On 1/28/21 8:33 AM, Zi Yan wrote:
>> One of the many lasting (as we don't coalesce back) sources for
>> huge page splits is tracing as the granular page
>> attribute/permission changes would force the kernel to split code
>> segments mapped to huge pages to smaller ones thereby increasing
>> the pr
On 1/28/21 4:58 AM, Jarkko Sakkinen wrote:
> The most trivial example of a race condition can be demonstrated by this
> sequence where mm_list contains just one entry:
>
> CPU A CPU B
> -> sgx_release()
> -> sgx_mmu_notifier_release()
>
On 1/27/21 2:50 PM, Saravanan D wrote:
> +#if defined(__x86_64__)
We don't use __x86_64__ in the kernel. This should be CONFIG_X86.
> +#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
> + "direct_map_2M_splits",
> +#else
> + "direct_map_4M_splits",
> +#endif
> + "direct_map_1G_s
On 1/27/21 1:03 PM, Tejun Heo wrote:
>> The lifetime split event information will be displayed at the bottom of
>> /proc/vmstat
>>
>> swap_ra 0
>> swap_ra_hit 0
>> direct_map_2M_splits 139
>> direct_map_4M_splits 0
>> direct_map_1G_splits 7
>> nr_unstable 0
>>
>
> This looks great to me.
ges()
]
Signed-off-by: Yang Shi
Signed-off-by: Dave Hansen
Cc: David Rientjes
Cc: Huang Ying
Cc: Dan Williams
Cc: David Hildenbrand
Cc: osalvador
--
Changes since 202010:
* remove unused scan-control 'demoted' field
---
b/include/linux/vm_event_item.h |2 ++
From: Dave Hansen
Anonymous pages are kept on their own LRU(s). These lists could theoretically
always be scanned and maintained. But, without swap, there is currently
nothing the kernel can *do* with the results of a scanned, sorted LRU for
anonymous pages.
A check for '!total_swap_
From: Dave Hansen
It is currently not obvious that the RECLAIM_* bits are part of the
uapi since they are defined in vmscan.c. Move them to a uapi header
to make it obvious.
This should have no functional impact.
Signed-off-by: Dave Hansen
Reviewed-by: Ben Widawsky
Acked-by: David
The full series is also available here:
https://github.com/hansendc/linux/tree/automigrate-20210122
The meat of this patch is in:
[PATCH 08/13] mm/migrate: demote pages during reclaim
Which also has the most changes since the last post. This version is
mostly to address revie
From: Dave Hansen
Global reclaim aims to reduce the amount of memory used on
a given node or set of nodes. Migrating pages to another
node serves this purpose.
memcg reclaim is different. Its goal is to reduce the
total memory consumption of the entire memcg, across all
nodes. Migration
From: Dave Hansen
Prepare for the kernel to auto-migrate pages to other memory nodes
with a user defined node migration table. This allows creating single
migration target for each NUMA node to enable the kernel to do NUMA
page migrations instead of simply reclaiming colder pages. A node
with
From: Dave Hansen
Some method is obviously needed to enable reclaim-based migration.
Just like traditional autonuma, there will be some workloads that
will benefit like workloads with more "static" configurations where
hot pages stay hot and cold pages stay cold. If pages come a
From: Dave Hansen
I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
sysctl. Like a good kernel developer, I also went to go update the
documentation. I noticed that the bits in the documentation didn't
match the bits in the #defines.
The VM never explicitly check
On 1/25/21 4:53 PM, Tejun Heo wrote:
>> This would be a lot more useful if you could reset the counters. Then
>> just reset them from userspace at boot. Adding read-write debugfs
>> exports for these should be pretty trivial.
> While this would work for hands-on cases, I'm a bit worried that this
On 1/25/21 12:32 PM, Tejun Heo wrote:
> On Mon, Jan 25, 2021 at 12:15:51PM -0800, Dave Hansen wrote:
>>> DirectMap4k: 3505112 kB
>>> DirectMap2M:19464192 kB
>>> DirectMap1G:12582912 kB
>>> DirectMap2MSplits: 1705
>>> DirectMap1GSplits:
ion rename]
Signed-off-by: Vishal Verma
Signed-off-by: Dave Hansen
Cc: Yang Shi
Cc: David Rientjes
Cc: Huang Ying
Cc: Dan Williams
Cc: David Hildenbrand
Cc: osalvador
--
Changes from Dave 10/2020:
* remove 'total_swap_pages' modification
Changes from Dave 06/2020:
* rename recla
From: Dave Hansen
This is mostly derived from a patch from Yang Shi:
https://lore.kernel.org/linux-mm/1560468577-101178-10-git-send-email-yang@linux.alibaba.com/
Add code to the reclaim path (shrink_page_list()) to "demote" data
to another NUMA node instead of disc
From: Dave Hansen
RECLAIM_ZONE was assumed to be unused because it was never explicitly
used in the kernel. However, there were a number of places where it
was checked implicitly by checking 'node_reclaim_mode' for a zero
value.
These zero checks are not great because it is not ob
From: Dave Hansen
Reclaim-based migration is attempting to optimize data placement in
memory based on the system topology. If the system changes, so must
the migration ordering.
The implementation here is pretty simple and entirely unoptimized. On
any memory or CPU hotplug events, assume
account how many pages are reclaimed (demoted) since page
reclaim behavior depends on this. Add *nr_succeeded parameter to make
migrate_pages() return how many pages are demoted successfully for all
cases.
Signed-off-by: Yang Shi
Signed-off-by: Dave Hansen
Cc: David Rientjes
Cc: Huang Ying
Cc
From: Dave Hansen
When memory fills up on a node, memory contents can be
automatically migrated to another node. The biggest problems are
knowing when to migrate and to where the migration should be
targeted.
The most straightforward way to generate the "to where" list
would be to
On 1/25/21 12:11 PM, Saravanan D wrote:
> Numerous hugepage splits in the linear mapping would give
> admins the signal to narrow down the sluggishness caused by TLB
> miss/reload.
>
> One of the many lasting (as we don't coalesce back) sources for huge page
> splits is tracing as the granular pag
> Usually, the compiler is better at making code efficient than humans. I
> find that coding it in the most human-readable way is best unless I
> *know* the compiler is unable to generate god code.
"good code", even.
I really want a "god code" compiler, though. :)
On 1/21/21 12:16 PM, Yu, Yu-cheng wrote:
>
>>> @@ -343,6 +349,16 @@ static inline pte_t pte_mkold(pte_t pte)
>>> static inline pte_t pte_wrprotect(pte_t pte)
>>> {
>>> + /*
>>> + * Blindly clearing _PAGE_RW might accidentally create
>>> + * a shadow stack PTE (RW=0, Dirty=1). Mov
On 1/19/21 10:44 PM, Yejune Deng wrote:
> In fpstate_sanitize_xstate(), use memset and offsetof instead of '= 0',
> and use sizeof instead of a constant.
What's the benefit to doing this? Saving 4 lines of code?
Your suggestions are not obviously wrong at a glance, but they're also
not obviously
On 1/15/21 6:04 PM, Eric Biggers wrote:
> On Fri, Jan 15, 2021 at 04:20:44PM -0800, Dave Hansen wrote:
>> On 1/15/21 4:14 PM, Dey, Megha wrote:
>>> Also, I do not know of any cores that implement PCLMULQDQ and not AES-NI.
>> That's true, bit it's also possible
On 1/15/21 4:14 PM, Dey, Megha wrote:
> Also, I do not know of any cores that implement PCLMULQDQ and not AES-NI.
That's true, bit it's also possible that a hypervisor could enumerate
support for PCLMULQDQ and not AES-NI. In general, we've tried to
implement x86 CPU features independently, even i
On 1/14/21 9:54 AM, Jarkko Sakkinen wrote:
> On Tue, Jan 12, 2021 at 04:24:01PM -0800, Dave Hansen wrote:
>> We need a bit more information here as well. What's the relationship
>> between NUMA nodes and sections? How does the BIOS tell us which NUMA
>> nodes a section
On 12/16/20 5:50 AM, Jarkko Sakkinen wrote:
> Create a pointer array for each NUMA node with the references to the
> contained EPC sections. Use this in __sgx_alloc_epc_page() to knock the
> current NUMA node before the others.
It makes it harder to comment when I'm not on cc.
Hint, hint... ;)
W
SGX ioctl() calls are serialized with a lock. It's a weird open-coded
lock that is not even called a "lock". That makes it a weird beast,
but Sean has convinced me it's a good idea without better alternatives.
Give the lock bit a better name, and document what it actually trying
to do.
Cc: Se
On 1/6/21 10:19 PM, Tony W Wang-oc wrote:
> + /*
> + * These CPUs declare support SSE4.2 instruction sets but
> + * having low performance CRC32C instruction implementation.
> + */
> + if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b))
> + set_cpu_cap(c
were allowed. Guess you
learn something new every day. This looks fine to me, it removes the
exact dup that Ingo appears to have added.
Acked-by: Dave Hansen
estigation into why we're suddenly seeing this now.
I agree that ridding ourselves of open-coded free_page()'s is a good
idea, but this patch itself needs to be around for stable anyway. So,
Acked-by: Dave Hansen
On 1/4/21 12:11 PM, David Hildenbrand wrote:
>> Yeah, it certainly can't be the default, but it *is* useful for
>> thing where we know that there are no cache benefits to zeroing
>> close to where the memory is allocated.
>>
>> The trick is opting into it somehow, either in a process or a VMA.
>>
On 1/4/21 11:27 AM, Matthew Wilcox wrote:
> On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote:
>> On 12/21/20 8:30 AM, Liang Li wrote:
>>> --- a/include/linux/page-flags.h
>>> +++ b/include/linux/page-flags.h
>>> @@ -137,6 +137,9 @@ enum pageflags {
On 12/21/20 8:30 AM, Liang Li wrote:
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -137,6 +137,9 @@ enum pageflags {
> #endif
> #ifdef CONFIG_64BIT
> PG_arch_2,
> +#endif
> +#ifdef CONFIG_PREZERO_PAGE
> + PG_zero,
> #endif
> __NR_PAGEFLAGS,
I don't t
On 12/18/20 11:42 AM, Ira Weiny wrote:
> Another problem would be if the kmap and kunmap happened in different
> contexts... :-/ I don't think that is done either but I don't know for
> certain.
It would be really nice to put together some surveillance patches to
help become more certain about t
On 12/17/20 8:10 PM, Ira Weiny wrote:
> On Thu, Dec 17, 2020 at 12:41:50PM -0800, Dave Hansen wrote:
>> On 11/6/20 3:29 PM, ira.we...@intel.com wrote:
>>> void disable_TSC(void)
>>> @@ -644,6 +668,8 @@ void __switch_to_xtra(struct task_struct *prev_p,
>
On 12/16/20 9:41 AM, Chang S. Bae wrote:
> +config CRYPTO_AES_KL
> + tristate "AES cipher algorithms (AES-KL)"
> + depends on X86_KEYLOCKER
> + select CRYPTO_AES_NI_INTEL
> + help
> + Use AES Key Locker instructions for AES algorithm.
> +
> + AES cipher algorithms (FIPS-
On 11/6/20 3:29 PM, ira.we...@intel.com wrote:
> + /* Arm for context switch test */
> + write(fd, "1", 1);
> +
> + /* Context switch out... */
> + sleep(4);
> +
> + /* Check msr restored */
> + write(fd, "2", 1);
These are al
On 11/6/20 3:29 PM, ira.we...@intel.com wrote:
> void disable_TSC(void)
> @@ -644,6 +668,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct
> task_struct *next_p)
>
> if ((tifp ^ tifn) & _TIF_SLD)
> switch_to_sld(tifn);
> +
> + pks_sched_in();
> }
Does the s
INIT_PGD_PAGE_COUNT (2 * INIT_PGD_PAGE_TABLES)
> #else
> -#define INIT_PGD_PAGE_COUNT 12
> +#define INIT_PGD_PAGE_COUNT (4 * INIT_PGD_PAGE_TABLES)
> #endif
> +
Lorenzo, thanks for the patch. That was a very nice changelog, and it
all seems sane to me, especially with Kirill's ack.
Acked-by: Dave Hansen
On 12/7/20 4:40 AM, Vladimir Kondratiev wrote:
> Recursive do_exit() is symptom of compromised kernel integrity.
> For safety critical systems, it may be better to
> panic() in this case to minimize risk.
This changelog is still woefully inadequate. It doesn't really describe
the problem which is
On 12/6/20 5:10 AM, Vladimir Kondratiev wrote:
> Double fault detected in do_exit() is symptom of integrity
> compromised. For safety critical systems, it may be better to
> panic() in this case to minimize risk.
Does this fix a real problem that you have observed in practice?
Or, is this a gener
On 12/3/20 1:19 AM, Borislav Petkov wrote:
> On Tue, Nov 10, 2020 at 08:21:51AM -0800, Yu-cheng Yu wrote:
>> Before introducing _PAGE_COW for non-hardware memory management purposes in
>> the next patch, rename _PAGE_DIRTY to _PAGE_DIRTY_HW and _PAGE_BIT_DIRTY to
>> _PAGE_BIT_DIRTY_HW to make meani
On 11/30/20 3:16 PM, Yu, Yu-cheng wrote:
>>
>> Do we have any other spots in the kernel where we care about:
>>
>> boot_cpu_has(X86_FEATURE_SHSTK) ||
>> boot_cpu_has(X86_FEATURE_IBT)
>>
>> ? If so, we could also address this by declaring a software-defined
>> X86_FEATURE_CET and then setti
On 11/30/20 7:25 AM, Lai Jiangshan wrote:
> The commit 825d0b73cd752("x86/mm/pti: Handle unaligned address gracefully
> in pti_clone_pagetable()") handles unaligned address well for unmapped
> PUD/PMD etc. But unaligned address for pmd_large() or PTI_CLONE_PMD is also
> needed to be aware.
That 82
On 11/30/20 10:06 AM, Yu, Yu-cheng wrote:
>>> + if (!boot_cpu_has(X86_FEATURE_SHSTK) &&
>>> + !boot_cpu_has(X86_FEATURE_IBT))
>>> + xfeatures_mask_all &= ~BIT_ULL(i);
>>> + } else {
>>> + if ((xsave_cpuid_features[i] == -1) ||
>>
>> Where d
On 11/25/20 9:32 PM, Huang Ying wrote:
> --- a/man2/set_mempolicy.2
> +++ b/man2/set_mempolicy.2
> @@ -113,6 +113,11 @@ A nonempty
> .I nodemask
> specifies node IDs that are relative to the set of
> node IDs allowed by the process's current cpuset.
> +.TP
> +.BR MPOL_F_NUMA_BALANCING " (since L
On 11/10/20 8:21 AM, Yu-cheng Yu wrote:
> Control-flow Enforcement Technology (CET) adds five MSRs. Introduce
> them and their XSAVES supervisor states:
>
> MSR_IA32_U_CET (user-mode CET settings),
> MSR_IA32_PL3_SSP (user-mode Shadow Stack pointer),
> MSR_IA32_PL0_SSP (kernel-mode Sh
>>> {
>>> - int cpu, hk_flags;
>>> + static DEFINE_SPINLOCK(spread_lock);
>>> + static bool used[MAX_NUMNODES];
>>
>> I thought I mentioned this last time. How large is this array? How
>> large would it be if it were a nodemask_t? Would this be less code if
>
> Apologies that I forgot to
On 11/30/20 7:25 AM, Lai Jiangshan wrote:
> --- a/arch/x86/mm/pti.c
> +++ b/arch/x86/mm/pti.c
> @@ -321,10 +321,10 @@ pti_clone_pgtable(unsigned long start, unsigned long
> end,
> break;
>
> pgd = pgd_offset_k(addr);
> - if (WARN_ON(pgd_none(*pgd))
On 11/21/20 7:12 AM, Dr. Greg wrote:
>> Important Kernel Touch Points
>> =
>>
>> This implementation is picky and will decline to work on hardware which
>> is locked to Intel's root of trust.
> Given that this driver is no longer locked to the Intel trust root, by
> virt
On 11/17/20 6:54 PM, Shaokun Zhang wrote:
> From: Yuqi Jin
>
> In multi-processor and NUMA system, I/O driver will find cpu cores that
> which shall be bound IRQ. When cpu cores in the local numa have been
> used up, it is better to find the node closest to the local numa node
> for performance,
The following commit has been merged into the x86/sgx branch of tip:
Commit-ID: 67655b57f8f59467506463055d9a8398d2836377
Gitweb:
https://git.kernel.org/tip/67655b57f8f59467506463055d9a8398d2836377
Author:Dave Hansen
AuthorDate:Mon, 16 Nov 2020 14:25:31 -08:00
Committer
On 11/16/20 9:54 AM, Dave Hansen wrote:
>> ENCLS instructions must be serialized for a given enclave, but holding
>> encl->lock for an entire ioctl() will result in deadlock due to an enclave
>> triggering reclaim on itself.
>>
>> Building an enclave must also be
From: Dave Hansen
Short Version:
The SGX section->laundry_list structure is effectively thread-local,
but declared next to some shared structures. Its semantics are clear
as mud. Fix that. No functional changes. Compile tested only.
Long Version:
The SGX hardware keeps per-page metad
On 11/14/20 8:01 PM, Hillf Danton wrote:
> On Fri, 13 Nov 2020 00:01:22 +0200 Jarkko Sakkinen wrote:
> +
> +static unsigned long sgx_get_unmapped_area(struct file *file,
> +unsigned long addr,
> +unsigned long len,
> +
On 11/14/20 12:42 AM, Hillf Danton wrote:
> On Fri, 13 Nov 2020 00:01:16 +0200 Jarkko Sakkinen wrote:
>> + */
>> +static void sgx_sanitize_section(struct sgx_epc_section *section)
>> +{
>> +struct sgx_epc_page *page;
>> +LIST_HEAD(dirty);
>> +int ret;
>> +
>> +while (!list_empty(§io
Hillf, I noticed that you removed a bunch of folks from cc, including
me. Was there a reason for that? I haven't been seeing your feedback
on these patches at all.
On 11/14/20 8:40 PM, Hillf Danton wrote:
> On Fri, 13 Nov 2020 00:01:23 +0200 Jarkko Sakkinen wrote:
>> +long sgx_ioctl(struct file
On 11/16/20 8:55 AM, Borislav Petkov wrote:
> On Fri, Nov 13, 2020 at 12:01:11AM +0200, Jarkko Sakkinen wrote:
>> Sean Christopherson is a major contributor to this series. However, he
>> has left Intel and his @intel.com address will soon be bouncing. He
>> does not have a email he wants us to s
On 11/16/20 8:32 AM, Matthew Wilcox wrote:
>>
>> That's really the best we can do from software without digging into
>> microarchitecture-specific events.
> I mean this is perf. Digging into microarch specific events is what it
> does ;-)
Yeah, totally.
But, if we see a bunch of 4k TLB hit event
On 11/16/20 7:54 AM, Matthew Wilcox wrote:
> It gets even more complicated with CPUs with multiple levels of TLB
> which support different TLB entry sizes. My CPU reports:
>
> TLB info
> Instruction TLB: 2M/4M pages, fully associative, 8 entries
> Instruction TLB: 4K pages, 8-way associative, 6
201 - 300 of 3038 matches
Mail list logo