Re: [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms

2021-02-10 Thread Dave Hansen
On 2/10/21 12:10 AM, Peter Zijlstra wrote: > On Tue, Feb 09, 2021 at 11:09:27PM +, Luck, Tony wrote: >>> +#define X86_BUG_NUMA_SHARES_LLCX86_BUG(25) /* CPU may >>> enumerate an LLC shared by multiple NUMA nodes */ >> >> During internal review I wondered why this is a "BUG" rather t

Re: [PATCH 6/7] x86/boot/compressed/64: Check SEV encryption in 32-bit boot-path

2021-02-10 Thread Dave Hansen
On 2/10/21 2:21 AM, Joerg Roedel wrote: > + /* Store to memory and keep it in the registers */ > + movl%eax, rva(sev_check_data)(%ebp) > + movl%ebx, rva(sev_check_data+4)(%ebp) > + > + /* Enable paging to see if encryption is active */ > + movl%cr0, %edx /* Back

Re: [PATCH 6/7] x86/boot/compressed/64: Check SEV encryption in 32-bit boot-path

2021-02-10 Thread Dave Hansen
On 2/10/21 2:21 AM, Joerg Roedel wrote: > +1: rdrand %eax > + jnc 1b > +2: rdrand %ebx > + jnc 2b > + > + /* Store to memory and keep it in the registers */ > + movl%eax, rva(sev_check_data)(%ebp) > + movl%ebx, rva(sev_check_data+4)(%ebp) > + > + /* Ena

Re: [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms

2021-02-10 Thread Dave Hansen
On 2/10/21 12:05 AM, Peter Zijlstra wrote: >> +if (IS_ENABLED(CONFIG_NUMA)) >> +set_cpu_bug(c, X86_BUG_NUMA_SHARES_LLC); >> } > This seens wrong too, it shouldn't be allowed pre SKX. And ideally only > be allowed when SNC is enabled. Originally, this just added a few more models t

Re: [RFC][PATCH 06/13] mm/migrate: update migration order during on hotplug events

2021-02-09 Thread Dave Hansen
On 2/2/21 3:42 AM, Oscar Salvador wrote: >> +static int __meminit migrate_on_reclaim_callback(struct notifier_block >> *self, >> + unsigned long action, void >> *arg) >> +{ >> +switch (action) { >> +case MEM_GOING_OFFLINE: >> +/* >>

Re: [RFC][PATCH 07/13] mm/migrate: make migrate_pages() return nr_succeeded

2021-02-09 Thread Dave Hansen
On 1/29/21 1:04 PM, Yang Shi wrote: >> @@ -1527,7 +1527,7 @@ retry: >> nr_succeeded += nr_subpages; > It seems the above line is missed. The THP accounting change was > merged in v5.9 before I submitted this patch. Thanks for reporting that. Ying found and

Re: [PATCH V6] x86/mm: Tracking linear mapping split events

2021-02-08 Thread Dave Hansen
rnel's > direct mapping range. Looks fine to me: Acked-by: Dave Hansen

Re: [PATCH v1] kvm: x86: Revise guest_fpu xcomp_bv field

2021-02-08 Thread Dave Hansen
On 2/8/21 8:16 AM, Jing Liu wrote: > -#define XSTATE_COMPACTION_ENABLED (1ULL << 63) > - > static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu) > { > struct xregs_state *xsave = &vcpu->arch.guest_fpu->state.xsave; > @@ -4494,7 +4492,8 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *

Re: [PATCH v8] x86/sgx: Maintain encl->refcount for each encl->mm_list entry

2021-02-07 Thread Dave Hansen
> This has been shown in tests: > > [ +0.08] WARNING: CPU: 3 PID: 7620 at kernel/rcu/srcutree.c:374 > cleanup_srcu_struct+0xed/0x100 > > This is essentially a use-after free, although SRCU notices it as > an SRCU cleanup in an invalid context. ... Acked-by: Dave Hansen

Re: [GIT PULL] x86/urgent for v5.11-rc7

2021-02-07 Thread Dave Hansen
On 2/7/21 12:44 PM, Alexei Starovoitov wrote: >>> It probably is an item on some Intel manager's to-enable list. So far, >>> the CET enablement concentrates only on userspace but dhansen might know >>> more about future plans. CCed. >> It's definitely on our radar to look at after CET userspace. >

Re: [RFC v1 09/26] x86/tdx: Handle CPUID via #VE

2021-02-07 Thread Dave Hansen
On 2/7/21 12:29 PM, Kirill A. Shutemov wrote: >> Couldn't you just have one big helper that takes *all* the registers >> that get used in any TDVMCALL and sets all the rcx bits? The users >> could just pass 0's for the things they don't use. >> >> Then you've got the ugly inline asm in one place.

Re: [GIT PULL] x86/urgent for v5.11-rc7

2021-02-07 Thread Dave Hansen
On 2/7/21 10:15 AM, Linus Torvalds wrote: > On Sun, Feb 7, 2021 at 9:58 AM Borislav Petkov wrote: >> It probably is an item on some Intel manager's to-enable list. So far, >> the CET enablement concentrates only on userspace but dhansen might know >> more about future plans. CCed. > I think the ne

Re: [GIT PULL] x86/urgent for v5.11-rc7

2021-02-07 Thread Dave Hansen
On 2/7/21 9:58 AM, Borislav Petkov wrote: > On Sun, Feb 07, 2021 at 09:49:18AM -0800, Linus Torvalds wrote: >> On Sun, Feb 7, 2021 at 2:40 AM Borislav Petkov wrote: >>> - Disable CET instrumentation in the kernel so that gcc doesn't add >>> ENDBR64 to kernel code and thus confuse tracing. >> So th

Re: [RFC v1 09/26] x86/tdx: Handle CPUID via #VE

2021-02-07 Thread Dave Hansen
On 2/7/21 6:13 AM, Kirill A. Shutemov wrote: >>> + /* Allow to pass R10, R11, R12, R13, R14 and R15 down to the VMM */ >>> + rcx = BIT(10) | BIT(11) | BIT(12) | BIT(13) | BIT(14) | BIT(15); >>> + >>> + asm volatile(TDCALL >>> + : "=a"(ret), "=r"(r10), "=r"(r1

Re: [PATCH v4] x86: Remove unnecessary kmap() from sgx_ioc_enclave_init()

2021-02-05 Thread Dave Hansen
to get a page aligned kernel address to use. > > In addition add a comment to document the alignment requirements so that > others like myself don't attempt to 'fix' this again. Looks good: Acked-by: Dave Hansen

Re: [PATCH 2/2] x86/sgx: Maintain encl->refcount for each encl->mm_list entry

2021-02-05 Thread Dave Hansen
encl->refcount when encl_mm->encl is established. Release this reference encl_mm is freed. This ensures that 'encl' outlives 'encl_mm'. Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Cc: Dave Hansen Signed-off-by: Jarkko Sakkinen --- b/arch/x86/kern

Re: [PATCH v19 08/25] x86/mm: Introduce _PAGE_COW

2021-02-04 Thread Dave Hansen
On 2/4/21 12:19 PM, Kees Cook wrote: >> (e) A page where the processor observed a Write=1 PTE, started a write, set >> Dirty=1, but then observed a Write=0 PTE. That's possible today, but >> will not happen on processors that support shadow stack. > What happens for "e" with/without CET? I

[tip: x86/urgent] x86/apic: Add extra serialization for non-serializing MSRs

2021-02-04 Thread tip-bot2 for Dave Hansen
The following commit has been merged into the x86/urgent branch of tip: Commit-ID: 25a068b8e9a4eb193d755d58efcb3c98928636e0 Gitweb: https://git.kernel.org/tip/25a068b8e9a4eb193d755d58efcb3c98928636e0 Author:Dave Hansen AuthorDate:Thu, 05 Mar 2020 09:47:08 -08:00 Committer

Re: [RFC][PATCH 2/2] x86: add extra serialization for non-serializing MSRs

2021-02-04 Thread Dave Hansen
... > Reported-by: Jan Kiszka > Cc: x...@kernel.org > Cc: Peter Zijlstra Don't know how I managed to miss it in the first place, but: Signed-off-by: Dave Hansen

Re: [PATCH v18 24/25] x86/cet/shstk: Add arch_prctl functions for shadow stack

2021-02-03 Thread Dave Hansen
On 2/3/21 1:54 PM, Yu, Yu-cheng wrote: > On 1/29/2021 10:56 AM, Yu, Yu-cheng wrote: >> On 1/29/2021 9:07 AM, Dave Hansen wrote: >>> On 1/27/21 1:25 PM, Yu-cheng Yu wrote: >>>> +    if (!IS_ENABLED(CONFIG_X86_CET)) >>>> +    return -EOPNOTSUPP; >

Re: [PATCH v5] x86/sgx: Fix use-after-free in sgx_mmu_notifier_release()

2021-02-03 Thread Dave Hansen
On 1/30/21 11:20 AM, Jarkko Sakkinen wrote: ... > Example scenario would such that all removals "side-channel" through > the notifier callback. Then mmu_notifier_unregister() gets called > exactly zero times. No MMU notifier srcu sync would be then happening. > > NOTE: There's bunch of other examp

Re: [PATCH] x86/fpu: Use consistent test for X86_FEATURE_XSAVES

2021-02-03 Thread Dave Hansen
On 2/3/21 3:23 AM, Borislav Petkov wrote: >> -/* >> - * 'XSAVES' implies two different things: >> - * 1. saving of supervisor/system state >> - * 2. using the compacted format >> - * >> - * Use this function when dealing with the compacted format so >> - * that it is obvious which aspect of 'XSAVES

Re: [RFC][PATCH 05/13] mm/numa: automatically generate node migration order

2021-02-02 Thread Dave Hansen
On 2/2/21 9:46 AM, Yang Shi wrote: > On Mon, Feb 1, 2021 at 11:13 AM Dave Hansen wrote: >> On 1/29/21 12:46 PM, Yang Shi wrote: >> ... >>>> int next_demotion_node(int node) >>>> { >>>> - return node_demotion[node]; >>>>

Re: [RFC][PATCH 08/13] mm/migrate: demote pages during reclaim

2021-02-02 Thread Dave Hansen
On 2/2/21 2:45 PM, Yang Shi wrote: >> Should we keep it simple for now and only try to demote those pages that are >> free of cpusets and memory policies? >> Actually, demoting those pages to a CPU or a NUMA node that does not fall >> into >> their set, would violate those constraints right? > Yes

Re: [RFC][PATCH 11/13] mm/vmscan: Consider anonymous pages without swap

2021-02-02 Thread Dave Hansen
On 2/2/21 10:56 AM, Yang Shi wrote: >> >> /* If we have no swap space, do not bother scanning anon pages. */ >> - if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0) { >> + if (!sc->may_swap || !can_reclaim_anon_pages(memcg, pgdat->node_id)) >> { > Just one minor thi

Re: [PATCH] x86: Remove unnecessary kmap() from sgx_ioc_enclave_init()

2021-02-02 Thread Dave Hansen
On 2/1/21 5:37 PM, ira.we...@intel.com wrote: > kmap is inefficient and we are trying to reduce the usage in the kernel. > There is no readily apparent reason why the initp_page page needs to be > allocated and kmap'ed() but sigstruct needs to be page aligned and token > 512 byte aligned. Hi Ira,

Re: [RFC][PATCH 08/13] mm/migrate: demote pages during reclaim

2021-02-02 Thread Dave Hansen
On 2/2/21 10:22 AM, Yang Shi wrote: >> +static struct page *alloc_demote_page(struct page *page, unsigned long node) >> +{ >> +struct migration_target_control mtc = { >> + /* >> +* Fail quickly and quietly. Page will likely >> +* just be discar

Re: [PATCH v18 05/25] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states

2021-02-01 Thread Dave Hansen
On 2/1/21 3:05 PM, Yu, Yu-cheng wrote: >>> >> >> Wait a sec...  What about *THIS* series?  Will *THIS* series give us >> oopses when userspace blasts a new XSAVE buffer in with NT_X86_XSTATE? >> > > Fortunately, CET states are supervisor states.  NT_x86_XSTATE has only > user states. Ahhh, good p

Re: [PATCH v18 05/25] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states

2021-02-01 Thread Dave Hansen
On 2/1/21 2:43 PM, Yu, Yu-cheng wrote: > On 1/29/2021 2:53 PM, Dave Hansen wrote: >> On 1/29/21 2:35 PM, Yu, Yu-cheng wrote: >>>> Andy Cooper just mentioned on IRC about this nugget in the spec: >>>> >>>>  XRSTORS on CET state will do reserved bit a

Re: [PATCH v18 21/25] x86/cet/shstk: Handle signals for shadow stack

2021-02-01 Thread Dave Hansen
On 1/27/21 1:25 PM, Yu-cheng Yu wrote: > To deliver a signal, create a shadow stack restore token and put a restore > token and the signal restorer address on the shadow stack. For sigreturn, > verify the token and restore the shadow stack pointer. > > Introduce WRUSS, which is a kernel-mode inst

Re: [RFC][PATCH 05/13] mm/numa: automatically generate node migration order

2021-02-01 Thread Dave Hansen
On 1/29/21 12:46 PM, Yang Shi wrote: ... >> int next_demotion_node(int node) >> { >> - return node_demotion[node]; >> + /* >> +* node_demotion[] is updated without excluding >> +* this function from running. READ_ONCE() avoids >> +* reading multiple, inconsist

Re: [RFC][PATCH 04/13] mm/numa: node demotion data structure and lookup

2021-02-01 Thread Dave Hansen
On 1/30/21 5:19 PM, David Rientjes wrote: > On Mon, 25 Jan 2021, Dave Hansen wrote: > >> diff -puN mm/migrate.c~0006-node-Define-and-export-memory-migration-path >> mm/migrate.c >> --- a/mm/migrate.c~0006-node-Define-and-export-memory-migration-path >> 2021-

Re: [PATCH v18 05/25] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states

2021-01-31 Thread Dave Hansen
On 1/29/21 2:35 PM, Yu, Yu-cheng wrote: >> Andy Cooper just mentioned on IRC about this nugget in the spec: >> >> XRSTORS on CET state will do reserved bit and canonicality >> checks on the state in similar manner as done by the WRMSR to >> these state elements. >> >> We're using copy_k

Re: [NEEDS-REVIEW] [PATCH v18 05/25] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states

2021-01-29 Thread Dave Hansen
On 1/27/21 1:25 PM, Yu-cheng Yu wrote: > @@ -135,6 +135,8 @@ enum xfeature { > #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR) > #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) > #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) > +#define XFEATURE

Re: [PATCH v18 02/25] x86/cet/shstk: Add Kconfig option for user-mode control-flow protection

2021-01-29 Thread Dave Hansen
On 1/29/21 11:58 AM, Andy Lutomirski wrote: >> Did any CPUs ever get released that have this? If so, name them. If >> not, time to change this to 2021, I think. > Zen 3 :) In that case is there any reason to keep the "depends on CPU_SUP_INTEL"?

Re: [PATCH v18 02/25] x86/cet/shstk: Add Kconfig option for user-mode control-flow protection

2021-01-29 Thread Dave Hansen
On 1/27/21 1:25 PM, Yu-cheng Yu wrote: > + help > + Control-flow protection is a hardware security hardening feature > + that detects function-return address or jump target changes by > + malicious code. It's not really one feature. I also think it's not worth talking about

Re: [PATCH v18 24/25] x86/cet/shstk: Add arch_prctl functions for shadow stack

2021-01-29 Thread Dave Hansen
On 1/29/21 10:56 AM, Yu, Yu-cheng wrote: > On 1/29/2021 9:07 AM, Dave Hansen wrote: >> On 1/27/21 1:25 PM, Yu-cheng Yu wrote: >>> +    u64 buf[3] = {0, 0, 0}; Doesn't the compiler zero these if you initialize it to anything? In other words, doesn't t

Re: [PATCH v18 24/25] x86/cet/shstk: Add arch_prctl functions for shadow stack

2021-01-29 Thread Dave Hansen
On 1/27/21 1:25 PM, Yu-cheng Yu wrote: > arch_prctl(ARCH_X86_CET_STATUS, u64 *args) > Get CET feature status. > > The parameter 'args' is a pointer to a user buffer. The kernel returns > the following information: > > *args = shadow stack/IBT status > *(args + 1) = shadow sta

Re: [PATCH V5] x86/mm: Tracking linear mapping split events

2021-01-28 Thread Dave Hansen
On 1/28/21 8:33 AM, Zi Yan wrote: >> One of the many lasting (as we don't coalesce back) sources for >> huge page splits is tracing as the granular page >> attribute/permission changes would force the kernel to split code >> segments mapped to huge pages to smaller ones thereby increasing >> the pr

Re: [PATCH v5] x86/sgx: Fix use-after-free in sgx_mmu_notifier_release()

2021-01-28 Thread Dave Hansen
On 1/28/21 4:58 AM, Jarkko Sakkinen wrote: > The most trivial example of a race condition can be demonstrated by this > sequence where mm_list contains just one entry: > > CPU A CPU B > -> sgx_release() > -> sgx_mmu_notifier_release() >

Re: [PATCH V3] x86/mm: Tracking linear mapping split events

2021-01-27 Thread Dave Hansen
On 1/27/21 2:50 PM, Saravanan D wrote: > +#if defined(__x86_64__) We don't use __x86_64__ in the kernel. This should be CONFIG_X86. > +#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) > + "direct_map_2M_splits", > +#else > + "direct_map_4M_splits", > +#endif > + "direct_map_1G_s

Re: [PATCH V2] x86/mm: Tracking linear mapping split events

2021-01-27 Thread Dave Hansen
On 1/27/21 1:03 PM, Tejun Heo wrote: >> The lifetime split event information will be displayed at the bottom of >> /proc/vmstat >> >> swap_ra 0 >> swap_ra_hit 0 >> direct_map_2M_splits 139 >> direct_map_4M_splits 0 >> direct_map_1G_splits 7 >> nr_unstable 0 >> > > This looks great to me.

[RFC][PATCH 09/13] mm/vmscan: add page demotion counter

2021-01-26 Thread Dave Hansen
ges() ] Signed-off-by: Yang Shi Signed-off-by: Dave Hansen Cc: David Rientjes Cc: Huang Ying Cc: Dan Williams Cc: David Hildenbrand Cc: osalvador -- Changes since 202010: * remove unused scan-control 'demoted' field --- b/include/linux/vm_event_item.h |2 ++

[RFC][PATCH 10/13] mm/vmscan: add helper for querying ability to age anonymous pages

2021-01-26 Thread Dave Hansen
From: Dave Hansen Anonymous pages are kept on their own LRU(s). These lists could theoretically always be scanned and maintained. But, without swap, there is currently nothing the kernel can *do* with the results of a scanned, sorted LRU for anonymous pages. A check for '!total_swap_

[RFC][PATCH 02/13] mm/vmscan: move RECLAIM* bits to uapi header

2021-01-26 Thread Dave Hansen
From: Dave Hansen It is currently not obvious that the RECLAIM_* bits are part of the uapi since they are defined in vmscan.c. Move them to a uapi header to make it obvious. This should have no functional impact. Signed-off-by: Dave Hansen Reviewed-by: Ben Widawsky Acked-by: David

[RFC][PATCH 00/13] [v5] Migrate Pages in lieu of discard

2021-01-26 Thread Dave Hansen
The full series is also available here: https://github.com/hansendc/linux/tree/automigrate-20210122 The meat of this patch is in: [PATCH 08/13] mm/migrate: demote pages during reclaim Which also has the most changes since the last post. This version is mostly to address revie

[RFC][PATCH 12/13] mm/vmscan: never demote for memcg reclaim

2021-01-26 Thread Dave Hansen
From: Dave Hansen Global reclaim aims to reduce the amount of memory used on a given node or set of nodes. Migrating pages to another node serves this purpose. memcg reclaim is different. Its goal is to reduce the total memory consumption of the entire memcg, across all nodes. Migration

[RFC][PATCH 04/13] mm/numa: node demotion data structure and lookup

2021-01-26 Thread Dave Hansen
From: Dave Hansen Prepare for the kernel to auto-migrate pages to other memory nodes with a user defined node migration table. This allows creating single migration target for each NUMA node to enable the kernel to do NUMA page migrations instead of simply reclaiming colder pages. A node with

[RFC][PATCH 13/13] mm/migrate: new zone_reclaim_mode to enable reclaim migration

2021-01-26 Thread Dave Hansen
From: Dave Hansen Some method is obviously needed to enable reclaim-based migration. Just like traditional autonuma, there will be some workloads that will benefit like workloads with more "static" configurations where hot pages stay hot and cold pages stay cold. If pages come a

[RFC][PATCH 01/13] mm/vmscan: restore zone_reclaim_mode ABI

2021-01-26 Thread Dave Hansen
From: Dave Hansen I went to go add a new RECLAIM_* mode for the zone_reclaim_mode sysctl. Like a good kernel developer, I also went to go update the documentation. I noticed that the bits in the documentation didn't match the bits in the #defines. The VM never explicitly check

Re: [PATCH] x86/mm: Tracking linear mapping split events since boot

2021-01-26 Thread Dave Hansen
On 1/25/21 4:53 PM, Tejun Heo wrote: >> This would be a lot more useful if you could reset the counters. Then >> just reset them from userspace at boot. Adding read-write debugfs >> exports for these should be pretty trivial. > While this would work for hands-on cases, I'm a bit worried that this

Re: [PATCH] x86/mm: Tracking linear mapping split events since boot

2021-01-26 Thread Dave Hansen
On 1/25/21 12:32 PM, Tejun Heo wrote: > On Mon, Jan 25, 2021 at 12:15:51PM -0800, Dave Hansen wrote: >>> DirectMap4k: 3505112 kB >>> DirectMap2M:19464192 kB >>> DirectMap1G:12582912 kB >>> DirectMap2MSplits: 1705 >>> DirectMap1GSplits:

[RFC][PATCH 11/13] mm/vmscan: Consider anonymous pages without swap

2021-01-26 Thread Dave Hansen
ion rename] Signed-off-by: Vishal Verma Signed-off-by: Dave Hansen Cc: Yang Shi Cc: David Rientjes Cc: Huang Ying Cc: Dan Williams Cc: David Hildenbrand Cc: osalvador -- Changes from Dave 10/2020: * remove 'total_swap_pages' modification Changes from Dave 06/2020: * rename recla

[RFC][PATCH 08/13] mm/migrate: demote pages during reclaim

2021-01-26 Thread Dave Hansen
From: Dave Hansen This is mostly derived from a patch from Yang Shi: https://lore.kernel.org/linux-mm/1560468577-101178-10-git-send-email-yang@linux.alibaba.com/ Add code to the reclaim path (shrink_page_list()) to "demote" data to another NUMA node instead of disc

[RFC][PATCH 03/13] mm/vmscan: replace implicit RECLAIM_ZONE checks with explicit checks

2021-01-25 Thread Dave Hansen
From: Dave Hansen RECLAIM_ZONE was assumed to be unused because it was never explicitly used in the kernel. However, there were a number of places where it was checked implicitly by checking 'node_reclaim_mode' for a zero value. These zero checks are not great because it is not ob

[RFC][PATCH 06/13] mm/migrate: update migration order during on hotplug events

2021-01-25 Thread Dave Hansen
From: Dave Hansen Reclaim-based migration is attempting to optimize data placement in memory based on the system topology. If the system changes, so must the migration ordering. The implementation here is pretty simple and entirely unoptimized. On any memory or CPU hotplug events, assume

[RFC][PATCH 07/13] mm/migrate: make migrate_pages() return nr_succeeded

2021-01-25 Thread Dave Hansen
account how many pages are reclaimed (demoted) since page reclaim behavior depends on this. Add *nr_succeeded parameter to make migrate_pages() return how many pages are demoted successfully for all cases. Signed-off-by: Yang Shi Signed-off-by: Dave Hansen Cc: David Rientjes Cc: Huang Ying Cc

[RFC][PATCH 05/13] mm/numa: automatically generate node migration order

2021-01-25 Thread Dave Hansen
From: Dave Hansen When memory fills up on a node, memory contents can be automatically migrated to another node. The biggest problems are knowing when to migrate and to where the migration should be targeted. The most straightforward way to generate the "to where" list would be to

Re: [PATCH] x86/mm: Tracking linear mapping split events since boot

2021-01-25 Thread Dave Hansen
On 1/25/21 12:11 PM, Saravanan D wrote: > Numerous hugepage splits in the linear mapping would give > admins the signal to narrow down the sluggishness caused by TLB > miss/reload. > > One of the many lasting (as we don't coalesce back) sources for huge page > splits is tracing as the granular pag

Re: [PATCH v17 08/26] x86/mm: Introduce _PAGE_COW

2021-01-21 Thread Dave Hansen
> Usually, the compiler is better at making code efficient than humans. I > find that coding it in the most human-readable way is best unless I > *know* the compiler is unable to generate god code. "good code", even. I really want a "god code" compiler, though. :)

Re: [PATCH v17 08/26] x86/mm: Introduce _PAGE_COW

2021-01-21 Thread Dave Hansen
On 1/21/21 12:16 PM, Yu, Yu-cheng wrote: > >>> @@ -343,6 +349,16 @@ static inline pte_t pte_mkold(pte_t pte) >>>     static inline pte_t pte_wrprotect(pte_t pte) >>>   { >>> +    /* >>> + * Blindly clearing _PAGE_RW might accidentally create >>> + * a shadow stack PTE (RW=0, Dirty=1).  Mov

Re: [PATCH] x86/fpu/xstate: calculate the number by sizeof and offsetof

2021-01-19 Thread Dave Hansen
On 1/19/21 10:44 PM, Yejune Deng wrote: > In fpstate_sanitize_xstate(), use memset and offsetof instead of '= 0', > and use sizeof instead of a constant. What's the benefit to doing this? Saving 4 lines of code? Your suggestions are not obviously wrong at a glance, but they're also not obviously

Re: [RFC V1 3/7] crypto: ghash - Optimized GHASH computations

2021-01-15 Thread Dave Hansen
On 1/15/21 6:04 PM, Eric Biggers wrote: > On Fri, Jan 15, 2021 at 04:20:44PM -0800, Dave Hansen wrote: >> On 1/15/21 4:14 PM, Dey, Megha wrote: >>> Also, I do not know of any cores that implement PCLMULQDQ and not AES-NI. >> That's true, bit it's also possible

Re: [RFC V1 3/7] crypto: ghash - Optimized GHASH computations

2021-01-15 Thread Dave Hansen
On 1/15/21 4:14 PM, Dey, Megha wrote: > Also, I do not know of any cores that implement PCLMULQDQ and not AES-NI. That's true, bit it's also possible that a hypervisor could enumerate support for PCLMULQDQ and not AES-NI. In general, we've tried to implement x86 CPU features independently, even i

Re: [PATCH RFC] x86/sgx: Add trivial NUMA allocation

2021-01-14 Thread Dave Hansen
On 1/14/21 9:54 AM, Jarkko Sakkinen wrote: > On Tue, Jan 12, 2021 at 04:24:01PM -0800, Dave Hansen wrote: >> We need a bit more information here as well. What's the relationship >> between NUMA nodes and sections? How does the BIOS tell us which NUMA >> nodes a section

Re: [PATCH RFC] x86/sgx: Add trivial NUMA allocation

2021-01-12 Thread Dave Hansen
On 12/16/20 5:50 AM, Jarkko Sakkinen wrote: > Create a pointer array for each NUMA node with the references to the > contained EPC sections. Use this in __sgx_alloc_epc_page() to knock the > current NUMA node before the others. It makes it harder to comment when I'm not on cc. Hint, hint... ;) W

[PATCH] x86/sgx: rename and document SGX bit lock

2021-01-12 Thread Dave Hansen
SGX ioctl() calls are serialized with a lock. It's a weird open-coded lock that is not even called a "lock". That makes it a weird beast, but Sean has convinced me it's a good idea without better alternatives. Give the lock bit a better name, and document what it actually trying to do. Cc: Se

Re: [PATCH v1 2/3] x86/cpu: Set low performance CRC32C flag on some Zhaoxin CPUs

2021-01-07 Thread Dave Hansen
On 1/6/21 10:19 PM, Tony W Wang-oc wrote: > + /* > + * These CPUs declare support SSE4.2 instruction sets but > + * having low performance CRC32C instruction implementation. > + */ > + if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b)) > + set_cpu_cap(c

Re: [PATCH 2/2] x86/mm: Remove duplicate definition of _PAGE_PAT_LARGE

2021-01-05 Thread Dave Hansen
were allowed. Guess you learn something new every day. This looks fine to me, it removes the exact dup that Ingo appears to have added. Acked-by: Dave Hansen

Re: [PATCH] x86/mm: Fix leak of pmd ptlock

2021-01-05 Thread Dave Hansen
estigation into why we're suddenly seeing this now. I agree that ridding ourselves of open-coded free_page()'s is a good idea, but this patch itself needs to be around for stable anyway. So, Acked-by: Dave Hansen

Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO

2021-01-04 Thread Dave Hansen
On 1/4/21 12:11 PM, David Hildenbrand wrote: >> Yeah, it certainly can't be the default, but it *is* useful for >> thing where we know that there are no cache benefits to zeroing >> close to where the memory is allocated. >> >> The trick is opting into it somehow, either in a process or a VMA. >>

Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO

2021-01-04 Thread Dave Hansen
On 1/4/21 11:27 AM, Matthew Wilcox wrote: > On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote: >> On 12/21/20 8:30 AM, Liang Li wrote: >>> --- a/include/linux/page-flags.h >>> +++ b/include/linux/page-flags.h >>> @@ -137,6 +137,9 @@ enum pageflags {

Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO

2021-01-04 Thread Dave Hansen
On 12/21/20 8:30 AM, Liang Li wrote: > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -137,6 +137,9 @@ enum pageflags { > #endif > #ifdef CONFIG_64BIT > PG_arch_2, > +#endif > +#ifdef CONFIG_PREZERO_PAGE > + PG_zero, > #endif > __NR_PAGEFLAGS, I don't t

Re: [PATCH V3 04/10] x86/pks: Preserve the PKRS MSR on context switch

2020-12-18 Thread Dave Hansen
On 12/18/20 11:42 AM, Ira Weiny wrote: > Another problem would be if the kmap and kunmap happened in different > contexts... :-/ I don't think that is done either but I don't know for > certain. It would be really nice to put together some surveillance patches to help become more certain about t

Re: [NEEDS-REVIEW] [PATCH V3 04/10] x86/pks: Preserve the PKRS MSR on context switch

2020-12-18 Thread Dave Hansen
On 12/17/20 8:10 PM, Ira Weiny wrote: > On Thu, Dec 17, 2020 at 12:41:50PM -0800, Dave Hansen wrote: >> On 11/6/20 3:29 PM, ira.we...@intel.com wrote: >>> void disable_TSC(void) >>> @@ -644,6 +668,8 @@ void __switch_to_xtra(struct task_struct *prev_p, >

Re: [NEEDS-REVIEW] [RFC PATCH 7/8] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions

2020-12-17 Thread Dave Hansen
On 12/16/20 9:41 AM, Chang S. Bae wrote: > +config CRYPTO_AES_KL > + tristate "AES cipher algorithms (AES-KL)" > + depends on X86_KEYLOCKER > + select CRYPTO_AES_NI_INTEL > + help > + Use AES Key Locker instructions for AES algorithm. > + > + AES cipher algorithms (FIPS-

Re: [PATCH V3 10/10] x86/pks: Add PKS test code

2020-12-17 Thread Dave Hansen
On 11/6/20 3:29 PM, ira.we...@intel.com wrote: > + /* Arm for context switch test */ > + write(fd, "1", 1); > + > + /* Context switch out... */ > + sleep(4); > + > + /* Check msr restored */ > + write(fd, "2", 1); These are al

Re: [NEEDS-REVIEW] [PATCH V3 04/10] x86/pks: Preserve the PKRS MSR on context switch

2020-12-17 Thread Dave Hansen
On 11/6/20 3:29 PM, ira.we...@intel.com wrote: > void disable_TSC(void) > @@ -644,6 +668,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct > task_struct *next_p) > > if ((tifp ^ tifn) & _TIF_SLD) > switch_to_sld(tifn); > + > + pks_sched_in(); > } Does the s

Re: [PATCH] x86/mm: increase pgt_buf size for 5-level page tables

2020-12-16 Thread Dave Hansen
INIT_PGD_PAGE_COUNT (2 * INIT_PGD_PAGE_TABLES) > #else > -#define INIT_PGD_PAGE_COUNT 12 > +#define INIT_PGD_PAGE_COUNT (4 * INIT_PGD_PAGE_TABLES) > #endif > + Lorenzo, thanks for the patch. That was a very nice changelog, and it all seems sane to me, especially with Kirill's ack. Acked-by: Dave Hansen

Re: [NEEDS-REVIEW] [RFC PATCH] do_exit(): panic() recursion detected

2020-12-07 Thread Dave Hansen
On 12/7/20 4:40 AM, Vladimir Kondratiev wrote: > Recursive do_exit() is symptom of compromised kernel integrity. > For safety critical systems, it may be better to > panic() in this case to minimize risk. This changelog is still woefully inadequate. It doesn't really describe the problem which is

Re: [NEEDS-REVIEW] [PATCH] do_exit(): panic() when double fault detected

2020-12-06 Thread Dave Hansen
On 12/6/20 5:10 AM, Vladimir Kondratiev wrote: > Double fault detected in do_exit() is symptom of integrity > compromised. For safety critical systems, it may be better to > panic() in this case to minimize risk. Does this fix a real problem that you have observed in practice? Or, is this a gener

Re: [PATCH v15 06/26] x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW

2020-12-03 Thread Dave Hansen
On 12/3/20 1:19 AM, Borislav Petkov wrote: > On Tue, Nov 10, 2020 at 08:21:51AM -0800, Yu-cheng Yu wrote: >> Before introducing _PAGE_COW for non-hardware memory management purposes in >> the next patch, rename _PAGE_DIRTY to _PAGE_DIRTY_HW and _PAGE_BIT_DIRTY to >> _PAGE_BIT_DIRTY_HW to make meani

Re: [PATCH v15 03/26] x86/fpu/xstate: Introduce CET MSR XSAVES supervisor states

2020-12-01 Thread Dave Hansen
On 11/30/20 3:16 PM, Yu, Yu-cheng wrote: >> >> Do we have any other spots in the kernel where we care about: >> >> boot_cpu_has(X86_FEATURE_SHSTK) || >> boot_cpu_has(X86_FEATURE_IBT) >> >> ?  If so, we could also address this by declaring a software-defined >> X86_FEATURE_CET and then setti

Re: [PATCH 1/2] x86/mm/pti: Check unaligned address for pmd clone in pti_clone_pagetable()

2020-12-01 Thread Dave Hansen
On 11/30/20 7:25 AM, Lai Jiangshan wrote: > The commit 825d0b73cd752("x86/mm/pti: Handle unaligned address gracefully > in pti_clone_pagetable()") handles unaligned address well for unmapped > PUD/PMD etc. But unaligned address for pmd_large() or PTI_CLONE_PMD is also > needed to be aware. That 82

Re: [NEEDS-REVIEW] [PATCH v15 03/26] x86/fpu/xstate: Introduce CET MSR XSAVES supervisor states

2020-11-30 Thread Dave Hansen
On 11/30/20 10:06 AM, Yu, Yu-cheng wrote: >>> +    if (!boot_cpu_has(X86_FEATURE_SHSTK) && >>> +    !boot_cpu_has(X86_FEATURE_IBT)) >>> +    xfeatures_mask_all &= ~BIT_ULL(i); >>> +    } else { >>> +    if ((xsave_cpuid_features[i] == -1) || >> >> Where d

Re: [PATCH -v6 2/3] NOT kernel/man-pages man2/set_mempolicy.2: Add mode flag MPOL_F_NUMA_BALANCING

2020-11-30 Thread Dave Hansen
On 11/25/20 9:32 PM, Huang Ying wrote: > --- a/man2/set_mempolicy.2 > +++ b/man2/set_mempolicy.2 > @@ -113,6 +113,11 @@ A nonempty > .I nodemask > specifies node IDs that are relative to the set of > node IDs allowed by the process's current cpuset. > +.TP > +.BR MPOL_F_NUMA_BALANCING " (since L

Re: [NEEDS-REVIEW] [PATCH v15 03/26] x86/fpu/xstate: Introduce CET MSR XSAVES supervisor states

2020-11-30 Thread Dave Hansen
On 11/10/20 8:21 AM, Yu-cheng Yu wrote: > Control-flow Enforcement Technology (CET) adds five MSRs. Introduce > them and their XSAVES supervisor states: > > MSR_IA32_U_CET (user-mode CET settings), > MSR_IA32_PL3_SSP (user-mode Shadow Stack pointer), > MSR_IA32_PL0_SSP (kernel-mode Sh

Re: [PATCH v7] lib: optimize cpumask_local_spread()

2020-11-30 Thread Dave Hansen
>>> { >>> - int cpu, hk_flags; >>> + static DEFINE_SPINLOCK(spread_lock); >>> + static bool used[MAX_NUMNODES]; >> >> I thought I mentioned this last time. How large is this array? How >> large would it be if it were a nodemask_t? Would this be less code if > > Apologies that I forgot to

Re: [PATCH 2/2] x86/mm/pti: warn and stop when pti_clone_pagetable() is on 1G page

2020-11-30 Thread Dave Hansen
On 11/30/20 7:25 AM, Lai Jiangshan wrote: > --- a/arch/x86/mm/pti.c > +++ b/arch/x86/mm/pti.c > @@ -321,10 +321,10 @@ pti_clone_pgtable(unsigned long start, unsigned long > end, > break; > > pgd = pgd_offset_k(addr); > - if (WARN_ON(pgd_none(*pgd))

Re: [PATCH v40 00/24] Intel SGX foundations

2020-11-21 Thread Dave Hansen
On 11/21/20 7:12 AM, Dr. Greg wrote: >> Important Kernel Touch Points >> = >> >> This implementation is picky and will decline to work on hardware which >> is locked to Intel's root of trust. > Given that this driver is no longer locked to the Intel trust root, by > virt

Re: [PATCH v7] lib: optimize cpumask_local_spread()

2020-11-20 Thread Dave Hansen
On 11/17/20 6:54 PM, Shaokun Zhang wrote: > From: Yuqi Jin > > In multi-processor and NUMA system, I/O driver will find cpu cores that > which shall be bound IRQ. When cpu cores in the local numa have been > used up, it is better to find the node closest to the local numa node > for performance,

[tip: x86/sgx] x86/sgx: Clarify 'laundry_list' locking

2020-11-18 Thread tip-bot2 for Dave Hansen
The following commit has been merged into the x86/sgx branch of tip: Commit-ID: 67655b57f8f59467506463055d9a8398d2836377 Gitweb: https://git.kernel.org/tip/67655b57f8f59467506463055d9a8398d2836377 Author:Dave Hansen AuthorDate:Mon, 16 Nov 2020 14:25:31 -08:00 Committer

Re: [PATCH v41 12/24] x86/sgx: Add SGX_IOC_ENCLAVE_CREATE

2020-11-16 Thread Dave Hansen
On 11/16/20 9:54 AM, Dave Hansen wrote: >> ENCLS instructions must be serialized for a given enclave, but holding >> encl->lock for an entire ioctl() will result in deadlock due to an enclave >> triggering reclaim on itself. >> >> Building an enclave must also be

[PATCH] x86/sgx: clarify 'laundry_list' locking

2020-11-16 Thread Dave Hansen
From: Dave Hansen Short Version: The SGX section->laundry_list structure is effectively thread-local, but declared next to some shared structures. Its semantics are clear as mud. Fix that. No functional changes. Compile tested only. Long Version: The SGX hardware keeps per-page metad

Re: [PATCH v41 11/24] x86/sgx: Add SGX misc driver interface

2020-11-16 Thread Dave Hansen
On 11/14/20 8:01 PM, Hillf Danton wrote: > On Fri, 13 Nov 2020 00:01:22 +0200 Jarkko Sakkinen wrote: > + > +static unsigned long sgx_get_unmapped_area(struct file *file, > +unsigned long addr, > +unsigned long len, > +

Re: [PATCH v41 05/24] x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections

2020-11-16 Thread Dave Hansen
On 11/14/20 12:42 AM, Hillf Danton wrote: > On Fri, 13 Nov 2020 00:01:16 +0200 Jarkko Sakkinen wrote: >> + */ >> +static void sgx_sanitize_section(struct sgx_epc_section *section) >> +{ >> +struct sgx_epc_page *page; >> +LIST_HEAD(dirty); >> +int ret; >> + >> +while (!list_empty(§io

Re: [PATCH v41 12/24] x86/sgx: Add SGX_IOC_ENCLAVE_CREATE

2020-11-16 Thread Dave Hansen
Hillf, I noticed that you removed a bunch of folks from cc, including me. Was there a reason for that? I haven't been seeing your feedback on these patches at all. On 11/14/20 8:40 PM, Hillf Danton wrote: > On Fri, 13 Nov 2020 00:01:23 +0200 Jarkko Sakkinen wrote: >> +long sgx_ioctl(struct file

Re: [PATCH v41 00/24] Intel SGX foundations

2020-11-16 Thread Dave Hansen
On 11/16/20 8:55 AM, Borislav Petkov wrote: > On Fri, Nov 13, 2020 at 12:01:11AM +0200, Jarkko Sakkinen wrote: >> Sean Christopherson is a major contributor to this series. However, he >> has left Intel and his @intel.com address will soon be bouncing. He >> does not have a email he wants us to s

Re: [PATCH 0/5] perf/mm: Fix PERF_SAMPLE_*_PAGE_SIZE

2020-11-16 Thread Dave Hansen
On 11/16/20 8:32 AM, Matthew Wilcox wrote: >> >> That's really the best we can do from software without digging into >> microarchitecture-specific events. > I mean this is perf. Digging into microarch specific events is what it > does ;-) Yeah, totally. But, if we see a bunch of 4k TLB hit event

Re: [PATCH 0/5] perf/mm: Fix PERF_SAMPLE_*_PAGE_SIZE

2020-11-16 Thread Dave Hansen
On 11/16/20 7:54 AM, Matthew Wilcox wrote: > It gets even more complicated with CPUs with multiple levels of TLB > which support different TLB entry sizes. My CPU reports: > > TLB info > Instruction TLB: 2M/4M pages, fully associative, 8 entries > Instruction TLB: 4K pages, 8-way associative, 6

<    1   2   3   4   5   6   7   8   9   10   >