On 9/24/20 12:28 PM, Sean Christopherson wrote:
> On Thu, Sep 24, 2020 at 02:11:37PM -0500, Haitao Huang wrote:
>> On Wed, 23 Sep 2020 08:50:56 -0500, Jarkko Sakkinen
>> wrote:
>>> I'll categorically deny noexec in the next patch set version.
>>>
>>> /Jarkko
>> There are use cases supported curren
On 9/23/20 7:33 AM, Jarkko Sakkinen wrote:
> The consequence is that enclaves are best created with an ioctl API and the
> access control can be based only to the origin of the source file for the
> enclave data, i.e. on VMA file pointer and page permissions. For example,
> this could be done with
On 9/23/20 3:47 PM, Andy Lutomirski wrote:
> On Wed, Sep 23, 2020 at 3:20 PM Yu, Yu-cheng wrote:
>> On 9/23/2020 3:08 PM, Dave Hansen wrote:
>>> On 9/23/20 3:06 PM, Yu, Yu-cheng wrote:
>>>> I think I'll add a check here for (r + 8) >= TASK_SIZE_MAX.
From: Dave Hansen
Tianshu Qiu has three MAINTAINERS entries, and one typo. After being
notified if the typo a few months ago, they didn't act, so here's a
patch.
Tianshu, an ack would be appreciated.
Signed-off-by: Dave Hansen
Cc: Tianshu Qiu
Cc: Shawn Tu
Cc: Bingbu Cao
Cc
On 9/21/20 10:27 PM, Feng Tang wrote:
> +static void parse_text(void)
> +{
> + FILE *file;
> + char *line = NULL;
> + size_t len = 0;
> + int ret;
> +
> + file = fopen("cpuid.txt", "r");
> + if (!file) {
> + printf("Error in opening 'cpuid.txt'\n");
> +
On 9/22/20 5:58 AM, Jarkko Sakkinen wrote:
> Intel Sofware Guard eXtensions (SGX) allows creation of executable blobs
> called enclaves, of which page permissions are defined when the enclave
"of which" => "for which"
> is first loaded. Once an enclave is loaded and initialized, it can be
> mappe
On 9/21/20 3:30 PM, Yu, Yu-cheng wrote:
> +config X86_INTEL_BRANCH_TRACKING_USER
> +prompt "Intel Indirect Branch Tracking for user-mode"
Take the "Intel " and "INTEL_" out, please. It will only cause us all
pain later if some of our x86 compatriots decide to implement this.
> If the kernel
> Here is my CET slides for LPC 2020:
>
> https://gitlab.com/cet-software/cet-smoke-test/-/wikis/uploads/09431a51248858e6f716a59065d732e2/CET-LPC-2020.pdf
>
> which may have answers for most questions.
Hi H.J.,
I know you're not super familiar with our kernel development process,
which might be
On 9/18/20 2:06 PM, H.J. Lu wrote:
> On Fri, Sep 18, 2020 at 2:00 PM Pavel Machek wrote:
>> On Fri 2020-09-18 12:32:57, Dave Hansen wrote:
>>> On 9/18/20 12:23 PM, Yu-cheng Yu wrote:
>>>> Emulation of the legacy vsyscall page is required by some programs
>>&
On 9/18/20 12:23 PM, Yu-cheng Yu wrote:
> Emulation of the legacy vsyscall page is required by some programs
> built before 2013. Newer programs after 2013 don't use it.
> Disable vsyscall emulation when Control-flow Enforcement (CET) is
> enabled to enhance security.
How does this "enhance secur
On 9/17/20 2:58 PM, Liang, Kan wrote:
> The user space perf tool looks like a better place for this kind of
> warning. The perf tool knows the total number of the samples. It also
> knows the number of the page size 0 samples. We can set a threshold,
> e.g., 90%. If 90% of the samples have the page
On 9/17/20 6:52 AM, kan.li...@linux.intel.com wrote:
> + mm = current->mm;
> + if (!mm) {
> + /*
> + * For kernel threads and the like, use init_mm so that
> + * we can find kernel memory.
> + */
> + mm = &init_mm;
> + }
I
On 9/15/20 3:17 AM, Jarkko Sakkinen wrote:
> OK, spotted the regression, sorry about this. I'll fix it for v38, which
> I'm sending soon given the email server issues with v37.
I'm going to cry uncle on the mail quantity too. Someone is going to
think the mail relays are mining bitcoin.
Especial
On 9/15/20 12:08 PM, Yu-cheng Yu wrote:
> On Mon, 2020-09-14 at 17:12 -0700, Yu, Yu-cheng wrote:
>> On 9/14/2020 7:50 AM, Dave Hansen wrote:
>>> On 9/11/20 3:59 PM, Yu-cheng Yu wrote:
>>> ...
>>>> Here are the changes if we take the mprotect(PROT_SHSTK)
On 9/15/20 7:32 AM, Matthew Wilcox wrote:
> On Tue, Sep 15, 2020 at 08:59:23PM +0800, Muchun Song wrote:
>> This patch series will free some vmemmap pages(struct page structures)
>> associated with each hugetlbpage when preallocated to save memory.
> It would be lovely to be able to do this. Unfor
On 9/14/20 11:31 AM, Andy Lutomirski wrote:
> No matter what we do, the effects of calling vfork() are going to be a
> bit odd with SHSTK enabled. I suppose we could disallow this, but
> that seems likely to cause its own issues.
What's odd about it? If you're a vfork()'d child, you can't touch
On 9/11/20 3:59 PM, Yu-cheng Yu wrote:
...
> Here are the changes if we take the mprotect(PROT_SHSTK) approach.
> Any comments/suggestions?
I still don't like it. :)
I'll also be much happier when there's a proper changelog to accompany
this which also spells out the alternatives any why they suc
On 9/11/20 1:10 PM, Krish Sadhukhan wrote:
...
>>> +#define X86_FEATURE_HW_CACHE_COHERENCY (11*32+ 7) /* AMD
>>> hardware-enforced cache coherency */
>> That's an awfully generic name. We generally have "hardware-enforced
>> cache coherency" already everywhere. :)
>>
>> This probably needs to say
On 9/11/20 12:25 PM, Krish Sadhukhan wrote:
>
> diff --git a/arch/x86/include/asm/cpufeatures.h
> b/arch/x86/include/asm/cpufeatures.h
> index 81335e6fe47d..0e5b27ee5931 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -293,6 +293,7 @@
> #define X
On 9/11/20 3:09 AM, David Hildenbrand wrote:
> Maybe we can derive the actual DIMMs from some ACPI tables (SRAT?),
> instead of relying on e820/"System RAM resources" - I have no clue.
It's actually really hard to map a DIMM to a physical address.
Interleaving can mean that one page actually spans
On 9/10/20 3:20 AM, David Hildenbrand wrote:
> While I'd love to rip it out completely, I think it would break old
> lsmem/chmem completely - and I assume that's not acceptable. I was
> wondering what would be considered safe to do now/in the future:
>
> 1. Make it always return 0 (just as if "scl
On 9/10/20 3:20 AM, David Hildenbrand wrote:
> I was just exploring how /sys/devices/system/memory/memoryX/phys_device
> is/was used. It's one of these interfaces that most probably never
> should have been added but now we are stuck with it.
While I'm all for cleanups, what specific problems is p
On 9/9/20 4:25 PM, Yu, Yu-cheng wrote:
> On 9/9/2020 4:11 PM, Dave Hansen wrote:
>> On 9/9/20 4:07 PM, Yu, Yu-cheng wrote:
>>> What if a writable mapping is passed to madvise(MADV_SHSTK)? Should
>>> that be rejected?
>>
>> It doesn't matter to me.
On 9/9/20 4:07 PM, Yu, Yu-cheng wrote:
> What if a writable mapping is passed to madvise(MADV_SHSTK)? Should
> that be rejected?
It doesn't matter to me. Even if it's readable, it _stops_ being even
directly readable after it's a shadow stack, right? I don't think
writes are special in any way.
On 9/9/20 3:08 PM, Yu, Yu-cheng wrote:
> After looking at this more, I found the changes are more similar to
> mprotect() than madvise(). We are going to change an anonymous mapping
> to a read-only mapping, and add the VM_SHSTK flag to it. Would an
> x86-specific mprotect(PROT_SHSTK) make more s
On 9/9/20 5:29 AM, Gerald Schaefer wrote:
> This only works well as long there are real pagetable pointers involved,
> that can also be used for iteration. For gup_fast, or any other future
> pagetable walkers using the READ_ONCE logic w/o lock, that is not true.
> There are pointers involved to lo
On 9/7/20 11:00 AM, Gerald Schaefer wrote:
> Commit 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast
> code") introduced a subtle but severe bug on s390 with gup_fast, due to
> dynamic page table folding.
Would it be fair to say that the "fake" page table entries s390
allocates o
On 9/7/20 11:00 AM, Gerald Schaefer wrote:
> x86:
> add/remove: 0/0 grow/shrink: 2/0 up/down: 10/0 (10)
> Function old new delta
> vmemmap_populate 587 592 +5
> munlock_vma_pages_range 556 561
On 9/8/20 10:50 AM, Yu, Yu-cheng wrote:
> What about this:
>
> - Do not add any new syscall or arch_prctl for creating a new shadow stack.
>
> - Add a new arch_prctl that can turn an anonymous mapping to a shadow
> stack mapping.
>
> This allows the application to do whatever is necessary. It c
On 9/7/20 6:40 AM, Marco Elver wrote:
> +The most important parameter is KFENCE's sample interval, which can be set
> via
> +the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
> +sample interval determines the frequency with which heap allocations will be
> +guarded by KFENC
On 9/3/20 9:32 AM, Andy Lutomirski wrote:
>> Taking the config register out of the init state is illogical, as is
>> writing to SSP while the config register is in its init state.
> What's so special about the INIT state? It's optimized by XSAVES, but
> it's just a number, right? So taking the re
On 9/3/20 9:15 AM, Andy Lutomirski wrote:
> On Thu, Sep 3, 2020 at 9:12 AM Dave Hansen wrote:
>>
>> On 9/3/20 9:09 AM, Yu, Yu-cheng wrote:
>>> If the debugger is going to write an MSR, only in the third case would
>>> this make a slight sense. For example, if th
On 9/3/20 9:09 AM, Yu, Yu-cheng wrote:
> If the debugger is going to write an MSR, only in the third case would
> this make a slight sense. For example, if the system has CET enabled,
> but the task does not have CET enabled, and GDB is writing to a CET MSR.
> But still, this is strange to me.
I
On 9/2/20 9:35 PM, Andy Lutomirski wrote:
>> + fpu__prepare_read(fpu);
>> + cetregs = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
>> + if (!cetregs)
>> + return -EFAULT;
> Can this branch ever be hit without a kernel bug? If yes, I think
On 9/2/20 9:52 AM, Borislav Petkov wrote:
>> I was *really* hoping that we could eventually feed kcpuid and the
>> X86_FEATURE_* bits from the same source.
> But X86_FEATURE_* won't be all bits in all CPUID leafs - only the ones the
> kernel has enabled/use for/needs/...
>
> Also you have CPUID fi
On 9/2/20 9:45 AM, pet...@infradead.org wrote:
> On Thu, Aug 27, 2020 at 03:49:03PM +0800, Feng Tang wrote:
>> End users frequently want to know what features their processor
>> supports, independent of what the kernel supports.
>>
>> /proc/cpuinfo is great. It is omnipresent and since it is provid
On 9/2/20 8:40 AM, Borislav Petkov wrote:
> When you need to add a new leaf, you simply extend the text file and the
> tool parses it anew and has its all CPUID info uptodate. This way you
> won't even have to recompile it. Adding new CPUID leafs would be adding new
> lines to the file.
>
> For ex
On 9/1/20 10:45 AM, Andy Lutomirski wrote:
>>> For arm64 (and sparc etc.) we continue to use the regular mmap/mprotect
>>> family of calls. One or two additional arch-specific mmap flags are
>>> sufficient for now.
>>>
>>> Is x86 definitely not going to fit within those calls?
>> That can work for
On 8/29/20 3:46 AM, Shuo A Liu wrote:
> On Fri 28.Aug'20 at 12:25:59 +0200, Greg Kroah-Hartman wrote:
>> On Tue, Aug 25, 2020 at 10:45:05AM +0800, shuo.a@intel.com wrote:
>>> +static long acrn_dev_ioctl(struct file *filp, unsigned int cmd,
>>> + unsigned long ioctl_param)
>>> +{
>
On 8/26/20 11:49 AM, Yu, Yu-cheng wrote:
>> I would expect things like Go and various JITs to call it directly.
>>
>> If we wanted to be fancy and add a potentially more widely useful
>> syscall, how about:
>>
>> mmap_special(void *addr, size_t length, int prot, int flags, int type);
>>
>> Where ty
On 8/25/20 2:04 PM, Yu, Yu-cheng wrote:
>>> I think this is more arch-specific. Even if it becomes a new syscall,
>>> we still need to pass the same parameters.
>>
>> Right, but without the copying in and out of memory.
>>
> Linux-api is already on the Cc list. Do we need to add more people to
>
On 8/25/20 11:43 AM, Yu, Yu-cheng wrote:
>>> arch_prctl(ARCH_X86_CET_MMAP_SHSTK, u64 *args)
>>> Allocate a new shadow stack.
>>>
>>> The parameter 'args' is a pointer to a user buffer.
>>>
>>> *args = desired size
>>> *(args + 1) = MAP_32BIT or MAP_POPULATE
>>>
>>> On retur
On 8/25/20 10:59 AM, Andrew Cooper wrote:
> If I've read the TDX spec/whitepaper properly, the main hypervisor can
> write to all the encrypted pages. This will destroy data, break the
> MAC, and yields #PF inside the SEAM hypervisor, or the TD when the cache
> line is next referenced.
I think yo
On 8/24/20 9:39 PM, Sean Christopherson wrote:
> +Andy
>
> On Mon, Aug 24, 2020 at 02:52:01PM +0100, Andrew Cooper wrote:
>> And to help with coordination, here is something prepared (slightly)
>> earlier.
>>
>> https://docs.google.com/document/d/1hWejnyDkjRRAW-JEsRjA5c9CKLOPc6VKJQsuvODlQEI/edit?u
On 8/20/20 12:05 PM, Tom Lendacky wrote:
>> I added a quick hack to save TSC_AUX to a new variable in the SVM
>> struct and then restore it right after VMEXIT (just after where GS is
>> restored in svm_vcpu_enter_exit()) and my guest is no longer crashing.
>
> Sorry, I mean my host is no longer cr
On 8/20/20 1:06 AM, Huang, Ying wrote:
>> +/* Migrate pages selected for demotion */
>> +nr_reclaimed += demote_page_list(&ret_pages, &demote_pages, pgdat, sc);
>> +
>> pgactivate = stat->nr_activate[0] + stat->nr_activate[1];
>>
>> mem_cgroup_uncharge_list(&free_pages);
>> _
>
From: Dave Hansen
Some method is obviously needed to enable reclaim-based migration.
Just like traditional autonuma, there will be some workloads that
will benefit like workloads with more "static" configurations where
hot pages stay hot and cold pages stay cold. If pages come a
From: Dave Hansen
When memory fills up on a node, memory contents can be
automatically migrated to another node. The biggest problems are
knowing when to migrate and to where the migration should be
targeted.
The most straightforward way to generate the "to where" list
would be to
ges()
]
Signed-off-by: Yang Shi
Signed-off-by: Dave Hansen
Cc: David Rientjes
Cc: Huang Ying
Cc: Dan Williams
---
b/include/linux/vm_event_item.h |2 ++
b/mm/vmscan.c |6 ++
b/mm/vmstat.c |2 ++
3 files changed, 10 insertions(+)
diff -
From: Dave Hansen
This is mostly derived from a patch from Yang Shi:
https://lore.kernel.org/linux-mm/1560468577-101178-10-git-send-email-yang@linux.alibaba.com/
Add code to the reclaim path (shrink_page_list()) to "demote" data
to another NUMA node instead of disc
From: Dave Hansen
Reclaim-based migration is attempting to optimize data placement in
memory based on the system topology. If the system changes, so must
the migration ordering.
The implementation here is pretty simple and entirely unoptimized. On
any memory or CPU hotplug events, assume
From: Dave Hansen
Global reclaim aims to reduce the amount of memory used on
a given node or set of nodes. Migrating pages to another
node serves this purpose.
memcg reclaim is different. Its goal is to reduce the
total memory consumption of the entire memcg, across all
nodes. Migration
account how many pages are reclaimed (demoted) since page
reclaim behavior depends on this. Add *nr_succeeded parameter to make
migrate_pages() return how many pages are demoted successfully for all
cases.
Signed-off-by: Yang Shi
Signed-off-by: Dave Hansen
Cc: David Rientjes
Cc: Huang Ying
Cc
-by: Dave Hansen
Cc: Yang Shi
Cc: David Rientjes
Cc: Huang Ying
Cc: Dan Williams
--
Changes from Dave 06/2020:
* rename reclaim_anon_pages()->can_reclaim_anon_pages()
Note: Keith's Intel SoB is commented out because he is no
longer at Intel and his @intel.com mail will bouncee
---
e migrations?
* Migration failures will result in pages being unreclaimable.
Need to be able to fall back to normal reclaim.
Cc: Yang Shi
Cc: David Rientjes
Cc: Huang Ying
Cc: Dan Williams
--
Dave Hansen (5):
mm/numa: node demotion data structure and lookup
mm/vmscan: At
From: Dave Hansen
Prepare for the kernel to auto-migrate pages to other memory nodes
with a user defined node migration table. This allows creating single
migration target for each NUMA node to enable the kernel to do NUMA
page migrations instead of simply reclaiming colder pages. A node
with
On 8/14/20 10:46 AM, Andy Lutomirski wrote:
> I'm a little unconvinced about the security benefits. As far as I
> know, UC memory will not end up in cache by any means (unless
> aliased), but it's going to be tough to do much with UC data with
> anything resembling reasonable performance without d
From: Dave Hansen
Greg has challenged some recent driver submitters on their license
choices. He was correct to do so, as the choices in these instances
did not always advance the aims of the submitters.
But, this left submitters (and the folks who help them pick licenses)
a bit confused
On 8/12/20 1:23 AM, Greg KH wrote:
> On Tue, Aug 11, 2020 at 10:17:48AM -0700, Dave Hansen wrote:
>> But, this left submitters (and the folks who help them pick licenses)
>> a bit confused. They have read things like
>> Documentation/process/license-rules.rst which says:
On 8/12/20 6:39 AM, Liang, Kan wrote:
> I searched the vma_mmu_pagesize(). It seems that PowerPC is the only
> one that defines a 'strong' function. In other words, the MMUPageSize
> and KerelPageSize are the same for X86. However, it seems not true
> for the above compound page cases. Is it a bug
Resend. Something appears to have eaten this on the way to LKML
(at least) the last time.
--
From: Dave Hansen
Greg has challenged some recent driver submitters on their license
choices. He was correct to do so, as the choices in these instances
did not always advance the aims of the
On 8/10/20 2:24 PM, Kan Liang wrote:
> +static u64 __perf_get_page_size(struct mm_struct *mm, unsigned long addr)
> +{
> + struct page *page;
> + pgd_t *pgd;
> + p4d_t *p4d;
> + pud_t *pud;
> + pmd_t *pmd;
> + pte_t *pte;
> +
> + pgd = pgd_offset(mm, addr);
> + if (p
... adding Kirill
On 8/7/20 1:40 AM, Joerg Roedel wrote:
> + lvl = "p4d";
> + p4d = p4d_alloc(&init_mm, pgd, addr);
> + if (!p4d)
> + goto failed;
>
> + /*
> + * With 5-level paging the P4D level is not folded. So t
On 8/6/20 4:04 PM, Ricardo Neri wrote:
>* CPUID is the conventional way, but it's nasty: it doesn't
>* exist on some 486-like CPUs, and it usually exits to a
>* hypervisor.
>*
>* The SERIALIZE instruction is the most straightforward way to
>* do this
On 8/6/20 12:25 PM, Ricardo Neri wrote:
> static inline void sync_core(void)
> {
> /*
> - * There are quite a few ways to do this. IRET-to-self is nice
> + * Hardware can do this for us if SERIALIZE is available. Otherwise,
> + * there are quite a few ways to do this. IRET-
On 8/3/20 10:16 AM, Andy Lutomirski wrote:
> - TILE: genuinely per-thread, but it's expensive so it's
> lazy-loadable. But the lazy-load mechanism reuses #NM, and it's not
> fully disambiguated from the other use of #NM. So it sort of works,
> but it's gross.
For those playing along at home, the
On 8/3/20 8:12 AM, Andy Lutomirski wrote:
> I could easily be convinced that the PASID fixup is so trivial and so
> obviously free of misfiring in a way that causes an infinite loop that
> this code is fine. But I think we first need to answer the bigger
> question of why we're doing a lazy fixup
On 7/31/20 4:34 PM, Andy Lutomirski wrote:
>> Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
>> without a valid PASID value programmed. #GP error codes are 16 bits and all
>> 16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
>> choice was to re
On 7/23/20 4:15 PM, Arvind Sankar wrote:
> This #define is not used anywhere, and has the wrong value on x86_64.
Yeah, it certainly is unused.
> I tried digging into the history a bit, but it seems to have been unused
> even in the initial merge of sparsemem in v2.6.13, when it was first
> define
On 7/23/20 9:56 AM, Sean Christopherson wrote:
> On Thu, Jul 23, 2020 at 09:41:37AM -0700, Dave Hansen wrote:
>> On 7/23/20 9:25 AM, Sean Christopherson wrote:
>>> How would people feel about taking the above two patches (02 and 03 in the
>>> series) through
On 7/23/20 10:08 AM, Andy Lutomirski wrote:
> Suppose some kernel code (a syscall or kernel thread) changes PKRS
> then takes a page fault. The page fault handler needs a fresh PKRS.
> Then the page fault handler (say a VMA’s .fault handler) changes
> PKRS. The we get an interrupt. The interrupt *
On 7/23/20 9:25 AM, Sean Christopherson wrote:
> How would people feel about taking the above two patches (02 and 03 in the
> series) through the KVM tree to enable KVM virtualization of CET before the
> kernel itself gains CET support? I.e. add the MSR and feature bits, along
> with the XSAVES co
On 7/23/20 9:18 AM, Fenghua Yu wrote:
> The PKRS MSR has been preserved in thread_info during kernel entry. We
> don't need to preserve it in another place (i.e. idtentry_state).
I'm missing how the PKRS MSR gets preserved in thread_info. Could you
explain the mechanism by which this happens and
On 7/18/20 11:24 AM, Yu-cheng Yu wrote:
> On Sat, 2020-07-18 at 11:00 -0700, Andy Lutomirski wrote:
>> On Sat, Jul 18, 2020 at 10:58 AM Yu-cheng Yu wrote:
>>> Hi,
>>>
>>> My shadow stack tests start to have random shadow stack pointer corruption
>>> after
>>> v5.7 (excluding). The symptom looks
On 7/17/20 1:54 AM, Peter Zijlstra wrote:
> This is unbelievable junk...
Ouch!
This is from the original user pkeys implementation.
> How about something like:
>
> u32 update_pkey_reg(u32 pk_reg, int pkey, unsigned int flags)
> {
> int pkey_shift = pkey * PKR_BITS_PER_PKEY;
>
> pk_
On 7/14/20 5:51 PM, Sean Christopherson wrote:
> To do the above table, KVM will also need to update
> itlb_multihit_kvm_mitigation
> when it is unloaded, which seems rather silly. That's partly why I suggested
> keying off CR4.VMXE as it doesn't require poking directly into KVM. E.g. the
> enti
On 7/14/20 2:04 PM, Pawan Gupta wrote:
>> I see three inputs and four possible states (sorry for the ugly table,
>> it was this or a spreadsheet :):
>>
>> X86_FEATURE_VMX CONFIG_KVM_*hpage split ResultReason
>> N x xNot Affected No VMX
On 7/14/20 12:17 PM, Pawan Gupta wrote:
> On Tue, Jul 14, 2020 at 07:57:53AM -0700, Dave Hansen wrote:
>> Let's stick to things which are at least static per reboot. Checking
>> for X86_FEATURE_VMX or even CONFIG_KVM_INTEL seems like a good stopping
>> point. "
On 7/14/20 12:29 PM, Peter Zijlstra wrote:
> On Tue, Jul 14, 2020 at 12:06:16PM -0700, Ira Weiny wrote:
>> On Tue, Jul 14, 2020 at 10:44:51AM +0200, Peter Zijlstra wrote:
>>> So, if I followed along correctly, you're proposing to do a WRMSR per
>>> k{,un}map{_atomic}(), sounds like excellent perfor
On 7/14/20 11:53 AM, Ira Weiny wrote:
>>> The PKRS MSR is defined as a per-core register.
Just to be clear, PKRS is a per-logical-processor register, just like
PKRU. The "per-core" thing here is a typo.
On 7/13/20 6:45 PM, Sean Christopherson wrote:
> This is all kinds of backwards. Virtualization being disabled in hardware
> is very, very different than KVM not being loaded. One requires at the
> very least a kernel reboot to change, the other does not.
That's a very good point.
It's a pretty
On 7/8/20 2:51 AM, tip-bot2 for Kan Liang wrote:
> diff --git a/arch/x86/include/asm/cpufeatures.h
> b/arch/x86/include/asm/cpufeatures.h
> index 02dabc9..72ba4c5 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -366,6 +366,7 @@
> #define X86_FEATU
On 7/9/20 9:07 AM, Andy Lutomirski wrote:
> On Thu, Jul 9, 2020 at 8:56 AM Dave Hansen wrote:
>> On 7/9/20 8:44 AM, Andersen, John wrote:
>>> Bits which are allowed to be pinned default to WP for CR0 and SMEP,
>>> SMAP, and UMIP for CR4.
>> I
On 7/9/20 8:44 AM, Andersen, John wrote:
>
> Bits which are allowed to be pinned default to WP for CR0 and SMEP,
> SMAP, and UMIP for CR4.
I think it also makes sense to have FSGSBASE in this set.
I know it hasn't been tested, but I think we should do the legwork to
test it. If
On 7/7/20 2:12 PM, Sean Christopherson wrote:
Let's say Intel loses its marbles and adds a CR4 bit that lets userspace
write to kernel memory. Linux won't set it, but an attacker would go
after it, first thing.
> That's an orthogonal to pinning. KVM never lets the guest set CR4 bit
On 7/6/20 11:22 AM, Dave Jiang wrote:
> +What:/sys/bus/dsa/devices/dsa/pasid_enabled
> +Date:Jul 5, 2020
> +KernelVersion: 5.9.0
> +Contact: dmaeng...@vger.kernel.org
> +Description: To indicate if PASID (process address space identifier) is
> +
On 7/2/20 4:28 AM, Huang, Ying wrote:
>> But, when the bit was removed (bit 0) the _other_ bit locations also
>> got changed. That's not OK because the bit values are documented to
>> mean one specific thing and users surely rely on them meaning that one
>> thing and not changing from kernel to ke
On 7/1/20 1:04 PM, Ben Widawsky wrote:
>> +static inline bool node_reclaim_enabled(void)
>> +{
>> +/* Is any node_reclaim_mode bit set? */
>> +return node_reclaim_mode & (RECLAIM_ZONE|RECLAIM_WRITE|RECLAIM_UNMAP);
>> +}
>> +
>> extern void check_move_unevictable_pages(struct pagevec *pvec)
On 6/30/20 1:22 AM, Huang, Ying wrote:
>> +/*
>> + * To avoid cycles in the migration "graph", ensure
>> + * that migration sources are not future targets by
>> + * setting them in 'used_targets'.
>> + *
>> + * But, do this only once per pass so that multiple
>> + * sour
On 7/1/20 1:54 AM, Huang, Ying wrote:
> Why can not we just bind the memory of the application to node 0, 2, 3
> via mbind() or cpuset.mems? Then the application can allocate memory
> directly from PMEM. And if we bind the memory of the application via
> mbind() to node 0, we can only allocate me
On 6/30/20 5:47 PM, David Rientjes wrote:
> On Mon, 29 Jun 2020, Dave Hansen wrote:
>> From: Dave Hansen
>>
>> If a memory node has a preferred migration path to demote cold pages,
>> attempt to move those inactive pages to that migration node before
>> rec
On 6/30/20 10:50 AM, Yang Shi wrote:
> So, I'm supposed you need check if node_reclaim is enabled before doing
> migration in shrink_page_list() and also need make node reclaim to adopt
> the new mode.
>
> Please refer to
> https://lore.kernel.org/linux-mm/1560468577-101178-6-git-send-email-yang..
On 7/1/20 8:46 AM, Ben Widawsky wrote:
>> +/*
>> + * These bit locations are exposed in the vm.zone_reclaim_mode sysctl
>> + * ABI. New bits are OK, but existing bits can never change.
>> + */
>> +#define RECLAIM_ZONE (1<<0)/* Run shrink_inactive_list on the zone
>> */
>> +#define RECLAI
On 6/30/20 1:46 PM, Peter Xu wrote:
> Use the general page fault accounting by passing regs into handle_mm_fault().
...
> - /*
> - * Major/minor page fault accounting. If any of the events
> - * returned VM_FAULT_MAJOR, we account it as a major fault.
> - */
> - if (major) {
From: Dave Hansen
RECLAIM_ZONE was assumed to be unused because it was never explicitly
used in the kernel. However, there were a number of places where it
was checked implicitly by checking 'node_reclaim_mode' for a zero
value.
These zero checks are not great because it is not ob
A previous cleanup accidentally changed the vm.zone_reclaim_mode ABI.
This series restores the ABI and then reorganizes the code to make
the ABI more obvious. Since the single-patch v1[1], I've:
* Restored the RECLAIM_ZONE naming, comment and Documentation now
that the implicit checks for it
From: Dave Hansen
It is currently not obvious that the RECLAIM_* bits are part of the
uapi since they are defined in vmscan.c. Move them to a uapi header
to make it obvious.
This should have no functional impact.
Signed-off-by: Dave Hansen
Cc: Ben Widawsky
Cc: Alex Shi
Cc: Daniel Wagner
From: Dave Hansen
I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
sysctl. Like a good kernel developer, I also went to go update the
documentation. I noticed that the bits in the documentation didn't
match the bits in the #defines.
The VM never explicitly check
On 6/30/20 7:47 PM, Andrew Morton wrote:
>> Oh, that's a very good point. There are a couple of those around. Let
>> me circle back and update the documentation and the variable name. I'll
>> send out another version.
> Was the omission of cc:stable deliberate?
Nope, it was an accidental sta...
On 6/30/20 10:41 PM, David Rientjes wrote:
> Maybe the strongest advantage of the node abstraction is the ability to
> use autonuma and migrate_pages()/move_pages() API for moving pages
> explicitly? Mempolicies could be used for migration to "top-tier" memory,
> i.e. ZONE_NORMAL or ZONE_MOVABL
401 - 500 of 3418 matches
Mail list logo