[PATCH 01/23] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-10-31 Thread Dave Hansen
keep CR3 during the NMI. It will not be clobbered by the C NMI handlers that get called. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Kees Cook Cc: Hugh Dickins Cc: x...@kernel.org --- b/arch/x86/entry/calling.h

[PATCH 18/23] x86, mm: Move CR3 construction functions

2017-10-31 Thread Dave Hansen
tlbflush.h, so just move the CR3 building over to tlbflush.h. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Kees Cook Cc: Hugh Dickins Cc: x...@kernel.org --- b/arch/x86/include/asm/mmu_c

[PATCH 23/23] x86, kaiser: add Kconfig

2017-10-31 Thread Dave Hansen
pped_start, and the items carefully gathered into that section for user-mapping on SMP, dispersed elsewhere on UP. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Kees Cook Cc: Hugh Dickins Cc: x...@ke

[PATCH 19/23] x86, mm: remove hard-coded ASID limit checks

2017-10-31 Thread Dave Hansen
First, it's nice to remove the magic numbers. Second, KAISER is going to eat up half of the available ASID space. We do not use it today, but we need to at least spell out this new restriction. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc:

[PATCH 21/23] x86, pcid, kaiser: allow flushing for future ASID switches

2017-10-31 Thread Dave Hansen
notice that we had 'all_other_ctxs_invalid' marked, and go invalidate all of the cpu_tlbstate.ctxs[] entries. This ensures that any futuee context switches will do a full flush of the TLB so they pick up the changes. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Micha

[PATCH 09/23] x86, kaiser: allow NX to be set in p4d/pgd

2017-10-31 Thread Dave Hansen
We protect user portion of the kernel page tables with the NX bit to cripple it. But, that trips the p4d/pgd_bad() checks. Make sure it does not do that. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Kees Cook

[PATCH 22/23] x86, kaiser: use PCID feature to make user and kernel switches faster

2017-10-31 Thread Dave Hansen
ts (and ignore the context-switch TLB preservation), then the deficiency of not having INVPCID becomes much less onerous. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Kees Cook Cc: Hugh Dickins Cc: x...@kernel.org ---

[PATCH 17/23] x86, kaiser: map virtually-addressed performance monitoring buffers

2017-10-31 Thread Dave Hansen
area. Signed-off-by: Hugh Dickins Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Kees Cook Cc: Hugh Dickins Cc: x...@kernel.org --- b/arch/x86/events/intel/ds.c | 57

[PATCH 20/23] x86, mm: put mmu-to-h/w ASID translation in one place

2017-10-31 Thread Dave Hansen
ch hardware ASID to flush for the userspace mapping. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Kees Cook Cc: Hugh Dickins Cc: x...@kernel.org --- b/arch/x86/include/asm/tlbflush.h | 30 ++

[PATCH 12/23] x86, kaiser: map dynamically-allocated LDTs

2017-10-31 Thread Dave Hansen
Normally, a process just has a NULL mm->context.ldt. But, we have a syscall for a process to set a new one. If a process does that, we need to map the new LDT. The original KAISER patch missed this case. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz

[PATCH 15/23] x86, kaiser: map trace interrupt entry

2017-10-31 Thread Dave Hansen
d the same page. That also generally does not hurt anything, but it can make things hard to debug because random build alignment can cause things to fail. This was missed in the original KAISER patch. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Andy Lutomi

[PATCH 14/23] x86, kaiser: map entry stack variables

2017-10-31 Thread Dave Hansen
ister. You can only 'MOV' to it from another register, which means we need to clobber a register in order to do any CR3 manipulation. User-mapping these variables allows us to obtain a safe stack *before* we switch the CR3 value. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel

[PATCH 16/23] x86, kaiser: map debug IDT tables

2017-10-31 Thread Dave Hansen
The IDT table it references are another structure where the CPU references a virtual address. It also obviously needs these to handle an interrupt in userspace, so these need to be mapped into the user copy of the page tables. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc

[PATCH 13/23] x86, kaiser: map espfix structures

2017-10-31 Thread Dave Hansen
witch over to the kernel copy, we would need some temporary storage which is in short supply at this point. The original KAISER patch missed this case. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Kees Cook Cc: Hu

[PATCH 08/23] x86, kaiser: only populate shadow page tables for userspace

2017-10-31 Thread Dave Hansen
KAISER has two copies of the page tables: one for the kernel and one for when we are running in userspace. There is also a kernel portion of each of the page tables: the part that *maps* the kernel. The kernel portion is relatively static and uses pre-populated PGDs. Nobody ever calls set_pgd()

[PATCH 07/23] x86, kaiser: unmap kernel from userspace page tables (core patch)

2017-10-31 Thread Dave Hansen
eir patch. Some of their code has been broken out into other patches in this series, but their SoB was only retained here. Signed-off-by: Moritz Lipp Signed-off-by: Daniel Gruss Signed-off-by: Michael Schwarz Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc:

[PATCH 10/23] x86, kaiser: make sure static PGDs are 8k in size

2017-10-31 Thread Dave Hansen
last PGD. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Kees Cook Cc: Hugh Dickins Cc: x...@kernel.org --- b/arch/x86/kernel/head_64.S | 16 1 file changed, 16 insertions(+) diff -puN arch

[PATCH 11/23] x86, kaiser: map GDT into user page tables

2017-10-31 Thread Dave Hansen
The GDT is used to control the x86 segmentation mechanism. It must be virtually mapped when switching segments or at IRET time when switching between userspace and kernel. The original KAISER patch did not do this. I have no ide how it ever worked. Signed-off-by: Dave Hansen Cc: Moritz Lipp

[PATCH 06/23] x86, kaiser: introduce user-mapped percpu areas

2017-10-31 Thread Dave Hansen
switching in and out of the kernel and a good subset of *those* are per-cpu data. This patch creates a new kind of per-cpu data that is mapped and can be used no matter which copy of the page tables we are using. Thanks to Hugh Dickins for cleanups to this code. Signed-off-by: Dave Hansen Cc: Moritz

[PATCH 05/23] x86, mm: document X86_CR4_PGE toggling behavior

2017-10-31 Thread Dave Hansen
The comment says it all here. The problem here is that the X86_CR4_PGE bit affects all PCIDs in a way that is totally obscure. This makes it easier for someone to find if grepping for PCID- related stuff and documents the hardware behavior that we are depending on. Signed-off-by: Dave Hansen

[PATCH 03/23] x86, kaiser: disable global pages

2017-10-31 Thread Dave Hansen
-fault attack: http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Kees Cook Cc: Hugh Dickins Cc: x...@kernel.org --- b/arch/x86/Kconfig

[PATCH 02/23] x86, kaiser: do not set _PAGE_USER for init_mm page tables

2017-10-31 Thread Dave Hansen
init_mm is for kernel-exclusive use. If someone is allocating page tables in it, do not set _PAGE_USER on them. This ensures that we do *not* set NX on these page tables in the KAISER code. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Andy Lutomirski

[PATCH 04/23] x86, tlb: make CR4-based TLB flushes more robust

2017-10-31 Thread Dave Hansen
Our CR4-based TLB flush currently requries global pages to be supported *and* enabled. But, we really only need for them to be supported. Make the code more robust by alllowing X86_CR4_PGE to clear as well as set. This change was suggested by Kirill Shutemov. Signed-off-by: Dave Hansen Cc

Re: [kernel-hardening] Re: [RFC, PATCH] x86_64: KAISER - do not mapkernel in user mode

2017-10-31 Thread Dave Hansen
Hi Folks, I've fixed some bugs and updated the KAISER patch set on top of the work that was done here. My new version is posted here: https://marc.info/?l=linux-kernel&m=150948911429162&w=2

Re: [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables

2017-10-31 Thread Dave Hansen
On 10/31/2017 04:27 PM, Linus Torvalds wrote: > Inconveniently, the people you cc'd on the actual patches did *not* > get cc'd with this 00/23 cover letter email. Urg, sorry about that. > (a) is this on top of Andy's entry cleanups? > > If not, that probably needs to be sorted out. It is

Re: [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables

2017-10-31 Thread Dave Hansen
On 10/31/2017 04:44 PM, Dave Hansen wrote: >> That seems insane. Why isn't only tyhe top level shadowed, and >> then lower levels are shared between the shadowed and the "kernel" >> page tables? > There are obviously two PGDs. The userspace half of the PGD

Re: [PATCH 01/23] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-10-31 Thread Dave Hansen
On 10/31/2017 05:43 PM, Brian Gerst wrote: >> >> + RESTORE_CR3 save_reg=%r14 >> + >> testl %ebx, %ebx /* swapgs needed? */ >> jnz nmi_restore >> nmi_swapgs: >> _ > This all needs to be conditional on a config option. Something with > this amount of

Re: [PATCH 21/23] x86, pcid, kaiser: allow flushing for future ASID switches

2017-11-01 Thread Dave Hansen
On 11/01/2017 01:03 AM, Andy Lutomirski wrote: >> This ensures that any futuee context switches will do a full flush >> of the TLB so they pick up the changes. > I'm convuced. What was wrong with the old code? I guess I just don't > see what the problem is that is solved by this patch. Instead o

Re: [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables

2017-11-01 Thread Dave Hansen
On 10/31/2017 04:27 PM, Linus Torvalds wrote: > So even if you don't want to have global pages for normal kernel > entries, you don't want to just make _PAGE_GLOBAL be defined as zero. > You'd want to just use _PAGE_GLOBAL conditionally. I implemented this, then did a quick test with some cod

Re: [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables

2017-11-01 Thread Dave Hansen
On 11/01/2017 09:08 AM, Linus Torvalds wrote: > On Tue, Oct 31, 2017 at 4:44 PM, Dave Hansen > wrote: >> On 10/31/2017 04:27 PM, Linus Torvalds wrote: >>> (c) am I reading the code correctly, and the shadow page tables are >>> *completely* duplicated? >>>

Re: [PATCH 01/23] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-11-01 Thread Dave Hansen
On 11/01/2017 11:18 AM, Borislav Petkov wrote: >> +.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req >> +movq%cr3, %r\scratch_reg >> +movq%r\scratch_reg, \save_reg > > So one of the args gets passed as "ax", for example, which then gets > completed to a register wit

Re: [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables

2017-11-01 Thread Dave Hansen
On 11/01/2017 11:27 AM, Linus Torvalds wrote: > So I'd like to see not just the comments about this, but I'd like to > see the code itself actually making that very clear. Have *code* that > verifies that nobody ever tries to use this on a user address (because > that would *completely* screw up al

Re: [PATCH 21/23] x86, pcid, kaiser: allow flushing for future ASID switches

2017-11-01 Thread Dave Hansen
On 11/01/2017 01:31 PM, Andy Lutomirski wrote: > On Wed, Nov 1, 2017 at 7:17 AM, Dave Hansen > wrote: >> On 11/01/2017 01:03 AM, Andy Lutomirski wrote: >>>> This ensures that any futuee context switches will do a full flush >>>> of the TLB so they pick up the

Re: [PATCH 21/23] x86, pcid, kaiser: allow flushing for future ASID switches

2017-11-01 Thread Dave Hansen
On 11/01/2017 02:04 PM, Andy Lutomirski wrote: > Aha! That wasn't at all clear to me from the changelog. Can I make a > totally different suggestion? Add a new function > __flush_tlb_one_kernel() and use it for kernel addresses. I'll look into this.

Re: [PATCH 02/23] x86, kaiser: do not set _PAGE_USER for init_mm page tables

2017-11-01 Thread Dave Hansen
On 11/01/2017 02:28 PM, Thomas Gleixner wrote: > On Wed, 1 Nov 2017, Andy Lutomirski wrote: >> The vsyscall page is _PAGE_USER and lives in init_mm via the fixmap. > > Groan, forgot about that abomination, but still there is no point in having > it marked PAGE_USER in the init_mm at all, kaiser or

Re: [PATCH 03/23] x86, kaiser: disable global pages

2017-11-01 Thread Dave Hansen
On 11/01/2017 02:18 PM, Thomas Gleixner wrote: > On Tue, 31 Oct 2017, Dave Hansen wrote: >> --- a/arch/x86/include/asm/pgtable_types.h~kaiser-prep-disable-global-pages >> 2017-10-31 15:03:49.314064402 -0700 >> +++ b/arch/x86/include/asm/pgtable_types.h 2017-10-31 15:03:4

Re: [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables

2017-11-01 Thread Dave Hansen
On 11/01/2017 01:54 AM, Ingo Molnar wrote: > Beyond the inevitable cavalcade of (solvable) problems that will pop up > during > review, one major item I'd like to see addressed is runtime configurability: > it > should be possible to switch between a CR3-flushing and a regular syscall and > pa

Re: [PATCH 04/23] x86, tlb: make CR4-based TLB flushes more robust

2017-11-01 Thread Dave Hansen
On 11/01/2017 04:18 AM, Andy Lutomirski wrote: >>> How about just adding a VM_WARN_ON_ONCE, then? >> What's wrong with xor? The function will continue to work this way even if >> CR4.PGE is disabled. > That's true. OTOH, since no one is actually proposing doing that, > there's an argument that peo

Re: [PATCH 04/23] x86, tlb: make CR4-based TLB flushes more robust

2017-11-01 Thread Dave Hansen
On 11/01/2017 02:25 PM, Thomas Gleixner wrote: >> cr4 = this_cpu_read(cpu_tlbstate.cr4); >> -/* clear PGE */ >> -native_write_cr4(cr4 & ~X86_CR4_PGE); >> -/* write old PGE again and flush TLBs */ >> +/* >> + * This function is only called on systems that support X86_CR4_PGE

Re: [PATCH 01/23] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-11-01 Thread Dave Hansen
On 11/01/2017 02:01 PM, Thomas Gleixner wrote: > On Tue, 31 Oct 2017, Dave Hansen wrote: >> >> +pushq %rdi >> +SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi >> +popq%rdi > > Can you please have a macro variant which does: > > SWITCH_TO_KER

Re: KAISER memory layout (Re: [PATCH 06/23] x86, kaiser: introduce user-mapped percpu areas)

2017-11-02 Thread Dave Hansen
On 11/02/2017 02:41 AM, Andy Lutomirski wrote: > > - The GDT array. > - The IDT. > - The vsyscall page. We can make this be _PAGE_USER. > - The TSS. > - The per-cpu entry stack. Let's make it one page with guard pages > on either side. This can replace rsp_scratch. > - cpu_current_top_of_

Re: [PATCH 02/23] x86, kaiser: do not set _PAGE_USER for init_mm page tables

2017-11-02 Thread Dave Hansen
On 11/02/2017 04:33 AM, Thomas Gleixner wrote: > So for the problem at hand, I'd suggest we disable the vsyscall stuff if > CONFIG_KAISER=y and be done with it. Just to be clear, are we suggesting to just disable LEGACY_VSYSCALL_NATIVE if KAISER=y, and allow LEGACY_VSYSCALL_EMULATE? Or, do we just

Re: [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables

2017-11-02 Thread Dave Hansen
On 11/02/2017 12:01 PM, Will Deacon wrote: > On Tue, Oct 31, 2017 at 03:31:46PM -0700, Dave Hansen wrote: >> KAISER makes it harder to defeat KASLR, but makes syscalls and >> interrupts slower. These patches are based on work from a team at >> Graz University of Technology

Re: [intel-sgx-kernel-dev] [PATCH v4 06/12] fs/pipe.c: export create_pipe_files() and replace_fd()

2017-10-24 Thread Dave Hansen
On 10/24/2017 06:39 AM, Jarkko Sakkinen wrote: > On Sun, Oct 22, 2017 at 10:09:16PM -0700, Dave Hansen wrote: >> On 10/22/2017 07:55 PM, Jarkko Sakkinen wrote: >>> On Fri, Oct 20, 2017 at 07:32:42AM -0700, Dave Hansen wrote: >>>> I've always been curious, and the

Re: [PATCH 2/2] Add /proc/PID/{smaps, numa_maps} support for DAX

2017-10-25 Thread Dave Hansen
On 10/25/2017 02:30 AM, Michal Hocko wrote: >> >> 7f6c1780-7f6c17e0 rw-s 00:06 20559 /dev/dax12.0 >> Size: 6144 kB >> . >> . >> . >> Ptes@2MB:6144 kB > This says how but it doesn't tell why and who is going to use the > information and what for.

Re: [tip:x86/mm] x86/mm: Add support for early encryption/decryption of memory

2017-10-25 Thread Dave Hansen
On 07/18/2017 03:51 AM, tip-bot for Tom Lendacky wrote: > +/* > + * This routine does not change the underlying encryption setting of the > + * page(s) that map this memory. It assumes that eventually the memory is > + * meant to be accessed as either encrypted or decrypted but the contents > + * a

Re: [PATCH v4] Add /proc/PID/smaps support for DAX

2017-10-26 Thread Dave Hansen
I'm honestly not understanding what problem this solves. Could you, perhaps, do a before and after of smaps with and without this patch? > +/* page structure behind DAX mappings is NOT compound page > + * when it's a huge page mappings, so introduce new API to > + * account for both PMD and PUD m

Re: [tip:x86/mm] x86/mm: Add support for early encryption/decryption of memory

2017-10-26 Thread Dave Hansen
On 10/26/2017 06:05 AM, Tom Lendacky wrote: >>> >>> +static void __init __sme_early_enc_dec(resource_size_t paddr, >>> +   unsigned long size, bool enc) >>> +{ >>> +    void *src, *dst; >>> +    size_t len; >>> + >>> +    if (!sme_me_mask) >>> +    return; >>> + >>> +    loc

Re: [PATCH 03/18] x86/asm/64: Move SWAPGS into the common iret-to-usermode path

2017-10-26 Thread Dave Hansen
On 10/26/2017 06:52 AM, Brian Gerst wrote: > On Thu, Oct 26, 2017 at 4:26 AM, Andy Lutomirski wrote: >> All of the code paths that ended up doing IRET to usermode did >> SWAPGS immediately beforehand. Move the SWAPGS into the common >> code. >> >> Signed-off-by: Andy Lutomirski > >> +GLOBAL(swa

Re: [PATCH 2/2] Add /proc/PID/{smaps, numa_maps} support for DAX

2017-10-26 Thread Dave Hansen
On 10/26/2017 07:16 AM, Michal Hocko wrote: >> The original motivation was for DAX. They have parallel large page >> infrastructure separate from hugetlbfs and THP. Their constraints about >> when they can use large pages differ from the normal mm cases, so it is >> hard to tell when large pages

Re: [PATCHv3 1/2] proc: mm: export PTE sizes directly in smaps

2017-10-26 Thread Dave Hansen
On 10/26/2017 07:19 AM, Michal Hocko wrote: >> Current vm_normal_page implementation doesn't pick up page with DEVMAP pfn. >> The second patch fix this and export DAX mappings into counters introduced >> in the >> first patch. >> >> IMO, the user care more about how much persistent memory they use

Re: [PATCH 2/2] Add /proc/PID/{smaps, numa_maps} support for DAX

2017-10-26 Thread Dave Hansen
On 10/26/2017 07:31 AM, Michal Hocko wrote: > On Thu 26-10-17 07:24:14, Dave Hansen wrote: >> Actually, I don't remember whether it was tooling or just confused >> humans. I *think* Dan was trying to write test cases for huge page DAX >> support and couldn't figure

Re: [RFC 4/7] x86/asm: Fix assumptions that the HW TSS is at the beginning of cpu_tss

2017-11-13 Thread Dave Hansen
On 11/10/2017 08:05 PM, Andy Lutomirski wrote: > -struct tss_struct doublefault_tss __cacheline_aligned = { > - .x86_tss = { > - .sp0= STACK_START, > - .ss0= __KERNEL_DS, > - .ldt= 0, ... > +struct x86_hw_tss doublefault_ts

Re: [PATCH] mm: show stats for non-default hugepage sizes in /proc/meminfo

2017-11-13 Thread Dave Hansen
On 11/13/2017 08:03 AM, Roman Gushchin wrote: > To solve this problem, let's display stats for all hugepage sizes. > To provide the backward compatibility let's save the existing format > for the default size, and add a prefix (e.g. 1G_) for non-default sizes. Is there something keeping you from u

Re: [PATCH] mm: show stats for non-default hugepage sizes in /proc/meminfo

2017-11-13 Thread Dave Hansen
On 11/13/2017 10:11 AM, Roman Gushchin wrote: > On Mon, Nov 13, 2017 at 09:06:32AM -0800, Dave Hansen wrote: >> On 11/13/2017 08:03 AM, Roman Gushchin wrote: >>> To solve this problem, let's display stats for all hugepage sizes. >>> To provide the backward compa

Re: [RFC 1/7] x86/asm/64: Allocate and enable the SYSENTER stack

2017-11-13 Thread Dave Hansen
On 11/10/2017 08:05 PM, Andy Lutomirski wrote: > This will simplify some future code changes that will want some > temporary stack space in more places. It also lets us get rid of a > SWAPGS_UNSAFE_STACK user. > > This does not depend on CONFIG_IA32_EMULATION because we'll want the > stack space

Re: [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment

2017-11-13 Thread Dave Hansen
On 11/10/2017 08:05 PM, Andy Lutomirski wrote: > struct tss_struct { > /* > + * Space for the temporary SYSENTER stack. Used for the entry > + * trampoline as well. Size it such that tss_struct ends up > + * as a multiple of PAGE_SIZE. This calculation assumes that > +

Re: [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area

2017-11-13 Thread Dave Hansen
On 11/10/2017 08:05 PM, Andy Lutomirski wrote: > diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h > index fbc9b7f4e35e..8a9ba5553cab 100644 > --- a/arch/x86/include/asm/fixmap.h > +++ b/arch/x86/include/asm/fixmap.h > @@ -52,6 +52,13 @@ extern unsigned long __FIXADDR_TOP;

Re: [PATCH] mm: show stats for non-default hugepage sizes in /proc/meminfo

2017-11-13 Thread Dave Hansen
On 11/13/2017 11:10 AM, Johannes Weiner wrote: > Maybe a simple summary counter for everything set aside by the hugetlb > subsystem - default and non-default page sizes, whether they're used > or only reserved etc.? Yeah, one line is a lot more sane than 5 lines times all the extra sizes. It'll j

Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-13 Thread Dave Hansen
On 11/12/2017 07:52 PM, Andy Lutomirski wrote: > On Fri, Nov 10, 2017 at 3:04 PM, Dave Hansen > wrote: >> On 11/10/2017 02:06 PM, Andy Lutomirski wrote: >>> I have nothing against disabling native. I object to breaking the >>> weird binary tracing behavior in the

Re: [PATCH v6 03/11] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO)

2017-11-13 Thread Dave Hansen
On 11/09/2017 05:09 PM, Tycho Andersen wrote: > which I guess is from the additional flags in grow_dev_page() somewhere down > the stack. Anyway... it seems this is a kernel allocation that's using > MIGRATE_MOVABLE, so perhaps we need some more fine tuned heuristic than just > all MOVABLE allocati

Re: [PATCH v6 03/11] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO)

2017-11-13 Thread Dave Hansen
On 11/13/2017 02:20 PM, Dave Hansen wrote: > On 11/09/2017 05:09 PM, Tycho Andersen wrote: >> which I guess is from the additional flags in grow_dev_page() somewhere down >> the stack. Anyway... it seems this is a kernel allocation that's using >> MIGRATE_MOVABLE, so perh

Re: [PATCH 18/30] x86, kaiser: map virtually-addressed performance monitoring buffers

2017-11-14 Thread Dave Hansen
On 11/14/2017 10:20 AM, Peter Zijlstra wrote: > On Fri, Nov 10, 2017 at 11:31:39AM -0800, Dave Hansen wrote: >> static int alloc_ds_buffer(int cpu) >> { >> +struct debug_store *ds = per_cpu_ptr(&cpu_debug_store, cpu); >> >> +memset(ds, 0, sizeo

Re: [PATCH] mm: show total hugetlb memory consumption in /proc/meminfo

2017-11-14 Thread Dave Hansen
Do we get an update for Documentation/vm/hugetlbpage.txt to spell out what our shiny, new and intentionally-ambiguous entry is supposed to mean and be used for?

Re: [intel-sgx-kernel-dev] [PATCH RFC v3 07/12] intel_sgx: driver for Intel Software Guard Extensions

2017-11-14 Thread Dave Hansen
On 11/14/2017 01:05 PM, Jarkko Sakkinen wrote: > I've started writing a patch to make all this happen and it is > progressing really well. I'm planning to include this change to v6. > As it simplifies code I'm going to squash it as part of the initial > driver patch. > > How does this sound? Soun

Re: [kernel-hardening] Re: [PATCH v6 03/11] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO)

2017-11-14 Thread Dave Hansen
On 11/14/2017 04:33 PM, Tycho Andersen wrote: >> >> void set_bh_page(struct buffer_head *bh, >> ... >> bh->b_data = page_address(page) + offset; > Ah, yes. I guess there will be many bugs like this :). Anyway, I'll > try to cook up a patch. It won't catch all the bugs, but it might be handy t

Re: [PATCH v6 03/11] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO)

2017-11-14 Thread Dave Hansen
On 11/14/2017 07:44 PM, Matthew Wilcox wrote: > On Mon, Nov 13, 2017 at 02:46:25PM -0800, Dave Hansen wrote: >> On 11/13/2017 02:20 PM, Dave Hansen wrote: >>> On 11/09/2017 05:09 PM, Tycho Andersen wrote: >>>> which I guess is from the additional flags in grow_d

[RFC][PATCH] x86, kaiser: do not require mapping process kernel stacks

2017-11-03 Thread Dave Hansen
With the KAISER code that I posted a few days ago, we map and unmap each of the kernel stacks when they are created. That's slow and it is also the single largest thing still mapped into the user address space. This patch is on top of Andy's new trampoline stack code[1] plus the previous KAISER

[RFC][PATCH] x86, sched: allow topolgies where NUMA nodes share an LLC

2017-11-06 Thread Dave Hansen
From: Dave Hansen Intel's Skylake Server CPUs have a different LLC topology than previous generations. When in Sub-NUMA-Clustering (SNC) mode, the package is divided into two "slices", each containing half the cores, half the LLC, and one memory controller and each slice is enum

Re: [PATCH] x86, syscalls: use SYSCALL_DEFINE() macros for sys_modify_ldt()

2017-10-18 Thread Dave Hansen
On 10/18/2017 06:17 AM, Ingo Molnar wrote: > I have added your: > > Signed-off-by: Dave Hansen > > let me know if that's OK. Yes, that's OK.

[PATCH] [v3] x86, syscalls: use SYSCALL_DEFINE() macros for sys_modify_ldt()

2017-10-18 Thread Dave Hansen
Changes from v2: * Fixed UML compile error from not including syscall header Changes from v1: * Added cast to (unsigned int) * Added comments -- From: Dave Hansen We do not have tracepoints for sys_modify_ldt() because we define it directly instead of using the normal SYSCALL_DEFINEx

Re: [PATCH 02/18] x86/asm/64: Split the iret-to-user and iret-to-kernel paths

2017-10-27 Thread Dave Hansen
On 10/26/2017 01:26 AM, Andy Lutomirski wrote: > +GLOBAL(restore_regs_and_return_to_usermode) > +#ifdef CONFIG_DEBUG_ENTRY > + testl $3, CS(%rsp) > + jnz 1f > + ud2 A nit from the mere mortals in the audience: Could we start commenting or make a constant for the user segment bits

Re: [PATCH 03/18] x86/asm/64: Move SWAPGS into the common iret-to-usermode path

2017-10-27 Thread Dave Hansen
On 10/26/2017 01:26 AM, Andy Lutomirski wrote: > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S > index 493e5e234d36..1909a4e42b81 100644 > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -254,7 +254,7 @@ return_from_SYSCALL_64: > movqRCX(%rsp),

Re: [PATCH 14/18] x86/boot/64: Stop initializing TSS.sp0 at boot

2017-10-27 Thread Dave Hansen
On 10/26/2017 01:26 AM, Andy Lutomirski wrote: > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -48,7 +48,8 @@ > */ > __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = { > .x86_tss = { > - .sp0 = TOP_OF_INIT_STACK, > + /*

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread Dave Hansen
On 10/05/2015 08:22 AM, Theodore Ts'o wrote: ... > I've bisected it down to commit 998ef75ddb: "fs: do not prefault > sys_write() user buffer pages". I've confirmed that 4.3-rc2 fails as > detailed below, but with 998ef75ddb reverted, the problem goes away. ... > Before commit 998ef75ddb, if we ne

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread Dave Hansen
On 10/05/2015 08:58 AM, Linus Torvalds wrote: ... > Dave, mind sharing the micro-benchmark or perhaps even just a kernel > profile of it? How is that "iov_iter_fault_in_readable()" so > noticeable? It really shouldn't be a big deal. The micro was just plugging this test: https://www.sr71.

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread Dave Hansen
I managed to catch the condition in an ftrace. Full spew is below. We can see that the iov_iter_copy_from_user_atomic() "failed" and ended up with a copied=0 which we can see in the ext4_journalled_write_end() tracepoint as "copied 0". So we're in this code with copied=0 and len=4096: > static

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread Dave Hansen
On 10/05/2015 01:22 PM, Linus Torvalds wrote: > On Mon, Oct 5, 2015 at 5:23 PM, Dave Hansen > wrote: >> One thing I've been noticing on Skylake is that barriers (implicit and >> explicit) are showing up more in profiles. > > Ahh, you're on skylake? Yup. >

Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

2015-10-05 Thread Dave Hansen
> On Mon, Oct 5, 2015 at 10:18 PM, Linus Torvalds > wrote: >> >> Your ext4 patch may well fix the issue, and be the right thing to do >> (_regardless_ of the revert, in fact - while it might make the revert >> unnecessary, it might also be a good idea even if we do revert). > > Thinking a bit mor

Re: [PATCH 26/26] x86, pkeys: Documentation

2015-10-06 Thread Dave Hansen
On 10/03/2015 12:27 AM, Ingo Molnar wrote: > - I'd also suggest providing an initial value with the 'alloc' call. It's > true >that user-space can do this itself in assembly, OTOH there's no reason not > to >provide a C interface for this. You mean an initial value for the rights regi

Re: [PATCH 26/26] x86, pkeys: Documentation

2015-10-07 Thread Dave Hansen
On 10/03/2015 01:17 AM, Ingo Molnar wrote: > Right now the native x86 PTE format allows two protection related bits for > user-space pages: > > _PAGE_BIT_RW: if 0 the page is read-only, if 1 then it's > read-write > _PAGE_BIT_NX: if 0 the page is executab

Re: [PATCH 26/26] x86, pkeys: Documentation

2015-10-07 Thread Dave Hansen
On 10/07/2015 01:39 PM, Andy Lutomirski wrote: > On Wed, Oct 7, 2015 at 1:24 PM, Dave Hansen wrote: >> On 10/03/2015 01:17 AM, Ingo Molnar wrote: >>> Right now the native x86 PTE format allows two protection related bits for >>> user-space pages: >>> >>&g

Re: [PATCH 09/11] x86, fpu: correct and check XSAVE xstate size calculations

2015-08-28 Thread Dave Hansen
On 08/27/2015 09:54 PM, Ingo Molnar wrote: > > * Dave Hansen wrote: > >> +static int xfeature_is_supervisor(int xfeature_nr) >> +{ >> +/* >> + * We currently do not suport supervisor states, but if >> + * we did, we could find out like this. &

Re: [PATCH 11/11] x86, fpu: check CPU-provided sizes against struct declarations

2015-08-28 Thread Dave Hansen
On 08/27/2015 10:25 PM, Ingo Molnar wrote: > * Dave Hansen wrote: >> @@ -447,6 +492,14 @@ static void do_extra_xstate_size_checks( >> paranoid_xstate_size += xfeature_size(i); >> } >> XSTATE_WARN_ON(paranoid_xstate_size != xstate_size); >> +

[PATCH] e1000: fix e1000e_disable_aspm_locked() warning

2015-08-31 Thread Dave Hansen
From: Dave Hansen I have a .config with CONFIG_PM disabled. I get the following whenever compiling the e1000 driver: ...net/ethernet/intel/e1000e/netdev.c:6450:13: warning: 'e1000e_disable_aspm_locked' defined but not used [-Wunused-function] static void e1000e_disable_aspm_loc

[PATCH] xhci: fix warning when CONFIG_PM disabled.

2015-08-31 Thread Dave Hansen
From: Dave Hansen I have a .config with CONFIG_PM disabled. I get the following whenever compiling the xhci driver: drivers/usb/host/xhci-pci.c:192:13: warning: ‘xhci_pme_quirk’ defined but not used [-Wunused-function] Looks like we just need to move xhci_pme_quirk() to be underneath

[PATCH 01/15] x86, fpu: print xfeature buffer size in decimal

2015-08-31 Thread Dave Hansen
From: Dave Hansen This is utterly a personal taste thing, but I find it way easier to read structure sizes in decimal than in hex. Signed-off-by: Dave Hansen Cc: Ingo Molnar Cc: x...@kernel.org Cc: Borislav Petkov Cc: Fenghua Yu Cc: Tim Chen Cc: linux-kernel@vger.kernel.org --- b/arch

[PATCH 15/15] x86, fpu: check CPU-provided sizes against struct declarations

2015-08-31 Thread Dave Hansen
From: Dave Hansen Changes from v2: * remove XSTATE_RESERVED check, since it is gone now -- From: Dave Hansen We now have C structures defined for each of the XSAVE state components that we support. This patch adds checks during our verification pass to ensure that the CPU-provided data

[PATCH 07/15] x86, fpu: rework XSTATE_* macros to remove magic '2'

2015-08-31 Thread Dave Hansen
From: Dave Hansen The 'xstate.c' code has a bunch of references to '2'. This is because we have a lot more work to do for the "extended" xstates than the "legacy" ones and state component 2 is the first "extended" state. Th

[PATCH 14/15] x86, fpu: check to ensure increasing-offset xstate offsets

2015-08-31 Thread Dave Hansen
From: Dave Hansen The xstate CPUID leaves enumerate where each state component is inside the XSAVE buffer, along with the size of the entire buffer. Our new XSAVE sanity-checking code extrapolates an expected _total_ buffer size by looking at the last component that it encounters. That method

[PATCH 10/15] x86, fpu: rework MPX 'xstate' types

2015-08-31 Thread Dave Hansen
From: Dave Hansen MPX includes two separate "extended state components". There is no real need to have an 'mpx_struct' because we never really manage the states together. We also separate out the actual data in 'mpx_bndcsr_state' from the padding. We will short

[PATCH 13/15] x86, fpu: correct and check XSAVE xstate size calculations

2015-08-31 Thread Dave Hansen
From: Dave Hansen Note: our xsaves support is currently broken and disabled. This patch does not fix it, but it is an incremental improvement. This might be useful to someone backporting the entire set of XSAVES patches at some point, but it should not be backported alone. Ingo said he wanted

[PATCH 12/15] x86, fpu: add C structures for AVX-512 state components

2015-08-31 Thread Dave Hansen
From: Dave Hansen AVX-512 has 3 separate state components: 1. opmask registers 2. zmm upper half of registers 0-15 3. new zmm registers (16-31) This patch adds C structures for the three components along with a few comments mostly lifted from the SDM to explain what they do. This will allow

[PATCH 00/15] [v3] x86, fpu: XSAVE cleanups and sanity checks

2015-08-31 Thread Dave Hansen
Changes in v3: * rework XSTATE_* macros using Ingo's suggested naming * change state size printk to be in decimal * add some more sanity-checking to detect and work around an undersized 'xregs_state' * remove "nr_" from some of the names used. Changes in v2: * remove references to Processo

[PATCH 11/15] x86, fpu: rework YMM definition

2015-08-31 Thread Dave Hansen
From: Dave Hansen We are about to rework all of the "extended state" definitions. This makes the 'ymm' naming consistent with the AVX-512 types we will introduce later. We also add a convenience type: "reg_128_bit" so that we do not have to spell out our arithmeti

[PATCH 08/15] x86, fpu: remove xfeature_nr

2015-08-31 Thread Dave Hansen
From: Dave Hansen xfeature_nr ended up being initialized too late for me to use it in the "xsave size sanity check" patch which is later in the series. I tried to move around its initialization but realized that it was just as easy to get rid of it. We only have 9 XFEATURES.

[PATCH 04/15] x86, fpu: kill LWP support

2015-08-31 Thread Dave Hansen
From: Dave Hansen LightWeight Profiling was evidently an AMD profiling feature that we never got around to implementing. Remove the references to it. Signed-off-by: Dave Hansen Cc: Ingo Molnar Cc: x...@kernel.org Cc: Borislav Petkov Cc: Fenghua Yu Cc: Tim Chen Cc: linux-kernel

[PATCH 09/15] x86, fpu: add helper xfeature_enabled() instead of test_bit()

2015-08-31 Thread Dave Hansen
From: Dave Hansen We currently use test_bit() in a few places to see if an xfeature is enabled. It ends up being a bit ugly because 'xfeatures_mask' is a u64 and test_bit wants an 'unsigned long' so it requires a cast. The *_bit() functions are also techincally atomic, w

[PATCH 06/15] x86, fpu: rename XFEATURES_NR_MAX

2015-08-31 Thread Dave Hansen
From: Dave Hansen This is a logcal followon to the last patch. It makes the XFEATURE_MAX naming consistent with the other enum values. This is what Ingo suggested. Signed-off-by: Dave Hansen Cc: Ingo Molnar Cc: x...@kernel.org Cc: Borislav Petkov Cc: Fenghua Yu Cc: Tim Chen Cc: linux

[PATCH 05/15] x86, fpu: XSAVE macro renames

2015-08-31 Thread Dave Hansen
From: Dave Hansen There are two concepts that have some confusing naming: 1. Extended State Component numbers (currently called XFEATURE_BIT_*) 2. Extended State Component masks (currently called XSTATE_*) The numbers are (currently) from 0-9. State component 3 is the bounds registers

<    6   7   8   9   10   11   12   13   14   15   >