Re: [RFC PATCH v2 4/4] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

2018-12-07 Thread Dave Hansen
On 12/7/18 10:15 AM, Jethro Beekman wrote: > This is not sufficient to support the Fortanix SGX ABI calling > convention, which was designed to be mostly compatible with the SysV > 64-bit calling convention. The following registers need to be passed in > to an enclave from userspace: RDI, RSI,

Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-06 Thread Dave Hansen
On 12/6/18 3:28 PM, Logan Gunthorpe wrote: > I didn't think this was meant to describe actual real world performance > between all of the links. If that's the case all of this seems like a > pipe dream to me. The HMAT discussions (that I was a part of at least) settled on just trying to describe

Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-06 Thread Dave Hansen
On 12/6/18 3:28 PM, Logan Gunthorpe wrote: > These patches are really tied to world view #1. But, the HMAT is really > tied to world view #1. Whoops, should have been "the HMAT is really tied to world view #2"

Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-06 Thread Dave Hansen
On 12/6/18 2:39 PM, Jerome Glisse wrote: > No if the 4 sockets are connect in a ring fashion ie: > Socket0 - Socket1 >| | > Socket3 - Socket2 > > Then you have 4 links: > link0: socket0 socket1 > link1: socket1 socket2 > link3: socket2 socket3 > link4: socket3

Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-06 Thread Dave Hansen
On 12/6/18 12:11 PM, Logan Gunthorpe wrote: >> My concern with having folks do per-program parsing, *and* having a huge >> amount of data to parse makes it unusable. The largest systems will >> literally have hundreds of thousands of objects in /sysfs, even in a >> single directory. That makes

Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-06 Thread Dave Hansen
On 12/6/18 11:20 AM, Jerome Glisse wrote: >>> For case 1 you can pre-parse stuff but this can be done by helper library >> How would that work? Would each user/container/whatever do this once? >> Where would they keep the pre-parsed stuff? How do they manage their >> cache if the topology

Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-06 Thread Dave Hansen
On 12/5/18 9:53 AM, Jerome Glisse wrote: > No so there is 2 kinds of applications: > 1) average one: i am using device {1, 3, 9} give me best memory for >those devices ... > > For case 1 you can pre-parse stuff but this can be done by helper library How would that work? Would each

Re: [RFC PATCH 3/4] x86/traps: Attempt to fixup exceptions in vDSO before signaling

2018-12-06 Thread Dave Hansen
On 12/5/18 3:20 PM, Sean Christopherson wrote: > @@ -223,6 +224,10 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, > const char *str, > tsk->thread.error_code = error_code; > tsk->thread.trap_nr = trapnr; > > + if (user_mode(regs) && > +

Re: [RFC PATCH 2/4] x86/fault: Attempt to fixup unhandled #PF in vDSO before signaling

2018-12-06 Thread Dave Hansen
> #define CREATE_TRACE_POINTS > #include > @@ -928,6 +929,9 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned > long error_code, > if (address >= TASK_SIZE_MAX) > error_code |= X86_PF_PROT; > > + if (fixup_vdso_exception(regs,

Re: [GIT PULL] x86: remove Intel MPX

2018-12-05 Thread Dave Hansen
On 12/5/18 10:42 AM, Konrad Rzeszutek Wilk wrote: > On Wed, Dec 05, 2018 at 08:44:43AM -0800, Dave Hansen wrote: >> Hi x86 maintainers, >> >> Please pull from: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx.git >> mpx-remove &

Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-05 Thread Dave Hansen
On 12/4/18 6:13 PM, Jerome Glisse wrote: > On Tue, Dec 04, 2018 at 05:06:49PM -0800, Dave Hansen wrote: >> OK, but there are 1024*1024 matrix cells on a systems with 1024 >> proximity domains (ACPI term for NUMA node). So it sounds like you are >> proposing a milli

[GIT PULL] x86: remove Intel MPX

2018-12-05 Thread Dave Hansen
Hi x86 maintainers, Please pull from: git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx.git mpx-remove There is only one commit, removing the Intel MPX implementation from the tree. The benefits of keeping the feature in the tree are not worth the ongoing maintenance cost.

Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-04 Thread Dave Hansen
On 12/4/18 4:15 PM, Jerome Glisse wrote: > On Tue, Dec 04, 2018 at 03:54:22PM -0800, Dave Hansen wrote: >> Basically, is sysfs the right place to even expose this much data? > > I definitly want to avoid the memoryX mistake. So i do not want to > see one link directory per

Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-04 Thread Dave Hansen
On 12/4/18 1:57 PM, Jerome Glisse wrote: > Fully correct mind if i steal that perfect summary description next time > i post ? I am so bad at explaining thing :) Go for it! > Intention is to allow program to do everything they do with mbind() today > and tomorrow with the HMAT patchset and on

Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-04 Thread Dave Hansen
On 12/3/18 3:34 PM, jgli...@redhat.com wrote: > This patchset use the above scheme to expose system topology through > sysfs under /sys/bus/hms/ with: > - /sys/bus/hms/devices/v%version-%id-target/ : a target memory, > each has a UID and you can usual value in that folder (node id, >

Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-04 Thread Dave Hansen
On 12/4/18 10:49 AM, Jerome Glisse wrote: >> Also, could you add a simple, example program for how someone might use >> this? I got lost in all the new sysfs and ioctl gunk. Can you >> characterize how this would work with the *exiting* NUMA interfaces that >> we have? > That is the issue i can

Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-04 Thread Dave Hansen
On 12/4/18 10:49 AM, Jerome Glisse wrote: > Policy is same kind of story, this email is long enough now :) But > i can write one down if you want. Yes, please. I'd love to see the code. We'll do the same on the "HMAT" side and we can compare notes.

Re: [patch V2 27/28] x86/speculation: Add seccomp Spectre v2 user space protection mode

2018-12-04 Thread Dave Hansen
> static const char * const spectre_v2_user_strings[] = { > [SPECTRE_V2_USER_NONE] = "User space: Vulnerable", > [SPECTRE_V2_USER_STRICT]= "User space: Mitigation: STIBP > protection", > [SPECTRE_V2_USER_PRCTL] = "User space: Mitigation: STIBP via >

Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-04 Thread Dave Hansen
On 12/3/18 3:34 PM, jgli...@redhat.com wrote: > This means that it is no longer sufficient to consider a flat view > for each node in a system but for maximum performance we need to > account for all of this new memory but also for system topology. > This is why this proposal is unlike the HMAT

Re: [PATCH] x86/mpx: pass 'mm' to kernel_managing_mpx_tables() in mpx_notify_unmap()

2018-12-03 Thread Dave Hansen
On 12/3/18 12:43 PM, Jarkko Sakkinen wrote: > If mm is not the same as current->mm, mpx_notify_unmap() will yield > invalid results and at worst will lead to a crash if it gets called by > a kthread. It's also worth noting that this does not fix any actual, end-user-visible bug today. It really

Re: [PATCH v2 3/5] generic/pgtable: Introduce set_pte_safe()

2018-12-03 Thread Dave Hansen
On 11/30/18 4:35 PM, Dan Williams wrote: > +/* > + * The _safe versions of set_{pte,pmd,pud,p4d,pgd} validate that the > + * entry was not populated previously. I.e. for cases where a flush-tlb > + * is elided, double-check that there is no stale mapping to shoot down. > + */ Functionally these

Re: [PATCH v2 5/5] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()

2018-12-03 Thread Dave Hansen
On 12/2/18 9:04 AM, Dan Williams wrote: >> This patch on it's own doesn't apply to any of the stable trees, does it >> maybe depend on some of the previous patches in this series? > It does not strictly depend on them, but it does need to be rebased > without them. The minimum patch for -stable

Re: [PATCH 2/4] x86/mm/cpa: Fix cpa_flush_array()

2018-11-30 Thread Dave Hansen
> +void __cpa_flush_array(void *data) > { > - unsigned int i, level; > + struct cpa_data *cpa = data; > + unsigned int i; > > - if (__cpa_flush_range(baddr, numpages, cache)) > + for (i = 0; i < cpa->numpages; i++) > + __flush_tlb_one_kernel(__cpa_addr(cpa, i));

Re: MPX is broken for 32bit programs on a 64bit host

2018-11-29 Thread Dave Hansen
On 11/29/18 6:17 AM, Sebastian Andrzej Siewior wrote: > This is broken since v4.12-rc1. This is known [1] since April this year. > Should I send a removal patch for MPX or is someone actually going to > fix this? Or do we wait for gcc-9 to be released? I've got a git tree prepared to do MPX

Re: [PATCH] x86/mm/dump_pagetables: Change to use DEFINE_SHOW_ATTRIBUTE macro

2018-11-28 Thread Dave Hansen
On 11/27/18 2:50 PM, Kees Cook wrote: > On Mon, Nov 19, 2018 at 9:06 AM, Dave Hansen wrote: >> On 11/19/18 7:43 AM, Yangtao Li wrote: >>> -static const struct file_operations ptdump_curusr_fops = { >>> - .owner = THIS_MODULE, >>> -

Re: [PATCH 0/7] ACPI HMAT memory sysfs representation

2018-11-26 Thread Dave Hansen
On 11/26/18 7:38 AM, Anshuman Khandual wrote: > On 11/24/2018 12:51 AM, Dave Hansen wrote: >> On 11/22/18 10:42 PM, Anshuman Khandual wrote: >>> Are we willing to go in the direction for inclusion of a new system >>> call, subset of it appears on sysfs etc ? My pri

Re: [PATCH 0/7] ACPI HMAT memory sysfs representation

2018-11-26 Thread Dave Hansen
On 11/23/18 1:13 PM, Dan Williams wrote: >> A new system call makes total sense to me. I have the same concern >> about the completeness of what's exposed in sysfs, I just don't see a >> _route_ to completeness with sysfs itself. Thus, the minimalist >> approach as a first step. > Outside of

Re: [PATCH 0/7] ACPI HMAT memory sysfs representation

2018-11-23 Thread Dave Hansen
On 11/22/18 10:42 PM, Anshuman Khandual wrote: > Are we willing to go in the direction for inclusion of a new system > call, subset of it appears on sysfs etc ? My primary concern is not > how the attribute information appears on the sysfs but lack of it's > completeness. A new system call makes

Re: [PATCH 0/7] ACPI HMAT memory sysfs representation

2018-11-22 Thread Dave Hansen
On 11/22/18 3:52 AM, Anshuman Khandual wrote: >> >> It sounds like the subset that's being exposed is insufficient for yo >> We did that because we think doing anything but a subset in sysfs will >> just blow up sysfs: MAX_NUMNODES is as high as 1024, so if we have 4 >> attributes, that's at

Re: [Patch v6 14/16] x86/speculation: Use STIBP to restrict speculation on non-dumpable task

2018-11-21 Thread Dave Hansen
On 11/20/18 5:27 PM, Linus Torvalds wrote: > Also, "dumpable" in general is pretty oddly defined to be used for this. > > The same (privileged) process can be dumpable or not depending on how > it was started (ie if it was started by a regular user and became > trusted through suid, it's not

Re: [PATCH] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()

2018-11-19 Thread Dave Hansen
On 11/19/18 3:19 PM, Dan Williams wrote: > Andy wondered why a path that can sleep was using __flush_tlb_all() [1] > and Dave confirmed the expectation for TLB flush is for modifying / > invalidating existing pte entries, but not initial population [2]. I _think_ this is OK. But, could we

Re: [Patch v5 11/16] x86/speculation: Add Spectre v2 app to app protection modes

2018-11-19 Thread Dave Hansen
On 11/19/18 3:01 PM, Thomas Gleixner wrote: >> Yes, it wouldn't make sense for having just one of those if a task >> is worried about attack from user space. >> >> I'll document it. > What? IBPB makes tons of sense even without STIBP. I'm lost. :) I don't think anyone is talking about using

Re: [Patch v5 11/16] x86/speculation: Add Spectre v2 app to app protection modes

2018-11-19 Thread Dave Hansen
On 11/19/18 3:16 PM, Andrea Arcangeli wrote: > So you may want to ask why it wasn't written as your "any" vs "any" email: Presumably because the authors really and truly meant what they said. I was not being as careful in my wording as they were. :) There is nothing in the spec that says that

Re: [Patch v5 11/16] x86/speculation: Add Spectre v2 app to app protection modes

2018-11-19 Thread Dave Hansen
On 11/19/18 11:32 AM, Andrea Arcangeli wrote: > The specs don't say if by making it immune from BTB mistraining, it > also could prevent to mistrain the BTB in order to attack what's > outside the SECCOMP jail. Probably it won't and I doubt we can rely on > it even if some implementation could do

Re: [PATCH 0/7] ACPI HMAT memory sysfs representation

2018-11-19 Thread Dave Hansen
On 11/18/18 9:44 PM, Anshuman Khandual wrote: > IIUC NUMA re-work in principle involves these functional changes > > 1. Enumerating compute and memory nodes in heterogeneous environment > (short/medium term) This patch set _does_ that, though. > 2. Enumerating memory node attributes as seen

Re: [PATCH v2] x86/fpu: Disable BH while while loading FPU registers in __fpu__restore_sig()

2018-11-19 Thread Dave Hansen
On 11/19/18 9:27 AM, Borislav Petkov wrote: >>> I was really hoping for code comments. :) >> I though we agreed to make those in the larger series because those >> comments in __fpu__restore_sig() would be removed anyway (as part of the >> series). > Also, over local_bh_disable() does not really

Re: [PATCH v2] x86/fpu: Disable BH while while loading FPU registers in __fpu__restore_sig()

2018-11-19 Thread Dave Hansen
On 11/19/18 8:04 AM, Sebastian Andrzej Siewior wrote: > v1…v2: A more verbose commit as message. I was really hoping for code comments. :)

Re: [PATCH] x86/mm/dump_pagetables: Change to use DEFINE_SHOW_ATTRIBUTE macro

2018-11-19 Thread Dave Hansen
On 11/19/18 7:43 AM, Yangtao Li wrote: > -static const struct file_operations ptdump_curusr_fops = { > - .owner = THIS_MODULE, > - .open = ptdump_open_curusr, > - .read = seq_read, > - .llseek = seq_lseek, > - .release=

Re: [PATCH] x86/fpu: Disable BH while while loading FPU registers in __fpu__restore_sig()

2018-11-19 Thread Dave Hansen
On 11/19/18 7:06 AM, Sebastian Andrzej Siewior wrote: > On 2018-11-19 07:04:35 [-0800], Dave Hansen wrote: >> Does the local_bh_disable() itself survive? > Not in __fpu__restore_sig(). I do have: > | static inline void __fpregs_changes_begin(void) > | { > |

Re: [PATCH] x86/fpu: Disable BH while while loading FPU registers in __fpu__restore_sig()

2018-11-19 Thread Dave Hansen
On 11/19/18 3:41 AM, Sebastian Andrzej Siewior wrote: > On 2018-11-12 09:48:08 [-0800], Dave Hansen wrote: >> On 11/12/18 7:56 AM, Sebastian Andrzej Siewior wrote: >>> Use local_bh_disable() around the restore sequence to avoid the race. BH >>> needs to be disabled b

Re: STIBP by default.. Revert?

2018-11-18 Thread Dave Hansen
> On Nov 18, 2018, at 2:17 PM, Jiri Kosina wrote: > > It's probably not just browsers, but anything running JITed sandboxed > code. So the most straightforward way might be the prctl() aproach, where > userspace would claim "I do care about this, please fix it up for me". So > prctl() +

Re: [PATCH v17 07/23] x86/mm: x86/sgx: Add new 'PF_SGX' page fault error code bit

2018-11-16 Thread Dave Hansen
On 11/15/18 5:01 PM, Jarkko Sakkinen wrote: > The SGX bit is set in the #PF error code if and only if the fault is > detected by the Enclave Page Cache Map (EPCM), a hardware-managed > table that enforces the paging permissions defined by the enclave, > e.g. to prevent the kernel from changing the

Re: [PATCH v17 06/23] x86/cpu/intel: Detect SGX support and update caps appropriately

2018-11-16 Thread Dave Hansen
On 11/15/18 5:01 PM, Jarkko Sakkinen wrote: > +static void detect_sgx(struct cpuinfo_x86 *c) > +{ > + unsigned long long fc; > + > + rdmsrl(MSR_IA32_FEATURE_CONTROL, fc); > + if (!(fc & FEATURE_CONTROL_LOCKED)) { > + pr_err_once("sgx: IA32_FEATURE_CONTROL MSR is not

Re: [PATCH v17 03/23] x86/cpufeatures: Add SGX sub-features (as Linux-defined bits)

2018-11-16 Thread Dave Hansen
On 11/15/18 5:01 PM, Jarkko Sakkinen wrote: > +#define X86_FEATURE_SGX1 ( 8*32+ 0) /* SGX1 leaf functions */ > +#define X86_FEATURE_SGX2 ( 8*32+ 1) /* SGX2 leaf functions */ Is there a reason these are not (all) tied to CONFIG_INTEL_SGX via:

Re: [PATCH v3 1/2] x86/fpu: track AVX-512 usage of tasks

2018-11-16 Thread Dave Hansen
On 11/15/18 4:21 PM, Li, Aubrey wrote: > "Core cycles where the core was running with power delivery for license > level 2 (introduced in Skylake Server microarchitecture). This includes > high current AVX 512-bit instructions." > > I translated license level 2 to frequency drop. BTW, the "high"

Re: [RFC PATCH 3/4] mm, memory_hotplug: allocate memmap from the added memory range for sparse-vmemmap

2018-11-16 Thread Dave Hansen
On 11/16/18 2:12 AM, Oscar Salvador wrote: > Physical memory hotadd has to allocate a memmap (struct page array) for > the newly added memory section. Currently, kmalloc is used for those > allocations. Did you literally mean kmalloc? I thought we had a bunch of ways of allocating memmaps, but I

Re: [RFC PATCH 2/4] mm, memory_hotplug: provide a more generic restrictions for memory hotplug

2018-11-16 Thread Dave Hansen
On 11/16/18 2:12 AM, Oscar Salvador wrote: > +/* > + * Do we want sysfs memblock files created. This will allow userspace to > online > + * and offline memory explicitly. Lack of this bit means that the caller has > to > + * call move_pfn_range_to_zone to finish the initialization. > + */ > + >

Re: [PATCH 0/7] ACPI HMAT memory sysfs representation

2018-11-16 Thread Dave Hansen
On 11/15/18 10:27 PM, Anshuman Khandual wrote: > Not able to see the patches from this series either on the list or on the > archive (https://lkml.org/lkml/2018/11/15/331). IIRC last time we discussed > about this and the concern which I raised was in absence of a broader NUMA > rework for multi

Re: [PATCH v3 1/2] x86/fpu: track AVX-512 usage of tasks

2018-11-15 Thread Dave Hansen
On 11/15/18 4:21 PM, Li, Aubrey wrote: > On 2018/11/15 23:40, Dave Hansen wrote: >> On 11/14/18 3:00 PM, Aubrey Li wrote: >>> AVX-512 component has 3 states, only Hi16_ZMM state causes notable >>> frequency drop. Add per task Hi16_ZMM state tracking to context

Re: [PATCH v3 1/2] x86/fpu: track AVX-512 usage of tasks

2018-11-15 Thread Dave Hansen
that, add a decay. > Signed-off-by: Aubrey Li > Cc: Peter Zijlstra > Cc: Andi Kleen > Cc: Tim Chen > Cc: Dave Hansen > Cc: Arjan van de Ven > --- > arch/x86/include/asm/fpu/internal.h | 26 ++ > arch/x86/include/asm/fpu/types.h| 9 +

Re: [PATCH v3 2/2] proc: add /proc//arch_state

2018-11-15 Thread Dave Hansen
On 11/14/18 3:00 PM, Aubrey Li wrote: > +void arch_thread_state(struct seq_file *m, struct task_struct *task) > +{ > + /* > + * Report AVX-512 Hi16_ZMM registers usage > + */ > + if (task->thread.fpu.hi16zmm_usage) > + seq_putc(m, '1'); > + else > +

Re: [PATCH 5/7] doc/vm: New documentation for memory cache

2018-11-14 Thread Dave Hansen
On 11/14/18 2:49 PM, Keith Busch wrote: > + # tree sys/devices/system/node/node0/cache/ > + /sys/devices/system/node/node0/cache/ > + |-- index1 > + | |-- associativity > + | |-- level > + | |-- line_size > + | |-- size > + | `-- write_policy Whoops, and

Re: [PATCH 4/7] node: Add memory caching attributes

2018-11-14 Thread Dave Hansen
On 11/14/18 2:49 PM, Keith Busch wrote: > System memory may have side caches to help improve access speed. While > the system provided cache is transparent to the software accessing > these memory ranges, applications can optimize their own access based > on cache attributes. > > In preparation

Re: [PATCH RFC] selftests/x86: Add a selftest for SGX

2018-11-13 Thread Dave Hansen
On 11/13/18 1:40 PM, Jarkko Sakkinen wrote: > +int main(int argc, char **argv) > +{ > + unsigned long bin_size = (unsigned long)_bin_end - > + (unsigned long)_bin; > + struct sgx_secs secs; > + uint64_t result = 0; > + > + if (!encl_load(, bin_size)) >

Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()

2018-11-12 Thread Dave Hansen
On 11/10/18 4:31 PM, Dan Williams wrote: >> If it indeed can run late in boot or after boot, then it sure looks >> buggy. Either the __flush_tlb_all() should be removed or it should >> be replaced with flush_tlb_kernel_range(). It’s unclear to me why a >> flush is needed at all, but if it’s

Re: [PATCH] x86/fpu: Disable BH while while loading FPU registers in __fpu__restore_sig()

2018-11-12 Thread Dave Hansen
On 11/12/18 7:56 AM, Sebastian Andrzej Siewior wrote: > Use local_bh_disable() around the restore sequence to avoid the race. BH > needs to be disabled because BH is allowed to run (even with preemption > disabled) and might invoke kernel_fpu_begin(). FWIW, that would make nice comment fodder for

[RFC PATCH v2 1/2] x86/fpu: detect AVX task

2018-11-12 Thread Dave Hansen
On 11/11/18 9:38 PM, Li, Aubrey wrote: > If there is a valid state in the AVX registers, we can say the tasks contains > AVX instructions, can't we? XRSTOR, for instance, can take XSAVE state out of the init state, but it is not necessarily an AVX instruction. In fact, we had a kernel bug along

Re: [RFC PATCH v2 1/2] x86/fpu: detect AVX task

2018-11-11 Thread Dave Hansen
On 11/7/18 9:16 AM, Aubrey Li wrote: > XSAVES and its variants use init optimization to reduce the amount of > data that they save to memory during context switch. Init optimization > uses the state component bitmap to denote if a component is in its init > configuration. We use this information

Re: RFC: userspace exception fixups

2018-11-08 Thread Dave Hansen
On 11/8/18 1:16 PM, Sean Christopherson wrote: > On Thu, Nov 08, 2018 at 12:10:30PM -0800, Dave Hansen wrote: >> On 11/8/18 12:05 PM, Andy Lutomirski wrote: >>> Hmm. The idea being that the SDK preserves RBP but not RSP. That's >>> not the most terrible thing in th

Re: RFC: userspace exception fixups

2018-11-08 Thread Dave Hansen
On 11/8/18 12:05 PM, Andy Lutomirski wrote: > Hmm. The idea being that the SDK preserves RBP but not RSP. That's > not the most terrible thing in the world. But could the SDK live with > something more like my suggestion where the vDSO supplies a normal > function that takes a struct containing

Re: RFC: userspace exception fixups

2018-11-07 Thread Dave Hansen
On 11/7/18 11:01 AM, Sean Christopherson wrote: > Going off comments in similar code related to UMIP, we'd need to figure > out how to handle protection keys. There are two options: 1. Don't depend on the userspace mapping. Do get_user_pages() to find the instruction in the kernel direct map,

Re: RFC: userspace exception fixups

2018-11-06 Thread Dave Hansen
On 11/6/18 12:12 PM, Andy Lutomirski wrote: > True, but what if we have a nasty enclave that writes to memory just > below SP *before* decrementing SP? Yeah, that would be unfortunate. If an enclave did this (roughly): 1. EENTER 2. Hardware sets eenter_hwframe->sp = %sp

Re: RFC: userspace exception fixups

2018-11-06 Thread Dave Hansen
On 11/6/18 11:02 AM, Andy Lutomirski wrote: > On Tue, Nov 6, 2018 at 10:41 AM Dave Hansen wrote: >> >> On 11/6/18 10:20 AM, Andy Lutomirski wrote: >>> I almost feel like the right solution is to call into SGX on its own >>> private stack or maybe even its own pri

Re: RFC: userspace exception fixups

2018-11-06 Thread Dave Hansen
On 11/6/18 10:20 AM, Andy Lutomirski wrote: > I almost feel like the right solution is to call into SGX on its own > private stack or maybe even its own private address space. Yeah, I had the same gut feeling. Couldn't the debugger even treat the enclave like its own "thread" with its own stack

Re: RFC: userspace exception fixups

2018-11-06 Thread Dave Hansen
On 11/6/18 8:57 AM, Andy Lutomirski wrote: > I’m assuming it’s way too late for the SGX SDK to be changed to use a > normal RPC mechanism? I’m a bit disappointed that enclaves can even > manipulate outside state like this. I assume Intel had some reason > for making it possible, but still. Just

Re: RFC: userspace exception fixups

2018-11-06 Thread Dave Hansen
On 11/6/18 7:37 AM, Sean Christopherson wrote: > > void *sgx_alloc_untrusted_stack(size_t size) > { > struct sgx_encl_tls *tls = get_encl_tls(); > struct sgx_out_call_context *context; > void *tmp; > > /* create a frame on the trusted stack to hold the out-call context */

Re: [PATCH v16 18/22] platform/x86: Intel SGX driver

2018-11-06 Thread Dave Hansen
On 11/6/18 8:40 AM, Sean Christopherson wrote: >> +struct sgx_encl { >> +unsigned int flags; >> +uint64_t attributes; >> +uint64_t xfrm; >> +unsigned int page_cnt; >> +unsigned int secs_child_cnt; >> +struct mutex lock; >> +struct mm_struct *mm; >> +struct file

Re: [PATCH v15 23/23] x86/sgx: Driver documentation

2018-11-06 Thread Dave Hansen
On 11/5/18 9:49 PM, Jarkko Sakkinen wrote: > On Mon, Nov 05, 2018 at 12:27:11PM -0800, Dave Hansen wrote: >> The ABI seems entirely undocumented and rather lightly designed, which >> seems like something we should fix before this is merged. > > ABI is documented in arch/x86/i

Re: [RFC PATCH] x86/mm/fault: Allow stack access below %rsp

2018-11-05 Thread Dave Hansen
On 11/5/18 8:27 AM, Waiman Long wrote: > So gcc had changed to avoid doing that, but my main concern are old > binaries that were compiled with old gcc. Yeah, fair enough. FWIW, I don't have any strong feelings about this patch either way, but supporting old binaries/compilers without crashing

Re: [RFC PATCH] x86/mm/fault: Allow stack access below %rsp

2018-11-05 Thread Dave Hansen
On 11/4/18 9:14 PM, Andy Lutomirski wrote: > I should add: if this patch is *not* applied, then I think we'll > need to replace the sw_error_code check with user_mode(regs) to avoid > an info leak if CET is enabled. Because, with CET, WRUSS will allow > a *kernel* mode access (where regs->sp is

Re: [RFC PATCH] x86/mm/fault: Allow stack access below %rsp

2018-11-02 Thread Dave Hansen
On 11/2/18 12:50 PM, Waiman Long wrote: > On 11/02/2018 03:44 PM, Dave Hansen wrote: >> On 11/2/18 12:40 PM, Waiman Long wrote: >>> The 64k+ limit check is kind of arbitrary. So the check is now removed >>> to just let expand_stack() decide if a segmentation fault sho

Re: [RFC PATCH] x86/mm/fault: Allow stack access below %rsp

2018-11-02 Thread Dave Hansen
On 11/2/18 12:40 PM, Waiman Long wrote: > The 64k+ limit check is kind of arbitrary. So the check is now removed > to just let expand_stack() decide if a segmentation fault should happen. With the 64k check removed, what's the next limit that we bump into? Is it just the stack_guard_gap space

Re: RFC: userspace exception fixups

2018-11-02 Thread Dave Hansen
On 11/2/18 10:06 AM, Sean Christopherson wrote: > On Fri, Nov 02, 2018 at 09:56:44AM -0700, Dave Hansen wrote: >> On 11/2/18 9:30 AM, Sean Christopherson wrote: >>> What if rather than having userspace register an address for fixup, the >>> kernel instead unconditiona

Re: RFC: userspace exception fixups

2018-11-02 Thread Dave Hansen
On 11/2/18 9:30 AM, Sean Christopherson wrote: > What if rather than having userspace register an address for fixup, the > kernel instead unconditionally does fixup on the ENCLU opcode? The problem is knowing what to do for the fixup. If we have a simple action to take that's universal, like

Re: [PATCH v14 09/19] x86/mm: x86/sgx: Signal SEGV_SGXERR for #PFs w/ PF_SGX

2018-10-31 Thread Dave Hansen
On 10/31/18 2:30 PM, Sean Christopherson wrote: > On Mon, Oct 01, 2018 at 03:03:30PM -0700, Dave Hansen wrote: >> On 10/01/2018 02:42 PM, Jethro Beekman wrote: >>> >>> 1) Even though the vDSO function exists, userspace may still call >>> `ENCLU[EENT

Re: [PATCH 1/2] x86/pkeys: copy pkey state at fork()

2018-10-26 Thread Dave Hansen
On 10/26/18 12:51 PM, Dave Hansen wrote: ... > The result is that, after a fork(), the child's pkey state ends up > looking like it does after an execve(), which is totally wrong. pkeys > that are already allocated can be allocated again, for instance. One thing I omitted. This was ve

[PATCH 1/2] x86/pkeys: copy pkey state at fork()

2018-10-26 Thread Dave Hansen
From: Dave Hansen Our creation of new mm's is a bit convoluted. At fork(), the code does: 1. memcpy() the parent mm to initialize child 2. mm_init() to initalize some select stuff stuff 3. dup_mmap() to create true copies that memcpy() did not do right

[PATCH 2/2] x86/selftests/pkeys: fork() to check for state being preserved

2018-10-26 Thread Dave Hansen
From: Dave Hansen There was a bug where the per-mm pkey state was not being preserved across fork() in the child. fork() is performed in the pkey selftests, but all of our pkey activity is performed in the parent. The child does not perform any actions sensitive to pkey state. To make

Re: [PATCH v3] x86/numa_emulation: Fix uniform-split numa emulation

2018-10-25 Thread Dave Hansen
this *ever* worked on a multi-socket configuration? Or has it just never been run on a multi-socket configuration? Either way, nice changelog, and nice comments. I'd have some minor nits if you have to respin it, but otherwise: Reviewed-by: Dave Hansen

Re: [PATCH 06/17] prmem: test cases for memory protection

2018-10-25 Thread Dave Hansen
> +static bool is_address_protected(void *p) > +{ > + struct page *page; > + struct vmap_area *area; > + > + if (unlikely(!is_vmalloc_addr(p))) > + return false; > + page = vmalloc_to_page(p); > + if (unlikely(!page)) > + return false; > + wmb(); /*

Re: [PATCH 05/17] prmem: shorthands for write rare on common types

2018-10-24 Thread Dave Hansen
On 10/23/18 2:34 PM, Igor Stoppa wrote: > Wrappers around the basic write rare functionality, addressing several > common data types found in the kernel, allowing to specify the new > values through immediates, like constants and defines. I have to wonder whether this is the right way, or whether

Re: [PATCH 03/17] prmem: vmalloc support for dynamic allocation

2018-10-24 Thread Dave Hansen
On 10/23/18 2:34 PM, Igor Stoppa wrote: > +#define VM_PMALLOC 0x0100 /* pmalloc area - see docs */ > +#define VM_PMALLOC_WR0x0200 /* pmalloc write rare > area */ > +#define VM_PMALLOC_PROTECTED 0x0400 /* pmalloc protected area */ Please

Re: [PATCH 02/17] prmem: write rare for static allocation

2018-10-24 Thread Dave Hansen
> +static __always_inline bool __is_wr_after_init(const void *ptr, size_t size) > +{ > + size_t start = (size_t)&__start_wr_after_init; > + size_t end = (size_t)&__end_wr_after_init; > + size_t low = (size_t)ptr; > + size_t high = (size_t)ptr + size; > + > + return likely(start

Re: [RFC PATCH 0/5] x86: dynamic indirect call promotion

2018-10-23 Thread Dave Hansen
On 10/23/18 1:32 PM, Nadav Amit wrote: >> On 10/17/18 5:54 PM, Nadav Amit wrote: >>> baserelpoline >>> - >>> nginx 22898 25178 (+10%) >>> redis-ycsb 24523 25486 (+4%) >>> dbench 21442103 (+2%) >> Just

Re: [RFC PATCH 0/5] x86: dynamic indirect call promotion

2018-10-23 Thread Dave Hansen
On 10/17/18 5:54 PM, Nadav Amit wrote: > baserelpoline > - > nginx 22898 25178 (+10%) > redis-ycsb24523 25486 (+4%) > dbench21442103 (+2%) Just out of curiosity, which indirect

Re: [PATCH 0/9] Allow persistent memory to be used like normal RAM

2018-10-23 Thread Dave Hansen
>> This series adds a new "driver" to which pmem devices can be >> attached. Once attached, the memory "owned" by the device is >> hot-added to the kernel and managed like any other memory. On > > Would this memory be considered volatile (with the driver initializing > it to zeros), or

[tip:x86/mm] x86/mm: Kill stray kernel fault handling comment

2018-10-21 Thread tip-bot for Dave Hansen
Commit-ID: 162041425193602b15774c61740ad8e7dc157df3 Gitweb: https://git.kernel.org/tip/162041425193602b15774c61740ad8e7dc157df3 Author: Dave Hansen AuthorDate: Fri, 19 Oct 2018 07:08:42 -0700 Committer: Ingo Molnar CommitDate: Sun, 21 Oct 2018 10:58:10 +0200 x86/mm: Kill stray kernel

Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads

2018-10-19 Thread Dave Hansen
On 10/19/2018 10:37 AM, Andy Lutomirski wrote: >> I think it's much more straightforward to just not enforce pkeys. >> Having this "phantom" value could cause a very odd, nearly >> undebuggable I/O failure. > But now we have the reverse. The IO can work if it’s truly async but, > if the kernel

Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads

2018-10-19 Thread Dave Hansen
On 10/19/2018 09:59 AM, Andy Lutomirski wrote: >> That looks like a good API in general. The ffs_user_copy_worker that >> Sebastian mentioned seems to be used by AIO, in which case of course it >> has to happen in a kernel thread. >> >> But while the API is good, deciding on the desired semantics

[PATCH] x86/mm: kill stray kernel fault handling comment

2018-10-19 Thread Dave Hansen
I originally tried to send this a couple days ago, but does not appear to have made it to LKML. Sorry if it's a duplicate. -- From: Dave Hansen I originally had matching user and kernel comments, but the kernel one got improved. Some errant conflict resolution kicked the commment somewhere

Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads

2018-10-18 Thread Dave Hansen
On 10/18/2018 01:46 PM, Andy Lutomirski wrote: > Setting it to allow-all/none would let the operation always fail or > succeed which might be an improvement in terms of debugging. However it > is hard to judge what the correct behaviour should be. Should fail or > succeed. Succeed. :) > But this

Re: [PATCH 04/11] x86/fpu: eager switch PKRU state

2018-10-18 Thread Dave Hansen
>>> diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h >>> index 19b137f1b3beb..b184f916319e5 100644 >>> --- a/arch/x86/include/asm/pkeys.h >>> +++ b/arch/x86/include/asm/pkeys.h >>> @@ -119,7 +119,7 @@ extern int arch_set_user_pkey_access(struct task_struct >>> *tsk, int

Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads

2018-10-18 Thread Dave Hansen
On 10/18/2018 09:48 AM, Andy Lutomirski wrote: We might want to do this for cleanliness reasons... Maybe. But this *should* have no practical effects. Kernel threads have no real 'mm' and no user pages. They should not have do access to user mappings. Protection keys

Re: l1tf: Kernel suggests I throw away third of my memory. I'd rather not

2018-10-17 Thread Dave Hansen
On 10/17/2018 04:32 AM, Pavel Machek wrote: >> Well, that depends. Do you care about PROT_NONE attacks as well? If not >> then no-swap would help you. But even then no-swap is rather theoretical >> attack on a physical host unless you allow an arbitrary swapout to a >> malicious user (e.g. allow a

[tip:x86/urgent] x86/entry: Add some paranoid entry/exit CR3 handling comments

2018-10-14 Thread tip-bot for Dave Hansen
Commit-ID: 16561f27f94e6193ee8f5b9b74801e1668c86efc Gitweb: https://git.kernel.org/tip/16561f27f94e6193ee8f5b9b74801e1668c86efc Author: Dave Hansen AuthorDate: Fri, 12 Oct 2018 16:21:18 -0700 Committer: Thomas Gleixner CommitDate: Sun, 14 Oct 2018 11:11:22 +0200 x86/entry: Add some

Re: [PATCH 04/11] x86/fpu: eager switch PKRU state

2018-10-12 Thread Dave Hansen
On 10/12/2018 11:09 AM, Andy Lutomirski wrote: > But maybe WRPKRU is more expensive than RDPKRU and a branch? Yeah, it is more expensive. It has a higher cycle cost and it's also practically a (light) speculation barrier.

Re: [PATCH 10/11] x86/fpu: prepare copy_fpstate_to_sigframe for TIF_LOAD_FPU

2018-10-12 Thread Dave Hansen
On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote: > From: Rik van Riel > > If TIF_LOAD_FPU is set, then the registers are saved (not loaded). In that > case > we skip the saving part. This sentence hurts my brain. "If TIF_LOAD_FPU is set the registers are ... not loaded" I

Re: [PATCH 08/11] x86/fpu: Always store the registers in copy_fpstate_to_sigframe()

2018-10-12 Thread Dave Hansen
On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote: > From: Rik van Riel > > copy_fpstate_to_sigframe() has two callers and both invoke the function only > if > fpu->initialized is set. So the check in the function for ->initialized makes > no sense. It might be a relict from the lazy-FPU

Re: [PATCH 07/11] x86/pkeys: Drop the preempt-disable section

2018-10-12 Thread Dave Hansen
On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote: > The fpu->initialized flag should not be changed underneath us. This might be a > fallout during the removal of the LazyFPU support. The FPU is marked > initialized as soon as the state has been set to an initial value. It does not > signal

Re: [PATCH 06/11] x86/pkeys: make init_pkru_value static

2018-10-12 Thread Dave Hansen
On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote: > The variable init_pkru_value isn't used outside of this file. > Make init_pkru_value static. > > Signed-off-by: Sebastian Andrzej Siewior Looks good. Acked-by: Dave Hansen

  1   2   3   4   5   6   7   8   9   10   >