Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space

2021-04-14 Thread Florian Weimer
* Jann Horn: > On Wed, Apr 14, 2021 at 12:27 PM Florian Weimer wrote: >> >> * Andrei Vagin: >> >> > We already have process_vm_readv and process_vm_writev to read and write >> > to a process memory faster than we can do this with ptrace. And now it >&

Re: [PATCH v7 5/6] x86/signal: Detect and prevent an alternate signal stack overflow

2021-04-14 Thread Florian Weimer
* Borislav Petkov: > On Mon, Apr 12, 2021 at 10:30:23PM +, Bae, Chang Seok wrote: >> On Mar 26, 2021, at 03:30, Borislav Petkov wrote: >> > On Thu, Mar 25, 2021 at 09:56:53PM -0700, Andy Lutomirski wrote: >> >> We really ought to have a SIGSIGFAIL signal that's sent, double-fault >> >>

Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space

2021-04-14 Thread Florian Weimer
* Andrei Vagin: > We already have process_vm_readv and process_vm_writev to read and write > to a process memory faster than we can do this with ptrace. And now it > is time for process_vm_exec that allows executing code in an address > space of another process. We can do this with ptrace but it

Re: Candidate Linux ABI for Intel AMX and hypothetical new related features

2021-04-12 Thread Florian Weimer
* Borislav Petkov: > On Mon, Apr 12, 2021 at 04:19:29PM +0200, Florian Weimer wrote: >> Maybe we could have done this in 2016 when I reported this for the first >> time. Now it is too late, as more and more software is using >> CPUID-based detection for AVX-512. > > S

Re: static_branch/jump_label vs branch merging

2021-04-09 Thread Florian Weimer
* Ard Biesheuvel: > Wouldn't that require the compiler to interpret the contents of the > asm() block? Yes and no. It would require proper toolchain support, so in this case a new ELF relocation type, with compiler, assembler, and linker support to generate those relocations and process them.

Re: Candidate Linux ABI for Intel AMX and hypothetical new related features

2021-03-29 Thread Florian Weimer
* Len Brown via Libc-alpha: >> In particular, the library may use instructions that main() doesn't know >> exist. > > And so I'll ask my question another way. > > How is it okay to change the value of XCR0 during the run time of a > program? > > I submit that it is not, and that is a deal-killer

Re: Why does glibc use AVX-512?

2021-03-26 Thread Florian Weimer
* Andy Lutomirski: > On Fri, Mar 26, 2021 at 1:35 PM Florian Weimer wrote: >> >> * Andy Lutomirski: >> >> > On Fri, Mar 26, 2021 at 12:34 PM Florian Weimer wrote: >> >> x86: Sporadic failures in tst-cpu-features-cpuinfo >> >> <

Re: Why does glibc use AVX-512?

2021-03-26 Thread Florian Weimer
* Andy Lutomirski: > On Fri, Mar 26, 2021 at 12:34 PM Florian Weimer wrote: >> x86: Sporadic failures in tst-cpu-features-cpuinfo >> <https://sourceware.org/bugzilla/show_bug.cgi?id=27398#c3> > > It's worth noting that recent microcode updates have make RTM

Re: Why does glibc use AVX-512?

2021-03-26 Thread Florian Weimer
* Andy Lutomirski: >> > AVX-512 cleared, and programs need to explicitly request enablement. >> > This would allow programs to opt into not saving/restoring across >> > signals or to save/restore in buffers supplied when the feature is >> > enabled. >> >> Isn't XSAVEOPT already able to handle

Re: Why does glibc use AVX-512?

2021-03-26 Thread Florian Weimer
* Andy Lutomirski-alpha: > glibc appears to use AVX512F for memcpy by default. (Unless > Prefer_ERMS is default-on, but I genuinely can't tell if this is the > case. I did some searching.) The commit adding it refers to a 2016 > email saying that it's 30% on KNL. As far as I know, glibc only

Re: [PATCH v7 5/6] x86/signal: Detect and prevent an alternate signal stack overflow

2021-03-25 Thread Florian Weimer
* Chang Seok via Libc-alpha Bae: > On Mar 25, 2021, at 09:20, Borislav Petkov wrote: >> >> $ gcc tst-minsigstksz-2.c -DMY_MINSIGSTKSZ=3453 -o tst-minsigstksz-2 >> $ ./tst-minsigstksz-2 >> tst-minsigstksz-2: changed byte 50 bytes below configured stack >> >> Whoops. >> >> And the debug print

Re: [PATCH] Document that PF_KTHREAD _is_ ABI

2021-03-20 Thread Florian Weimer
* Alexey Dobriyan: > Some aren't -- PF_FORKNOEXEC. However it is silly for userspace to query it > because programs knows if it forked but didn't exec without external help. Libraries typically lack that knowledge, and may have reasons to detect forks. But there are probably better ways than

Re: [PATCH v2] ptrace: add PTRACE_GET_RSEQ_CONFIGURATION request

2021-03-03 Thread Florian Weimer
* Mathieu Desnoyers: > This way, the configuration structure can be expanded in the future. The > rseq ABI structure is by definition fixed-size, so there is no point in > having its size here. > > Florian, did I understand your request correctly, or am I missing your > point ? No, the idea was

Re: [PATCH] ptrace: add PTRACE_GET_RSEQ_CONFIGURATION request

2021-02-23 Thread Florian Weimer
* Piotr Figiel: > diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h > index 83ee45fa634b..d54cf6b6ce7c 100644 > --- a/include/uapi/linux/ptrace.h > +++ b/include/uapi/linux/ptrace.h > @@ -102,6 +102,14 @@ struct ptrace_syscall_info { > }; > }; > > +#define

LINUX_VERSION_CODE overflow (was: Re: Linux 4.9.256)

2021-02-11 Thread Florian Weimer
* Greg Kroah-Hartman: > I'm announcing the release of the 4.9.256 kernel. > > This, and the 4.4.256 release are a little bit "different" than normal. > > This contains only 1 patch, just the version bump from .255 to .256 > which ends up causing the userspace-visable LINUX_VERSION_CODE to >

Re: Linux 4.9.256

2021-02-08 Thread Florian Weimer
* Greg Kroah-Hartman: > I'm announcing the release of the 4.9.256 kernel. > > This, and the 4.4.256 release are a little bit "different" than normal. > > This contains only 1 patch, just the version bump from .255 to .256 which ends > up causing the userspace-visable LINUX_VERSION_CODE to behave

Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-12 Thread Florian Weimer
* Lukas Wunner: > On Fri, Jan 08, 2021 at 12:02:53PM -0800, Linus Torvalds wrote: >> I appreciate Arnd pointing out "--std=gnu11", though. What are the >> actual relevant language improvements? >> >> Variable declarations in for-loops is the only one I can think of. I >> think that would clean

Re: [PATCH 1/1] mm/madvise: replace ptrace attach requirement for process_madvise

2021-01-11 Thread Florian Weimer
* Suren Baghdasaryan: > diff --git a/mm/madvise.c b/mm/madvise.c > index 6a660858784b..c2d600386902 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const > struct iovec __user *, vec, > goto release_task; >

Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-07 Thread Florian Weimer
* Theodore Ts'o: > On Thu, Jan 07, 2021 at 01:37:47PM +, Russell King - ARM Linux admin > wrote: >> > The gcc bugzilla mentions backports into gcc-linaro, but I do not see >> > them in my git history. >> >> So, do we raise the minimum gcc version for the kernel as a whole to 5.1 >> or just

Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack

2020-12-03 Thread Florian Weimer
* Andy Lutomirski: > If you want a 4GB allocation to succeed, you can only divide the > address space into 32k fragments. Or, a little more precisely, if you > want a randomly selected 4GB region to be empty, any other allocation > has a 1/32k chance of being in the way. (Rough numbers — I’m

Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack

2020-12-03 Thread Florian Weimer
* Topi Miettinen: > +3 Additionally enable full randomization of memory mappings created > +with mmap(NULL, ...). With 2, the base of the VMA used for such > +mappings is random, but the mappings are created in predictable > +places within the VMA and in sequential order. With 3,

Re: [PATCH] syscalls: Document OCI seccomp filter interactions & workaround

2020-11-24 Thread Florian Weimer
* Jann Horn: > But if you can't tell whether the more modern syscall failed because > of a seccomp filter, you may be forced to retry with an older syscall > even on systems where the new syscall works fine, and such a fallback > may reduce security or reliability if you're trying to use some

Re: [PATCH] syscalls: Document OCI seccomp filter interactions & workaround

2020-11-24 Thread Florian Weimer
* Mark Wielaard: > For valgrind the issue is statx which we try to use before falling back > to stat64, fstatat or stat (depending on architecture, not all define > all of these). The problem with these fallbacks is that under some > containers (libseccomp versions) they might return EPERM

Re: [PATCH] syscalls: Document OCI seccomp filter interactions & workaround

2020-11-24 Thread Florian Weimer
* Jann Horn: > +seccomp maintainers/reviewers > [thread context is at > https://lore.kernel.org/linux-api/87lfer2c0b@oldenburg2.str.redhat.com/ > ] > > On Tue, Nov 24, 2020 at 5:49 PM Christoph Hellwig wrote: >> On Tue, Nov 24, 2020 at 03:08:05PM +0100, Mark Wielaard wrote: >> > For valgrind

Re: [PATCH] syscalls: Document OCI seccomp filter interactions & workaround

2020-11-24 Thread Florian Weimer
* Christoph Hellwig: > On Tue, Nov 24, 2020 at 03:08:09PM +0100, Florian Weimer wrote: >> Do you categorically reject the general advice, or specific instances as >> well? > > All of the above. Really, if people decided to use seccompt to return > nonsensical error

Re: [PATCH] syscalls: Document OCI seccomp filter interactions & workaround

2020-11-24 Thread Florian Weimer
* Christoph Hellwig: > On Tue, Nov 24, 2020 at 01:08:20PM +0100, Florian Weimer wrote: >> This documents a way to safely use new security-related system calls >> while preserving compatibility with container runtimes that require >> insecure emulation (because they f

Re: [PATCH] syscalls: Document OCI seccomp filter interactions & workaround

2020-11-24 Thread Florian Weimer
* Aleksa Sarai: > As I mentioned in the runc thread[1], this is really down to Docker's > default policy configuration. The EPERM-everything behaviour in OCI was > inherited from Docker, and it boils down to not having an additional > seccomp rule which does ENOSYS for unknown syscall numbers

Re: [PATCH] syscalls: Document OCI seccomp filter interactions & workaround

2020-11-24 Thread Florian Weimer
* Christian Brauner: > I'm sorry but I have some doubts about this new "rule". The idea of > being able to reliably trigger an error for a system call other then > EPERM might have merrit in some scenarios but justifying it via a bug in > a userspace standard is not enough in my opinion. > > The

[PATCH] syscalls: Document OCI seccomp filter interactions & workaround

2020-11-24 Thread Florian Weimer
, for existing system calls such as faccessat2, without kernel or container runtime changes. Signed-off-by: Florian Weimer --- Documentation/process/adding-syscalls.rst | 37 +++ 1 file changed, 37 insertions(+) diff --git a/Documentation/process/adding-syscalls.rst b

Re: [PATCH] lseek.2: SYNOPSIS: Use correct types

2020-11-22 Thread Florian Weimer
* Alejandro Colomar: > The Linux kernel uses 'unsigned int' instead of 'int' for 'fd' and > 'whence'. As glibc provides no wrapper, use the same types the > kernel uses. lseek is a POSIX interface, and glibc provides it. POSIX uses int for file descriptors (and the whence parameter in case of

Re: violating function pointer signature

2020-11-18 Thread Florian Weimer
* Segher Boessenkool: > On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote: >> I could change the stub from (void) to () if that would be better. > > Don't? In a function definition they mean exactly the same thing (and > the kernel uses (void) everywhere else, which many people find

Re: [PATCH v7 0/7] Syscall User Dispatch

2020-11-18 Thread Florian Weimer
* Gabriel Krisman Bertazi: > The main use case is to intercept Windows system calls of an application > running over Wine. While Wine is using an unmodified glibc to execute > its own native Linux syscalls, the Windows libraries might be directly > issuing syscalls that we need to capture. So

Re: violating function pointer signature

2020-11-18 Thread Florian Weimer
* Peter Zijlstra: >> The default Linux calling conventions are all of the cdecl family, >> where the caller pops the argument off the stack. You didn't quote >> enough to context to tell whether other calling conventions matter in >> your case. > > This is strictly in-kernel, and I think we're

Re: violating function pointer signature

2020-11-18 Thread Florian Weimer
* Peter Zijlstra: > I think that as long as the function is completely empty (it never > touches any of the arguments) this should work in practise. > > That is: > > void tp_nop_func(void) { } > > can be used as an argument to any function pointer that has a void > return. In fact, I already do

Re: [PATCH v7 0/7] Syscall User Dispatch

2020-11-18 Thread Florian Weimer
* Gabriel Krisman Bertazi: > This is the v7 of syscall user dispatch. This version is a bit > different from v6 on the following points, after the modifications > requested on that submission. Is this supposed to work with existing (Linux) libcs, or do you bring your own low-level run-time

Re: [PATCH v7 7/7] docs: Document Syscall User Dispatch

2020-11-18 Thread Florian Weimer
* Gabriel Krisman Bertazi: > +Interface > +- > + > +A process can setup this mechanism on supported kernels > +CONFIG_SYSCALL_USER_DISPATCH) by executing the following prctl: > + > + prctl(PR_SET_SYSCALL_USER_DISPATCH, , , , [selector]) > + > + is either PR_SYS_DISPATCH_ON or

Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

2020-11-04 Thread Florian Weimer
* Catalin Marinas: > Can the dynamic loader mmap() the main exe again while munmap'ing the > original one? (sorry if it was already discussed) No, we don't have a descriptor for that. /proc may not be mounted, and using the path stored there has a race condition anyway. Thanks, Florian -- Red

Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

2020-11-04 Thread Florian Weimer
* Will Deacon: > Is there real value in this seccomp filter if it only looks at mprotect(), > or was it just implemented because it's easy to do and sounds like a good > idea? It seems bogus to me. Everyone will just create alias mappings instead, just like they did for the similar SELinux

Re: [PATCH 2/4] elf: Move note processing after l_phdr is updated [BZ #26831]

2020-11-03 Thread Florian Weimer
* Szabolcs Nagy: > Program headers are processed in two pass: after the first pass > load segments are mmapped so in the second pass target specific > note processing logic can access the notes. > > The second pass is moved later so various link_map fields are > set up that may be useful for note

Re: [PATCH 3/4] aarch64: Use mmap to add PROT_BTI instead of mprotect [BZ #26831]

2020-11-03 Thread Florian Weimer
* Szabolcs Nagy: > Re-mmap executable segments if possible instead of using mprotect > to add PROT_BTI. This allows using BTI protection with security > policies that prevent mprotect with PROT_EXEC. > > If the fd of the ELF module is not available because it was kernel > mapped then mprotect is

Re: [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64

2020-10-29 Thread Florian Weimer
* Mathieu Desnoyers: > - On Sep 29, 2020, at 4:13 AM, Florian Weimer fwei...@redhat.com wrote: > >> * Mathieu Desnoyers: >> >>>> So we have a bootstrap issue here that needs to be solved, I think. >>> >>> The one thing I'm not sure about is w

Re: Possible bug in getdents64()?

2020-10-29 Thread Florian Weimer
* Alejandro Colomar via Libc-alpha: > [[ CC += linux-man, linux-kernel, libc-alpha, mtk ]] > > On 2020-10-28 20:26, Alejandro Colomar wrote: >> The manual page for getdents64() says the prototype should be the >> following: >>    int getdents64(unsigned int fd, struct linux_dirent64 *dirp,

Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-27 Thread Florian Weimer
* Dave Martin via Libc-alpha: > On Mon, Oct 26, 2020 at 05:45:42PM +0100, Florian Weimer via Libc-alpha wrote: >> * Dave Martin via Libc-alpha: >> >> > Would it now help to add something like: >> > >> > int mchangeprot(void *addr, size_t len, int old_fl

Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-26 Thread Florian Weimer
* Dave Martin via Libc-alpha: > Would it now help to add something like: > > int mchangeprot(void *addr, size_t len, int old_flags, int new_flags) > { > int ret = -EINVAL; > mmap_write_lock(current->mm); > if (all vmas in [addr .. addr + len) have > their

Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Florian Weimer
* Topi Miettinen: > Allowing mprotect(PROT_EXEC|PROT_BTI) would mean that all you need to > circumvent MDWX is to add PROT_BTI flag. I'd suggest getting the flags > right at mmap() time or failing that, reverting the PROT_BTI for > legacy programs later. > > Could the kernel tell the loader of

Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Florian Weimer
* Topi Miettinen: >> The dynamic loader has to process the LOAD segments to get to the ELF >> note that says to enable BTI. Maybe we could do a first pass and >> load only the segments that cover notes. But that requires lots of >> changes to generic code in the loader. > > What if the loader

Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Florian Weimer
* Lennart Poettering: > On Mi, 21.10.20 22:44, Jeremy Linton (jeremy.lin...@arm.com) wrote: > >> Hi, >> >> There is a problem with glibc+systemd on BTI enabled systems. Systemd >> has a service flag "MemoryDenyWriteExecute" which uses seccomp to deny >> PROT_EXEC changes. Glibc enables BTI only

Re: Additional debug info to aid cacheline analysis

2020-10-11 Thread Florian Weimer
* Mark Wielaard: > On Sun, Oct 11, 2020 at 02:15:18PM +0200, Florian Weimer wrote: >> * Mark Wielaard: >> >> > Yes, that would work. I don't know what the lowest supported GCC >> > version is, but technically it was definitely fixed in 4.10.0, 4.8.4 >>

Re: Additional debug info to aid cacheline analysis

2020-10-11 Thread Florian Weimer
* Mark Wielaard: > Yes, that would work. I don't know what the lowest supported GCC > version is, but technically it was definitely fixed in 4.10.0, 4.8.4 > and 4.9.2. And various distros would probably have backported the > fix. But checking for 5.0+ would certainly give you a good version. > >

Re: Control Dependencies vs C Compilers

2020-10-07 Thread Florian Weimer
* Peter Zijlstra: > On Tue, Oct 06, 2020 at 11:20:01PM +0200, Florian Weimer wrote: >> * Peter Zijlstra: >> >> > Our Documentation/memory-barriers.txt has a Control Dependencies section >> > (which I shall not replicate here for brevity) which lists a numbe

Re: Control Dependencies vs C Compilers

2020-10-06 Thread Florian Weimer
* Peter Zijlstra: > Our Documentation/memory-barriers.txt has a Control Dependencies section > (which I shall not replicate here for brevity) which lists a number of > caveats. But in general the work-around we use is: > > x = READ_ONCE(*foo); > if (x > 42) >

Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

2020-10-06 Thread Florian Weimer
* Dave Martin via Libc-alpha: > On Tue, Oct 06, 2020 at 08:33:47AM -0700, Dave Hansen wrote: >> On 10/6/20 8:25 AM, Dave Martin wrote: >> > Or are people reporting real stack overruns on x86 today? >> >> We have real overruns. We have ~2800 bytes of XSAVE (regisiter) state >> mostly from

Re: [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64

2020-09-29 Thread Florian Weimer
* Mathieu Desnoyers: >> So we have a bootstrap issue here that needs to be solved, I think. > > The one thing I'm not sure about is whether the vDSO interface is indeed > superior to KTLS, or if it is just the model we are used to. > > AFAIU, the current use-cases for vDSO is that an application

Re: [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64

2020-09-28 Thread Florian Weimer
* Mathieu Desnoyers: > Upstreaming efforts aiming to integrate rseq support into glibc led to > interesting discussions, where we identified a clear need to extend the > size of the per-thread structure shared between kernel and user-space > (struct rseq). This is something that is not possible

Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor

2020-09-24 Thread Florian Weimer
* Madhavan T. Venkataraman: > Otherwise, using an ABI quirk or a calling convention side effect to > load the PC into a GPR is, IMO, non-standard or non-compliant or > non-approved or whatever you want to call it. I would be > conservative and not use it. Who knows what incompatibility there >

Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor

2020-09-23 Thread Florian Weimer
* Solar Designer: > While I share my opinion here, I don't mean that to block Madhavan's > work. I'd rather defer to people more knowledgeable in current userland > and ABI issues/limitations and plans on dealing with those, especially > to Florian Weimer. I haven't seen Florian

Re: Expose 'array_length()' macro in

2020-09-22 Thread Florian Weimer
* Jonathan Wakely: > I don't see much point in using std::size here. If you're going to > provide the alternative implementation for when std::size isn't > defined, why not just use it always? > > template > #if __cplusplus >= 201103L > constexpr > #endif > inline std::size_t >

Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor

2020-09-17 Thread Florian Weimer
* Madhavan T. Venkataraman: > On 9/17/20 10:36 AM, Madhavan T. Venkataraman wrote: libffi == I have implemented my solution for libffi and provided the changes for X86 and ARM, 32-bit and 64-bit. Here is the reference patch:

Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor

2020-09-16 Thread Florian Weimer
* madvenka: > Examples of trampolines > === > > libffi (A Portable Foreign Function Interface Library): > > libffi allows a user to define functions with an arbitrary list of > arguments and return value through a feature called "Closures". > Closures use trampolines to jump

Re: [PATCH v9 3/3] mm/madvise: introduce process_madvise() syscall: an external memory hinting API

2020-09-03 Thread Florian Weimer
* Minchan Kim: > On Tue, Sep 01, 2020 at 08:46:02PM +0200, Florian Weimer wrote: >> * Minchan Kim: >> >> > ssize_t process_madvise(int pidfd, const struct iovec *iovec, >> > unsigned long vlen, int advice, unsigned int flags); >> >

Re: [PATCH v9 3/3] mm/madvise: introduce process_madvise() syscall: an external memory hinting API

2020-09-01 Thread Florian Weimer
* Minchan Kim: > ssize_t process_madvise(int pidfd, const struct iovec *iovec, > unsigned long vlen, int advice, unsigned int flags); size_t for vlen provides a clearer hint regarding the type of special treatment needed for ILP32 here (zero extension, not changing the type

Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack

2020-09-01 Thread Florian Weimer
* Yu-cheng Yu: > On 9/1/2020 10:50 AM, Florian Weimer wrote: >> * Yu-cheng Yu: >> >>> Like other arch_prctl()'s, this parameter was 'unsigned long' >>> earlier. The idea was, since this arch_prctl is only implemented for >>> the 64-bit kernel, we wanted

Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack

2020-09-01 Thread Florian Weimer
* Yu-cheng Yu: > Like other arch_prctl()'s, this parameter was 'unsigned long' > earlier. The idea was, since this arch_prctl is only implemented for > the 64-bit kernel, we wanted it to look as 64-bit only. I will change > it back to 'unsigned long'. What about x32? In general, long is rather

Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack

2020-08-28 Thread Florian Weimer
* H. J. Lu: > Can you think of ANY issues of passing more arguments to arch_prctl? On x32, the glibc arch_prctl system call wrapper only passes two arguments to the kernel, and applications have no way of detecting that. musl only passes two arguments on all architectures. It happens to work

Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack

2020-08-27 Thread Florian Weimer
* H. J. Lu: > On Thu, Aug 27, 2020 at 6:19 AM Florian Weimer wrote: >> >> * Dave Martin: >> >> > You're right that this has implications: for i386, libc probably pulls >> > more arguments off the stack than are really there in some situation

Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack

2020-08-27 Thread Florian Weimer
* Dave Martin: > You're right that this has implications: for i386, libc probably pulls > more arguments off the stack than are really there in some situations. > This isn't a new problem though. There are already generic prctls with > fewer than 4 args that are used on x86. As originally

Re: [RFC PATCH] mm: extend memfd with ability to create "secret" memory areas

2020-08-26 Thread Florian Weimer
* Andy Lutomirski: >> I _believe_ there are also things like AES-NI that can get strong >> protection from stuff like this. They load encryption keys into (AVX) >> registers and then can do encrypt/decrypt operations without the keys >> leaving the registers. If the key was loaded from a secret

Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack

2020-08-26 Thread Florian Weimer
* Dave Martin: > On Tue, Aug 25, 2020 at 04:34:27PM -0700, Yu, Yu-cheng wrote: >> On 8/25/2020 4:20 PM, Dave Hansen wrote: >> >On 8/25/20 2:04 PM, Yu, Yu-cheng wrote: >> I think this is more arch-specific.  Even if it becomes a new syscall, >> we still need to pass the same parameters. >>

Re: [PATCH v11 9/9] x86: Disallow vsyscall emulation when CET is enabled

2020-08-25 Thread Florian Weimer
* Andy Lutomirski: > On Mon, Aug 24, 2020 at 5:30 PM Yu-cheng Yu wrote: >> >> From: "H.J. Lu" >> >> Emulation of the legacy vsyscall page is required by some programs built >> before 2013. Newer programs after 2013 don't use it. Disallow vsyscall >> emulation when Control-flow Enforcement

Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor

2020-08-02 Thread Florian Weimer
* Madhavan T. Venkataraman: > Standardization > - > > Trampfd is a framework that can be used to implement multiple > things. May be, a few of those things can also be implemented in > user land itself. But I think having just one mechanism to execute > dynamic code objects is

Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor

2020-07-29 Thread Florian Weimer
* Andy Lutomirski: > This is quite clever, but now I’m wondering just how much kernel help > is really needed. In your series, the trampoline is an non-executable > page. I can think of at least two alternative approaches, and I'd > like to know the pros and cons. > > 1. Entirely userspace: a

Re: [PATCH v7 4/7] fs: Introduce O_MAYEXEC flag for openat2(2)

2020-07-26 Thread Florian Weimer
* Al Viro: > On Thu, Jul 23, 2020 at 07:12:24PM +0200, Mickaël Salaün wrote: >> When the O_MAYEXEC flag is passed, openat2(2) may be subject to >> additional restrictions depending on a security policy managed by the >> kernel through a sysctl or implemented by an LSM thanks to the >>

Re: [PATCH] copy_xstate_to_kernel: Fix typo which caused GDB regression

2020-07-21 Thread Florian Weimer
* Kevin Buettner: > This commit fixes a regression encountered while running the > gdb.base/corefile.exp test in GDB's test suite. > > In my testing, the typo prevented the sw_reserved field of struct > fxregs_state from being output to the kernel XSAVES area. Thus the > correct mask

Re: [RFC PATCH 0/4] rseq: Introduce extensible struct rseq

2020-07-15 Thread Florian Weimer
* Carlos O'Donell: > On 7/13/20 11:03 PM, Mathieu Desnoyers wrote: >> Recent discussion led to a solution for extending struct rseq. This is >> an implementation of the proposed solution. >> >> Now is a good time to agree on this scheme before the release of glibc >> 2.32, just in case there are

Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq

2020-07-15 Thread Florian Weimer
* Mathieu Desnoyers: > - On Jul 15, 2020, at 9:42 AM, Florian Weimer fwei...@redhat.com wrote: >> * Mathieu Desnoyers: >> > [...] >>> How would this allow early-rseq-adopter libraries to interact with >>> glibc ? >> >> Under all exte

Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq

2020-07-15 Thread Florian Weimer
* Mathieu Desnoyers: > So indeed it could be done today without upgrading the toolchains by > writing custom assembler for each architecture to get the thread's > struct rseq. AFAIU the ABI to access the thread pointer is fixed for > each architecture, right ? Yes, determining the thread pointer

Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq

2020-07-15 Thread Florian Weimer
* Mathieu Desnoyers: > Practically speaking, I suspect this would mean postponing availability of > rseq for widely deployed applications for a few more years ? There is no rseq support in GCC today, so you have to write assembler code anyway. Thanks, Florian

Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq

2020-07-15 Thread Florian Weimer
* Chris Kennelly: > When glibc provides registration, is the anticipated use case that a > library would unregister and reregister each thread to "upgrade" it to > the most modern version of interface it knows about provided by the > kernel? Absolutely not, that is likely to break other

Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq

2020-07-14 Thread Florian Weimer
* Mathieu Desnoyers: >> How are extensions going to affect the definition of struct rseq, >> including its alignment? > > The alignment will never decrease. If the structure becomes large enough > its alignment could theoretically increase. Would that be an issue ? Telling the compiler that

Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq

2020-07-14 Thread Florian Weimer
* Mathieu Desnoyers: > + /* > + * Very last field of the structure, to calculate size excluding padding > + * with offsetof(). > + */ > + char end[]; > } __attribute__((aligned(4 * sizeof(__u64; This makes the header incompatible with standard C++. How are extensions

Re: [RFC PATCH for 5.8 3/4] rseq: Introduce RSEQ_FLAG_RELIABLE_CPU_ID

2020-07-08 Thread Florian Weimer
* Christian Brauner: > I've been following this a little bit. The kernel version itself doesn't > really mean anything and the kernel version is imho not at all > interesting to userspace applications. Especially for cross-distro > programs. We can't go around and ask Red Hat, SUSE, Ubuntu,

Re: [RFC PATCH for 5.8 3/4] rseq: Introduce RSEQ_FLAG_RELIABLE_CPU_ID

2020-07-08 Thread Florian Weimer
* Mathieu Desnoyers: > Allright, thanks for the insight! I'll drop these patches and focus only > on the bugfix. Thanks, much appreciated!

Re: [RFC PATCH for 5.8 3/4] rseq: Introduce RSEQ_FLAG_RELIABLE_CPU_ID

2020-07-07 Thread Florian Weimer
* Carlos O'Donell: > It's not a great fit IMO. Just let the kernel version be the arbiter of > correctness. For manual review, sure. But checking it programmatically does not yield good results due to backports. Even those who use the stable kernel series sometimes pick up critical fixes

Re: [RFC PATCH for 5.8 0/4] rseq cpu_id ABI fix

2020-07-07 Thread Florian Weimer
I would like to point out that the subject is misleading: This is not an ABI change. It fixes the contents of the __rseq_abi TLS variable (as glibc calls it), but that's it. (Sorry, I should have mentioned this earlier.)

Re: [RFC PATCH for 5.8 3/4] rseq: Introduce RSEQ_FLAG_RELIABLE_CPU_ID

2020-07-07 Thread Florian Weimer
* Mathieu Desnoyers: > Those are very good points. One possibility we have would be to let > glibc do the rseq registration without the RSEQ_FLAG_RELIABLE_CPU_ID > flag. On kernels with the bug present, the cpu_id field is still good > enough for typical uses of sched_getcpu() which does not

Re: [RFC PATCH for 5.8 1/4] sched: Fix unreliable rseq cpu_id for new tasks

2020-07-07 Thread Florian Weimer
xpected CPU 2, expected 0 > error: Unexpected CPU 2, expected 0 > error: Unexpected CPU 138, expected 0 > error: Unexpected CPU 138, expected 0 > error: Unexpected CPU 138, expected 0 > error: Unexpected CPU 138, expected 0 As far as I can tell, the glibc reproducer no longer shows the issue with this patch applied. Tested-By: Florian Weimer

Re: [RFC PATCH for 5.8 3/4] rseq: Introduce RSEQ_FLAG_RELIABLE_CPU_ID

2020-07-07 Thread Florian Weimer
* Mathieu Desnoyers: > commit 93b585c08d16 ("Fix: sched: unreliable rseq cpu_id for new tasks") > addresses an issue with cpu_id field of newly created processes. Expose > a flag which can be used by user-space to query whether the kernel > implements this fix. > > Considering that this issue can

Re: [RFC PATCH for 5.8 0/4] rseq cpu_id ABI fix

2020-07-07 Thread Florian Weimer
* Mathieu Desnoyers: > This is an RFC aiming for quick inclusion into the Linux kernel, unless > we prefer reverting the entire rseq glibc integration and try again in 6 > months. Their upcoming release is on August 3rd, so we need to take a > decision on this matter quickly. Just to clarify

Re: [PATCH 2/3] Linux: Use rseq in sched_getcpu if available (v9)

2020-07-06 Thread Florian Weimer
* Mathieu Desnoyers: > - On Jul 6, 2020, at 1:50 PM, Florian Weimer fwei...@redhat.com wrote: > >> * Mathieu Desnoyers: >> >>> Now we need to discuss how we introduce that fix in a way that will >>> allow user-space to trust the __rseq_abi.cpu_id field's

Re: [PATCH 2/3] Linux: Use rseq in sched_getcpu if available (v9)

2020-07-06 Thread Florian Weimer
* Mathieu Desnoyers: > Now we need to discuss how we introduce that fix in a way that will > allow user-space to trust the __rseq_abi.cpu_id field's content. I don't think that's necessary. We can mention it in the glibc distribution notes on the wiki. > The usual approach to kernel bug fixing

Re: [PATCH 2/3] Linux: Use rseq in sched_getcpu if available (v9)

2020-07-06 Thread Florian Weimer
* Mathieu Desnoyers: > When available, use the cpu_id field from __rseq_abi on Linux to > implement sched_getcpu(). Fall-back on the vgetcpu vDSO if > unavailable. I've pushed this to glibc master, but unfortunately it looks like this exposes a kernel bug related to affinity mask changes.

Re: [PATCH 1/3] glibc: Perform rseq registration at C startup and thread creation (v22)

2020-07-02 Thread Florian Weimer
* Mathieu Desnoyers via Libc-alpha: > Register rseq TLS for each thread (including main), and unregister for > each thread (excluding main). "rseq" stands for Restartable Sequences. > > See the rseq(2) man page proposed here: > https://lkml.org/lkml/2018/9/19/647 > > Those are based on glibc

Re: [PATCH 1/3] glibc: Perform rseq registration at C startup and thread creation (v21)

2020-06-24 Thread Florian Weimer
* Mathieu Desnoyers: >> I think we should keep things simple on the glibc side for now and do >> this changes to the kernel headers first. > > Just to be sure I understand what you mean by "keep things simple", do you > recommend removing the following lines completely for now from sys/rseq.h ? >

Re: [PATCH 1/3] glibc: Perform rseq registration at C startup and thread creation (v21)

2020-06-24 Thread Florian Weimer
* Mathieu Desnoyers: >> I'm still worried that __rseq_static_assert and __rseq_alignof will show >> up in the UAPI with textually different definitions. (This does not >> apply to __rseq_tls_model_ie.) > > What makes this worry not apply to __rseq_tls_model_ie ? It's not needed by the kernel

Re: [PATCH 1/3] glibc: Perform rseq registration at C startup and thread creation (v21)

2020-06-24 Thread Florian Weimer
* Mathieu Desnoyers: > diff --git a/manual/threads.texi b/manual/threads.texi > index bb7a42c655..d5069d5581 100644 > --- a/manual/threads.texi > +++ b/manual/threads.texi > +@deftypevar {struct rseq} __rseq_abi > +@standards{Linux, sys/rseq.h} > +@Theglibc{} implements a @code{__rseq_abi} TLS

Re: Add a new fchmodat4() syscall, v2

2020-06-09 Thread Florian Weimer
* Palmer Dabbelt: > This patch set adds fchmodat4(), a new syscall. The actual > implementation is super simple: essentially it's just the same as > fchmodat(), but LOOKUP_FOLLOW is conditionally set based on the flags. > I've attempted to make this match "man 2 fchmodat" as closely as >

Re: [PATCH glibc 1/3] glibc: Perform rseq registration at C startup and thread creation (v20)

2020-06-03 Thread Florian Weimer
* Mathieu Desnoyers: > - On Jun 3, 2020, at 8:05 AM, Florian Weimer fwei...@redhat.com wrote: > >> * Mathieu Desnoyers: >> >>> +#ifdef __cplusplus >>> +# if __cplusplus >= 201103L >>> +# define __rseq_static_assert(expr, diagnostic) sta

Re: [PATCH glibc 1/3] glibc: Perform rseq registration at C startup and thread creation (v20)

2020-06-03 Thread Florian Weimer
* Mathieu Desnoyers: > +#ifdef __cplusplus > +# if __cplusplus >= 201103L > +# define __rseq_static_assert(expr, diagnostic) static_assert (expr, > diagnostic) > +# define __rseq_alignof(type) alignof (type) > +# define __rseq_alignas(x) alignas (x) >

Re: [PATCH v5 1/3] open: add close_range()

2020-06-02 Thread Florian Weimer
* Christian Brauner: > The performance is striking. For good measure, comparing the following > simple close_all_fds() userspace implementation that is essentially just > glibc's version in [6]: > > static int close_all_fds(void) > { > int dir_fd; > DIR *dir; > struct

Re: [PATCH glibc 1/3] glibc: Perform rseq registration at C startup and thread creation (v19)

2020-05-26 Thread Florian Weimer
* Mathieu Desnoyers: >> Like the attribute, it needs to come right after the struct keyword, I >> think. (Trailing attributes can be ambiguous, but not in this case.) > > Nope. _Alignas really _is_ special :-( > > struct _Alignas (16) blah { > int a; > }; > > p.c:1:8: error: expected ‘{’

  1   2   3   4   5   6   7   >