Re: [PATCH 1/3] glibc: Perform rseq registration at C startup and thread creation (v22)

2020-07-03 Thread Carlos O'Donell
On 7/2/20 10:46 AM, Florian Weimer wrote:
> * Mathieu Desnoyers via Libc-alpha:
> 
>> Register rseq TLS for each thread (including main), and unregister for
>> each thread (excluding main).  "rseq" stands for Restartable Sequences.
>>
>> See the rseq(2) man page proposed here:
>>   https://lkml.org/lkml/2018/9/19/647
>>
>> Those are based on glibc master branch commit 3ee1e0ec5c.
>> The rseq system call was merged into Linux 4.18.
>>
>> The TLS_STATIC_SURPLUS define is increased to leave additional room for
>> dlopen'd initial-exec TLS, which keeps elf/tst-auditmany working.
>>
>> The increase (76 bytes) is larger than 32 bytes because it has not been
>> increased in quite a while.  The cost in terms of additional TLS storage
>> is quite significant, but it will also obscure some initial-exec-related
>> dlopen failures.
> 
> We need another change to get this working on most non-x86
> architectures:
> 
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 817bcbbf59..ca13778ca9 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -134,6 +134,12 @@ void
>  _dl_determine_tlsoffset (void)
>  {
>size_t max_align = TLS_TCB_ALIGN;
> +  /* libc.so with rseq has TLS with 32-byte alignment.  Since TLS is
> + initialized before audit modules are loaded and slotinfo
> + information is available, this is not taken into account below in
> + the audit case.  */
> +  max_align = MAX (max_align, 32U);
> +
>size_t freetop = 0;
>size_t freebottom = 0;
> 
> This isn't visible on x86-64 because TLS_TCB_ALIGN is already 64 there.
> 
> I plan to re-test with this fix and push the series.
> 
> Carlos, is it okay if I fold in the dl-tls.c change if testing looks
> good?

I have reviewed the above and I think it is the correct *pragmatic* fix.

The reality is that to fix this fully you must use a two stage loading
process to pre-examine all audit modules *before* setting the fundamental
alignment of the TCB.  This isn't easy with the current loader framework.
Therefore the above is a good pragmatic solution.

There is always going to be a bit of a chicken and an egg situation.
We want to provide a fundamental alignment requirement but we haven't
yet seen all the requirements on alignment. So the best we could do is
look over DT_NEEDED, DT_AUDIT, LD_PRELOAD, etc. get the best answer
and then fail any subsequent dlopen's that load objects with higher
fundamental requirements for alignment of the TCB.

The audit modules are problematic becuase they are loaded *before*
anything else is loaded, *before* we've examined any of the actual
objects we're about to load because they can influence the search
paths. Again, this means the above solution is a perfect pragmatic
choice. The real solution is to rearchitect the early audit module
loading into two stages and that's work we can do later.

OK with the above change.

-- 
Cheers,
Carlos.



Re: [PATCH 1/3] glibc: Perform rseq registration at C startup and thread creation (v22)

2020-07-02 Thread Florian Weimer
* Mathieu Desnoyers via Libc-alpha:

> Register rseq TLS for each thread (including main), and unregister for
> each thread (excluding main).  "rseq" stands for Restartable Sequences.
>
> See the rseq(2) man page proposed here:
>   https://lkml.org/lkml/2018/9/19/647
>
> Those are based on glibc master branch commit 3ee1e0ec5c.
> The rseq system call was merged into Linux 4.18.
>
> The TLS_STATIC_SURPLUS define is increased to leave additional room for
> dlopen'd initial-exec TLS, which keeps elf/tst-auditmany working.
>
> The increase (76 bytes) is larger than 32 bytes because it has not been
> increased in quite a while.  The cost in terms of additional TLS storage
> is quite significant, but it will also obscure some initial-exec-related
> dlopen failures.

We need another change to get this working on most non-x86
architectures:

diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 817bcbbf59..ca13778ca9 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -134,6 +134,12 @@ void
 _dl_determine_tlsoffset (void)
 {
   size_t max_align = TLS_TCB_ALIGN;
+  /* libc.so with rseq has TLS with 32-byte alignment.  Since TLS is
+ initialized before audit modules are loaded and slotinfo
+ information is available, this is not taken into account below in
+ the audit case.  */
+  max_align = MAX (max_align, 32U);
+
   size_t freetop = 0;
   size_t freebottom = 0;

This isn't visible on x86-64 because TLS_TCB_ALIGN is already 64 there.

I plan to re-test with this fix and push the series.

Carlos, is it okay if I fold in the dl-tls.c change if testing looks
good?

Thanks,
Florian



[PATCH 1/3] glibc: Perform rseq registration at C startup and thread creation (v22)

2020-06-29 Thread Mathieu Desnoyers
Register rseq TLS for each thread (including main), and unregister for
each thread (excluding main).  "rseq" stands for Restartable Sequences.

See the rseq(2) man page proposed here:
  https://lkml.org/lkml/2018/9/19/647

Those are based on glibc master branch commit 3ee1e0ec5c.
The rseq system call was merged into Linux 4.18.

The TLS_STATIC_SURPLUS define is increased to leave additional room for
dlopen'd initial-exec TLS, which keeps elf/tst-auditmany working.

The increase (76 bytes) is larger than 32 bytes because it has not been
increased in quite a while.  The cost in terms of additional TLS storage
is quite significant, but it will also obscure some initial-exec-related
dlopen failures.

CC: Carlos O'Donell 
CC: Florian Weimer 
CC: Joseph Myers 
CC: Szabolcs Nagy 
CC: Thomas Gleixner 
CC: Ben Maurer 
CC: Peter Zijlstra 
CC: "Paul E. McKenney" 
CC: Boqun Feng 
CC: Will Deacon 
CC: Dave Watson 
CC: Paul Turner 
CC: Rich Felker 
CC: libc-al...@sourceware.org
CC: linux-kernel@vger.kernel.org
CC: linux-...@vger.kernel.org
---
Changes since v1:
- Move __rseq_refcount to an extra field at the end of __rseq_abi to
  eliminate one symbol.

  All libraries/programs which try to register rseq (glibc,
  early-adopter applications, early-adopter libraries) should use the
  rseq refcount.  It becomes part of the ABI within a user-space
  process, but it's not part of the ABI shared with the kernel per se.

- Restructure how this code is organized so glibc keeps building on
  non-Linux targets.

- Use non-weak symbol for __rseq_abi.

- Move rseq registration/unregistration implementation into its own
  nptl/rseq.c compile unit.

- Move __rseq_abi symbol under GLIBC_2.29.

Changes since v2:
- Move __rseq_refcount to its own symbol, which is less ugly than
  trying to play tricks with the rseq uapi.
- Move __rseq_abi from nptl to csu (C start up), so it can be used
  across glibc, including memory allocator and sched_getcpu().  The
  __rseq_refcount symbol is kept in nptl, because there is no reason
  to use it elsewhere in glibc.

Changes since v3:
- Set __rseq_refcount TLS to 1 on register/set to 0 on unregister
  because glibc is the first/last user.
- Unconditionally register/unregister rseq at thread start/exit, because
  glibc is the first/last user.
- Add missing abilist items.
- Rebase on glibc master commit a502c5294.
- Add NEWS entry.

Changes since v4:
- Do not use "weak" symbols for __rseq_abi and __rseq_refcount.  Based on
  "System V Application Binary Interface", weak only affects the link
  editor, not the dynamic linker.
- Install a new sys/rseq.h system header on Linux, which contains the
  RSEQ_SIG definition, __rseq_abi declaration and __rseq_refcount
  declaration.  Move those definition/declarations from rseq-internal.h
  to the installed sys/rseq.h header.
- Considering that rseq is only available on Linux, move csu/rseq.c to
  sysdeps/unix/sysv/linux/rseq-sym.c.
- Move __rseq_refcount from nptl/rseq.c to
  sysdeps/unix/sysv/linux/rseq-sym.c, so it is only defined on Linux.
- Move both ABI definitions for __rseq_abi and __rseq_refcount to
  sysdeps/unix/sysv/linux/Versions, so they only appear on Linux.
- Document __rseq_abi and __rseq_refcount volatile.
- Document the RSEQ_SIG signature define.
- Move registration functions from rseq.c to rseq-internal.h static
  inline functions.  Introduce empty stubs in misc/rseq-internal.h,
  which can be overridden by architecture code in
  sysdeps/unix/sysv/linux/rseq-internal.h.
- Rename __rseq_register_current_thread and __rseq_unregister_current_thread
  to rseq_register_current_thread and rseq_unregister_current_thread,
  now that those are only visible as internal static inline functions.
- Invoke rseq_register_current_thread() from libc-start.c LIBC_START_MAIN
  rather than nptl init, so applications not linked against
  libpthread.so have rseq registered for their main() thread.  Note that
  it is invoked separately for SHARED and !SHARED builds.

Changes since v5:
- Replace __rseq_refcount by __rseq_lib_abi, which contains two
  uint32_t: register_state and refcount.  The "register_state" field
  allows inhibiting rseq registration from signal handlers nested on top
  of glibc registration and occuring after rseq unregistration by glibc.
- Introduce enum rseq_register_state, which contains the states allowed
  for the struct rseq_lib_abi register_state field.

Changes since v6:
- Introduce bits/rseq.h to define RSEQ_SIG for each architecture.
  The generic bits/rseq.h does not define RSEQ_SIG, meaning that each
  architecture implementing rseq needs to implement bits/rseq.h.
- Rename enum item RSEQ_REGISTER_NESTED to RSEQ_REGISTER_ONGOING.
- Port to glibc-2.29.

Changes since v7:
- Remove __rseq_lib_abi symbol, including refcount and register_state
  fields.
- Remove reference counting and nested signals handling from
  registration/unregistration functions.
- Introduce new __rseq_handled exported symbol, which is set to 1
  by glibc on C