Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
On Wed, Jul 15, 2020 at 11:10:47AM -0400, Mathieu Desnoyers wrote: > - On Jul 15, 2020, at 8:33 AM, Christian Brauner > christian.brau...@ubuntu.com wrote: > [...] > > > > So here's a very free-wheeling draft of roughly what I had in mind. Not > > even compile-tested just to illustrate what I'd change and sorry if that > > code will make you sob in your hands: > > > [...] > > + /* > > +* With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, user_size should be > > +* statically initialized to offsetof(struct rseq, end). > > +*/ > > + __u32 user_size; > > + /* > > +* With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, if the kernel supports > > +* extensible struct rseq ABI, the kernel_size field is populated by > > +* the kernel to the minimum between user_size and the offset of the > > +* "end" field within the struct rseq supported by the kernel on > > +* successful registration. Should be initialized to 0. > > +*/ > > + __u32 kernel_size; > > Moving from __u16 to __u32 for both fields don't achieve much, and increase > the size of struct rseq (excluding padding) from 24 bytes to 28 bytes. > > Note that the struct rseq alignment is 32 bytes. At 24 bytes, it leaves room > for exactly one 8 bytes pointer, which can be useful for future extensions. > If the size is increased to 28 bytes or more, then we're done and cannot > add a pointer. > > > + __u32 active_size; > > This additional field takes the very last bytes of padding we have in the > current layout. > > > } __attribute__((aligned(4 * sizeof(__u64; > > > > +#define RSEQ_SIZE_VER0 24 /* sizeof first published struct */ > > This is incorrect. The sizeof(struct_rseq) with its 32 bytes alignment is 32, > not 24. The padding at the end of the structure is considered as part of its > size, but we cannot rely on its content being zero-initialized based on the > C standard. > > > +#define RSEQ_SIZE_VER1 32 /* sizeof second published struct */ > > +#define RSEQ_SIZE_LATEST RSEQ_SIZE_VER1 /* sizeof last published struct */ > > + > > [...] > > > @@ -349,10 +363,58 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, > > rseq_len, > > * ensure the provided rseq is properly aligned and valid. > > */ > > if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || > > - rseq_len != sizeof(*rseq)) > > + rseq_len < RSEQ_SIZE_VER0) > > This could perhaps be changed for future kernels, but will break for existing > kernels as soon as rseq_len is increased. This is something we should have > planned for in the initial implementation of the system call, but here we are. > > How do you envision that userspace would handle this failure from older > kernels ? > Try again with a second system call passing RSEQ_SIZE_VER0 as argument ? > > > return -EINVAL; > > if (!access_ok(rseq, rseq_len)) > > return -EFAULT; > > + > > + /* Handle extensible struct rseq ABI. */ > > + ret = get_user(tls_flags, >flags); > > + if (ret) > > + return ret; > > + if (tls_flags & RSEQ_TLS_FLAG_SIZE) { > > + u32 user_size, kernel_size, active_size; > > + > > + /* Can probably be made nicer by using check_zeroed_user(). */ > > + ret = get_user(user_size, >user_size); > > + if (ret) > > + return ret; > > + if (user_size != 0) > > + return -EINVAL; > > + > > + ret = get_user(active_size, >active_size); > > + if (ret) > > + return ret; > > + if (active_size != 0) > > + return -EINVAL; > > + > > + ret = get_user(active_size, >kernel_size); > > I guess you mean kernel_size here. > > > + if (ret) > > + return ret; > > + if (kernel_size != 0) > > + return -EINVAL; > > + > > + /* Calculate the useable size. */ > > + active_size = min_t(u32, rseq_len, RSEQ_SIZE_LATEST); > > Where is the rseq_len supposed to come from in userspace ? Should it be > that the code doing the registration uses sizeof(struct rseq), or > offsetof(struct rseq, end), > or should it read the content of __rseq_abi.user_size ? > > > + ret = put_user(active_size, >active_size); > > + if (ret) > > + return ret; > > + > > + /* Let other users know what userspace used to register. */ > > + ret = put_user(rseq_len, >user_size); > > + if (ret) > > + return -EFAULT; > > + > > + /* Let other users know what size the kernel supports. */ > > I am not sure what those 3 __u32 fields (user_size, kernel_size, and > active_size), > plus use of the rseq_len syscall parameter, accomplish which was not > accomplished > by my __u16 user_size + kernel_size approach ? If anything, it seems to make > support > of older kernels which do not support an extended rseq_len parameter more >
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
- On Jul 15, 2020, at 10:58 AM, Florian Weimer fwei...@redhat.com wrote: > * Mathieu Desnoyers: > >> - On Jul 15, 2020, at 9:42 AM, Florian Weimer fwei...@redhat.com wrote: >>> * Mathieu Desnoyers: >>> >> [...] How would this allow early-rseq-adopter libraries to interact with glibc ? >>> >>> Under all extension proposals I've seen so far, early adopters are >>> essentially incompatible with glibc rseq registration. I don't think >>> you can have it both ways. >> >> The basic question I'm not sure about is whether we are allowed to increase >> the size and alignement of __rseq_abi from e.g. glibc 2.32 to glibc 2.33. > > With the current mechanism (global TLS data symbol), we can do that > using symbol versioning. That means that we can only do this on a > release boundary, That should not be a problem. > and that it's incompatible with other libraries which > use an interposing unversioned symbol. We have the freedom to define the ABI of this shared __rseq_abi symbol right now. Maybe it's not such a good thing to let early adopters use unversioned __rseq_abi symbols. Let me wrap my head around this scenario then, please let me know if I'm misunderstanding something: 1) glibc 2.32 exposes: __rseq_abi (GLIBC_2.32) with size == 32. __rseq_abi with size == 32 is available as a private symbol within glibc - both symbols alias the same contents. 2) glibc 2.33 exposes: __rseq_abi (GLIBC_2.32) with size == 32. __rseq_abi (GLIBC_2.33) with size == 64. __rseq_abi with size == 64 is available as a private symbol within glibc - the three symbols alias the same contents. Then what happens if we have a program or preloaded library defining __rseq_abi (without version) with size == 32 loaded with a glibc 2.33 ? Or what happens if we have a program or preloaded libary defining __rseq_abi (GLIBC_2.32) with size == 32 loaded with a glibc 2.33 ? I wonder if "GLIBC_*" is the right version namespace for this. Considering that the layout of this structure is defined by the Linux kernel UAPI, maybe we'd want version named as "RSEQ_1.0", "RSEQ_2.0" or something similar. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
- On Jul 15, 2020, at 8:33 AM, Christian Brauner christian.brau...@ubuntu.com wrote: [...] > > So here's a very free-wheeling draft of roughly what I had in mind. Not > even compile-tested just to illustrate what I'd change and sorry if that > code will make you sob in your hands: > [...] > + /* > + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, user_size should be > + * statically initialized to offsetof(struct rseq, end). > + */ > + __u32 user_size; > + /* > + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, if the kernel supports > + * extensible struct rseq ABI, the kernel_size field is populated by > + * the kernel to the minimum between user_size and the offset of the > + * "end" field within the struct rseq supported by the kernel on > + * successful registration. Should be initialized to 0. > + */ > + __u32 kernel_size; Moving from __u16 to __u32 for both fields don't achieve much, and increase the size of struct rseq (excluding padding) from 24 bytes to 28 bytes. Note that the struct rseq alignment is 32 bytes. At 24 bytes, it leaves room for exactly one 8 bytes pointer, which can be useful for future extensions. If the size is increased to 28 bytes or more, then we're done and cannot add a pointer. > + __u32 active_size; This additional field takes the very last bytes of padding we have in the current layout. > } __attribute__((aligned(4 * sizeof(__u64; > > +#define RSEQ_SIZE_VER0 24 /* sizeof first published struct */ This is incorrect. The sizeof(struct_rseq) with its 32 bytes alignment is 32, not 24. The padding at the end of the structure is considered as part of its size, but we cannot rely on its content being zero-initialized based on the C standard. > +#define RSEQ_SIZE_VER1 32 /* sizeof second published struct */ > +#define RSEQ_SIZE_LATEST RSEQ_SIZE_VER1 /* sizeof last published struct */ > + [...] > @@ -349,10 +363,58 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, > rseq_len, >* ensure the provided rseq is properly aligned and valid. >*/ > if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || > - rseq_len != sizeof(*rseq)) > + rseq_len < RSEQ_SIZE_VER0) This could perhaps be changed for future kernels, but will break for existing kernels as soon as rseq_len is increased. This is something we should have planned for in the initial implementation of the system call, but here we are. How do you envision that userspace would handle this failure from older kernels ? Try again with a second system call passing RSEQ_SIZE_VER0 as argument ? > return -EINVAL; > if (!access_ok(rseq, rseq_len)) > return -EFAULT; > + > + /* Handle extensible struct rseq ABI. */ > + ret = get_user(tls_flags, >flags); > + if (ret) > + return ret; > + if (tls_flags & RSEQ_TLS_FLAG_SIZE) { > + u32 user_size, kernel_size, active_size; > + > + /* Can probably be made nicer by using check_zeroed_user(). */ > + ret = get_user(user_size, >user_size); > + if (ret) > + return ret; > + if (user_size != 0) > + return -EINVAL; > + > + ret = get_user(active_size, >active_size); > + if (ret) > + return ret; > + if (active_size != 0) > + return -EINVAL; > + > + ret = get_user(active_size, >kernel_size); I guess you mean kernel_size here. > + if (ret) > + return ret; > + if (kernel_size != 0) > + return -EINVAL; > + > + /* Calculate the useable size. */ > + active_size = min_t(u32, rseq_len, RSEQ_SIZE_LATEST); Where is the rseq_len supposed to come from in userspace ? Should it be that the code doing the registration uses sizeof(struct rseq), or offsetof(struct rseq, end), or should it read the content of __rseq_abi.user_size ? > + ret = put_user(active_size, >active_size); > + if (ret) > + return ret; > + > + /* Let other users know what userspace used to register. */ > + ret = put_user(rseq_len, >user_size); > + if (ret) > + return -EFAULT; > + > + /* Let other users know what size the kernel supports. */ I am not sure what those 3 __u32 fields (user_size, kernel_size, and active_size), plus use of the rseq_len syscall parameter, accomplish which was not accomplished by my __u16 user_size + kernel_size approach ? If anything, it seems to make support of older kernels which do not support an extended rseq_len parameter more complex. Thanks, Mathieu > + ret = put_user(RSEQ_SIZE_LATEST, >kernel_size); > + if (ret) > + return -EFAULT; > + > + current->rseq_size = active_size; > +
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
* Mathieu Desnoyers: > - On Jul 15, 2020, at 9:42 AM, Florian Weimer fwei...@redhat.com wrote: >> * Mathieu Desnoyers: >> > [...] >>> How would this allow early-rseq-adopter libraries to interact with >>> glibc ? >> >> Under all extension proposals I've seen so far, early adopters are >> essentially incompatible with glibc rseq registration. I don't think >> you can have it both ways. > > The basic question I'm not sure about is whether we are allowed to increase > the size and alignement of __rseq_abi from e.g. glibc 2.32 to glibc 2.33. With the current mechanism (global TLS data symbol), we can do that using symbol versioning. That means that we can only do this on a release boundary, and that it's incompatible with other libraries which use an interposing unversioned symbol. Thanks, Florian
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
- On Jul 15, 2020, at 9:42 AM, Florian Weimer fwei...@redhat.com wrote: > * Mathieu Desnoyers: > [...] >> How would this allow early-rseq-adopter libraries to interact with >> glibc ? > > Under all extension proposals I've seen so far, early adopters are > essentially incompatible with glibc rseq registration. I don't think > you can have it both ways. The basic question I'm not sure about is whether we are allowed to increase the size and alignement of __rseq_abi from e.g. glibc 2.32 to glibc 2.33. If not, then we just need to find another way to extend struct rseq, e.g. by adding a pointer to another extended structure in the padding space we have at the end of struct rseq. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
- On Jul 14, 2020, at 10:34 PM, Chris Kennelly ckenne...@google.com wrote: > On Tue, Jul 14, 2020 at 2:33 PM Peter Oskolkov wrote: >> >> On Tue, Jul 14, 2020 at 10:43 AM Mathieu Desnoyers >> wrote: >> > >> > - On Jul 14, 2020, at 1:24 PM, Peter Oskolkov p...@posk.io wrote: >> > >> > > At Google, we actually extended struct rseq (I will post the patches >> > > here once they are fully deployed and we have specific >> > > benefits/improvements to report). We did this by adding several fields >> > > below __u32 flags (the last field currently), and correspondingly >> > > increasing rseq_len in rseq() syscall. If the kernel does not know of >> > > this extension, it will return -EINVAL due to an unexpected rseq_len; >> > > then the application can either fall-back to the standard/upstream >> > > rseq, or bail. If the kernel does know of this extension, it accepts >> > > it. If the application passes the old rseq_len (32), the kernel knows >> > > that this is an old application and treats it as such. >> > > >> > > I looked through the archives, but I did not find specifically why the >> > > pretty standard approach described above is considered inferior to the >> > > one taken in this patch (freeze rseq_len at 32, add additional length >> > > fields to struct rseq). Can these be summarized? >> > >> > I think you don't face the issues I'm facing with libc rseq integration >> > because you control the entire user-space software ecosystem at Google. >> > >> > The main issue we face is that the library responsible for registering >> > rseq (either glibc 2.32+, an early-adopter librseq library, or the >> > application) may very well not be the same library defining the __rseq_abi >> > symbol used in the global symbol table. Interposition with ld preload or >> > by defining the __rseq_abi in the program's executable are good examples >> > of this kind of scenario, and those use-cases are supported. > > Does this work if/when we run out of bytes in the current sizeof(__rseq_abi)? Only if all libraries/programs involved (including glibc) expect that the size of the __rseq_abi can be the smallest possible subset, and only consider it to be "extended" if specific information in the ABI tells them it is the case. > > Which library provides the TLS symbol (and N bytes of storage) seems > sensitive to the choices the linker makes for us, once the symbol > sizes diverge. AFAIU, a symbol defined in the main executable will have precedence over a preloaded library, which has precedence over shared library dependencies, e.g. glibc. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
- On Jul 15, 2020, at 2:31 AM, Florian Weimer f...@deneb.enyo.de wrote: > * Chris Kennelly: > >> When glibc provides registration, is the anticipated use case that a >> library would unregister and reregister each thread to "upgrade" it to >> the most modern version of interface it knows about provided by the >> kernel? > > Absolutely not, that is likely to break other consumers because an > expected rseq area becomes dormant instead. Indeed. > >> There, I could assume an all-or-nothing registration of the new >> feature--limited only by kernel availability for thread >> homogeneity--but inconsistencies across early adopter libraries would >> mean each thread would have to examine its own TLS to determine if a >> feature were available. > > Exactly. Certain uses of seccomp can also have this effect, > presenting a non-homogeneous view. The nice thing about having a consistent feature-set for a given thread group is that it allows specializing the code at thread group startup, rather than requiring to dynamically check for feature availability at runtime in fast-paths. I wonder whether this kind of non-homogeneous view scenario caused by seccomp is something we should support, or something that should be documented as incompatible with rseq ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
- On Jul 15, 2020, at 9:55 AM, Christian Brauner christian.brau...@ubuntu.com wrote: > On Wed, Jul 15, 2020 at 03:42:11PM +0200, Florian Weimer wrote: >> * Mathieu Desnoyers: >> >> > So indeed it could be done today without upgrading the toolchains by >> > writing custom assembler for each architecture to get the thread's >> > struct rseq. AFAIU the ABI to access the thread pointer is fixed for >> > each architecture, right ? >> >> Yes, determining the thread pointer and access initial-exec TLS >> variables is baked into the ABI. >> >> > How would this allow early-rseq-adopter libraries to interact with >> > glibc ? >> >> Under all extension proposals I've seen so far, early adopters are >> essentially incompatible with glibc rseq registration. I don't think >> you can have it both ways. > > Who are the early adopters? And if we aren't being compatible with them > under the extensible schemes proposed we should be able to achieve > compatibility with non-early adopters, right? Which I guess is more > important. (I still struggle to make sense what qualifies as an early > adopter/what the difference to a non-early adopter is.) Early adopter libraries and applications are meant to be able to use rseq without requiring upgrade of the entire environment to a newer glibc. I maintain early adopter projects (liburcu, lttng-ust) which postpone using rseq outside of prototype branches until we agree on an ABI to share __rseq_abi between glibc and early adopter libraries. The last thing I want is for those projects to break when an end-user upgrades their glibc. tcmalloc is another early adopter which have less strict compatibility requirements: they are OK with breaking changes requiring upgrading and rebuilding tcmalloc. Indeed, until we cast in stone the layout of struct rseq as exposed by glibc, I think we have some freedom in our definition of "early adopter", because pretty much every relevant open source project which want to use rseq is waiting on glibc to define that ABI, to use rseq either as an early-adopter or through a dependency on newer glibc. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
On Wed, Jul 15, 2020 at 03:42:11PM +0200, Florian Weimer wrote: > * Mathieu Desnoyers: > > > So indeed it could be done today without upgrading the toolchains by > > writing custom assembler for each architecture to get the thread's > > struct rseq. AFAIU the ABI to access the thread pointer is fixed for > > each architecture, right ? > > Yes, determining the thread pointer and access initial-exec TLS > variables is baked into the ABI. > > > How would this allow early-rseq-adopter libraries to interact with > > glibc ? > > Under all extension proposals I've seen so far, early adopters are > essentially incompatible with glibc rseq registration. I don't think > you can have it both ways. Who are the early adopters? And if we aren't being compatible with them under the extensible schemes proposed we should be able to achieve compatibility with non-early adopters, right? Which I guess is more important. (I still struggle to make sense what qualifies as an early adopter/what the difference to a non-early adopter is.) Christian
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
* Mathieu Desnoyers: > So indeed it could be done today without upgrading the toolchains by > writing custom assembler for each architecture to get the thread's > struct rseq. AFAIU the ABI to access the thread pointer is fixed for > each architecture, right ? Yes, determining the thread pointer and access initial-exec TLS variables is baked into the ABI. > How would this allow early-rseq-adopter libraries to interact with > glibc ? Under all extension proposals I've seen so far, early adopters are essentially incompatible with glibc rseq registration. I don't think you can have it both ways. Thanks, Florian
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
- On Jul 15, 2020, at 9:22 AM, Florian Weimer fwei...@redhat.com wrote: > * Mathieu Desnoyers: > >> Practically speaking, I suspect this would mean postponing availability of >> rseq for widely deployed applications for a few more years ? > > There is no rseq support in GCC today, so you have to write assembler > code anyway. Assembler code is only needed for the rseq critical sections, not for accessing the __rseq_abi. Use-cases like getting the current cpu number do not currently require any assembler for instance. So indeed it could be done today without upgrading the toolchains by writing custom assembler for each architecture to get the thread's struct rseq. AFAIU the ABI to access the thread pointer is fixed for each architecture, right ? How would this allow early-rseq-adopter libraries to interact with glibc ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
* Mathieu Desnoyers: > Practically speaking, I suspect this would mean postponing availability of > rseq for widely deployed applications for a few more years ? There is no rseq support in GCC today, so you have to write assembler code anyway. Thanks, Florian
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
- On Jul 14, 2020, at 5:30 PM, carlos car...@redhat.com wrote: > On 7/14/20 9:19 AM, Mathieu Desnoyers wrote: >> Is there an arch-agnostic way to get the thread pointer from user-space code >> ? >> That >> would be needed by all rseq critical section implementations. > > Yes, and no. We have void *__builtin_thread_pointer (void), but > few architectures implement the builtin so we'd have to go through > a round of compiler updates and backports. All targets know how to > access the thread pointer because the compiler has to generate > IE-mode accesses to the TLS variables. Practically speaking, I suspect this would mean postponing availability of rseq for widely deployed applications for a few more years ? I can very well see end users upgrading their kernel and using an early-adoption library to use rseq today, but requiring to upgrade the entire toolchain will likely postpone adoption to many years from now. It would be good to start getting feedback from rseq users so we can progress on the system call feature development. Unfortunately everything has been in a stand-still for the past years due to lack of rseq registration coordination in user-space. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
On Wed, Jul 15, 2020 at 01:38:51PM +0200, Christian Brauner wrote: > On Mon, Jul 13, 2020 at 11:03:46PM -0400, Mathieu Desnoyers wrote: > > Add a __rseq_abi.flags "RSEQ_TLS_FLAG_SIZE", which indicates support for > > extending struct rseq. This adds two new fields to struct rseq: > > user_size and kernel_size. > > > > The user_size field allows the size of the __rseq_abi definition (which > > can be overridden by symbol interposition either by a preloaded library > > or by the application) to be handed over to the kernel at registration. > > This registration can be performed by a library, e.g. glibc, which does > > not know there is interposition taking place. > > > > The kernel_size is populated by the kernel when the "RSEQ_TLS_FLAG_SIZE" > > flag is set in __rseq_abi.flags to the minimum between user_size and > > the offset of the "end" field of struct rseq as known by the kernel. > > This allows user-space to query which fields are effectively populated > > by the kernel. > > > > A rseq_size field is added to the task struct to keep track of the > > "kernel_size" effective for each thread. > > > > Signed-off-by: Mathieu Desnoyers > > --- > > include/linux/sched.h | 4 > > include/uapi/linux/rseq.h | 37 -- > > kernel/rseq.c | 42 +-- > > 3 files changed, 75 insertions(+), 8 deletions(-) > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > index 692e327d7455..5d61a3197987 100644 > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -1147,6 +1147,7 @@ struct task_struct { > > #ifdef CONFIG_RSEQ > > struct rseq __user *rseq; > > u32 rseq_sig; > > + u32 rseq_size; > > /* > > * RmW on rseq_event_mask must be performed atomically > > * with respect to preemption. > > @@ -1976,10 +1977,12 @@ static inline void rseq_fork(struct task_struct *t, > > unsigned long clone_flags) > > if (clone_flags & CLONE_VM) { > > t->rseq = NULL; > > t->rseq_sig = 0; > > + t->rseq_size = 0; > > t->rseq_event_mask = 0; > > } else { > > t->rseq = current->rseq; > > t->rseq_sig = current->rseq_sig; > > + t->rseq_size = current->rseq_size; > > t->rseq_event_mask = current->rseq_event_mask; > > } > > } > > @@ -1988,6 +1991,7 @@ static inline void rseq_execve(struct task_struct *t) > > { > > t->rseq = NULL; > > t->rseq_sig = 0; > > + t->rseq_size = 0; > > t->rseq_event_mask = 0; > > } > > > > diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h > > index e11d9df5e564..03c0b5e9a859 100644 > > --- a/include/uapi/linux/rseq.h > > +++ b/include/uapi/linux/rseq.h > > @@ -37,6 +37,15 @@ enum rseq_cs_flags { > > (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT), > > }; > > > > +enum rseq_tls_flags_bit { > > + /* enum rseq_cs_flags reserves bits 0-2. */ > > + RSEQ_TLS_FLAG_SIZE_BIT = 3, > > +}; > > + > > +enum rseq_tls_flags { > > + RSEQ_TLS_FLAG_SIZE = (1U << RSEQ_TLS_FLAG_SIZE_BIT), > > +}; > > + > > /* The rseq_len expected by rseq registration is always 32 bytes. */ > > enum rseq_len_expected { > > RSEQ_LEN_EXPECTED = 32, > > @@ -133,8 +142,9 @@ struct rseq { > > * > > * This field should only be updated by the thread which > > * registered this data structure. Read by the kernel. > > -* Mainly used for single-stepping through rseq critical sections > > -* with debuggers. > > +* > > +* The RSEQ_CS flags are mainly used for single-stepping through rseq > > +* critical sections with debuggers. > > * > > * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT > > * Inhibit instruction sequence block restart on preemption > > @@ -145,8 +155,31 @@ struct rseq { > > * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE > > * Inhibit instruction sequence block restart on migration for > > * this thread. > > +* > > +* - RSEQ_TLS_FLAG_SIZE > > +* Extensible struct rseq ABI. This flag should be statically > > +* initialized. > > */ > > __u32 flags; > > + /* > > +* With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, user_size should be > > +* statically initialized to offsetof(struct rseq, end). > > +*/ > > + __u16 user_size; > > + /* > > +* With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, if the kernel supports > > +* extensible struct rseq ABI, the kernel_size field is populated by > > +* the kernel to the minimum between user_size and the offset of the > > +* "end" field within the struct rseq supported by the kernel on > > +* successful registration. Should be initialized to 0. > > +*/ > > + __u16 kernel_size; > > (Btw, this what I suggested - minus the user_size part - when I said > "expose the size of struct rseq" the kernel knows about. The approach > here is of course more general.) > > It's pretty uncommon to use
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
On Mon, Jul 13, 2020 at 11:03:46PM -0400, Mathieu Desnoyers wrote: > Add a __rseq_abi.flags "RSEQ_TLS_FLAG_SIZE", which indicates support for > extending struct rseq. This adds two new fields to struct rseq: > user_size and kernel_size. > > The user_size field allows the size of the __rseq_abi definition (which > can be overridden by symbol interposition either by a preloaded library > or by the application) to be handed over to the kernel at registration. > This registration can be performed by a library, e.g. glibc, which does > not know there is interposition taking place. > > The kernel_size is populated by the kernel when the "RSEQ_TLS_FLAG_SIZE" > flag is set in __rseq_abi.flags to the minimum between user_size and > the offset of the "end" field of struct rseq as known by the kernel. > This allows user-space to query which fields are effectively populated > by the kernel. > > A rseq_size field is added to the task struct to keep track of the > "kernel_size" effective for each thread. > > Signed-off-by: Mathieu Desnoyers > --- > include/linux/sched.h | 4 > include/uapi/linux/rseq.h | 37 -- > kernel/rseq.c | 42 +-- > 3 files changed, 75 insertions(+), 8 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 692e327d7455..5d61a3197987 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1147,6 +1147,7 @@ struct task_struct { > #ifdef CONFIG_RSEQ > struct rseq __user *rseq; > u32 rseq_sig; > + u32 rseq_size; > /* >* RmW on rseq_event_mask must be performed atomically >* with respect to preemption. > @@ -1976,10 +1977,12 @@ static inline void rseq_fork(struct task_struct *t, > unsigned long clone_flags) > if (clone_flags & CLONE_VM) { > t->rseq = NULL; > t->rseq_sig = 0; > + t->rseq_size = 0; > t->rseq_event_mask = 0; > } else { > t->rseq = current->rseq; > t->rseq_sig = current->rseq_sig; > + t->rseq_size = current->rseq_size; > t->rseq_event_mask = current->rseq_event_mask; > } > } > @@ -1988,6 +1991,7 @@ static inline void rseq_execve(struct task_struct *t) > { > t->rseq = NULL; > t->rseq_sig = 0; > + t->rseq_size = 0; > t->rseq_event_mask = 0; > } > > diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h > index e11d9df5e564..03c0b5e9a859 100644 > --- a/include/uapi/linux/rseq.h > +++ b/include/uapi/linux/rseq.h > @@ -37,6 +37,15 @@ enum rseq_cs_flags { > (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT), > }; > > +enum rseq_tls_flags_bit { > + /* enum rseq_cs_flags reserves bits 0-2. */ > + RSEQ_TLS_FLAG_SIZE_BIT = 3, > +}; > + > +enum rseq_tls_flags { > + RSEQ_TLS_FLAG_SIZE = (1U << RSEQ_TLS_FLAG_SIZE_BIT), > +}; > + > /* The rseq_len expected by rseq registration is always 32 bytes. */ > enum rseq_len_expected { > RSEQ_LEN_EXPECTED = 32, > @@ -133,8 +142,9 @@ struct rseq { >* >* This field should only be updated by the thread which >* registered this data structure. Read by the kernel. > - * Mainly used for single-stepping through rseq critical sections > - * with debuggers. > + * > + * The RSEQ_CS flags are mainly used for single-stepping through rseq > + * critical sections with debuggers. >* >* - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT >* Inhibit instruction sequence block restart on preemption > @@ -145,8 +155,31 @@ struct rseq { >* - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE >* Inhibit instruction sequence block restart on migration for >* this thread. > + * > + * - RSEQ_TLS_FLAG_SIZE > + * Extensible struct rseq ABI. This flag should be statically > + * initialized. >*/ > __u32 flags; > + /* > + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, user_size should be > + * statically initialized to offsetof(struct rseq, end). > + */ > + __u16 user_size; > + /* > + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, if the kernel supports > + * extensible struct rseq ABI, the kernel_size field is populated by > + * the kernel to the minimum between user_size and the offset of the > + * "end" field within the struct rseq supported by the kernel on > + * successful registration. Should be initialized to 0. > + */ > + __u16 kernel_size; (Btw, this what I suggested - minus the user_size part - when I said "expose the size of struct rseq" the kernel knows about. The approach here is of course more general.) It's pretty uncommon to use __u16 for sizes at least in public facing structs. I'd suggest to use __u32 user_size and __u32 kernel_size and if needed, insert padding. Seems you have done this in your union above already.
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
On Wed, Jul 15, 2020 at 08:31:05AM +0200, Florian Weimer wrote: > * Chris Kennelly: > > > When glibc provides registration, is the anticipated use case that a > > library would unregister and reregister each thread to "upgrade" it to > > the most modern version of interface it knows about provided by the > > kernel? > > Absolutely not, that is likely to break other consumers because an > expected rseq area becomes dormant instead. > > > There, I could assume an all-or-nothing registration of the new > > feature--limited only by kernel availability for thread > > homogeneity--but inconsistencies across early adopter libraries would > > mean each thread would have to examine its own TLS to determine if a > > feature were available. Fwiw, I pointed this out in the discussions that led up to this patchset. I don't see how this can work if threads don't check for their feature set. > > Exactly. Certain uses of seccomp can also have this effect, > presenting a non-homogeneous view. Good point. There might be threads with a seccomp filter that would block rseq features is what you mean, I assume. Christian
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
* Chris Kennelly: > When glibc provides registration, is the anticipated use case that a > library would unregister and reregister each thread to "upgrade" it to > the most modern version of interface it knows about provided by the > kernel? Absolutely not, that is likely to break other consumers because an expected rseq area becomes dormant instead. > There, I could assume an all-or-nothing registration of the new > feature--limited only by kernel availability for thread > homogeneity--but inconsistencies across early adopter libraries would > mean each thread would have to examine its own TLS to determine if a > feature were available. Exactly. Certain uses of seccomp can also have this effect, presenting a non-homogeneous view.
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
On Tue, Jul 14, 2020 at 2:33 PM Peter Oskolkov wrote: > > On Tue, Jul 14, 2020 at 10:43 AM Mathieu Desnoyers > wrote: > > > > - On Jul 14, 2020, at 1:24 PM, Peter Oskolkov p...@posk.io wrote: > > > > > At Google, we actually extended struct rseq (I will post the patches > > > here once they are fully deployed and we have specific > > > benefits/improvements to report). We did this by adding several fields > > > below __u32 flags (the last field currently), and correspondingly > > > increasing rseq_len in rseq() syscall. If the kernel does not know of > > > this extension, it will return -EINVAL due to an unexpected rseq_len; > > > then the application can either fall-back to the standard/upstream > > > rseq, or bail. If the kernel does know of this extension, it accepts > > > it. If the application passes the old rseq_len (32), the kernel knows > > > that this is an old application and treats it as such. > > > > > > I looked through the archives, but I did not find specifically why the > > > pretty standard approach described above is considered inferior to the > > > one taken in this patch (freeze rseq_len at 32, add additional length > > > fields to struct rseq). Can these be summarized? > > > > I think you don't face the issues I'm facing with libc rseq integration > > because you control the entire user-space software ecosystem at Google. > > > > The main issue we face is that the library responsible for registering > > rseq (either glibc 2.32+, an early-adopter librseq library, or the > > application) may very well not be the same library defining the __rseq_abi > > symbol used in the global symbol table. Interposition with ld preload or > > by defining the __rseq_abi in the program's executable are good examples > > of this kind of scenario, and those use-cases are supported. Does this work if/when we run out of bytes in the current sizeof(__rseq_abi)? Which library provides the TLS symbol (and N bytes of storage) seems sensitive to the choices the linker makes for us, once the symbol sizes diverge. > > So the size of the __rseq_abi structure may be larger than the struct > > rseq known by glibc (and eventually smaller, if future glibc versions > > extend their __rseq_abi size but is loaded with an older program/library > > doing __rseq_abi interposition). When glibc provides registration, is the anticipated use case that a library would unregister and reregister each thread to "upgrade" it to the most modern version of interface it knows about provided by the kernel? > > So we need some way to allow code defining the __rseq_abi to let the kernel > > know how much room is available, without necessarily requiring the code > > responsible for rseq registration to be aware of that extended layout. > > This is the purpose of the __rseq_abi.flags RSEQ_FLAG_TLS_SIZE and field > > __rseq_abi.user_size. > > > > And we need some way to allow the kernel to let user-space rseq critical > > sections (user code) know how much of those fields are actually populated > > by the kernel. This is the purpose of __rseq_abi.flags RSEQ_FLAG_TLS_SIZE > > with __rseq_abi.kernel_size. I authored the userspace component (https://github.com/google/tcmalloc/commit/ad136d45f75a273b934446699cef8b278c34ec6e) that consumes the extensions Peter mentions and found that minimizing the performance impact of their potential absence was a bit of a challenge. There, I could assume an all-or-nothing registration of the new feature--limited only by kernel availability for thread homogeneity--but inconsistencies across early adopter libraries would mean each thread would have to examine its own TLS to determine if a feature were available. Chris
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
On 7/14/20 9:19 AM, Mathieu Desnoyers wrote: > Is there an arch-agnostic way to get the thread pointer from user-space code > ? That > would be needed by all rseq critical section implementations. Yes, and no. We have void *__builtin_thread_pointer (void), but few architectures implement the builtin so we'd have to go through a round of compiler updates and backports. All targets know how to access the thread pointer because the compiler has to generate IE-mode accesses to the TLS variables. I have filed an enhancement request: Bug 96200 - Implement __builtin_thread_pointer() and __builtin_set_thread_pointer() if TLS is supported https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96200 We have glibc internal macro APIs to access the thread pointer, but I would rather the compiler handle the access since it can schedule the resulting sequence better. On some arches setting the therad pointer needs a syscall or equivalent operation (hppa), and for some arches there is no fixed register (arm) hence the need for __builtin_thread_pointer() to force the compiler to place the pointer into a register for function return. -- Cheers, Carlos.
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
On Tue, Jul 14, 2020 at 10:43 AM Mathieu Desnoyers wrote: > > - On Jul 14, 2020, at 1:24 PM, Peter Oskolkov p...@posk.io wrote: > > > At Google, we actually extended struct rseq (I will post the patches > > here once they are fully deployed and we have specific > > benefits/improvements to report). We did this by adding several fields > > below __u32 flags (the last field currently), and correspondingly > > increasing rseq_len in rseq() syscall. If the kernel does not know of > > this extension, it will return -EINVAL due to an unexpected rseq_len; > > then the application can either fall-back to the standard/upstream > > rseq, or bail. If the kernel does know of this extension, it accepts > > it. If the application passes the old rseq_len (32), the kernel knows > > that this is an old application and treats it as such. > > > > I looked through the archives, but I did not find specifically why the > > pretty standard approach described above is considered inferior to the > > one taken in this patch (freeze rseq_len at 32, add additional length > > fields to struct rseq). Can these be summarized? > > I think you don't face the issues I'm facing with libc rseq integration > because you control the entire user-space software ecosystem at Google. > > The main issue we face is that the library responsible for registering > rseq (either glibc 2.32+, an early-adopter librseq library, or the > application) may very well not be the same library defining the __rseq_abi > symbol used in the global symbol table. Interposition with ld preload or > by defining the __rseq_abi in the program's executable are good examples > of this kind of scenario, and those use-cases are supported. > > So the size of the __rseq_abi structure may be larger than the struct > rseq known by glibc (and eventually smaller, if future glibc versions > extend their __rseq_abi size but is loaded with an older program/library > doing __rseq_abi interposition). > > So we need some way to allow code defining the __rseq_abi to let the kernel > know how much room is available, without necessarily requiring the code > responsible for rseq registration to be aware of that extended layout. > This is the purpose of the __rseq_abi.flags RSEQ_FLAG_TLS_SIZE and field > __rseq_abi.user_size. > > And we need some way to allow the kernel to let user-space rseq critical > sections (user code) know how much of those fields are actually populated > by the kernel. This is the purpose of __rseq_abi.flags RSEQ_FLAG_TLS_SIZE > with __rseq_abi.kernel_size. Thanks, Mathieu, for the explanation. Yes, multiple unrelated libraries having to share struct rseq complicates matters. Your approach appears to be a way to reconcile the issues you outlined above. Thanks, Peter > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
- On Jul 14, 2020, at 1:24 PM, Peter Oskolkov p...@posk.io wrote: > At Google, we actually extended struct rseq (I will post the patches > here once they are fully deployed and we have specific > benefits/improvements to report). We did this by adding several fields > below __u32 flags (the last field currently), and correspondingly > increasing rseq_len in rseq() syscall. If the kernel does not know of > this extension, it will return -EINVAL due to an unexpected rseq_len; > then the application can either fall-back to the standard/upstream > rseq, or bail. If the kernel does know of this extension, it accepts > it. If the application passes the old rseq_len (32), the kernel knows > that this is an old application and treats it as such. > > I looked through the archives, but I did not find specifically why the > pretty standard approach described above is considered inferior to the > one taken in this patch (freeze rseq_len at 32, add additional length > fields to struct rseq). Can these be summarized? I think you don't face the issues I'm facing with libc rseq integration because you control the entire user-space software ecosystem at Google. The main issue we face is that the library responsible for registering rseq (either glibc 2.32+, an early-adopter librseq library, or the application) may very well not be the same library defining the __rseq_abi symbol used in the global symbol table. Interposition with ld preload or by defining the __rseq_abi in the program's executable are good examples of this kind of scenario, and those use-cases are supported. So the size of the __rseq_abi structure may be larger than the struct rseq known by glibc (and eventually smaller, if future glibc versions extend their __rseq_abi size but is loaded with an older program/library doing __rseq_abi interposition). So we need some way to allow code defining the __rseq_abi to let the kernel know how much room is available, without necessarily requiring the code responsible for rseq registration to be aware of that extended layout. This is the purpose of the __rseq_abi.flags RSEQ_FLAG_TLS_SIZE and field __rseq_abi.user_size. And we need some way to allow the kernel to let user-space rseq critical sections (user code) know how much of those fields are actually populated by the kernel. This is the purpose of __rseq_abi.flags RSEQ_FLAG_TLS_SIZE with __rseq_abi.kernel_size. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
At Google, we actually extended struct rseq (I will post the patches here once they are fully deployed and we have specific benefits/improvements to report). We did this by adding several fields below __u32 flags (the last field currently), and correspondingly increasing rseq_len in rseq() syscall. If the kernel does not know of this extension, it will return -EINVAL due to an unexpected rseq_len; then the application can either fall-back to the standard/upstream rseq, or bail. If the kernel does know of this extension, it accepts it. If the application passes the old rseq_len (32), the kernel knows that this is an old application and treats it as such. I looked through the archives, but I did not find specifically why the pretty standard approach described above is considered inferior to the one taken in this patch (freeze rseq_len at 32, add additional length fields to struct rseq). Can these be summarized? Thanks, Peter On Mon, Jul 13, 2020 at 8:04 PM Mathieu Desnoyers wrote: > > Add a __rseq_abi.flags "RSEQ_TLS_FLAG_SIZE", which indicates support for > extending struct rseq. This adds two new fields to struct rseq: > user_size and kernel_size. > > The user_size field allows the size of the __rseq_abi definition (which > can be overridden by symbol interposition either by a preloaded library > or by the application) to be handed over to the kernel at registration. > This registration can be performed by a library, e.g. glibc, which does > not know there is interposition taking place. > > The kernel_size is populated by the kernel when the "RSEQ_TLS_FLAG_SIZE" > flag is set in __rseq_abi.flags to the minimum between user_size and > the offset of the "end" field of struct rseq as known by the kernel. > This allows user-space to query which fields are effectively populated > by the kernel. > > A rseq_size field is added to the task struct to keep track of the > "kernel_size" effective for each thread. > > Signed-off-by: Mathieu Desnoyers > --- > include/linux/sched.h | 4 > include/uapi/linux/rseq.h | 37 -- > kernel/rseq.c | 42 +-- > 3 files changed, 75 insertions(+), 8 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 692e327d7455..5d61a3197987 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1147,6 +1147,7 @@ struct task_struct { > #ifdef CONFIG_RSEQ > struct rseq __user *rseq; > u32 rseq_sig; > + u32 rseq_size; > /* > * RmW on rseq_event_mask must be performed atomically > * with respect to preemption. > @@ -1976,10 +1977,12 @@ static inline void rseq_fork(struct task_struct *t, > unsigned long clone_flags) > if (clone_flags & CLONE_VM) { > t->rseq = NULL; > t->rseq_sig = 0; > + t->rseq_size = 0; > t->rseq_event_mask = 0; > } else { > t->rseq = current->rseq; > t->rseq_sig = current->rseq_sig; > + t->rseq_size = current->rseq_size; > t->rseq_event_mask = current->rseq_event_mask; > } > } > @@ -1988,6 +1991,7 @@ static inline void rseq_execve(struct task_struct *t) > { > t->rseq = NULL; > t->rseq_sig = 0; > + t->rseq_size = 0; > t->rseq_event_mask = 0; > } > > diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h > index e11d9df5e564..03c0b5e9a859 100644 > --- a/include/uapi/linux/rseq.h > +++ b/include/uapi/linux/rseq.h > @@ -37,6 +37,15 @@ enum rseq_cs_flags { > (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT), > }; > > +enum rseq_tls_flags_bit { > + /* enum rseq_cs_flags reserves bits 0-2. */ > + RSEQ_TLS_FLAG_SIZE_BIT = 3, > +}; > + > +enum rseq_tls_flags { > + RSEQ_TLS_FLAG_SIZE = (1U << RSEQ_TLS_FLAG_SIZE_BIT), > +}; > + > /* The rseq_len expected by rseq registration is always 32 bytes. */ > enum rseq_len_expected { > RSEQ_LEN_EXPECTED = 32, > @@ -133,8 +142,9 @@ struct rseq { > * > * This field should only be updated by the thread which > * registered this data structure. Read by the kernel. > -* Mainly used for single-stepping through rseq critical sections > -* with debuggers. > +* > +* The RSEQ_CS flags are mainly used for single-stepping through rseq > +* critical sections with debuggers. > * > * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT > * Inhibit instruction sequence block restart on preemption > @@ -145,8 +155,31 @@ struct rseq { > * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE > * Inhibit instruction sequence block restart on migration for > * this thread. > +* > +* - RSEQ_TLS_FLAG_SIZE > +* Extensible struct rseq ABI. This flag should be statically > +* initialized. > */ >
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
- On Jul 14, 2020, at 9:00 AM, Florian Weimer fwei...@redhat.com wrote: > * Mathieu Desnoyers: > >>> How are extensions going to affect the definition of struct rseq, >>> including its alignment? >> >> The alignment will never decrease. If the structure becomes large enough >> its alignment could theoretically increase. Would that be an issue ? > > Telling the compiler that struct is larger than it actually is, or that > it has more alignment than in memory, results in undefined behavior, > even if only fields are accessed in the smaller struct region. > > An increase in alignment from 32 to 64 is perhaps not likely to have > this effect. But the undefined behavior is still there, and has been > observed for mismatches like 8 vs 16. Good points. > >>> As things stand now, glibc 2.32 will make the size and alignment of >>> struct rseq part of its ABI, so it can't really change after that. >> >> Can the size and alignment of a structure be defined as minimum alignment >> and size values ? For instance, those would be invariant for a given glibc >> version (if we always use the internal struct rseq declaration), but could >> be increased in future versions. > > Not if we are talking about a global (TLS) data symbol. No such changes > are possible there. We have some workarounds for symbols that live > exclusively within glibc, but they don't work if there are libraries out > there which interpose the symbol. OK > >>> With a different approach, we can avoid making the symbol size part of >>> the ABI, but then we cannot use the __rseq_abi TLS symbol. As a result, >>> interoperability with early adopters would be lost. >> >> Do you mean with a function "getter", and then keeping that pointer around >> in a per-user TLS ? I would prefer to avoid that because it adds an extra >> pointer dereference on a fast path. > > My choice would have been a function that returns the offset from the > thread pointer (which has to be unchanged regarding all threads). So AFAIU we would have glibc expose a symbol, e.g.: off_t rseq_tls_offset(void); Which would be typically called by user libraries and applications at initialization to get the offset of the struct rseq. They should store it in a static variable so rseq critical sections can use that offset. Is there an arch-agnostic way to get the thread pointer from user-space code ? That would be needed by all rseq critical section implementations. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
* Mathieu Desnoyers: >> How are extensions going to affect the definition of struct rseq, >> including its alignment? > > The alignment will never decrease. If the structure becomes large enough > its alignment could theoretically increase. Would that be an issue ? Telling the compiler that struct is larger than it actually is, or that it has more alignment than in memory, results in undefined behavior, even if only fields are accessed in the smaller struct region. An increase in alignment from 32 to 64 is perhaps not likely to have this effect. But the undefined behavior is still there, and has been observed for mismatches like 8 vs 16. >> As things stand now, glibc 2.32 will make the size and alignment of >> struct rseq part of its ABI, so it can't really change after that. > > Can the size and alignment of a structure be defined as minimum alignment > and size values ? For instance, those would be invariant for a given glibc > version (if we always use the internal struct rseq declaration), but could > be increased in future versions. Not if we are talking about a global (TLS) data symbol. No such changes are possible there. We have some workarounds for symbols that live exclusively within glibc, but they don't work if there are libraries out there which interpose the symbol. >> With a different approach, we can avoid making the symbol size part of >> the ABI, but then we cannot use the __rseq_abi TLS symbol. As a result, >> interoperability with early adopters would be lost. > > Do you mean with a function "getter", and then keeping that pointer around > in a per-user TLS ? I would prefer to avoid that because it adds an extra > pointer dereference on a fast path. My choice would have been a function that returns the offset from the thread pointer (which has to be unchanged regarding all threads). Thanks, Florian
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
- On Jul 14, 2020, at 5:58 AM, Florian Weimer fwei...@redhat.com wrote: > * Mathieu Desnoyers: > >> +/* >> + * Very last field of the structure, to calculate size excluding padding >> + * with offsetof(). >> + */ >> +char end[]; >> } __attribute__((aligned(4 * sizeof(__u64; > > This makes the header incompatible with standard C++. One alternative would be to add a helper to compute the effective size on c++, e.g.: /* Always updated with struct rseq_cs declaration. */ #define rseq_last_field kernel_size static inline size_t rseq_effective_size(void) { return offsetof(struct rseq, rseq_last_field) + sizeof(((struct rseq *)NULL)->rseq_last_field); } > > How are extensions going to affect the definition of struct rseq, > including its alignment? The alignment will never decrease. If the structure becomes large enough its alignment could theoretically increase. Would that be an issue ? > As things stand now, glibc 2.32 will make the size and alignment of > struct rseq part of its ABI, so it can't really change after that. Can the size and alignment of a structure be defined as minimum alignment and size values ? For instance, those would be invariant for a given glibc version (if we always use the internal struct rseq declaration), but could be increased in future versions. > With a different approach, we can avoid making the symbol size part of > the ABI, but then we cannot use the __rseq_abi TLS symbol. As a result, > interoperability with early adopters would be lost. Do you mean with a function "getter", and then keeping that pointer around in a per-user TLS ? I would prefer to avoid that because it adds an extra pointer dereference on a fast path. > One way to avoid this problem would be for every library to register its > own rseq area, of the appropriate size. Then process-wide coordination > in userspace would not be needed. I did propose the code to do just that in my initial rseq implementations, but the idea was shutdown by kernel maintainers because it required the kernel to handle a linked-list of rseq areas per thread, which was more complex within the kernel. Thanks, Mathieu > > Thanks, > Florian -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
* Mathieu Desnoyers: > + /* > + * Very last field of the structure, to calculate size excluding padding > + * with offsetof(). > + */ > + char end[]; > } __attribute__((aligned(4 * sizeof(__u64; This makes the header incompatible with standard C++. How are extensions going to affect the definition of struct rseq, including its alignment? As things stand now, glibc 2.32 will make the size and alignment of struct rseq part of its ABI, so it can't really change after that. With a different approach, we can avoid making the symbol size part of the ABI, but then we cannot use the __rseq_abi TLS symbol. As a result, interoperability with early adopters would be lost. One way to avoid this problem would be for every library to register its own rseq area, of the appropriate size. Then process-wide coordination in userspace would not be needed. Thanks, Florian
[RFC PATCH 2/4] rseq: Allow extending struct rseq
Add a __rseq_abi.flags "RSEQ_TLS_FLAG_SIZE", which indicates support for extending struct rseq. This adds two new fields to struct rseq: user_size and kernel_size. The user_size field allows the size of the __rseq_abi definition (which can be overridden by symbol interposition either by a preloaded library or by the application) to be handed over to the kernel at registration. This registration can be performed by a library, e.g. glibc, which does not know there is interposition taking place. The kernel_size is populated by the kernel when the "RSEQ_TLS_FLAG_SIZE" flag is set in __rseq_abi.flags to the minimum between user_size and the offset of the "end" field of struct rseq as known by the kernel. This allows user-space to query which fields are effectively populated by the kernel. A rseq_size field is added to the task struct to keep track of the "kernel_size" effective for each thread. Signed-off-by: Mathieu Desnoyers --- include/linux/sched.h | 4 include/uapi/linux/rseq.h | 37 -- kernel/rseq.c | 42 +-- 3 files changed, 75 insertions(+), 8 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 692e327d7455..5d61a3197987 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1147,6 +1147,7 @@ struct task_struct { #ifdef CONFIG_RSEQ struct rseq __user *rseq; u32 rseq_sig; + u32 rseq_size; /* * RmW on rseq_event_mask must be performed atomically * with respect to preemption. @@ -1976,10 +1977,12 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) if (clone_flags & CLONE_VM) { t->rseq = NULL; t->rseq_sig = 0; + t->rseq_size = 0; t->rseq_event_mask = 0; } else { t->rseq = current->rseq; t->rseq_sig = current->rseq_sig; + t->rseq_size = current->rseq_size; t->rseq_event_mask = current->rseq_event_mask; } } @@ -1988,6 +1991,7 @@ static inline void rseq_execve(struct task_struct *t) { t->rseq = NULL; t->rseq_sig = 0; + t->rseq_size = 0; t->rseq_event_mask = 0; } diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h index e11d9df5e564..03c0b5e9a859 100644 --- a/include/uapi/linux/rseq.h +++ b/include/uapi/linux/rseq.h @@ -37,6 +37,15 @@ enum rseq_cs_flags { (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT), }; +enum rseq_tls_flags_bit { + /* enum rseq_cs_flags reserves bits 0-2. */ + RSEQ_TLS_FLAG_SIZE_BIT = 3, +}; + +enum rseq_tls_flags { + RSEQ_TLS_FLAG_SIZE = (1U << RSEQ_TLS_FLAG_SIZE_BIT), +}; + /* The rseq_len expected by rseq registration is always 32 bytes. */ enum rseq_len_expected { RSEQ_LEN_EXPECTED = 32, @@ -133,8 +142,9 @@ struct rseq { * * This field should only be updated by the thread which * registered this data structure. Read by the kernel. -* Mainly used for single-stepping through rseq critical sections -* with debuggers. +* +* The RSEQ_CS flags are mainly used for single-stepping through rseq +* critical sections with debuggers. * * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT * Inhibit instruction sequence block restart on preemption @@ -145,8 +155,31 @@ struct rseq { * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE * Inhibit instruction sequence block restart on migration for * this thread. +* +* - RSEQ_TLS_FLAG_SIZE +* Extensible struct rseq ABI. This flag should be statically +* initialized. */ __u32 flags; + /* +* With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, user_size should be +* statically initialized to offsetof(struct rseq, end). +*/ + __u16 user_size; + /* +* With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, if the kernel supports +* extensible struct rseq ABI, the kernel_size field is populated by +* the kernel to the minimum between user_size and the offset of the +* "end" field within the struct rseq supported by the kernel on +* successful registration. Should be initialized to 0. +*/ + __u16 kernel_size; + + /* +* Very last field of the structure, to calculate size excluding padding +* with offsetof(). +*/ + char end[]; } __attribute__((aligned(4 * sizeof(__u64; #endif /* _UAPI_LINUX_RSEQ_H */ diff --git a/kernel/rseq.c b/kernel/rseq.c index a4f86a9d6937..bbc57fc18573 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -96,6 +96,7 @@ static int rseq_update_cpu_id(struct task_struct *t) static int rseq_reset_rseq_cpu_id(struct task_struct *t) { u32 cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED; + u16