Re: [RFC PATCH 0/4] rseq: Introduce extensible struct rseq

2020-07-16 Thread Mathieu Desnoyers
- On Jul 16, 2020, at 9:39 AM, carlos car...@redhat.com wrote:

> On 7/15/20 9:02 AM, Mathieu Desnoyers wrote:
>> At this point, the main question I would like answered is whether
>> it would be acceptable to increase the size and alignment of
>> the __rseq_abi symbol (which will be exposed by glibc) between
>> e.g. glibc 2.32 and 2.33. If it's not possible, then we can
>> find other solutions, for instance using an indirection with
>> a pointer to an extended structure, but this appears to be
>> slightly less efficient.
> 
> The answer is always a soft "maybe" because it depends exactly
> on how we do it and what consequences we are willing to accept
> in the design.
> 
> For example, static applications that call dlopen will fail if
> we increase the alignment beyond 32 because we had to special
> case this scenario. Why did we have to special case it? Because
> the "static" part of the runtime needs to create the initial
> thread's static TLS space, and since it doesn't know apriori
> what will be loaded in the shared library, it needs to make a
> "best guess" at the alignment requirement at startup.
> We need to discuss this and agree that it's OK. We already want
> to deprecate dynamic loading from static applications, so this
> may not be a problem in general, but I hope you see my point.
> That there are corner cases to be considered and ironed out.

Note that I don't foresee we will explicitly need to increase
the alignment value for __rseq_abi beyond 32, but I was merely
asking this for sake of completeness, in case extending struct rseq
beyond a certain limit ever happens to increase the minimum
alignment.

> 
> I want to see a detailed design document explaining the various
> compatibility issues and how we solve them along with the way
> the extension mechanism would work and how it would be compliant
> with C/C++ language rules in userspace without adding undue burden
> of potentially having to use atomic instructions all the time.
> This includes discussing how the headers change. We should also
> talk out the options for symbol versioning and their consequences.
>  
> I haven't seen enough details, and there isn't really enough
> time to discuss this. I think it is *great* that we are discussing
> it, but it's safest if we revert rseq, finish the discussion,
> and then finalize the inclusion for 2.33 with these details
> ironed out.

Yes, absolutely.

> 
> I feel like we've made all the technical process we need to actually
> include rseq in glibc, but this discussion, and the google example
> (even if it doesn't match our use case) shows that if we spend another
> month hammering out the extension details could yield something we
> can use for years to come while we work out other details e.g. cpu_opv.

Indeed. Note that the current approach proposed to replace cpu_opv
is "sched_pair_cpu", ref. 
https://lore.kernel.org/lkml/20200619202516.7109-1-mathieu.desnoy...@efficios.com/

> I can set aside time in the next month to write up such a document
> and discuss these issues with you and Florian. The text would form
> even more of the language we'd have to include in the man page for
> the feature.

I'll do my best to secure some time to work with you on this in the
next month, but I will really have to focus on other projects which
I had to delay to make sure the rseq integration was ready for glibc
2.32.

> In the meantime I think we should revert rseq in glibc and take
> our time to hash this out without the looming deadline of August 1st
> for the ABI going out the door.
> 
> I know this is disappointing, but I think in a month you'll look
> back at this, we'll have Fedora Rawhide using the new extensible
> version (and you'll be able to point people at that), and we'll
> only be 5 months away from an official release with extensible
> rseq.

If this delay gives us a future-proof extensible rseq ABI, I'm absolutely
for it!

> Could you please respond to Florian's request to revert here?
> https://sourceware.org/pipermail/libc-alpha/2020-July/116368.html
> 
> I'm looking for a Signed-off-by from you that you're OK with
> reverting.

Will do, thanks!

Mathieu


> 
> --
> Cheers,
> Carlos.

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [RFC PATCH 0/4] rseq: Introduce extensible struct rseq

2020-07-16 Thread Carlos O'Donell
On 7/15/20 9:02 AM, Mathieu Desnoyers wrote:
> At this point, the main question I would like answered is whether
> it would be acceptable to increase the size and alignment of
> the __rseq_abi symbol (which will be exposed by glibc) between
> e.g. glibc 2.32 and 2.33. If it's not possible, then we can
> find other solutions, for instance using an indirection with
> a pointer to an extended structure, but this appears to be
> slightly less efficient.

The answer is always a soft "maybe" because it depends exactly
on how we do it and what consequences we are willing to accept
in the design.

For example, static applications that call dlopen will fail if
we increase the alignment beyond 32 because we had to special
case this scenario. Why did we have to special case it? Because
the "static" part of the runtime needs to create the initial
thread's static TLS space, and since it doesn't know apriori
what will be loaded in the shared library, it needs to make a
"best guess" at the alignment requirement at startup.
We need to discuss this and agree that it's OK. We already want
to deprecate dynamic loading from static applications, so this
may not be a problem in general, but I hope you see my point.
That there are corner cases to be considered and ironed out.

I want to see a detailed design document explaining the various
compatibility issues and how we solve them along with the way
the extension mechanism would work and how it would be compliant
with C/C++ language rules in userspace without adding undue burden
of potentially having to use atomic instructions all the time.
This includes discussing how the headers change. We should also
talk out the options for symbol versioning and their consequences.
  
I haven't seen enough details, and there isn't really enough
time to discuss this. I think it is *great* that we are discussing
it, but it's safest if we revert rseq, finish the discussion,
and then finalize the inclusion for 2.33 with these details
ironed out.

I feel like we've made all the technical process we need to actually
include rseq in glibc, but this discussion, and the google example
(even if it doesn't match our use case) shows that if we spend another
month hammering out the extension details could yield something we
can use for years to come while we work out other details e.g. cpu_opv.

I can set aside time in the next month to write up such a document
and discuss these issues with you and Florian. The text would form
even more of the language we'd have to include in the man page for
the feature.

In the meantime I think we should revert rseq in glibc and take
our time to hash this out without the looming deadline of August 1st
for the ABI going out the door.

I know this is disappointing, but I think in a month you'll look
back at this, we'll have Fedora Rawhide using the new extensible
version (and you'll be able to point people at that), and we'll
only be 5 months away from an official release with extensible
rseq.

Could you please respond to Florian's request to revert here?
https://sourceware.org/pipermail/libc-alpha/2020-July/116368.html

I'm looking for a Signed-off-by from you that you're OK with
reverting.

-- 
Cheers,
Carlos.



Re: [RFC PATCH 0/4] rseq: Introduce extensible struct rseq

2020-07-15 Thread Mathieu Desnoyers
- On Jul 15, 2020, at 11:12 AM, Florian Weimer fwei...@redhat.com wrote:

> * Carlos O'Donell:
> 
>> On 7/13/20 11:03 PM, Mathieu Desnoyers wrote:
>>> Recent discussion led to a solution for extending struct rseq. This is
>>> an implementation of the proposed solution.
>>> 
>>> Now is a good time to agree on this scheme before the release of glibc
>>> 2.32, just in case there are small details to fix on the user-space
>>> side in order to allow extending struct rseq.
>>
>> Adding extensibility to the rseq registration process would be great,
>> but we are out of time for the glibc 2.32 release.
>>
>> Should we revert rseq for glibc 2.32 and spend quality time discussing
>> the implications of an extensible design, something that Google already
>> says they are doing?
>>
>> We can, with a clear head, and an agreed upon extension mechanism
>> include rseq in glibc 2.33 (release scheduled for Feburary 1st 2021).
>> We release time boxed every 6 months, no deviation, so you know when
>> your next merge window will be.
>>
>> We have already done the hard work of fixing the nesting signal
>> handler issues, and glibc integration. If we revert today that will
>> also give time for Firefox and Chrome to adjust their sandboxes.
>>
>> Do you wish to go forward with rseq as we have it in glibc 2.32,
>> or do you wish to revert rseq from glibc 2.32, discuss the extension
>> mechanism, and put it back into glibc 2.33 with adjustments?
> 
> I posted the glibc revert:
> 
>  
> 
> I do not think we have any other choice at this point.

This is indeed the safe course of action.

Let's hope the overall interest about rseq will be sufficient to justify
integrating extensibility support in the rseq system call ABI, otherwise we
have a catch-22 situation where everything is stalled again, due to further
progress on rseq kernel features awaiting user feedback on the existing
implementation, which will never come because the integration of coordinated
use across libraries is awaiting further development at the kernel level.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [RFC PATCH 0/4] rseq: Introduce extensible struct rseq

2020-07-15 Thread Florian Weimer
* Carlos O'Donell:

> On 7/13/20 11:03 PM, Mathieu Desnoyers wrote:
>> Recent discussion led to a solution for extending struct rseq. This is
>> an implementation of the proposed solution.
>> 
>> Now is a good time to agree on this scheme before the release of glibc
>> 2.32, just in case there are small details to fix on the user-space
>> side in order to allow extending struct rseq.
>
> Adding extensibility to the rseq registration process would be great,
> but we are out of time for the glibc 2.32 release.
>
> Should we revert rseq for glibc 2.32 and spend quality time discussing
> the implications of an extensible design, something that Google already
> says they are doing?
>
> We can, with a clear head, and an agreed upon extension mechanism
> include rseq in glibc 2.33 (release scheduled for Feburary 1st 2021).
> We release time boxed every 6 months, no deviation, so you know when
> your next merge window will be.
>
> We have already done the hard work of fixing the nesting signal
> handler issues, and glibc integration. If we revert today that will 
> also give time for Firefox and Chrome to adjust their sandboxes.
>
> Do you wish to go forward with rseq as we have it in glibc 2.32,
> or do you wish to revert rseq from glibc 2.32, discuss the extension
> mechanism, and put it back into glibc 2.33 with adjustments?

I posted the glibc revert:

  

I do not think we have any other choice at this point.

Thanks,
Florian



Re: [RFC PATCH 0/4] rseq: Introduce extensible struct rseq

2020-07-15 Thread Mathieu Desnoyers
- On Jul 14, 2020, at 4:55 PM, carlos car...@redhat.com wrote:

> On 7/13/20 11:03 PM, Mathieu Desnoyers wrote:
>> Recent discussion led to a solution for extending struct rseq. This is
>> an implementation of the proposed solution.
>> 
>> Now is a good time to agree on this scheme before the release of glibc
>> 2.32, just in case there are small details to fix on the user-space
>> side in order to allow extending struct rseq.
> 
> Adding extensibility to the rseq registration process would be great,
> but we are out of time for the glibc 2.32 release.

Of course, and my goal is not to add this support for extensibility
before glibc 2.32, but merely to see if we need to change anything in
the way it uses rseq today (before the release) in order to facilitate
extensibility in the future.

> Should we revert rseq for glibc 2.32 and spend quality time discussing
> the implications of an extensible design, something that Google already
> says they are doing?

Google's approach is limited to contexts simpler than multiple unrelated
libraries scenarios. Peter Oskolkov stated as a follow-up that my
extension approach would be one way to deal with problems associated
with sharing __rseq_abi between unrelated libraries:

https://lore.kernel.org/lkml/capnvh5ficcjpyelj_ciwzfro4fasvxznhpfkxjhjwjirxdj...@mail.gmail.com/

The fact that Google already have their own rseq extensions internally
confirms that planning for extensibility is needed.

> We can, with a clear head, and an agreed upon extension mechanism
> include rseq in glibc 2.33 (release scheduled for Feburary 1st 2021).
> We release time boxed every 6 months, no deviation, so you know when
> your next merge window will be.
> 
> We have already done the hard work of fixing the nesting signal
> handler issues, and glibc integration. If we revert today that will
> also give time for Firefox and Chrome to adjust their sandboxes.
> 
> Do you wish to go forward with rseq as we have it in glibc 2.32,
> or do you wish to revert rseq from glibc 2.32, discuss the extension
> mechanism, and put it back into glibc 2.33 with adjustments?

So here we have a catch-22 situation. Linus wants to see how rseq
is being used before accepting additional features (ref.
https://lore.kernel.org/lkml/CAHk-=wjk-2c4xvwjdzc-bs9hbgvy-p7assnkkphggr5qdox...@mail.gmail.com/).
This lack of ability to allow user-space to make any large-scale use
of the rseq system call in a coordinated fashion blocks wide use of rseq.
This coordination is supposed to be done by glibc, and I told
every user-space project maintainer who contacted me to hold off
using rseq until it is integrated into glibc. "tcmalloc" from Google
is the exception because they do not care about ABI compatibility with
other libraries (they are OK with a breakage and requiring upgrade).

The process I'm going through right now is checking what are our
options for extending rseq starting from the current ABI, just
to see if we are painting ourselves in a corner with the current
glibc integration. However, if we postpone integration of rseq
into glibc because of possible future extensibility features, those
may never happen because of the lack of usage feedback, due of lack
of users, due to lack of coordinated ABI registration.

At this point, the main question I would like answered is whether
it would be acceptable to increase the size and alignment of
the __rseq_abi symbol (which will be exposed by glibc) between
e.g. glibc 2.32 and 2.33. If it's not possible, then we can
find other solutions, for instance using an indirection with
a pointer to an extended structure, but this appears to be
slightly less efficient.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [RFC PATCH 0/4] rseq: Introduce extensible struct rseq

2020-07-14 Thread Carlos O'Donell
On 7/13/20 11:03 PM, Mathieu Desnoyers wrote:
> Recent discussion led to a solution for extending struct rseq. This is
> an implementation of the proposed solution.
> 
> Now is a good time to agree on this scheme before the release of glibc
> 2.32, just in case there are small details to fix on the user-space
> side in order to allow extending struct rseq.

Adding extensibility to the rseq registration process would be great,
but we are out of time for the glibc 2.32 release.

Should we revert rseq for glibc 2.32 and spend quality time discussing
the implications of an extensible design, something that Google already
says they are doing?

We can, with a clear head, and an agreed upon extension mechanism
include rseq in glibc 2.33 (release scheduled for Feburary 1st 2021).
We release time boxed every 6 months, no deviation, so you know when
your next merge window will be.

We have already done the hard work of fixing the nesting signal
handler issues, and glibc integration. If we revert today that will 
also give time for Firefox and Chrome to adjust their sandboxes.

Do you wish to go forward with rseq as we have it in glibc 2.32,
or do you wish to revert rseq from glibc 2.32, discuss the extension
mechanism, and put it back into glibc 2.33 with adjustments?

-- 
Cheers,
Carlos.



[RFC PATCH 0/4] rseq: Introduce extensible struct rseq

2020-07-13 Thread Mathieu Desnoyers
Recent discussion led to a solution for extending struct rseq. This is
an implementation of the proposed solution.

Now is a good time to agree on this scheme before the release of glibc
2.32, just in case there are small details to fix on the user-space
side in order to allow extending struct rseq.

Thanks,

Mathieu

Mathieu Desnoyers (4):
  selftests: rseq: Use fixed value as rseq_len parameter
  rseq: Allow extending struct rseq
  selftests: rseq: define __rseq_abi with extensible size
  selftests: rseq: print rseq extensible size in basic test

 include/linux/sched.h |  4 +++
 include/uapi/linux/rseq.h | 42 --
 kernel/rseq.c | 44 +++
 tools/testing/selftests/rseq/basic_test.c | 15 
 tools/testing/selftests/rseq/rseq.c   |  8 +++--
 5 files changed, 101 insertions(+), 12 deletions(-)

-- 
2.17.1