Re: [PATCH][RFA/RFC] Stack clash mitigation 0/9

2017-07-18 Thread Jeff Law
On 07/13/2017 03:26 AM, Christophe Lyon wrote:
> I have executed a validation of your patch series on aarch64 and arm
> targets, and I have minor comments.
> 
> On arm, all new tests are unsupported, as expected.
Good.

> On aarch64-linux, the new tests pass, but they fail on aarch64-elf:
>   - FAIL appears  [ => FAIL]:
That's really strange.  I just tried that here and the only two failures
I got were stack-check-7 and stack-check-8 which failed because I didn't
have a cross assembler installed.

> 
> 
> As I noticed that you used dg-require-effective-target
> stack_clash_protected instead of
> dg-require-stack-check "clash" that I recently committed, I also tried
> with the later.
Yea.  Ultimately I decided that unless the target had explicitly added
support for stack clash protection that the tests should be considered
UNRESOLVED, even if the port had partial protection (as is the case with
ARM).  Thus I ended up with a new effective target test.  I should have
mentioned that in the cover letter.

Thanks,


Jeff


Re: [PATCH][RFA/RFC] Stack clash mitigation 0/9

2017-07-13 Thread Segher Boessenkool
On Thu, Jul 13, 2017 at 05:10:33PM -0600, Jeff Law wrote:
>    2. ABI mandates that *sp always contain a backchain pointer (ppc)
> >>>
> >>> In the ELFv2 ABI a backchain is not required.  GCC still always has
> >>> one afaik.  I'll find out more.
> >> Please do.  I was under the impression it was mandated by the earlier
> >> ABIs as well.  If it isn't, then I don't think we can depend on it for
> >> the older ABIs.
> > 
> > I checked most ABIs, and all but ELFv2 require it.  You can assume we
> > require it everywhere (we do assume it currently, and there is no
> > intention to change this).  The statement in the ABI surprised me
> > yesterday, sorry for panicking.
> Y'all are the experts here.  It would be advisable to get the ABI
> documents tweaked if indeed we are going to rely on the existence of the
> backchain as an implicit probe.

Yes, we'll deal with whatever is needed here, don't worry :-)


Segher


Re: [PATCH][RFA/RFC] Stack clash mitigation 0/9

2017-07-13 Thread Jeff Law
On 07/13/2017 04:48 PM, Segher Boessenkool wrote:
> On Thu, Jul 13, 2017 at 11:28:17AM -0600, Jeff Law wrote:
>> On 07/12/2017 04:44 PM, Segher Boessenkool wrote:
>>> On Tue, Jul 11, 2017 at 03:19:36PM -0600, Jeff Law wrote:
 Examples of implicit probes include
>>>
   2. ABI mandates that *sp always contain a backchain pointer (ppc)
>>>
>>> In the ELFv2 ABI a backchain is not required.  GCC still always has
>>> one afaik.  I'll find out more.
>> Please do.  I was under the impression it was mandated by the earlier
>> ABIs as well.  If it isn't, then I don't think we can depend on it for
>> the older ABIs.
> 
> I checked most ABIs, and all but ELFv2 require it.  You can assume we
> require it everywhere (we do assume it currently, and there is no
> intention to change this).  The statement in the ABI surprised me
> yesterday, sorry for panicking.
Y'all are the experts here.  It would be advisable to get the ABI
documents tweaked if indeed we are going to rely on the existence of the
backchain as an implicit probe.  Otherwise we end up in the same
scenario as aarch64 where we have to make some unpleasant assumptions.


> 
>> THe code we generate for alloca was so awful it's hard to see how
>> hitting each page once would matter either.  *However* I was looking at
>> x86 in this case and due to potential stack realignments x86's alloca
>> code might be notably worse than others for constant sizes.
> 
> There is generic code that aligns too often, too.  You might be seeing
> that same thing.
Exactly.  It's the generic code that's driven by various macros in the
x86 backend.


> 
>> There's further improvements that could be made as well.   It ought to
>> be possible to write an optimizer pass that uses some of the ideas from
>> DSE and SLSR to identify explicit probes that are made redundant by
>> nearby implicit probes -- this would seem most useful for the dynamic space.
>>
>> The problem is we'd want to do that in gimple, but probing of the
>> dynamic space happens at the gimple/rtl border.  So we'd probably want
>> to make probing happen earlier to expose stuff at the gimple level.
> 
> This would just get rid of one probe per dynamic allocation, correct?
> Doesn't seem worth complicating anything for.There's enough implicit probes 
> lying around in the IL that I suspect we
could likely prove the first and last are unnecessary on a reasonably
consistent basis.  It didn't seem critical to address at this stage, but
something we could look at later if we feel the need.

THe other thing I've pondered lightly would be to attach frame & probe
info to decl nodes, perhaps doing some IPA propagation.

THe idea here is if we have a function that is static to the CU, but its
not a good inline candidate, we can use information about the callers to
build a less pessimistic state at function entry.  This would likely
only help aarch64 and s390.  It would also fall into something we could
explore in the future if the need arises.


Thanks for all the feedback,
Jeff



Re: [PATCH][RFA/RFC] Stack clash mitigation 0/9

2017-07-13 Thread Segher Boessenkool
On Thu, Jul 13, 2017 at 11:28:17AM -0600, Jeff Law wrote:
> On 07/12/2017 04:44 PM, Segher Boessenkool wrote:
> > On Tue, Jul 11, 2017 at 03:19:36PM -0600, Jeff Law wrote:
> >> Examples of implicit probes include
> > 
> >>   2. ABI mandates that *sp always contain a backchain pointer (ppc)
> > 
> > In the ELFv2 ABI a backchain is not required.  GCC still always has
> > one afaik.  I'll find out more.
> Please do.  I was under the impression it was mandated by the earlier
> ABIs as well.  If it isn't, then I don't think we can depend on it for
> the older ABIs.

I checked most ABIs, and all but ELFv2 require it.  You can assume we
require it everywhere (we do assume it currently, and there is no
intention to change this).  The statement in the ABI surprised me
yesterday, sorry for panicking.

> THe code we generate for alloca was so awful it's hard to see how
> hitting each page once would matter either.  *However* I was looking at
> x86 in this case and due to potential stack realignments x86's alloca
> code might be notably worse than others for constant sizes.

There is generic code that aligns too often, too.  You might be seeing
that same thing.

> There's further improvements that could be made as well.   It ought to
> be possible to write an optimizer pass that uses some of the ideas from
> DSE and SLSR to identify explicit probes that are made redundant by
> nearby implicit probes -- this would seem most useful for the dynamic space.
> 
> The problem is we'd want to do that in gimple, but probing of the
> dynamic space happens at the gimple/rtl border.  So we'd probably want
> to make probing happen earlier to expose stuff at the gimple level.

This would just get rid of one probe per dynamic allocation, correct?
Doesn't seem worth complicating anything for.


Segher


Re: [PATCH][RFA/RFC] Stack clash mitigation 0/9

2017-07-13 Thread Jakub Jelinek
On Thu, Jul 13, 2017 at 11:28:17AM -0600, Jeff Law wrote:
> On 07/12/2017 04:44 PM, Segher Boessenkool wrote:
> > On Tue, Jul 11, 2017 at 03:19:36PM -0600, Jeff Law wrote:
> >> Examples of implicit probes include
> > 
> >>   2. ABI mandates that *sp always contain a backchain pointer (ppc)
> > 
> > In the ELFv2 ABI a backchain is not required.  GCC still always has
> > one afaik.  I'll find out more.
> Please do.  I was under the impression it was mandated by the earlier
> ABIs as well.  If it isn't, then I don't think we can depend on it for
> the older ABIs.
> 
> That wouldn't be the end of the world -- it's pretty clear that ppc64le
> is the future and we'd get good code there.  I wouldn't lose much sleep
> if ppc32 and ppc64 big endian had a less efficient probing scheme.

??  Segher said in ELFv2 ABI it is not required, so that would mean
it does affect ppc64le and does not affect ppc32 or ppc64.
So, we wouldn't get good code for ppc64le and would get one for ppc32 and
ppc64.

Jakub


Re: [PATCH][RFA/RFC] Stack clash mitigation 0/9

2017-07-13 Thread Jeff Law
On 07/13/2017 11:32 AM, Jakub Jelinek wrote:
> On Thu, Jul 13, 2017 at 11:28:17AM -0600, Jeff Law wrote:
>> On 07/12/2017 04:44 PM, Segher Boessenkool wrote:
>>> On Tue, Jul 11, 2017 at 03:19:36PM -0600, Jeff Law wrote:
 Examples of implicit probes include
>>>
   2. ABI mandates that *sp always contain a backchain pointer (ppc)
>>>
>>> In the ELFv2 ABI a backchain is not required.  GCC still always has
>>> one afaik.  I'll find out more.
>> Please do.  I was under the impression it was mandated by the earlier
>> ABIs as well.  If it isn't, then I don't think we can depend on it for
>> the older ABIs.
>>
>> That wouldn't be the end of the world -- it's pretty clear that ppc64le
>> is the future and we'd get good code there.  I wouldn't lose much sleep
>> if ppc32 and ppc64 big endian had a less efficient probing scheme.
> 
> ??  Segher said in ELFv2 ABI it is not required, so that would mean
> it does affect ppc64le and does not affect ppc32 or ppc64.
> So, we wouldn't get good code for ppc64le and would get one for ppc32 and
> ppc64.
Opps.  Mis-read.  Got it totally backwards

Not good.  Waiting on Segher for clarification, but will start thinking
about better options than punting :-)



jeff


Re: [PATCH][RFA/RFC] Stack clash mitigation 0/9

2017-07-13 Thread Jeff Law
On 07/12/2017 04:44 PM, Segher Boessenkool wrote:
> On Tue, Jul 11, 2017 at 03:19:36PM -0600, Jeff Law wrote:
>> Examples of implicit probes include
> 
>>   2. ABI mandates that *sp always contain a backchain pointer (ppc)
> 
> In the ELFv2 ABI a backchain is not required.  GCC still always has
> one afaik.  I'll find out more.
Please do.  I was under the impression it was mandated by the earlier
ABIs as well.  If it isn't, then I don't think we can depend on it for
the older ABIs.

That wouldn't be the end of the world -- it's pretty clear that ppc64le
is the future and we'd get good code there.  I wouldn't lose much sleep
if ppc32 and ppc64 big endian had a less efficient probing scheme.

We'd set up a last_probe_offset tracker like we do for aarch & s390.
For ppc64le it's initial state would be zero.  For ppc32 and ppc64 big
endian the initial state would be PROBE_OFFSET - STACK_BOUNDARY /
UNITS_PER_WORD.  Depending on cost/benefit analysis we could try to
optimize those ports, but given overall directions it just might not be
worth the effort.

> 
>> To get a sense of overhead, just 1.5% of routines in glibc need probing
>> in their prologues (x86) in the testing I performed.  IIRC each and
>> every one of those routines needed just 1-4 inlined probes.
>>
>> Significantly more functions need alloca space probed (IIRC ~5%), but
>> given the amazingly inefficient alloca code, I can't believe anyone will
>> ever notice the probing overhead.
> 
> That is quite a lot of functions IMO, but it's just one stor per page
> (or per alloca), and supposedly you'll store to that stack anyway (or
> it is stupid slow code in the first place).  Did you measure any real
> timings?
Haven't measured any real timings.  We hit so few functions with the
prologue probes it's hard to see how they could end up being measurable.

THe code we generate for alloca was so awful it's hard to see how
hitting each page once would matter either.  *However* I was looking at
x86 in this case and due to potential stack realignments x86's alloca
code might be notably worse than others for constant sizes.

There's further improvements that could be made as well.   It ought to
be possible to write an optimizer pass that uses some of the ideas from
DSE and SLSR to identify explicit probes that are made redundant by
nearby implicit probes -- this would seem most useful for the dynamic space.

The problem is we'd want to do that in gimple, but probing of the
dynamic space happens at the gimple/rtl border.  So we'd probably want
to make probing happen earlier to expose stuff at the gimple level.


Jeff


Re: [PATCH][RFA/RFC] Stack clash mitigation 0/9

2017-07-13 Thread Michael Matz
Hello,

On Tue, 11 Jul 2017, Jeff Law wrote:

> This patch series is designed to mitigate the problems exposed by the
> stack-clash exploits.  As I've noted before, the way to address this
> class of problems is via a good stack probing strategy.
> 
> This has taken much longer than expected to pull together for
> submission.  Sorry about that.  However, the delay has led to some clear
> improvements on ppc, aarch64 and s390 as well as tests which aren't
> eyeballed, but instead are part of the testsuite.
> 
> This series introduces -fstack-check=clash which is a variant of
> -fstack-check designed to prevent "jumping the stack" as seen in the
> stack-clash exploits.

FWIW, this is the patch we're going to use in our older compilers (back up 
to 4.1, meh) in one or another variant.  It only probes for dynamic 
allocations, not for static stack frames.  And it probes more often than 
strictly necessary.  But on the plus side it is completely target 
independend (except STACK_GROWS_DOWNWARD, which it doesn't handle because 
we don't have hppa) and only 70 lines, doesn't interact with any of the 
hairy existing stack checking code and it's easy to see that it does the 
right thing :)

(This particular variant is for 4.3, but the code of 
allocate_dynamic_stack_space() is essentially stable since a very long 
time, which is another plus thing of this patch, it's easy to back- and 
forward-port :) )

I'm not suggesting this for inclusion, but in case others are in a similar 
position of having to deal with old compilers and are fine with the above, 
they might find this useful.


Ciao,
Michael.
--- gcc/common.opt.mm   2017-06-26 16:07:55.0 +0200
+++ gcc/common.opt  2017-06-26 16:05:27.0 +0200
@@ -966,6 +966,10 @@ fstack-check
 Common Report Var(flag_stack_check)
 Insert stack checking code into the program
 
+fstack-probe
+Common Report Var(flag_stack_probe)
+Insert stack checking code into the program
+
 fstack-limit
 Common
 
--- gcc/explow.c.mm 2008-11-05 22:19:47.0 +0100
+++ gcc/explow.c2017-06-26 17:31:25.0 +0200
@@ -1071,6 +1071,9 @@ update_nonlocal_goto_save_area (void)
 rtx
 allocate_dynamic_stack_space (rtx size, rtx target, int known_align)
 {
+  rtx loop_lab, end_lab, last_size;
+  int probe_pass = 0;
+
   /* If we're asking for zero bytes, it doesn't matter what we point
  to since we can't dereference it.  But return a reasonable
  address anyway.  */
@@ -1203,6 +1206,24 @@ allocate_dynamic_stack_space (rtx size,
 
   mark_reg_pointer (target, known_align);
 
+  if (flag_stack_probe)
+{
+  size = copy_to_mode_reg (Pmode, convert_to_mode (Pmode, size, 1));
+  loop_lab = gen_label_rtx ();
+  end_lab = gen_label_rtx ();
+  emit_label (loop_lab);
+#ifndef STACK_GROWS_DOWNWARD
+#error stack must grow down
+#endif
+  emit_cmp_and_jump_insns (size, GEN_INT (STACK_CHECK_PROBE_INTERVAL), LTU,
+  NULL_RTX, Pmode, 1, end_lab);
+  last_size = expand_binop (Pmode, sub_optab, size, GEN_INT 
(STACK_CHECK_PROBE_INTERVAL), size,
+   1, OPTAB_WIDEN);
+  gcc_assert (last_size == size);
+  size = GEN_INT (STACK_CHECK_PROBE_INTERVAL);
+}
+
+again:
   /* Perform the required allocation from the stack.  Some systems do
  this differently than simply incrementing/decrementing from the
  stack pointer, such as acquiring the space by calling malloc().  */
@@ -1264,6 +1285,15 @@ allocate_dynamic_stack_space (rtx size,
   emit_move_insn (target, virtual_stack_dynamic_rtx);
 #endif
 }
+  if (flag_stack_probe && probe_pass == 0)
+{
+  probe_pass = 1;
+  emit_stack_probe (target);
+  emit_jump (loop_lab);
+  emit_label (end_lab);
+  size = last_size;
+  goto again;
+}
 
   if (MUST_ALIGN)
 {
@@ -1280,6 +1310,8 @@ allocate_dynamic_stack_space (rtx size,
GEN_INT (BIGGEST_ALIGNMENT / BITS_PER_UNIT),
NULL_RTX, 1);
 }
+  if (flag_stack_probe)
+emit_stack_probe (target);
 
   /* Record the new stack level for nonlocal gotos.  */
   if (cfun->nonlocal_goto_save_area != 0)


Re: [PATCH][RFA/RFC] Stack clash mitigation 0/9

2017-07-13 Thread Christophe Lyon
Hi Jeff,


On 11 July 2017 at 23:19, Jeff Law  wrote:
> This patch series is designed to mitigate the problems exposed by the
> stack-clash exploits.  As I've noted before, the way to address this
> class of problems is via a good stack probing strategy.
>
> This has taken much longer than expected to pull together for
> submission.  Sorry about that.  However, the delay has led to some clear
> improvements on ppc, aarch64 and s390 as well as tests which aren't
> eyeballed, but instead are part of the testsuite.
>
> This series introduces -fstack-check=clash which is a variant of
> -fstack-check designed to prevent "jumping the stack" as seen in the
> stack-clash exploits.
>
>
>
> The key ideas here:
>
> Individual stack allocations are never more than PROBE_INTERVAL in size
> (4k by default).  Larger allocations are broken up into PROBE_INTERVAL
> chunks and each chunk is probed as it is allocated.
>
> No combination of stack allocations can exceed PROBE_INTERVAL bytes
> without probing.  ie, if we have an allocation of 2k and a later
> allocation of 3k, then there must be a stack probe into the first 4k of
> allocated space that executes between the two allocations.
>
> We must consider an environment where code compiled without stack
> probing is linked statically or dynamically with code that is compiled
> with stack probing.  That is actually the most likely scenario for an
> indefinite period of time.  Thus we have to consider the possibility of
> a hostile caller in the call stack.
>
> We need not guarantee enough stack space to handle a signal if a probe
> hits the guard page.
>
> --
>
>
> Probes come in two forms.  They can be explicit or implicit.
>
> Explicit probes are emitted by prologue generation or dynamic stack
> allocation routines.  These are net new code and avoiding them when it
> is safe to do so helps reduce the overhead of stack probing.
>
> Implicit probes are "probes" that occur as a natural side effect of the
> existing code or guarantees provided by the ABI.  They are essentially
> free and may allow the compiler to avoid some explicit probes.
>
> Examples of implicit probes include
>
>   1. ISA which pushes the return address onto the stack in a call
>  instruction (x86)
>
>   2. ABI mandates that *sp always contain a backchain pointer (ppc)
>
>   3. Prologue stores a register into the stack.  We exploit this on
>  aarch64 and s390.  On s390 register saves go into the caller's
>  stack frame, on aarch64 register saves hit newly allocated
>  space in the callee's frame.  We can exploit both to avoid
>  some explicit probing.
>
> I've done implementations for x86, ppc, aarch64 and s390 and the
> included tests have been checked against those targets
> ($arch-unknown-linux).
>
> This patch does not change the probing insn itself.  We've had various
> discussions on-list on a better probe insn for x86.  I think the
> consensus is to avoid read-modify-write insns.  A testb may ultimately
> be best.  This is IMHO an independent implementation detail for each
> target and should be handled as a follow-up.  But if folks insist, it's
> a trivial change to make as it doesn't fundamentally affect how all this
> stuff works.
>
> Other targets that have an existing -fstack-check=specific, but for
> which I have not added a -fstack-check=clash implementation get partial
> protection against stack clash as well.  This is a side effect of
> keeping some of the early code we'd hoped to use to avoid writing a new
> probe implementation for each target.
>
> --
>
> To get a sense of overhead, just 1.5% of routines in glibc need probing
> in their prologues (x86) in the testing I performed.  IIRC each and
> every one of those routines needed just 1-4 inlined probes.
>
> Significantly more functions need alloca space probed (IIRC ~5%), but
> given the amazingly inefficient alloca code, I can't believe anyone will
> ever notice the probing overhead.
>
> --
>
>
> Patch #1 contains the new option -fstack-check=clash and some dejagnu
> infrastructure  (most of which is unused until later patches)
>
> Patch #2 adds the new style probing support to the alloca/vla area and
> indirects uses of STACK_CHECK_PROTECT through get_stack_check_protect.
>
> Patch #3 Add some generic dumping support for use by the target prologue
> expanders
>
> Patch #4 introduces the x86 specific bits
>
> Patch #5 addresses combine-stack-adjustments interactions with
> -fstack-check=clash
>
> Patch #6 adds PPC support
>
> Patch #7 adds aarch64 support
>
> Patch #8 adds s390 support
>
> The patch series has been bootstrapped and regression tested on
> x86_64-linux-gnu
> ppc64-linux-gnu
> ppc64le-linux-gnu
> aarch64-linux-gnu
> s390x-linux-gnu (another respin of this is still in-progress)
>
> Additionally, each target has been bootstrapped with -fstack-check=clash
> enabled by default, the testsuite run and checked for glaring errors.
>
> Earlier versions have also bootstrapped on 32bit PPC and 

Re: [PATCH][RFA/RFC] Stack clash mitigation 0/9

2017-07-12 Thread Segher Boessenkool
On Tue, Jul 11, 2017 at 03:19:36PM -0600, Jeff Law wrote:
> Examples of implicit probes include

>   2. ABI mandates that *sp always contain a backchain pointer (ppc)

In the ELFv2 ABI a backchain is not required.  GCC still always has
one afaik.  I'll find out more.

> To get a sense of overhead, just 1.5% of routines in glibc need probing
> in their prologues (x86) in the testing I performed.  IIRC each and
> every one of those routines needed just 1-4 inlined probes.
> 
> Significantly more functions need alloca space probed (IIRC ~5%), but
> given the amazingly inefficient alloca code, I can't believe anyone will
> ever notice the probing overhead.

That is quite a lot of functions IMO, but it's just one stor per page
(or per alloca), and supposedly you'll store to that stack anyway (or
it is stupid slow code in the first place).  Did you measure any real
timings?


Segher


[PATCH][RFA/RFC] Stack clash mitigation 0/9

2017-07-11 Thread Jeff Law
This patch series is designed to mitigate the problems exposed by the
stack-clash exploits.  As I've noted before, the way to address this
class of problems is via a good stack probing strategy.

This has taken much longer than expected to pull together for
submission.  Sorry about that.  However, the delay has led to some clear
improvements on ppc, aarch64 and s390 as well as tests which aren't
eyeballed, but instead are part of the testsuite.

This series introduces -fstack-check=clash which is a variant of
-fstack-check designed to prevent "jumping the stack" as seen in the
stack-clash exploits.



The key ideas here:

Individual stack allocations are never more than PROBE_INTERVAL in size
(4k by default).  Larger allocations are broken up into PROBE_INTERVAL
chunks and each chunk is probed as it is allocated.

No combination of stack allocations can exceed PROBE_INTERVAL bytes
without probing.  ie, if we have an allocation of 2k and a later
allocation of 3k, then there must be a stack probe into the first 4k of
allocated space that executes between the two allocations.

We must consider an environment where code compiled without stack
probing is linked statically or dynamically with code that is compiled
with stack probing.  That is actually the most likely scenario for an
indefinite period of time.  Thus we have to consider the possibility of
a hostile caller in the call stack.

We need not guarantee enough stack space to handle a signal if a probe
hits the guard page.

--


Probes come in two forms.  They can be explicit or implicit.

Explicit probes are emitted by prologue generation or dynamic stack
allocation routines.  These are net new code and avoiding them when it
is safe to do so helps reduce the overhead of stack probing.

Implicit probes are "probes" that occur as a natural side effect of the
existing code or guarantees provided by the ABI.  They are essentially
free and may allow the compiler to avoid some explicit probes.

Examples of implicit probes include

  1. ISA which pushes the return address onto the stack in a call
 instruction (x86)

  2. ABI mandates that *sp always contain a backchain pointer (ppc)

  3. Prologue stores a register into the stack.  We exploit this on
 aarch64 and s390.  On s390 register saves go into the caller's
 stack frame, on aarch64 register saves hit newly allocated
 space in the callee's frame.  We can exploit both to avoid
 some explicit probing.

I've done implementations for x86, ppc, aarch64 and s390 and the
included tests have been checked against those targets
($arch-unknown-linux).

This patch does not change the probing insn itself.  We've had various
discussions on-list on a better probe insn for x86.  I think the
consensus is to avoid read-modify-write insns.  A testb may ultimately
be best.  This is IMHO an independent implementation detail for each
target and should be handled as a follow-up.  But if folks insist, it's
a trivial change to make as it doesn't fundamentally affect how all this
stuff works.

Other targets that have an existing -fstack-check=specific, but for
which I have not added a -fstack-check=clash implementation get partial
protection against stack clash as well.  This is a side effect of
keeping some of the early code we'd hoped to use to avoid writing a new
probe implementation for each target.

--

To get a sense of overhead, just 1.5% of routines in glibc need probing
in their prologues (x86) in the testing I performed.  IIRC each and
every one of those routines needed just 1-4 inlined probes.

Significantly more functions need alloca space probed (IIRC ~5%), but
given the amazingly inefficient alloca code, I can't believe anyone will
ever notice the probing overhead.

--


Patch #1 contains the new option -fstack-check=clash and some dejagnu
infrastructure  (most of which is unused until later patches)

Patch #2 adds the new style probing support to the alloca/vla area and
indirects uses of STACK_CHECK_PROTECT through get_stack_check_protect.

Patch #3 Add some generic dumping support for use by the target prologue
expanders

Patch #4 introduces the x86 specific bits

Patch #5 addresses combine-stack-adjustments interactions with
-fstack-check=clash

Patch #6 adds PPC support

Patch #7 adds aarch64 support

Patch #8 adds s390 support

The patch series has been bootstrapped and regression tested on
x86_64-linux-gnu
ppc64-linux-gnu
ppc64le-linux-gnu
aarch64-linux-gnu
s390x-linux-gnu (another respin of this is still in-progress)

Additionally, each target has been bootstrapped with -fstack-check=clash
enabled by default, the testsuite run and checked for glaring errors.

Earlier versions have also bootstrapped on 32bit PPC and 32bit s390.

Earlier versions have also been used to build and regression test.
glibc-2.17 with -fstack-check=clash enabled by default.  The resulting
x86 and x86_64 libraries also were scanned to verify proper probing.
Similarly for x86_64 builds with the trunk glibc.


An