[PATCH] D154911: Enabling fstack_clash_protection for arm32 bit, thumb and thumb2 mode

2023-07-26 Thread Tamar Christina via Phabricator via cfe-commits
tnfchris added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:3460
+  } else if (EffectiveTriple.isArm() || EffectiveTriple.isThumb()) {
+CmdArgs.push_back("-mstack-probe-size=1024");
+  }

efriedma wrote:
> tnfchris wrote:
> > efriedma wrote:
> > > Why 1024?
> > 1024 was experimentally determined by Arm and is part of the ABI for stack 
> > clash (which has not yet been published).  It was determined by examining a 
> > large number of programs and looking at the function stack usages.  1024 
> > covers 80-90% of programs such that we can minimize the number of probes 
> > required in the average cases. 
> There are actually multiple numbers involved here, no?  One is the spacing of 
> probes, i.e. if allocating a large amount of stack, how many times you need 
> to probe; this is basically the page size of the target. the other is how 
> much unprobed space a function is allowed to allocate before calling another 
> function. Referring to the the AArch64 patch, -mstack-probe-size is the 
> former, the hardcoded "1024" is the latter.
I hadn't looked at the patch in detail yet, I thought this was the probing 
offset.  But you're right, what I thought of was `StackClashCallerGuard`,  if 
`stack-probe-size` indeed the guard size itself, then yeah this would be wrong. 
 It seems incorrect to allow it smaller than the page size.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154911/new/

https://reviews.llvm.org/D154911

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154911: Enabling fstack_clash_protection for arm32 bit, thumb and thumb2 mode

2023-07-26 Thread Eli Friedman via Phabricator via cfe-commits
efriedma added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:3460
+  } else if (EffectiveTriple.isArm() || EffectiveTriple.isThumb()) {
+CmdArgs.push_back("-mstack-probe-size=1024");
+  }

tnfchris wrote:
> efriedma wrote:
> > Why 1024?
> 1024 was experimentally determined by Arm and is part of the ABI for stack 
> clash (which has not yet been published).  It was determined by examining a 
> large number of programs and looking at the function stack usages.  1024 
> covers 80-90% of programs such that we can minimize the number of probes 
> required in the average cases. 
There are actually multiple numbers involved here, no?  One is the spacing of 
probes, i.e. if allocating a large amount of stack, how many times you need to 
probe; this is basically the page size of the target. the other is how much 
unprobed space a function is allowed to allocate before calling another 
function. Referring to the the AArch64 patch, -mstack-probe-size is the former, 
the hardcoded "1024" is the latter.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154911/new/

https://reviews.llvm.org/D154911

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154911: Enabling fstack_clash_protection for arm32 bit, thumb and thumb2 mode

2023-07-26 Thread Tamar Christina via Phabricator via cfe-commits
tnfchris added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:3460
+  } else if (EffectiveTriple.isArm() || EffectiveTriple.isThumb()) {
+CmdArgs.push_back("-mstack-probe-size=1024");
+  }

efriedma wrote:
> Why 1024?
1024 was experimentally determined by Arm and is part of the ABI for stack 
clash (which has not yet been published).  It was determined by examining a 
large number of programs and looking at the function stack usages.  1024 covers 
80-90% of programs such that we can minimize the number of probes required in 
the average cases. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154911/new/

https://reviews.llvm.org/D154911

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154911: Enabling fstack_clash_protection for arm32 bit, thumb and thumb2 mode

2023-07-10 Thread Eli Friedman via Phabricator via cfe-commits
efriedma added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:3460
+  } else if (EffectiveTriple.isArm() || EffectiveTriple.isThumb()) {
+CmdArgs.push_back("-mstack-probe-size=1024");
+  }

Why 1024?



Comment at: llvm/lib/Target/ARM/ARMFrameLowering.cpp:780
   emitSPUpdate(isARM, MBB, MBBI, dl, TII, -NumBytes,
MachineInstr::FrameSetup);
   DefCFAOffsetCandidates.addInst(std::prev(MBBI), NumBytes, true);

Is this relevant?  Are the other unmodified uses of emitSPUpdate() relevant?



Comment at: llvm/lib/Target/ARM/ARMFrameLowering.cpp:979
 
   if (STI.isTargetWindows() && WindowsRequiresStackProbe(MF, NumBytes)) {
 uint32_t NumWords = NumBytes >> 2;

Can we try to unify the Windows/non-Windows codepaths to some extent?  Having 
two independent codepaths each trying to decide when probing is necessary seems 
likely to lead to bugs.



Comment at: llvm/test/CodeGen/ARM/stackProbing_arm.ll:19
+; CHECK-NEXT:.pad #4096
+; CHECK-NEXT:sub sp, sp, #4096
+; CHECK-NEXT:cmp sp, r0

Going straight from the function entry to here, you've decremented the stack by 
a total of 6276 bytes; this can jump over a 4kb guard page.



Comment at: llvm/test/CodeGen/ARM/stackProbing_arm.ll:21
+; CHECK-NEXT:cmp sp, r0
+; CHECK-NEXT:str r0, [sp, #1024]
+; CHECK-NEXT:bne .LBB0_1

From a security perspective, scattering pointers to the stack onto the stack is 
maybe not the best idea.



Comment at: llvm/test/CodeGen/ARM/stackProbing_arm.ll:95
+; CHECK-NEXT:@ =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:sub sp, sp, #4096
+; CHECK-NEXT:cmp sp, r3

Again, this is too many bytes.  (At least, ignoring the stores before the loop, 
which don't appear to be intentionally inserted checks.)



Comment at: llvm/test/CodeGen/ARM/stackProbing_thumb.ll:12
+; CHECK-NEXT:ldr r0, .LCPI0_1
+; CHECK-NEXT:subs r0, r0, r0
+; CHECK-NEXT:ldr r1, .LCPI0_2

This is a very fancy way of setting a register to zero.

Have you done any testing that the generated code actually works?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154911/new/

https://reviews.llvm.org/D154911

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154911: Enabling fstack_clash_protection for arm32 bit, thumb and thumb2 mode

2023-07-10 Thread Varun Kumar E via Phabricator via cfe-commits
varunkumare99 created this revision.
varunkumare99 added reviewers: tnfchris, efriedma, kristof.beyls, 
serge-sans-paille.
Herald added a subscriber: hiraditya.
Herald added a project: All.
varunkumare99 requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, MaskRay.
Herald added projects: clang, LLVM.

adds stack probing instruction sequence for dynamic stack allocations, VLA's 
and constant arrays to protect against stack clash attacks.

Depending on the size of the stack allocation, various probing sequences are 
generated:

- straight line sequence of subtracts and stores
- A loop allocating and probing one page size per iteration, plus a single 
probe to deal with the remainder.
- A loop which moves the SP down to the target value held in a register, used 
when the allocation size is not known at compile-time

reference: https://reviews.llvm.org/D96004


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D154911

Files:
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
  llvm/lib/Target/ARM/ARMFrameLowering.cpp
  llvm/lib/Target/ARM/ARMFrameLowering.h
  llvm/lib/Target/ARM/ARMISelLowering.cpp
  llvm/lib/Target/ARM/ARMISelLowering.h
  llvm/lib/Target/ARM/ARMInstrInfo.td
  llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
  llvm/lib/Target/ARM/Thumb1FrameLowering.h
  llvm/test/CodeGen/ARM/stackProbing_arm.ll
  llvm/test/CodeGen/ARM/stackProbing_thumb.ll
  llvm/test/CodeGen/Thumb2/stackProbing_thumb2.ll

Index: llvm/test/CodeGen/Thumb2/stackProbing_thumb2.ll
===
--- /dev/null
+++ llvm/test/CodeGen/Thumb2/stackProbing_thumb2.ll
@@ -0,0 +1,188 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
+; RUN: llc -mtriple thumbv8.1m.main-none-linux-eabi < %s -verify-machineinstrs | FileCheck %s
+; Function Attrs: noinline nounwind optnone
+define dso_local void @large_stack() "probe-stack"="inline-asm" "frame-pointer"="none" {
+; CHECK-LABEL: large_stack:
+; CHECK:   @ %bb.0: @ %entry
+; CHECK-NEXT:.save {r4, r5}
+; CHECK-NEXT:push {r4, r5}
+; CHECK-NEXT:sub.w r0, sp, #79872
+; CHECK-NEXT:subs r0, #132
+; CHECK-NEXT:.pad #2180
+; CHECK-NEXT:subw sp, sp, #2180
+; CHECK-NEXT:  .LBB0_1: @ %entry
+; CHECK-NEXT:@ =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:.pad #4096
+; CHECK-NEXT:sub.w sp, sp, #4096
+; CHECK-NEXT:cmp sp, r0
+; CHECK-NEXT:str.w r0, [sp, #1024]
+; CHECK-NEXT:bne .LBB0_1
+; CHECK-NEXT:  @ %bb.2: @ %entry
+; CHECK-NEXT:movs r0, #0
+; CHECK-NEXT:add r1, sp, #4
+; CHECK-NEXT:str r0, [sp]
+; CHECK-NEXT:movw r0, #1
+; CHECK-NEXT:  .LBB0_3: @ %for.cond
+; CHECK-NEXT:@ =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:ldr r2, [sp]
+; CHECK-NEXT:cmp r2, r0
+; CHECK-NEXT:bhi .LBB0_5
+; CHECK-NEXT:  @ %bb.4: @ %for.body
+; CHECK-NEXT:@ in Loop: Header=BB0_3 Depth=1
+; CHECK-NEXT:ldr r2, [sp]
+; CHECK-NEXT:ldr r3, [sp]
+; CHECK-NEXT:str.w r2, [r1, r3, lsl #2]
+; CHECK-NEXT:ldr r2, [sp]
+; CHECK-NEXT:adds r2, #1
+; CHECK-NEXT:str r2, [sp]
+; CHECK-NEXT:b .LBB0_3
+; CHECK-NEXT:  .LBB0_5: @ %for.end
+; CHECK-NEXT:add.w sp, sp, #79872
+; CHECK-NEXT:add sp, #132
+; CHECK-NEXT:pop {r4, r5}
+; CHECK-NEXT:bx lr
+entry:
+  %stack = alloca [2 x i32], align 4
+  %i = alloca i32, align 4
+  store volatile i32 0, ptr %i, align 4
+  br label %for.cond
+
+for.cond: ; preds = %for.inc, %entry
+  %0 = load volatile i32, ptr %i, align 4
+  %cmp = icmp ult i32 %0, 2
+  br i1 %cmp, label %for.body, label %for.end
+
+for.body: ; preds = %for.cond
+  %1 = load volatile i32, ptr %i, align 4
+  %2 = load volatile i32, ptr %i, align 4
+  %arrayidx = getelementptr inbounds [2 x i32], ptr %stack, i32 0, i32 %2
+  store volatile i32 %1, ptr %arrayidx, align 4
+  br label %for.inc
+
+for.inc:  ; preds = %for.body
+  %3 = load volatile i32, ptr %i, align 4
+  %inc = add nsw i32 %3, 1
+  store volatile i32 %inc, ptr %i, align 4
+  br label %for.cond, !llvm.loop !6
+
+for.end:  ; preds = %for.cond
+  ret void
+}
+
+; Function Attrs: noinline nounwind optnone
+define dso_local void @vla(i32 noundef %n) "probe-stack"="inline-asm" "frame-pointer"="none" {
+; CHECK-LABEL: vla:
+; CHECK:   @ %bb.0: @ %entry
+; CHECK-NEXT:.save {r4, r6, r7, lr}
+; CHECK-NEXT:push {r4, r6, r7, lr}
+; CHECK-NEXT:.setfp r7, sp, #8
+; CHECK-NEXT:add r7, sp, #8
+; CHECK-NEXT:.pad #16
+; CHECK-NEXT:sub sp, #16
+; CHECK-NEXT:movs r1, #7
+; CHECK-NEXT:str r0, [r7, #-12]
+; CHECK-NEXT:add.w r1, r1, r0, lsl #2
+; CHECK-NEXT:str sp, [r7, #-16]
+; CHECK-NEXT:bic r1, r1, #7
+; CHECK-NEXT:sub.w r1,