[PATCH] D154911: Enabling fstack_clash_protection for arm32 bit, thumb and thumb2 mode
tnfchris added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:3460 + } else if (EffectiveTriple.isArm() || EffectiveTriple.isThumb()) { +CmdArgs.push_back("-mstack-probe-size=1024"); + } efriedma wrote: > tnfchris wrote: > > efriedma wrote: > > > Why 1024? > > 1024 was experimentally determined by Arm and is part of the ABI for stack > > clash (which has not yet been published). It was determined by examining a > > large number of programs and looking at the function stack usages. 1024 > > covers 80-90% of programs such that we can minimize the number of probes > > required in the average cases. > There are actually multiple numbers involved here, no? One is the spacing of > probes, i.e. if allocating a large amount of stack, how many times you need > to probe; this is basically the page size of the target. the other is how > much unprobed space a function is allowed to allocate before calling another > function. Referring to the the AArch64 patch, -mstack-probe-size is the > former, the hardcoded "1024" is the latter. I hadn't looked at the patch in detail yet, I thought this was the probing offset. But you're right, what I thought of was `StackClashCallerGuard`, if `stack-probe-size` indeed the guard size itself, then yeah this would be wrong. It seems incorrect to allow it smaller than the page size. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D154911/new/ https://reviews.llvm.org/D154911 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D154911: Enabling fstack_clash_protection for arm32 bit, thumb and thumb2 mode
efriedma added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:3460 + } else if (EffectiveTriple.isArm() || EffectiveTriple.isThumb()) { +CmdArgs.push_back("-mstack-probe-size=1024"); + } tnfchris wrote: > efriedma wrote: > > Why 1024? > 1024 was experimentally determined by Arm and is part of the ABI for stack > clash (which has not yet been published). It was determined by examining a > large number of programs and looking at the function stack usages. 1024 > covers 80-90% of programs such that we can minimize the number of probes > required in the average cases. There are actually multiple numbers involved here, no? One is the spacing of probes, i.e. if allocating a large amount of stack, how many times you need to probe; this is basically the page size of the target. the other is how much unprobed space a function is allowed to allocate before calling another function. Referring to the the AArch64 patch, -mstack-probe-size is the former, the hardcoded "1024" is the latter. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D154911/new/ https://reviews.llvm.org/D154911 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D154911: Enabling fstack_clash_protection for arm32 bit, thumb and thumb2 mode
tnfchris added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:3460 + } else if (EffectiveTriple.isArm() || EffectiveTriple.isThumb()) { +CmdArgs.push_back("-mstack-probe-size=1024"); + } efriedma wrote: > Why 1024? 1024 was experimentally determined by Arm and is part of the ABI for stack clash (which has not yet been published). It was determined by examining a large number of programs and looking at the function stack usages. 1024 covers 80-90% of programs such that we can minimize the number of probes required in the average cases. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D154911/new/ https://reviews.llvm.org/D154911 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D154911: Enabling fstack_clash_protection for arm32 bit, thumb and thumb2 mode
efriedma added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:3460 + } else if (EffectiveTriple.isArm() || EffectiveTriple.isThumb()) { +CmdArgs.push_back("-mstack-probe-size=1024"); + } Why 1024? Comment at: llvm/lib/Target/ARM/ARMFrameLowering.cpp:780 emitSPUpdate(isARM, MBB, MBBI, dl, TII, -NumBytes, MachineInstr::FrameSetup); DefCFAOffsetCandidates.addInst(std::prev(MBBI), NumBytes, true); Is this relevant? Are the other unmodified uses of emitSPUpdate() relevant? Comment at: llvm/lib/Target/ARM/ARMFrameLowering.cpp:979 if (STI.isTargetWindows() && WindowsRequiresStackProbe(MF, NumBytes)) { uint32_t NumWords = NumBytes >> 2; Can we try to unify the Windows/non-Windows codepaths to some extent? Having two independent codepaths each trying to decide when probing is necessary seems likely to lead to bugs. Comment at: llvm/test/CodeGen/ARM/stackProbing_arm.ll:19 +; CHECK-NEXT:.pad #4096 +; CHECK-NEXT:sub sp, sp, #4096 +; CHECK-NEXT:cmp sp, r0 Going straight from the function entry to here, you've decremented the stack by a total of 6276 bytes; this can jump over a 4kb guard page. Comment at: llvm/test/CodeGen/ARM/stackProbing_arm.ll:21 +; CHECK-NEXT:cmp sp, r0 +; CHECK-NEXT:str r0, [sp, #1024] +; CHECK-NEXT:bne .LBB0_1 From a security perspective, scattering pointers to the stack onto the stack is maybe not the best idea. Comment at: llvm/test/CodeGen/ARM/stackProbing_arm.ll:95 +; CHECK-NEXT:@ =>This Inner Loop Header: Depth=1 +; CHECK-NEXT:sub sp, sp, #4096 +; CHECK-NEXT:cmp sp, r3 Again, this is too many bytes. (At least, ignoring the stores before the loop, which don't appear to be intentionally inserted checks.) Comment at: llvm/test/CodeGen/ARM/stackProbing_thumb.ll:12 +; CHECK-NEXT:ldr r0, .LCPI0_1 +; CHECK-NEXT:subs r0, r0, r0 +; CHECK-NEXT:ldr r1, .LCPI0_2 This is a very fancy way of setting a register to zero. Have you done any testing that the generated code actually works? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D154911/new/ https://reviews.llvm.org/D154911 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D154911: Enabling fstack_clash_protection for arm32 bit, thumb and thumb2 mode
varunkumare99 created this revision. varunkumare99 added reviewers: tnfchris, efriedma, kristof.beyls, serge-sans-paille. Herald added a subscriber: hiraditya. Herald added a project: All. varunkumare99 requested review of this revision. Herald added subscribers: llvm-commits, cfe-commits, MaskRay. Herald added projects: clang, LLVM. adds stack probing instruction sequence for dynamic stack allocations, VLA's and constant arrays to protect against stack clash attacks. Depending on the size of the stack allocation, various probing sequences are generated: - straight line sequence of subtracts and stores - A loop allocating and probing one page size per iteration, plus a single probe to deal with the remainder. - A loop which moves the SP down to the target value held in a register, used when the allocation size is not known at compile-time reference: https://reviews.llvm.org/D96004 Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D154911 Files: clang/lib/CodeGen/CodeGenModule.cpp clang/lib/Driver/ToolChains/Clang.cpp llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp llvm/lib/Target/ARM/ARMFrameLowering.cpp llvm/lib/Target/ARM/ARMFrameLowering.h llvm/lib/Target/ARM/ARMISelLowering.cpp llvm/lib/Target/ARM/ARMISelLowering.h llvm/lib/Target/ARM/ARMInstrInfo.td llvm/lib/Target/ARM/Thumb1FrameLowering.cpp llvm/lib/Target/ARM/Thumb1FrameLowering.h llvm/test/CodeGen/ARM/stackProbing_arm.ll llvm/test/CodeGen/ARM/stackProbing_thumb.ll llvm/test/CodeGen/Thumb2/stackProbing_thumb2.ll Index: llvm/test/CodeGen/Thumb2/stackProbing_thumb2.ll === --- /dev/null +++ llvm/test/CodeGen/Thumb2/stackProbing_thumb2.ll @@ -0,0 +1,188 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2 +; RUN: llc -mtriple thumbv8.1m.main-none-linux-eabi < %s -verify-machineinstrs | FileCheck %s +; Function Attrs: noinline nounwind optnone +define dso_local void @large_stack() "probe-stack"="inline-asm" "frame-pointer"="none" { +; CHECK-LABEL: large_stack: +; CHECK: @ %bb.0: @ %entry +; CHECK-NEXT:.save {r4, r5} +; CHECK-NEXT:push {r4, r5} +; CHECK-NEXT:sub.w r0, sp, #79872 +; CHECK-NEXT:subs r0, #132 +; CHECK-NEXT:.pad #2180 +; CHECK-NEXT:subw sp, sp, #2180 +; CHECK-NEXT: .LBB0_1: @ %entry +; CHECK-NEXT:@ =>This Inner Loop Header: Depth=1 +; CHECK-NEXT:.pad #4096 +; CHECK-NEXT:sub.w sp, sp, #4096 +; CHECK-NEXT:cmp sp, r0 +; CHECK-NEXT:str.w r0, [sp, #1024] +; CHECK-NEXT:bne .LBB0_1 +; CHECK-NEXT: @ %bb.2: @ %entry +; CHECK-NEXT:movs r0, #0 +; CHECK-NEXT:add r1, sp, #4 +; CHECK-NEXT:str r0, [sp] +; CHECK-NEXT:movw r0, #1 +; CHECK-NEXT: .LBB0_3: @ %for.cond +; CHECK-NEXT:@ =>This Inner Loop Header: Depth=1 +; CHECK-NEXT:ldr r2, [sp] +; CHECK-NEXT:cmp r2, r0 +; CHECK-NEXT:bhi .LBB0_5 +; CHECK-NEXT: @ %bb.4: @ %for.body +; CHECK-NEXT:@ in Loop: Header=BB0_3 Depth=1 +; CHECK-NEXT:ldr r2, [sp] +; CHECK-NEXT:ldr r3, [sp] +; CHECK-NEXT:str.w r2, [r1, r3, lsl #2] +; CHECK-NEXT:ldr r2, [sp] +; CHECK-NEXT:adds r2, #1 +; CHECK-NEXT:str r2, [sp] +; CHECK-NEXT:b .LBB0_3 +; CHECK-NEXT: .LBB0_5: @ %for.end +; CHECK-NEXT:add.w sp, sp, #79872 +; CHECK-NEXT:add sp, #132 +; CHECK-NEXT:pop {r4, r5} +; CHECK-NEXT:bx lr +entry: + %stack = alloca [2 x i32], align 4 + %i = alloca i32, align 4 + store volatile i32 0, ptr %i, align 4 + br label %for.cond + +for.cond: ; preds = %for.inc, %entry + %0 = load volatile i32, ptr %i, align 4 + %cmp = icmp ult i32 %0, 2 + br i1 %cmp, label %for.body, label %for.end + +for.body: ; preds = %for.cond + %1 = load volatile i32, ptr %i, align 4 + %2 = load volatile i32, ptr %i, align 4 + %arrayidx = getelementptr inbounds [2 x i32], ptr %stack, i32 0, i32 %2 + store volatile i32 %1, ptr %arrayidx, align 4 + br label %for.inc + +for.inc: ; preds = %for.body + %3 = load volatile i32, ptr %i, align 4 + %inc = add nsw i32 %3, 1 + store volatile i32 %inc, ptr %i, align 4 + br label %for.cond, !llvm.loop !6 + +for.end: ; preds = %for.cond + ret void +} + +; Function Attrs: noinline nounwind optnone +define dso_local void @vla(i32 noundef %n) "probe-stack"="inline-asm" "frame-pointer"="none" { +; CHECK-LABEL: vla: +; CHECK: @ %bb.0: @ %entry +; CHECK-NEXT:.save {r4, r6, r7, lr} +; CHECK-NEXT:push {r4, r6, r7, lr} +; CHECK-NEXT:.setfp r7, sp, #8 +; CHECK-NEXT:add r7, sp, #8 +; CHECK-NEXT:.pad #16 +; CHECK-NEXT:sub sp, #16 +; CHECK-NEXT:movs r1, #7 +; CHECK-NEXT:str r0, [r7, #-12] +; CHECK-NEXT:add.w r1, r1, r0, lsl #2 +; CHECK-NEXT:str sp, [r7, #-16] +; CHECK-NEXT:bic r1, r1, #7 +; CHECK-NEXT:sub.w r1,