[clang] [llvm] [ASAN] For Asan instrumented global, emit two symbols, one with actual size and other with instrumented size. (PR #70166)

2024-02-15 Thread via cfe-commits

b-sumner wrote:

Hi @hctim, I'm really not seeing how this patch could possibly prevent ASAN or 
other sanitizers from arbitrarily changing their implementations or placing 
arbitrary data in redzones.  This patch is merely introducing additional 
symbols that overlap with the uninstrumented parts of instrumented objects.   
We would need to do this again if we implemented MSAN, so would be fine with 
not restricting it to ASAN if that is the concern.

Of course our runtime could avoid checking the shadow when copying, but that 
would be just as crippling as disabling checking by memcpy() and memmove().  
It's not an option.

https://github.com/llvm/llvm-project/pull/70166
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ASAN] For Asan instrumented global, emit two symbols, one with actual size and other with instrumented size. (PR #70166)

2024-02-15 Thread Mitch Phillips via cfe-commits

hctim wrote:

Messing around with global variables (changing their size, padding, alignment) 
is a common theme amongst sanitizers. We'd therefore want any strategy applied 
to ASan to be generic and apply across other sanitizers.

The patch might not cause issues right now with ASan - but I think it's best to 
reserve the right to change the way that ASan's global variable instrumentation 
works. The padding bytes are currently unused. But, I've heard at least one 
idea come across my desk to change that and to use the "free" metadata space. 
That would be incompatible with your patch. In contrary - I still don't know 
why the small section of AMDGPU's symbol-copying code can't be just built 
without `__attribute__((no_sanitize("address")))`

https://github.com/llvm/llvm-project/pull/70166
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ASAN] For Asan instrumented global, emit two symbols, one with actual size and other with instrumented size. (PR #70166)

2024-02-14 Thread via cfe-commits

skc7 wrote:

Hi @hctim,
In this patch, we are identifying globals that are instrumented using an new IR 
attribute and for them extra symbol with padded size is emitted. This is done 
only in ASAN pass. Not sure how would that affect HWAsan and MTE-globals. Could 
you please give an example for this ?
At AMD, we have complex HPC/AI applications and libraries and this patch seems 
to not create any issue as such. Could you please give a concrete example test 
case where this patch would create an issue?

https://github.com/llvm/llvm-project/pull/70166
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ASAN] For Asan instrumented global, emit two symbols, one with actual size and other with instrumented size. (PR #70166)

2024-02-09 Thread Mitch Phillips via cfe-commits

hctim wrote:

Hi,

My apologies for the delay.

I started to hack together the same idea using `SanitizerMetadata`, which is 
definitely the preferred way of adding sanitizer globalvariable instrumentation 
(given that everything else already lives there). You can see the WIP here: 
https://pastebin.com/vXzHmnMg

Part way through patching up all the tests though, I've come to the conclusion 
that this really isn't a great idea. For just ASan, sure we can just emit a 
secondary symbol and everything is hunky dory. But I think that we shouldn't do 
something like this for just ASan, it needs to also work for HWASan and 
MTE-globals as well. And it really doesn't compose well with the MTE-globals 
model. If you were to emit the unpadded symbol as the dominant name (from the 
c/cpp definition), then the linker gets super confused if you're linking 
multiple DSOs, as it needs to ensure that *all* of the DSOs use the tagged 
definition (or they all get demoted to being untagged). So, you'd need the 
padded one to be the dominant name, which I think is not what you want.

This feels like a behaviour that we don't want the additional complexity for 
AMDGPU's drivers, because I highly suspect you can just deal with the 
sanitizers in the AMDGPU driver itself.

How does your driver work? If it's basically doing a `dlsym` -> `memcpy` (and 
the memcpy interceptor in ASan is firing because you're going OOB on the 
unpadded symbol), can't you just have an `internal_memcpy` that's built with 
`__attribute__((no_sanitize("address")))`?

We'd also potentially like to put some data in the padding. This has been 
discussed between some colleagues and I. If you end up not copying the padding 
as well, you might drop some metadata and things could explode in future.

HTH,
Mitch.

https://github.com/llvm/llvm-project/pull/70166
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ASAN] For Asan instrumented global, emit two symbols, one with actual size and other with instrumented size. (PR #70166)

2024-02-02 Thread via cfe-commits

skc7 wrote:

@hctim @MaskRay @vitalybuka #70166 and #68865 have been pending for review and 
approvals for few months now. All the feedback has been useful in improving the 
patch. Made the changes to patches as per feedback.
Please review.

https://github.com/llvm/llvm-project/pull/70166
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ASAN] For Asan instrumented global, emit two symbols, one with actual size and other with instrumented size. (PR #70166)

2024-01-19 Thread via cfe-commits

skc7 wrote:

#68865 and #70166 have been rebased and updated as per review comments. Please 
review.

https://github.com/llvm/llvm-project/pull/70166
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ASAN] For Asan instrumented global, emit two symbols, one with actual size and other with instrumented size. (PR #70166)

2024-01-19 Thread via cfe-commits

https://github.com/skc7 updated https://github.com/llvm/llvm-project/pull/70166

>From 45d7b14e494ce771460ba263c5573f0fb4715246 Mon Sep 17 00:00:00 2001
From: skc7 
Date: Wed, 25 Oct 2023 10:46:10 +0530
Subject: [PATCH] [ASAN] For Asan instrumented globals, emit two symbols, with
 actual size and instrumented size.

---
 clang/test/CodeGen/asan_globals_symbols.cpp   | 15 ++
 llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp| 28 ++-
 .../Instrumentation/AddressSanitizer.cpp  |  3 ++
 .../AddressSanitizer/debug-info-global-var.ll |  2 +-
 4 files changed, 46 insertions(+), 2 deletions(-)
 create mode 100644 clang/test/CodeGen/asan_globals_symbols.cpp

diff --git a/clang/test/CodeGen/asan_globals_symbols.cpp 
b/clang/test/CodeGen/asan_globals_symbols.cpp
new file mode 100644
index 00..d53afb2433b171
--- /dev/null
+++ b/clang/test/CodeGen/asan_globals_symbols.cpp
@@ -0,0 +1,15 @@
+// RUN: %clang_cc1 -S -x c++ -std=c++11 -triple x86_64-linux \
+// RUN:   -fsanitize=address -o %t.out %s
+// RUN: FileCheck %s --input-file=%t.out --check-prefix=CHECK-A
+
+// CHECK-A: myGlobal:
+// CHECK-A: .size   myGlobal, 4
+// CHECK-A: myGlobal__sanitized_padded_global:
+// CHECK-A  .size   myGlobal__sanitized_padded_global, 32
+
+int myGlobal;
+
+int main() {
+myGlobal = 0;
+return 0;
+}
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp 
b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index 7df1c82bf357f6..cf1bdde6fc3425 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -801,6 +801,19 @@ void AsmPrinter::emitGlobalVariable(const GlobalVariable 
*GV) {
   // sections and expected to be contiguous (e.g. ObjC metadata).
   const Align Alignment = getGVAlignment(GV, DL);
 
+  // Identify globals with "SanitizedPaddedGlobal" attribute and extract
+  // the actual global variable size.
+  uint64_t ActualSize = 0;
+  if (GV->hasAttribute(Attribute::SanitizedPaddedGlobal)) {
+StructType *ST = dyn_cast(GV->getValueType());
+if (ST && ST->getNumElements() == 2) {
+  auto *ET0 = ST->getElementType(0);
+  if (ET0 && isa(ST->getElementType(1))) {
+ActualSize = DL.getTypeAllocSize(ET0);
+  }
+}
+  }
+
   for (const HandlerInfo  : Handlers) {
 NamedRegionTimer T(HI.TimerName, HI.TimerDescription,
HI.TimerGroupName, HI.TimerGroupDescription,
@@ -911,6 +924,18 @@ void AsmPrinter::emitGlobalVariable(const GlobalVariable 
*GV) {
 
   MCSymbol *EmittedInitSym = GVSym;
 
+  if (GV->hasAttribute(Attribute::SanitizedPaddedGlobal)) {
+OutStreamer->switchSection(TheSection);
+emitLinkage(GV, EmittedInitSym);
+OutStreamer->emitLabel(EmittedInitSym);
+if (MAI->hasDotTypeDotSizeDirective())
+  OutStreamer->emitELFSize(EmittedInitSym,
+   MCConstantExpr::create(ActualSize, OutContext));
+EmittedInitSym = OutContext.getOrCreateSymbol(
+GVSym->getName() + Twine("__sanitized_padded_global"));
+emitVisibility(EmittedInitSym, GV->getVisibility(), !GV->isDeclaration());
+  }
+
   OutStreamer->switchSection(TheSection);
 
   emitLinkage(GV, EmittedInitSym);
@@ -918,7 +943,8 @@ void AsmPrinter::emitGlobalVariable(const GlobalVariable 
*GV) {
 
   OutStreamer->emitLabel(EmittedInitSym);
   MCSymbol *LocalAlias = getSymbolPreferLocal(*GV);
-  if (LocalAlias != EmittedInitSym)
+  if ((LocalAlias != EmittedInitSym) &&
+  !GV->hasAttribute(Attribute::SanitizedPaddedGlobal))
 OutStreamer->emitLabel(LocalAlias);
 
   emitGlobalConstant(GV->getParent()->getDataLayout(), GV->getInitializer());
diff --git a/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp 
b/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
index caab98c732eeeb..a0dbc5224b3ebc 100644
--- a/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
+++ b/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
@@ -2496,6 +2496,9 @@ void 
ModuleAddressSanitizer::instrumentGlobals(IRBuilder<> , Module ,
 // zero so we can copy the metadata over as is.
 NewGlobal->copyMetadata(G, 0);
 
+// Attach "SanitizedPaddedGlobal" attribute to the new global.
+NewGlobal->addAttribute(Attribute::SanitizedPaddedGlobal);
+
 Value *Indices2[2];
 Indices2[0] = IRB.getInt32(0);
 Indices2[1] = IRB.getInt32(0);
diff --git 
a/llvm/test/Instrumentation/AddressSanitizer/debug-info-global-var.ll 
b/llvm/test/Instrumentation/AddressSanitizer/debug-info-global-var.ll
index 0b516e0174d6d1..2815c1f04bff15 100644
--- a/llvm/test/Instrumentation/AddressSanitizer/debug-info-global-var.ll
+++ b/llvm/test/Instrumentation/AddressSanitizer/debug-info-global-var.ll
@@ -2,7 +2,7 @@
 source_filename = "version.c"
 target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-apple-macosx10.12.0"
-; CHECK: @version = constant { [5 x i8], [27 x i8] } {{.*}}, !dbg ![[GV:.*]]
+; CHECK: @version = constant { [5 x i8], [27 x i8] } {{.*}}, !dbg ![[GV:.*]] 

[clang] [llvm] [ASAN] For Asan instrumented global, emit two symbols, one with actual size and other with instrumented size. (PR #70166)

2023-11-01 Thread Mitch Phillips via cfe-commits

hctim wrote:

Will wait for a rebase on some changes requested in #68865, but it'd also be 
really important to write tests for a couple more scenarios:

 1. Building with `-fsanitize=address -S -emit-llvm` results in GVs with 
`sanitized_padded_global` (and results in GVs without `sanitized_padded_global` 
if they have `__attribute__((no_sanitize("address")))` for example). See 
`clang/test/CodeGen/memtag-globals.cpp`.
 2. Running #1 through bitcode back/forth to make sure you still have the same 
attributes sets (to ensure you got the parser/writer logic correct). See 
`llvm/test/Bitcode/compatibility.ll` and 
`llvm/test/Assembler/globalvariable-attributes.ll`.


https://github.com/llvm/llvm-project/pull/70166
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [ASAN] For Asan instrumented global, emit two symbols, one with actual size and other with instrumented size. (PR #70166)

2023-11-01 Thread Mitch Phillips via cfe-commits

hctim wrote:

> > It's my understanding your problem is that you are asan-trapping on the 
> > redzones when you copy data to/from the device. Is it possible instead to 
> > just make those copy-from and copy-to functions in the runtime 
> > `__attribute__((no_sanitize("address")))` and copy the padding as well?
> 
> Yes it is possible, but it would result in bugs being hidden...the very bugs 
> that ASAN is meant to catch. This is not an acceptable option for us.

Only bugs in the `copy_from_` and `copy_to_` functions though (which should be 
pretty clear)?

--

Alright, let's give this a crack. I'll leave some comments here and on #68865, 
but let's not worry about potential unknown problems and we can see what shakes 
out once we actually submit the patches.

https://github.com/llvm/llvm-project/pull/70166
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits