[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
efriedma-quic wrote: > IIUC you're suggesting movint the TBAA data origination from CodeGen into > (probably) Sema and/or TargetInfo and then deriving the LLVM info in CodeGen > from that. I.e. keep once source of truth, but move where it is? Right; move the core computation to AST, and then the CodeGen bits would just translate that data to LLVM IR. > Although you don't explicitly say it, the implication is you'd be ameanable > to such an approach? TBAA categorization is semantic information, so I think it makes sense to make it accessible outside CodeGen, yes. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
urnathan wrote: > On the LLVM side, there's very little interesting logic; it's basically just > walking the tree of metadata nodes generated by clang. See > https://llvm.org/docs/LangRef.html#tbaa-node-semantics . The hard part of the > refactoring would just be adding an abstraction for the underlying > information. IIUC you're suggesting movint the TBAA data origination from CodeGen into (probably) Sema and/or TargetInfo and then deriving the LLVM info in CodeGen from that. I.e. keep once source of truth, but move where it is? I don't think it'd be that hard -- mostly mechanical. And I suspect only the scalar TBAA needs so moving, the structure-related stuff is sufficiently different to that strict-aliasing wants. Although you don't explicitly say it, the implication is you'd be ameanable to such an approach? https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
efriedma-quic wrote: > > Making Sema pull the TBAA info out of clang/lib/CodeGen is a layering > > violation (and probably breaks if we aren't actually generating code). If > > we need some notion of "aliasing" in Sema, we should pull the relevant code > > into clang/lib/AST. > > That's unfortunate. The code will not call the TBAA stuff if the code > generator doesn't provide that predicate -- so if not generating optimized > code the warning is inactive. This is the same as with GCC btw. To be clear > what's exposed is a new predicate allowing querying of type conversion's > TBAAness -- the representations of TBAA etc are not exposed. The CodeGen TBAA > machinery builds on llvm's aliasing MDNodes. It would seem a large task to > replicate that in AST (and I presume propagating llvm's bits further into AST > is even worse?) On the LLVM side, there's very little interesting logic; it's basically just walking the tree of metadata nodes generated by clang. See https://llvm.org/docs/LangRef.html#tbaa-node-semantics . The hard part of the refactoring would just be adding an abstraction for the underlying information. gcc has made various unfortunate decisions regarding diagnostics in the past; we've put a lot of effort into making sure our diagnostics are more consistent. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
urnathan wrote: > > > FWIW the GCC doc is > > > https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstrict-aliasing_003dn > > > It says for Level 3 "If optimization is enabled, it also runs in the > > > back end, where it deals with multiple statement cases using > > > flow-sensitive points-to information." > > > Do you know how it works? Any example? > > > > > > I do not now how it works -- didn't go poking there. Neither do I have > > examples. > > My understanding (which could be totally wrong) is that the logic for when to > emit the level 3 diagnostics rests mostly in the optimizer passes. This is > not something I think Clang should emulate as it creates a very poor user > experience in practice (not just with strict aliasing diagnostics -- I don't > think _any_ diagnostics other than remarks should be emitted based on LLVM > optimization decisions aside from the `error` and `warning` attributes which > are special circumstances). I don't know if it's 'mostly', the level 3 can certainly be triggered in the FE -- it requires (a) an address-of-object idiom, (b) a suspicious cast and (c) a dereference. Something like `*reinterpret_cast() = 5`, where `obj` is not compatible with `int`. That's what this patch implements. (In GCC's case when it warns about this in the FE it marks the tree as warned, to inhibit the middle end also doing so.) https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -498,3 +498,137 @@ CodeGenTBAA::mergeTBAAInfoForMemoryTransfer(TBAAAccessInfo DestInfo, // access type regardless of their base types. return TBAAAccessInfo::getMayAliasInfo(); } + +// Determine the aliasing kind bit-converting from type Src to type Dst. +CodeGenTBAA::AliasingKind CodeGenTBAA::getAliasingKind(QualType , + QualType ) { + assert(!Src->isVoidType() && !Dst->isVoidType()); + if (TypeHasMayAlias(Src) || TypeHasMayAlias(Dst)) +return AK_Ok; + + Src = QualType{Src->getBaseElementTypeUnsafe(), 0}; + Dst = QualType{Dst->getBaseElementTypeUnsafe(), 0}; + + auto *SrcDecl = Src->getAsRecordDecl(); + auto *DstDecl = Dst->getAsRecordDecl(); + + const llvm::MDNode *AnyTBAA = getChar(); + const llvm::MDNode *SrcTBAA = nullptr; + const llvm::MDNode *DstTBAA = nullptr; + + if (!SrcDecl) { +SrcTBAA = getTypeInfo(Src); +if (!SrcTBAA || SrcTBAA == AnyTBAA) urnathan wrote: Pedantically 'signed char' is not blessed with that property by the std. But many compilers (llvm included) bestow it upon signed char. Indeed this particular patch doesn't make a judgement on that, it just gets the aliasing info for 'signed char' and it happens to get the same ubiquitous char tbaa. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -37,6 +38,27 @@ class ASTConsumer { friend class SemaConsumer; +public: + /// Allow type-based aliasing information to be interrogated by the AST + /// producer (for diagnostics). + class TypeAliasing { urnathan wrote: Oh, we also endup deriving from the TypeAliasing class to override the virtual function it provides. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -0,0 +1,192 @@ +// RUN: %clang_cc1 %s -O0 -Wstrict-aliasing -S -o %t -verify=quiet +// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=0 -S -o %t -verify=quiet +// RUN: %clang_cc1 %s -O2 -Wno-strict-aliasing -S -o %t -verify=quiet +// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=1 -S -o %t -verify=level1,level12,level123 +// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=2 -S -o %t -verify=level2,level23,level12,level123 +// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=3 -S -o %t -verify=level23,level123 +// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing -S -o %t -verify=level23,level123 +// RUN: %clang_cc1 %s -O2 -S -o %t -verify=level23,level123 + +// quiet-no-diagnostics + +#if _LP64 +// These names make more sense on an ilp32 machine +typedef long INT; +typedef long long LONG; +typedef unsigned long UINT; +#else +typedef int INT; +typedef long LONG; +typedef unsigned int UINT; +#endif +typedef short SHORT; + +INT ScalarINT; +INT Ary[2]; +struct {int m;} Struct; + +_Complex int CPLx; + +void ByVal(long long); +void ByPtr(void *); + +void VarPtr(INT *Ptr) { + // GCC: 1 + // level1-warning@+1{{type-punned pointer might break}} + ByPtr((LONG *)(Ptr)); + // level1-note@-1{{not alias compatible}} + + // GCC: + ByPtr((LONG *)((void *)(Ptr))); + + // GCC: 1 + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*(LONG *)(Ptr)); + // level1-note@-1{{not alias compatible}} +} + +void Object() { + // GCC: 1, 2 + // level2-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByPtr((LONG *)()); + // level12-note@-1{{not alias compatible}} + + // GCC: + ByPtr((LONG *)((void *)())); + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*(LONG *)()); + // level123-note@-1{{not alias compatible}} +} + +// Level 1, 2, 3 - 1, 2, 3 +void DetectedVariants() { + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*(LONG *)([1])); + // level123-note@-1{{not alias compatible}} + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*(LONG *)()); + // level123-note@-1{{not alias compatible}} + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*(LONG *)(&()->m)); + // level123-note@-1{{not alias compatible}} + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*(LONG *)(&__real__(CPLx))); + // level123-note@-1{{not alias compatible}} + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*(LONG *)(&__imag__(CPLx))); + // level123-note@-1{{not alias compatible}} +} + +void Ok() { + // GCC: + ByPtr((UINT *)()); + // GCC: + ByPtr((UINT *)((void *)())); + // GCC: + ByVal(*(UINT *)()); +} + +// Level 1, 2, 3 - 1, 2, 3 +void Parens() { + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*((LONG *)((&(ScalarINT); + // level123-note@-1{{not alias compatible}} + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*((LONG *)((&(Ary[1]); + // level123-note@-1{{not alias compatible}} +} + +// Clang models may_alias as a decl attribute, not a type attribute. + +typedef int MA __attribute__((may_alias)); + +void Frob(MA *a) { + ByPtr((short *)(a)); + ByVal(*(short *)(a)); +} + +struct Inner { int m; }; +struct Outer1 { struct Inner i; }; +struct Outer2 { struct Outer1 o; }; +struct Inner i; +struct Outer2 o; + +void ByValInner (struct Inner); +void ByValOuter2 (struct Outer2); + +void Inherit() { + // Check we see through multiple levels + int in; + + ByValOuter2(*(struct Outer2 *)); + ByValOuter2(*(struct Outer2 *)); + ByValInner(*(struct Inner *)); + ByValInner(*(struct Inner *)); + ByVal(*(int *)); +} + +// PR 50066 +typedef unsigned char uchar; + +void Double(double); + +int main() { + double d = 2.34; + int i[2]; + Double(d); + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + *(long long *)i = + // level123-note@-1{{not alias compatible}} + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + *(long long *) + // level123-note@-1{{not alias compatible}} + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ((int *))[0] = i[0]; + // level123-note@-1{{not
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
https://github.com/urnathan updated https://github.com/llvm/llvm-project/pull/74155 >From 89c05618bb75fd073343046f3b54bde5f2254719 Mon Sep 17 00:00:00 2001 From: Nathan Sidwell Date: Wed, 15 Nov 2023 15:27:16 -0500 Subject: [PATCH 1/6] [clang] Strict aliasing warning ala GCC [PR50066] This implements -Wstrict-aliasing(=[123])? along the same lines as GCC. It's not 100% the same for reasons expanded on below. The default is level 3, and I have verified that bootstrapping does not trigger any warnings (just like building with GCC). As with GCC, higher levels result in fewer warnings, reducing the number of false positives at the cost of missing (some) potential cases. Unlike GCC, this is entirely in the FE, we do not propagate any checking into the IR (so there are cases GCC will detect we do not, I have not encountered any). GCC's documentation is not very specific about which cases are detected. I examined GCC's source code to reverse engineer the algorithm[*], and as can be seen from the testcases, noted GCC's behaviour. The strict aliasing check relies on TBAA. LLVM's representation is semantically different to GCCs for structured types. I have tried to keep with the spirit of the warning. The warning checks reinterpret_casts that are CK_BitCast or CK_LValueBitCast. It also checks C-style casts that are equivalent (to the extent available, as a C-style bitcast could be a well-formed static cast). level=1 looks for reinterpret casts from T * to U * (or an lvalue of type T to U &), where T and U are not TBAA compatible. level=2 requires the src expression be an lvalue something of known(ish) static type. I.e a variable, array dereference or member access. level=3 requires level 2 and that the resultant pointer is actually referenced (lvalue->rvalue or lvalue written). Here we can do better than GCC, which doesn't represent this in the IR -- we merely get a dereference (and reference types are modeled as pointers with auto-dereference semantics). There is one exception to this, which is by-value aggregate arguments. These end up invoking a copy constructor (passing a reference of course), but are IMHO morally an rvalue -- so should trigger. The warning hooks into clang's code-generator's TBAA machinery. For scalar types the TBAA is straight forwards, comparing LLVM's MDNode representaion. For record & union types we check if one of the types is (recursively) the same (or TBAA compatible) as the first direct base or a field(s) of the record at offset zero (i.e. the first member of a record, or any members of a union). This covers both C++ and C-Style inheritance. Also. the member maybe alias_any, or in ubiquitous-char's alias set, which is also permissible. The warning is similar to the alignment check that CheckCompatibleReinterpretCast does, perhaps some merging could occur if this is accepted? [*] I implemented what turned into GCC's level=1 way back when. WIP: adjust tests --- clang/include/clang/AST/ASTConsumer.h | 25 + clang/include/clang/Basic/DiagnosticGroups.td |6 - clang/include/clang/Basic/DiagnosticOptions.h |4 + .../clang/Basic/DiagnosticSemaKinds.td|8 + clang/include/clang/Driver/Options.td |6 + clang/include/clang/Sema/Sema.h | 11 + clang/lib/CodeGen/BackendConsumer.h |1 + clang/lib/CodeGen/CodeGenAction.cpp |4 + clang/lib/CodeGen/CodeGenModule.h |1 + clang/lib/CodeGen/CodeGenTBAA.cpp | 134 ++ clang/lib/CodeGen/CodeGenTBAA.h |5 +- clang/lib/CodeGen/ModuleBuilder.cpp |2 + clang/lib/Frontend/CompilerInvocation.cpp |3 + clang/lib/Frontend/FrontendAction.cpp |8 + clang/lib/Sema/SemaCast.cpp | 139 ++ clang/lib/Sema/SemaExpr.cpp | 17 +- clang/lib/Sema/SemaExprMember.cpp |2 + clang/lib/Sema/SemaInit.cpp |5 + clang/test/Misc/warning-flags-tree.c |4 - clang/test/Sema/strict-aliasing-warn.c| 192 +++ clang/test/SemaCXX/strict-aliasing-warn.cpp | 1375 + 21 files changed, 1939 insertions(+), 13 deletions(-) create mode 100644 clang/test/Sema/strict-aliasing-warn.c create mode 100644 clang/test/SemaCXX/strict-aliasing-warn.cpp diff --git a/clang/include/clang/AST/ASTConsumer.h b/clang/include/clang/AST/ASTConsumer.h index ebcd8059284d8..d5731ed6adf63 100644 --- a/clang/include/clang/AST/ASTConsumer.h +++ b/clang/include/clang/AST/ASTConsumer.h @@ -21,6 +21,7 @@ namespace clang { class DeclGroupRef; class ASTMutationListener; class ASTDeserializationListener; // layering violation because void* is ugly + class QualType; class SemaConsumer; // layering violation required for safe SemaConsumer class TagDecl; class VarDecl; @@ -37,6 +38,27 @@ class ASTConsumer { friend class SemaConsumer; +public: + /// Allow type-based aliasing information to be interrogated by
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -128,6 +128,10 @@ class DiagnosticOptions : public RefCountedBase{ /// whether -Wsystem-headers is enabled on a per-module basis. std::vector SystemHeaderWarningsModules; + /// Level of scrutiny reinterpret_casts get for type-unsafe aliasing + /// checks. Requires an ASTConsumer that provides TBAA information. + unsigned StrictAliasing; urnathan wrote: I was misled by the -Wundef-prefix pattern that I followed. Should have looked closer and found the DiagnosticOptions.def file. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -498,3 +498,137 @@ CodeGenTBAA::mergeTBAAInfoForMemoryTransfer(TBAAAccessInfo DestInfo, // access type regardless of their base types. return TBAAAccessInfo::getMayAliasInfo(); } + +// Determine the aliasing kind bit-converting from type Src to type Dst. +CodeGenTBAA::AliasingKind CodeGenTBAA::getAliasingKind(QualType , + QualType ) { + assert(!Src->isVoidType() && !Dst->isVoidType()); + if (TypeHasMayAlias(Src) || TypeHasMayAlias(Dst)) +return AK_Ok; + + Src = QualType{Src->getBaseElementTypeUnsafe(), 0}; + Dst = QualType{Dst->getBaseElementTypeUnsafe(), 0}; + + auto *SrcDecl = Src->getAsRecordDecl(); + auto *DstDecl = Dst->getAsRecordDecl(); + + const llvm::MDNode *AnyTBAA = getChar(); + const llvm::MDNode *SrcTBAA = nullptr; + const llvm::MDNode *DstTBAA = nullptr; + + if (!SrcDecl) { +SrcTBAA = getTypeInfo(Src); +if (!SrcTBAA || SrcTBAA == AnyTBAA) Endilll wrote: Are you sure `signed char` is allowed to alias everything, like `char` and `unsigned char`? It's not there in the strict aliasing rule. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -37,6 +38,27 @@ class ASTConsumer { friend class SemaConsumer; +public: + /// Allow type-based aliasing information to be interrogated by the AST + /// producer (for diagnostics). + class TypeAliasing { Endilll wrote: Sorry I didn't make myself clear enough. I thought that `TypeAliasing` exists for the sole purpose of providing a scope for unscoped enumeration, so my idea was that this whole class can be replaces with a scoped enum. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -498,3 +498,137 @@ CodeGenTBAA::mergeTBAAInfoForMemoryTransfer(TBAAAccessInfo DestInfo, // access type regardless of their base types. return TBAAAccessInfo::getMayAliasInfo(); } + +// Determine the aliasing kind bit-converting from type Src to type Dst. +CodeGenTBAA::AliasingKind CodeGenTBAA::getAliasingKind(QualType , + QualType ) { + assert(!Src->isVoidType() && !Dst->isVoidType()); + if (TypeHasMayAlias(Src) || TypeHasMayAlias(Dst)) +return AK_Ok; + + Src = QualType{Src->getBaseElementTypeUnsafe(), 0}; + Dst = QualType{Dst->getBaseElementTypeUnsafe(), 0}; + + auto *SrcDecl = Src->getAsRecordDecl(); + auto *DstDecl = Dst->getAsRecordDecl(); + + const llvm::MDNode *AnyTBAA = getChar(); + const llvm::MDNode *SrcTBAA = nullptr; + const llvm::MDNode *DstTBAA = nullptr; + + if (!SrcDecl) { +SrcTBAA = getTypeInfo(Src); +if (!SrcTBAA || SrcTBAA == AnyTBAA) + return AK_Ok; + } + if (!DstDecl) { +DstTBAA = getTypeInfo(Dst); +if (!DstTBAA || DstTBAA == AnyTBAA) + return AK_Ok; + } + + auto IsAncestor = [](const llvm::MDNode *Ancestor, + const llvm::MDNode *Descendant) { +assert(Ancestor != Descendant && "Identical TBAA"); +while (Descendant->getNumOperands() != 1) { + Descendant = cast(Descendant->getOperand(1)); + if (Descendant == Ancestor) +return true; +} +return false; + }; + + assert(SrcTBAA != AnyTBAA && DstTBAA != AnyTBAA && Endilll wrote: Sure, I'm not insisting. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
AaronBallman wrote: > > FWIW the GCC doc is > > https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstrict-aliasing_003dn > > It says for Level 3 "If optimization is enabled, it also runs in the back > > end, where it deals with multiple statement cases using flow-sensitive > > points-to information." > > Do you know how it works? Any example? > > I do not now how it works -- didn't go poking there. Neither do I have > examples. My understanding (which could be totally wrong) is that the logic for when to emit the level 3 diagnostics rests mostly in the optimizer passes. This is not something I think Clang should emulate as it creates a very poor user experience in practice (not just with strict aliasing diagnostics -- I don't think *any* diagnostics other than remarks should be emitted based on LLVM optimization decisions aside from the `error` and `warning` attributes which are special circumstances). https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
https://github.com/urnathan edited https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
urnathan wrote: > Thank you for your efforts! I scratched the surface a bit; not qualified to > do an in-depth review. Can you also add a release note? Thanks for your comments. A release note is added. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
urnathan wrote: > FWIW the GCC doc is > https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstrict-aliasing_003dn > It says for Level 3 "If optimization is enabled, it also runs in the back > end, where it deals with multiple statement cases using flow-sensitive > points-to information." > > Do you know how it works? Any example? I do not now how it works -- didn't go poking there. Neither do I have examples. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
urnathan wrote: > Making Sema pull the TBAA info out of clang/lib/CodeGen is a layering > violation (and probably breaks if we aren't actually generating code). If we > need some notion of "aliasing" in Sema, we should pull the relevant code into > clang/lib/AST. That's unfortunate. The code will not call the TBAA stuff if the code generator doesn't provide that predicate -- so if not generating optimized code the warning is inactive. This is the same as with GCC btw. To be clear what's exposed is a new predicate allowing querying of type conversion's TBAAness -- the representations of TBAA etc are not exposed. The CodeGen TBAA machinery builds on llvm's aliasing MDNodes. It would seem a large task to replicate that in AST (and I presume propagating llvm's bits further into AST is even worse?) > I don't have any experience with this warning, so I'm not sure how effective > it is at detecting issues; do a significant fraction of issues actually show > up at a pure syntactic level? Do you have any idea what false positive rate > we should expect? I think at level 3 the false positive rate is very low. Given GCC's extended this over the years from the simple check I had (IIRC any reinterpret_cast between pointers/references to TBAA incompatible types) to the semantics it has now suggests the warning is useful. and the level 3 default is satisfactory. But I don't have hard data. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -2026,6 +2027,137 @@ static TryCastResult TryConstCast(Sema , ExprResult , return TC_Success; } +// We're dereferencing E, either by turning into an RValue, or by dereferencing +// it. Check whether it's a deref of a reinterpret cast that has aliasing +// issues. +void Sema::CheckStrictAliasingDeref(Expr const *E, bool IsLValue) { + if (Diags.getDiagnosticOptions().StrictAliasing < 3) +return; + + assert(IsLValue || E->getType()->isAnyPointerType()); + CastExpr const *CE = nullptr; + for (;;) { +CE = dyn_cast(E->IgnoreParens()); +if (!CE) + return; + +if (IsLValue || CE->getCastKind() != CK_ArrayToPointerDecay) + break; + +E = CE->getSubExpr(); +IsLValue = true; + } + + if (CE->getCastKind() != (IsLValue ? CK_LValueBitCast : CK_BitCast)) +return; + + if (CE->getSubExpr()->getType()->isVoidPointerType()) +return; + + QualType DestTy = CE->getType(); + if (!IsLValue) +DestTy = DestTy->getPointeeType(); + + CheckStrictAliasing(CE->getSubExpr(), DestTy, IsLValue, CE->getSourceRange()); +} + +/// We're building a cast from E to pointer type DestType. If ISLValueCast is +/// true, DestType is the pointer equivalent of the reference type we're casting +/// to. +void Sema::CheckStrictAliasingCast(Expr const *E, QualType DestType, + bool IsLValueCast, SourceRange Range) { + if (Diags.getDiagnosticOptions().StrictAliasing < 1 || urnathan wrote: Heh, that's amusing :) this started as a switch, and then changed when I found the switch to be line noise.. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -498,3 +498,137 @@ CodeGenTBAA::mergeTBAAInfoForMemoryTransfer(TBAAAccessInfo DestInfo, // access type regardless of their base types. return TBAAAccessInfo::getMayAliasInfo(); } + +// Determine the aliasing kind bit-converting from type Src to type Dst. +CodeGenTBAA::AliasingKind CodeGenTBAA::getAliasingKind(QualType , + QualType ) { + assert(!Src->isVoidType() && !Dst->isVoidType()); + if (TypeHasMayAlias(Src) || TypeHasMayAlias(Dst)) +return AK_Ok; + + Src = QualType{Src->getBaseElementTypeUnsafe(), 0}; + Dst = QualType{Dst->getBaseElementTypeUnsafe(), 0}; + + auto *SrcDecl = Src->getAsRecordDecl(); + auto *DstDecl = Dst->getAsRecordDecl(); + + const llvm::MDNode *AnyTBAA = getChar(); + const llvm::MDNode *SrcTBAA = nullptr; + const llvm::MDNode *DstTBAA = nullptr; + + if (!SrcDecl) { +SrcTBAA = getTypeInfo(Src); +if (!SrcTBAA || SrcTBAA == AnyTBAA) + return AK_Ok; + } + if (!DstDecl) { +DstTBAA = getTypeInfo(Dst); +if (!DstTBAA || DstTBAA == AnyTBAA) + return AK_Ok; + } + + auto IsAncestor = [](const llvm::MDNode *Ancestor, + const llvm::MDNode *Descendant) { +assert(Ancestor != Descendant && "Identical TBAA"); +while (Descendant->getNumOperands() != 1) { + Descendant = cast(Descendant->getOperand(1)); + if (Descendant == Ancestor) +return true; +} +return false; + }; + + assert(SrcTBAA != AnyTBAA && DstTBAA != AnyTBAA && urnathan wrote: Ah, I found the current placement better -- as a reminder to the code below that that particular condition did not need to be handled. The assert got added explicitly when I needed that reminder! https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -498,3 +498,137 @@ CodeGenTBAA::mergeTBAAInfoForMemoryTransfer(TBAAAccessInfo DestInfo, // access type regardless of their base types. return TBAAAccessInfo::getMayAliasInfo(); } + +// Determine the aliasing kind bit-converting from type Src to type Dst. +CodeGenTBAA::AliasingKind CodeGenTBAA::getAliasingKind(QualType , + QualType ) { + assert(!Src->isVoidType() && !Dst->isVoidType()); + if (TypeHasMayAlias(Src) || TypeHasMayAlias(Dst)) +return AK_Ok; + + Src = QualType{Src->getBaseElementTypeUnsafe(), 0}; + Dst = QualType{Dst->getBaseElementTypeUnsafe(), 0}; + + auto *SrcDecl = Src->getAsRecordDecl(); + auto *DstDecl = Dst->getAsRecordDecl(); + + const llvm::MDNode *AnyTBAA = getChar(); + const llvm::MDNode *SrcTBAA = nullptr; + const llvm::MDNode *DstTBAA = nullptr; + + if (!SrcDecl) { +SrcTBAA = getTypeInfo(Src); +if (!SrcTBAA || SrcTBAA == AnyTBAA) urnathan wrote: Yeah, I didn't because it was a class member, but scoped is better. done. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
https://github.com/urnathan updated https://github.com/llvm/llvm-project/pull/74155 >From 89c05618bb75fd073343046f3b54bde5f2254719 Mon Sep 17 00:00:00 2001 From: Nathan Sidwell Date: Wed, 15 Nov 2023 15:27:16 -0500 Subject: [PATCH 1/5] [clang] Strict aliasing warning ala GCC [PR50066] This implements -Wstrict-aliasing(=[123])? along the same lines as GCC. It's not 100% the same for reasons expanded on below. The default is level 3, and I have verified that bootstrapping does not trigger any warnings (just like building with GCC). As with GCC, higher levels result in fewer warnings, reducing the number of false positives at the cost of missing (some) potential cases. Unlike GCC, this is entirely in the FE, we do not propagate any checking into the IR (so there are cases GCC will detect we do not, I have not encountered any). GCC's documentation is not very specific about which cases are detected. I examined GCC's source code to reverse engineer the algorithm[*], and as can be seen from the testcases, noted GCC's behaviour. The strict aliasing check relies on TBAA. LLVM's representation is semantically different to GCCs for structured types. I have tried to keep with the spirit of the warning. The warning checks reinterpret_casts that are CK_BitCast or CK_LValueBitCast. It also checks C-style casts that are equivalent (to the extent available, as a C-style bitcast could be a well-formed static cast). level=1 looks for reinterpret casts from T * to U * (or an lvalue of type T to U &), where T and U are not TBAA compatible. level=2 requires the src expression be an lvalue something of known(ish) static type. I.e a variable, array dereference or member access. level=3 requires level 2 and that the resultant pointer is actually referenced (lvalue->rvalue or lvalue written). Here we can do better than GCC, which doesn't represent this in the IR -- we merely get a dereference (and reference types are modeled as pointers with auto-dereference semantics). There is one exception to this, which is by-value aggregate arguments. These end up invoking a copy constructor (passing a reference of course), but are IMHO morally an rvalue -- so should trigger. The warning hooks into clang's code-generator's TBAA machinery. For scalar types the TBAA is straight forwards, comparing LLVM's MDNode representaion. For record & union types we check if one of the types is (recursively) the same (or TBAA compatible) as the first direct base or a field(s) of the record at offset zero (i.e. the first member of a record, or any members of a union). This covers both C++ and C-Style inheritance. Also. the member maybe alias_any, or in ubiquitous-char's alias set, which is also permissible. The warning is similar to the alignment check that CheckCompatibleReinterpretCast does, perhaps some merging could occur if this is accepted? [*] I implemented what turned into GCC's level=1 way back when. WIP: adjust tests --- clang/include/clang/AST/ASTConsumer.h | 25 + clang/include/clang/Basic/DiagnosticGroups.td |6 - clang/include/clang/Basic/DiagnosticOptions.h |4 + .../clang/Basic/DiagnosticSemaKinds.td|8 + clang/include/clang/Driver/Options.td |6 + clang/include/clang/Sema/Sema.h | 11 + clang/lib/CodeGen/BackendConsumer.h |1 + clang/lib/CodeGen/CodeGenAction.cpp |4 + clang/lib/CodeGen/CodeGenModule.h |1 + clang/lib/CodeGen/CodeGenTBAA.cpp | 134 ++ clang/lib/CodeGen/CodeGenTBAA.h |5 +- clang/lib/CodeGen/ModuleBuilder.cpp |2 + clang/lib/Frontend/CompilerInvocation.cpp |3 + clang/lib/Frontend/FrontendAction.cpp |8 + clang/lib/Sema/SemaCast.cpp | 139 ++ clang/lib/Sema/SemaExpr.cpp | 17 +- clang/lib/Sema/SemaExprMember.cpp |2 + clang/lib/Sema/SemaInit.cpp |5 + clang/test/Misc/warning-flags-tree.c |4 - clang/test/Sema/strict-aliasing-warn.c| 192 +++ clang/test/SemaCXX/strict-aliasing-warn.cpp | 1375 + 21 files changed, 1939 insertions(+), 13 deletions(-) create mode 100644 clang/test/Sema/strict-aliasing-warn.c create mode 100644 clang/test/SemaCXX/strict-aliasing-warn.cpp diff --git a/clang/include/clang/AST/ASTConsumer.h b/clang/include/clang/AST/ASTConsumer.h index ebcd8059284d8..d5731ed6adf63 100644 --- a/clang/include/clang/AST/ASTConsumer.h +++ b/clang/include/clang/AST/ASTConsumer.h @@ -21,6 +21,7 @@ namespace clang { class DeclGroupRef; class ASTMutationListener; class ASTDeserializationListener; // layering violation because void* is ugly + class QualType; class SemaConsumer; // layering violation required for safe SemaConsumer class TagDecl; class VarDecl; @@ -37,6 +38,27 @@ class ASTConsumer { friend class SemaConsumer; +public: + /// Allow type-based aliasing information to be interrogated by
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
MaskRay wrote: Add @shafik who authors https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8 ("Although clang allows these flags it apparently does not actually implement the warnings") > [*] I implemented what turned into GCC's level=1 way back when. Cool! FWIW the GCC doc is https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstrict-aliasing_003dn It says for Level 3 "If optimization is enabled, it also runs in the back end, where it deals with multiple statement cases using flow-sensitive points-to information." Do you know how it works? Any example? https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -0,0 +1,192 @@ +// RUN: %clang_cc1 %s -O0 -Wstrict-aliasing -S -o %t -verify=quiet +// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=0 -S -o %t -verify=quiet +// RUN: %clang_cc1 %s -O2 -Wno-strict-aliasing -S -o %t -verify=quiet +// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=1 -S -o %t -verify=level1,level12,level123 +// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=2 -S -o %t -verify=level2,level23,level12,level123 +// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=3 -S -o %t -verify=level23,level123 +// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing -S -o %t -verify=level23,level123 +// RUN: %clang_cc1 %s -O2 -S -o %t -verify=level23,level123 + +// quiet-no-diagnostics + +#if _LP64 +// These names make more sense on an ilp32 machine +typedef long INT; +typedef long long LONG; +typedef unsigned long UINT; +#else +typedef int INT; +typedef long LONG; +typedef unsigned int UINT; +#endif +typedef short SHORT; + +INT ScalarINT; +INT Ary[2]; +struct {int m;} Struct; + +_Complex int CPLx; + +void ByVal(long long); +void ByPtr(void *); + +void VarPtr(INT *Ptr) { + // GCC: 1 + // level1-warning@+1{{type-punned pointer might break}} + ByPtr((LONG *)(Ptr)); + // level1-note@-1{{not alias compatible}} + + // GCC: + ByPtr((LONG *)((void *)(Ptr))); + + // GCC: 1 + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*(LONG *)(Ptr)); + // level1-note@-1{{not alias compatible}} +} + +void Object() { + // GCC: 1, 2 + // level2-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByPtr((LONG *)()); + // level12-note@-1{{not alias compatible}} + + // GCC: + ByPtr((LONG *)((void *)())); + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*(LONG *)()); + // level123-note@-1{{not alias compatible}} +} + +// Level 1, 2, 3 - 1, 2, 3 +void DetectedVariants() { + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*(LONG *)([1])); + // level123-note@-1{{not alias compatible}} + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*(LONG *)()); + // level123-note@-1{{not alias compatible}} + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*(LONG *)(&()->m)); + // level123-note@-1{{not alias compatible}} + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*(LONG *)(&__real__(CPLx))); + // level123-note@-1{{not alias compatible}} + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*(LONG *)(&__imag__(CPLx))); + // level123-note@-1{{not alias compatible}} +} + +void Ok() { + // GCC: + ByPtr((UINT *)()); + // GCC: + ByPtr((UINT *)((void *)())); + // GCC: + ByVal(*(UINT *)()); +} + +// Level 1, 2, 3 - 1, 2, 3 +void Parens() { + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*((LONG *)((&(ScalarINT); + // level123-note@-1{{not alias compatible}} + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ByVal(*((LONG *)((&(Ary[1]); + // level123-note@-1{{not alias compatible}} +} + +// Clang models may_alias as a decl attribute, not a type attribute. + +typedef int MA __attribute__((may_alias)); + +void Frob(MA *a) { + ByPtr((short *)(a)); + ByVal(*(short *)(a)); +} + +struct Inner { int m; }; +struct Outer1 { struct Inner i; }; +struct Outer2 { struct Outer1 o; }; +struct Inner i; +struct Outer2 o; + +void ByValInner (struct Inner); +void ByValOuter2 (struct Outer2); + +void Inherit() { + // Check we see through multiple levels + int in; + + ByValOuter2(*(struct Outer2 *)); + ByValOuter2(*(struct Outer2 *)); + ByValInner(*(struct Inner *)); + ByValInner(*(struct Inner *)); + ByVal(*(int *)); +} + +// PR 50066 +typedef unsigned char uchar; + +void Double(double); + +int main() { + double d = 2.34; + int i[2]; + Double(d); + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + *(long long *)i = + // level123-note@-1{{not alias compatible}} + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + *(long long *) + // level123-note@-1{{not alias compatible}} + + // GCC: 1, 2, 3 + // level23-warning@+2{{type-punned pointer breaks}} + // level1-warning@+1{{type-punned pointer might break}} + ((int *))[0] = i[0]; + // level123-note@-1{{not
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
efriedma-quic wrote: Making Sema pull the TBAA info out of clang/lib/CodeGen is a layering violation (and probably breaks if we aren't actually generating code). If we need some notion of "aliasing" in Sema, we should pull the relevant code into clang/lib/AST. I don't have any experience with this warning, so I'm not sure how effective it is at detecting issues; do a significant fraction of issues actually show up at a pure syntactic level? Do you have any idea what false positive rate we should expect? https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -128,6 +128,10 @@ class DiagnosticOptions : public RefCountedBase{ /// whether -Wsystem-headers is enabled on a per-module basis. std::vector SystemHeaderWarningsModules; + /// Level of scrutiny reinterpret_casts get for type-unsafe aliasing + /// checks. Requires an ASTConsumer that provides TBAA information. + unsigned StrictAliasing; Endilll wrote: Wouldn't a small bit-field suffice here? https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -498,3 +498,137 @@ CodeGenTBAA::mergeTBAAInfoForMemoryTransfer(TBAAAccessInfo DestInfo, // access type regardless of their base types. return TBAAAccessInfo::getMayAliasInfo(); } + +// Determine the aliasing kind bit-converting from type Src to type Dst. +CodeGenTBAA::AliasingKind CodeGenTBAA::getAliasingKind(QualType , + QualType ) { + assert(!Src->isVoidType() && !Dst->isVoidType()); + if (TypeHasMayAlias(Src) || TypeHasMayAlias(Dst)) +return AK_Ok; + + Src = QualType{Src->getBaseElementTypeUnsafe(), 0}; + Dst = QualType{Dst->getBaseElementTypeUnsafe(), 0}; + + auto *SrcDecl = Src->getAsRecordDecl(); + auto *DstDecl = Dst->getAsRecordDecl(); + + const llvm::MDNode *AnyTBAA = getChar(); + const llvm::MDNode *SrcTBAA = nullptr; + const llvm::MDNode *DstTBAA = nullptr; + + if (!SrcDecl) { +SrcTBAA = getTypeInfo(Src); +if (!SrcTBAA || SrcTBAA == AnyTBAA) + return AK_Ok; + } + if (!DstDecl) { +DstTBAA = getTypeInfo(Dst); +if (!DstTBAA || DstTBAA == AnyTBAA) + return AK_Ok; + } + + auto IsAncestor = [](const llvm::MDNode *Ancestor, + const llvm::MDNode *Descendant) { +assert(Ancestor != Descendant && "Identical TBAA"); +while (Descendant->getNumOperands() != 1) { + Descendant = cast(Descendant->getOperand(1)); + if (Descendant == Ancestor) +return true; +} +return false; + }; + + assert(SrcTBAA != AnyTBAA && DstTBAA != AnyTBAA && Endilll wrote: I think this assert should be placed closer to where `AnyTBAA` is handled, before `IsAncestor` definition. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -37,6 +38,27 @@ class ASTConsumer { friend class SemaConsumer; +public: + /// Allow type-based aliasing information to be interrogated by the AST + /// producer (for diagnostics). + class TypeAliasing { Endilll wrote: It seems to me that we'd be better off with `AliasingKind` being scoped enum. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -2026,6 +2027,137 @@ static TryCastResult TryConstCast(Sema , ExprResult , return TC_Success; } +// We're dereferencing E, either by turning into an RValue, or by dereferencing +// it. Check whether it's a deref of a reinterpret cast that has aliasing +// issues. +void Sema::CheckStrictAliasingDeref(Expr const *E, bool IsLValue) { + if (Diags.getDiagnosticOptions().StrictAliasing < 3) +return; + + assert(IsLValue || E->getType()->isAnyPointerType()); + CastExpr const *CE = nullptr; + for (;;) { +CE = dyn_cast(E->IgnoreParens()); +if (!CE) + return; + +if (IsLValue || CE->getCastKind() != CK_ArrayToPointerDecay) + break; + +E = CE->getSubExpr(); +IsLValue = true; + } + + if (CE->getCastKind() != (IsLValue ? CK_LValueBitCast : CK_BitCast)) +return; + + if (CE->getSubExpr()->getType()->isVoidPointerType()) +return; + + QualType DestTy = CE->getType(); + if (!IsLValue) +DestTy = DestTy->getPointeeType(); + + CheckStrictAliasing(CE->getSubExpr(), DestTy, IsLValue, CE->getSourceRange()); +} + +/// We're building a cast from E to pointer type DestType. If ISLValueCast is +/// true, DestType is the pointer equivalent of the reference type we're casting +/// to. +void Sema::CheckStrictAliasingCast(Expr const *E, QualType DestType, + bool IsLValueCast, SourceRange Range) { + if (Diags.getDiagnosticOptions().StrictAliasing < 1 || Endilll wrote: Switch might be a better fit here. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
@@ -498,3 +498,137 @@ CodeGenTBAA::mergeTBAAInfoForMemoryTransfer(TBAAAccessInfo DestInfo, // access type regardless of their base types. return TBAAAccessInfo::getMayAliasInfo(); } + +// Determine the aliasing kind bit-converting from type Src to type Dst. +CodeGenTBAA::AliasingKind CodeGenTBAA::getAliasingKind(QualType , + QualType ) { + assert(!Src->isVoidType() && !Dst->isVoidType()); + if (TypeHasMayAlias(Src) || TypeHasMayAlias(Dst)) +return AK_Ok; + + Src = QualType{Src->getBaseElementTypeUnsafe(), 0}; + Dst = QualType{Dst->getBaseElementTypeUnsafe(), 0}; + + auto *SrcDecl = Src->getAsRecordDecl(); + auto *DstDecl = Dst->getAsRecordDecl(); + + const llvm::MDNode *AnyTBAA = getChar(); + const llvm::MDNode *SrcTBAA = nullptr; + const llvm::MDNode *DstTBAA = nullptr; + + if (!SrcDecl) { +SrcTBAA = getTypeInfo(Src); +if (!SrcTBAA || SrcTBAA == AnyTBAA) Endilll wrote: Does this check take into account all 3 types (`char`, `unsigned char`, `std::byte`) listed in [basic.lval#11.3](http://eel.is/c++draft/basic.lval#11.3)? Same goes for the similar `DstTBAA` check below. https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
https://github.com/Endilll edited https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
https://github.com/Endilll commented: Thank you for your efforts! I scratched the surface a bit; not qualified to do an in-depth review. Can you also add a release note? https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
https://github.com/urnathan edited https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
llvmbot wrote: @llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-codegen Author: Nathan Sidwell (urnathan) Changes This implements -Wstrict-aliasing(=[123])? along the same lines as GCC. It's not 100% the same for reasons expanded on below. The default is level 3, and I have verified that bootstrapping does not trigger any warnings (just like building with GCC). As with GCC, higher levels result in fewer warnings, reducing the number of false positives at the cost of missing (some) potential cases. Unlike GCC, this is entirely in the FE, we do not propagate any checking into the IR (so there are cases GCC will detect we do not, I have not encountered any). GCC's documentation is not very specific about which cases are detected. I examined GCC's source code to reverse engineer the algorithm[*], and as can be seen from the testcases, noted GCC's behaviour. The strict aliasing check relies on TBAA. LLVM's representation is semantically different to GCCs for structured types. I have tried to keep with the spirit of the warning. The warning checks reinterpret_casts that are CK_BitCast or CK_LValueBitCast. It also checks C-style casts that are equivalent (to the extent available, as a C-style bitcast could be a well-formed static cast). level=1 looks for reinterpret casts from T * to U * (or an lvalue of type T to U ), where T and U are not TBAA compatible. level=2 requires the src expression be an lvalue something of known(ish) static type. I.e a variable, array dereference or member access. level=3 requires level 2 and that the resultant pointer is actually referenced (lvalue-rvalue or lvalue written). Here we can do better than GCC, which doesn't represent this in the IR -- we merely get a dereference (and reference types are modeled as pointers with auto-dereference semantics). There is one exception to this, which is by-value aggregate arguments. These end up invoking a copy constructor (passing a reference of course), but are IMHO morally an rvalue -- so should trigger. The warning hooks into clang's code-generator's TBAA machinery. For scalar types the TBAA is straight forwards, comparing LLVM's MDNode representaion. For record union types we check if one of the types is (recursively) the same (or TBAA compatible) as the first direct base or a field(s) of the record at offset zero (i.e. the first member of a record, or any members of a union). This covers both C++ and C-Style inheritance. Also. the member maybe alias_any, or in ubiquitous-char's alias set, which is also permissible. The warning is similar to the alignment check that CheckCompatibleReinterpretCast does, perhaps some merging could occur if this is accepted? [*] I implemented what turned into GCC's level=1 way back when. WIP: adjust tests --- Patch is 77.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/74155.diff 21 Files Affected: - (modified) clang/include/clang/AST/ASTConsumer.h (+25) - (modified) clang/include/clang/Basic/DiagnosticGroups.td (-6) - (modified) clang/include/clang/Basic/DiagnosticOptions.h (+4) - (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+8) - (modified) clang/include/clang/Driver/Options.td (+6) - (modified) clang/include/clang/Sema/Sema.h (+11) - (modified) clang/lib/CodeGen/BackendConsumer.h (+1) - (modified) clang/lib/CodeGen/CodeGenAction.cpp (+4) - (modified) clang/lib/CodeGen/CodeGenModule.h (+1) - (modified) clang/lib/CodeGen/CodeGenTBAA.cpp (+134) - (modified) clang/lib/CodeGen/CodeGenTBAA.h (+4-1) - (modified) clang/lib/CodeGen/ModuleBuilder.cpp (+2) - (modified) clang/lib/Frontend/CompilerInvocation.cpp (+3) - (modified) clang/lib/Frontend/FrontendAction.cpp (+8) - (modified) clang/lib/Sema/SemaCast.cpp (+139) - (modified) clang/lib/Sema/SemaExpr.cpp (+15-2) - (modified) clang/lib/Sema/SemaExprMember.cpp (+2) - (modified) clang/lib/Sema/SemaInit.cpp (+5) - (modified) clang/test/Misc/warning-flags-tree.c (-4) - (added) clang/test/Sema/strict-aliasing-warn.c (+192) - (added) clang/test/SemaCXX/strict-aliasing-warn.cpp (+1375) ``diff diff --git a/clang/include/clang/AST/ASTConsumer.h b/clang/include/clang/AST/ASTConsumer.h index ebcd8059284d8..d5731ed6adf63 100644 --- a/clang/include/clang/AST/ASTConsumer.h +++ b/clang/include/clang/AST/ASTConsumer.h @@ -21,6 +21,7 @@ namespace clang { class DeclGroupRef; class ASTMutationListener; class ASTDeserializationListener; // layering violation because void* is ugly + class QualType; class SemaConsumer; // layering violation required for safe SemaConsumer class TagDecl; class VarDecl; @@ -37,6 +38,27 @@ class ASTConsumer { friend class SemaConsumer; +public: + /// Allow type-based aliasing information to be interrogated by the AST + /// producer (for diagnostics). + class TypeAliasing { + public: +TypeAliasing() = default; +virtual ~TypeAliasing(){}; + + public: +enum
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
https://github.com/urnathan ready_for_review https://github.com/llvm/llvm-project/pull/74155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)
https://github.com/urnathan created https://github.com/llvm/llvm-project/pull/74155 This implements -Wstrict-aliasing(=[123])? along the same lines as GCC. It's not 100% the same for reasons expanded on below. The default is level 3, and I have verified that bootstrapping does not trigger any warnings (just like building with GCC). As with GCC, higher levels result in fewer warnings, reducing the number of false positives at the cost of missing (some) potential cases. Unlike GCC, this is entirely in the FE, we do not propagate any checking into the IR (so there are cases GCC will detect we do not, I have not encountered any). GCC's documentation is not very specific about which cases are detected. I examined GCC's source code to reverse engineer the algorithm[*], and as can be seen from the testcases, noted GCC's behaviour. The strict aliasing check relies on TBAA. LLVM's representation is semantically different to GCCs for structured types. I have tried to keep with the spirit of the warning. The warning checks reinterpret_casts that are CK_BitCast or CK_LValueBitCast. It also checks C-style casts that are equivalent (to the extent available, as a C-style bitcast could be a well-formed static cast). level=1 looks for reinterpret casts from T * to U * (or an lvalue of type T to U &), where T and U are not TBAA compatible. level=2 requires the src expression be an lvalue something of known(ish) static type. I.e a variable, array dereference or member access. level=3 requires level 2 and that the resultant pointer is actually referenced (lvalue->rvalue or lvalue written). Here we can do better than GCC, which doesn't represent this in the IR -- we merely get a dereference (and reference types are modeled as pointers with auto-dereference semantics). There is one exception to this, which is by-value aggregate arguments. These end up invoking a copy constructor (passing a reference of course), but are IMHO morally an rvalue -- so should trigger. The warning hooks into clang's code-generator's TBAA machinery. For scalar types the TBAA is straight forwards, comparing LLVM's MDNode representaion. For record & union types we check if one of the types is (recursively) the same (or TBAA compatible) as the first direct base or a field(s) of the record at offset zero (i.e. the first member of a record, or any members of a union). This covers both C++ and C-Style inheritance. Also. the member maybe alias_any, or in ubiquitous-char's alias set, which is also permissible. The warning is similar to the alignment check that CheckCompatibleReinterpretCast does, perhaps some merging could occur if this is accepted? [*] I implemented what turned into GCC's level=1 way back when. WIP: adjust tests >From 89c05618bb75fd073343046f3b54bde5f2254719 Mon Sep 17 00:00:00 2001 From: Nathan Sidwell Date: Wed, 15 Nov 2023 15:27:16 -0500 Subject: [PATCH] [clang] Strict aliasing warning ala GCC [PR50066] This implements -Wstrict-aliasing(=[123])? along the same lines as GCC. It's not 100% the same for reasons expanded on below. The default is level 3, and I have verified that bootstrapping does not trigger any warnings (just like building with GCC). As with GCC, higher levels result in fewer warnings, reducing the number of false positives at the cost of missing (some) potential cases. Unlike GCC, this is entirely in the FE, we do not propagate any checking into the IR (so there are cases GCC will detect we do not, I have not encountered any). GCC's documentation is not very specific about which cases are detected. I examined GCC's source code to reverse engineer the algorithm[*], and as can be seen from the testcases, noted GCC's behaviour. The strict aliasing check relies on TBAA. LLVM's representation is semantically different to GCCs for structured types. I have tried to keep with the spirit of the warning. The warning checks reinterpret_casts that are CK_BitCast or CK_LValueBitCast. It also checks C-style casts that are equivalent (to the extent available, as a C-style bitcast could be a well-formed static cast). level=1 looks for reinterpret casts from T * to U * (or an lvalue of type T to U &), where T and U are not TBAA compatible. level=2 requires the src expression be an lvalue something of known(ish) static type. I.e a variable, array dereference or member access. level=3 requires level 2 and that the resultant pointer is actually referenced (lvalue->rvalue or lvalue written). Here we can do better than GCC, which doesn't represent this in the IR -- we merely get a dereference (and reference types are modeled as pointers with auto-dereference semantics). There is one exception to this, which is by-value aggregate arguments. These end up invoking a copy constructor (passing a reference of course), but are IMHO morally an rvalue -- so should trigger. The warning hooks into clang's code-generator's TBAA machinery. For scalar types the TBAA is straight forwards, comparing LLVM's