[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2024-02-13 Thread Eli Friedman via cfe-commits

efriedma-quic wrote:

> IIUC you're suggesting movint the TBAA data origination from CodeGen into 
> (probably) Sema and/or TargetInfo and then deriving the LLVM info in CodeGen 
> from that. I.e. keep once source of truth, but move where it is?

Right; move the core computation to AST, and then the CodeGen bits would just 
translate that data to LLVM IR.

> Although you don't explicitly say it, the implication is you'd be ameanable 
> to such an approach?

TBAA categorization is semantic information, so I think it makes sense to make 
it accessible outside CodeGen, yes.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2024-02-13 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

> On the LLVM side, there's very little interesting logic; it's basically just 
> walking the tree of metadata nodes generated by clang. See 
> https://llvm.org/docs/LangRef.html#tbaa-node-semantics . The hard part of the 
> refactoring would just be adding an abstraction for the underlying 
> information.

IIUC you're suggesting movint the TBAA data origination from CodeGen into 
(probably) Sema and/or TargetInfo and then deriving the LLVM info in CodeGen 
from that.  I.e. keep once source of truth, but move where it is?

I don't think it'd be that hard -- mostly mechanical. And I suspect only the 
scalar TBAA needs so moving, the structure-related stuff is sufficiently 
different to that strict-aliasing wants.

Although you don't explicitly say it, the implication is you'd be ameanable to 
such an approach?

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-15 Thread Eli Friedman via cfe-commits

efriedma-quic wrote:

> > Making Sema pull the TBAA info out of clang/lib/CodeGen is a layering 
> > violation (and probably breaks if we aren't actually generating code). If 
> > we need some notion of "aliasing" in Sema, we should pull the relevant code 
> > into clang/lib/AST.
> 
> That's unfortunate. The code will not call the TBAA stuff if the code 
> generator doesn't provide that predicate -- so if not generating optimized 
> code the warning is inactive. This is the same as with GCC btw. To be clear 
> what's exposed is a new predicate allowing querying of type conversion's 
> TBAAness -- the representations of TBAA etc are not exposed. The CodeGen TBAA 
> machinery builds on llvm's aliasing MDNodes. It would seem a large task to 
> replicate that in AST (and I presume propagating llvm's bits further into AST 
> is even worse?)

On the LLVM side, there's very little interesting logic; it's basically just 
walking the tree of metadata nodes generated by clang.  See 
https://llvm.org/docs/LangRef.html#tbaa-node-semantics .  The hard part of the 
refactoring would just be adding an abstraction for the underlying information.

gcc has made various unfortunate decisions regarding diagnostics in the past; 
we've put a lot of effort into making sure our diagnostics are more consistent.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-15 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

> > > FWIW the GCC doc is 
> > > https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstrict-aliasing_003dn
> > >  It says for Level 3 "If optimization is enabled, it also runs in the 
> > > back end, where it deals with multiple statement cases using 
> > > flow-sensitive points-to information."
> > > Do you know how it works? Any example?
> > 
> > 
> > I do not now how it works -- didn't go poking there. Neither do I have 
> > examples.
> 
> My understanding (which could be totally wrong) is that the logic for when to 
> emit the level 3 diagnostics rests mostly in the optimizer passes. This is 
> not something I think Clang should emulate as it creates a very poor user 
> experience in practice (not just with strict aliasing diagnostics -- I don't 
> think _any_ diagnostics other than remarks should be emitted based on LLVM 
> optimization decisions aside from the `error` and `warning` attributes which 
> are special circumstances).

I don't know if it's 'mostly', the level 3 can certainly be triggered in the FE 
-- it requires (a) an address-of-object idiom, (b) a suspicious cast and (c) a 
dereference.  Something like `*reinterpret_cast() = 5`, where `obj` 
is  not compatible with `int`. That's what this patch implements. (In GCC's 
case when it warns about this in the FE it marks the tree as warned, to inhibit 
the middle end also doing so.)



https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-15 Thread Nathan Sidwell via cfe-commits


@@ -498,3 +498,137 @@ 
CodeGenTBAA::mergeTBAAInfoForMemoryTransfer(TBAAAccessInfo DestInfo,
   // access type regardless of their base types.
   return TBAAAccessInfo::getMayAliasInfo();
 }
+
+// Determine the aliasing kind bit-converting from type Src to type Dst.
+CodeGenTBAA::AliasingKind CodeGenTBAA::getAliasingKind(QualType ,
+   QualType ) {
+  assert(!Src->isVoidType() && !Dst->isVoidType());
+  if (TypeHasMayAlias(Src) || TypeHasMayAlias(Dst))
+return AK_Ok;
+
+  Src = QualType{Src->getBaseElementTypeUnsafe(), 0};
+  Dst = QualType{Dst->getBaseElementTypeUnsafe(), 0};
+
+  auto *SrcDecl = Src->getAsRecordDecl();
+  auto *DstDecl = Dst->getAsRecordDecl();
+
+  const llvm::MDNode *AnyTBAA = getChar();
+  const llvm::MDNode *SrcTBAA = nullptr;
+  const llvm::MDNode *DstTBAA = nullptr;
+
+  if (!SrcDecl) {
+SrcTBAA = getTypeInfo(Src);
+if (!SrcTBAA || SrcTBAA == AnyTBAA)

urnathan wrote:

Pedantically 'signed char' is not blessed with that property by the std.  But 
many compilers (llvm included) bestow it upon signed char.  Indeed this 
particular patch doesn't make a judgement on that, it just gets the aliasing 
info for 'signed char' and it happens to get the same ubiquitous char tbaa.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-15 Thread Nathan Sidwell via cfe-commits


@@ -37,6 +38,27 @@ class ASTConsumer {
 
   friend class SemaConsumer;
 
+public:
+  /// Allow type-based aliasing information to be interrogated by the AST
+  /// producer (for diagnostics).
+  class TypeAliasing {

urnathan wrote:

Oh, we also endup deriving from the TypeAliasing class to override the virtual 
function it provides.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-06 Thread Nathan Sidwell via cfe-commits


@@ -0,0 +1,192 @@
+// RUN: %clang_cc1 %s -O0 -Wstrict-aliasing -S -o %t -verify=quiet
+// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=0 -S -o %t -verify=quiet
+// RUN: %clang_cc1 %s -O2 -Wno-strict-aliasing -S -o %t -verify=quiet
+// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=1 -S -o %t 
-verify=level1,level12,level123
+// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=2 -S -o %t 
-verify=level2,level23,level12,level123
+// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=3 -S -o %t -verify=level23,level123
+// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing -S -o %t -verify=level23,level123
+// RUN: %clang_cc1 %s -O2 -S -o %t -verify=level23,level123
+
+// quiet-no-diagnostics
+
+#if _LP64
+// These names make more sense on an ilp32 machine
+typedef long INT;
+typedef long long LONG;
+typedef unsigned long UINT;
+#else
+typedef int INT;
+typedef long LONG;
+typedef unsigned int UINT;
+#endif
+typedef short SHORT;
+
+INT ScalarINT;
+INT Ary[2];
+struct {int m;} Struct;
+
+_Complex int CPLx;
+
+void ByVal(long long);
+void ByPtr(void *);
+
+void VarPtr(INT *Ptr) {
+  // GCC: 1
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByPtr((LONG *)(Ptr));
+  // level1-note@-1{{not alias compatible}}
+
+  // GCC:
+  ByPtr((LONG *)((void *)(Ptr)));
+
+  // GCC: 1
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*(LONG *)(Ptr));
+  // level1-note@-1{{not alias compatible}}
+}
+
+void Object() {
+  // GCC: 1, 2
+  // level2-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByPtr((LONG *)());
+  // level12-note@-1{{not alias compatible}}
+
+  // GCC:
+  ByPtr((LONG *)((void *)()));
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*(LONG *)());
+  // level123-note@-1{{not alias compatible}}
+}
+
+// Level 1, 2, 3 - 1, 2, 3
+void DetectedVariants() {
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*(LONG *)([1]));
+  // level123-note@-1{{not alias compatible}}
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*(LONG *)());
+  // level123-note@-1{{not alias compatible}}
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*(LONG *)(&()->m));
+  // level123-note@-1{{not alias compatible}}
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*(LONG *)(&__real__(CPLx)));
+  // level123-note@-1{{not alias compatible}}
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*(LONG *)(&__imag__(CPLx)));
+  // level123-note@-1{{not alias compatible}}
+}
+
+void Ok() {
+  // GCC:
+  ByPtr((UINT *)());
+  // GCC:
+  ByPtr((UINT *)((void *)()));
+  // GCC:
+  ByVal(*(UINT *)());
+}
+
+// Level 1, 2, 3 - 1, 2, 3
+void Parens() {
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*((LONG *)((&(ScalarINT);
+  // level123-note@-1{{not alias compatible}}
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*((LONG *)((&(Ary[1]);
+  // level123-note@-1{{not alias compatible}}
+}
+
+// Clang models may_alias as a decl attribute, not a type attribute.
+
+typedef int MA __attribute__((may_alias));
+
+void Frob(MA *a) {
+  ByPtr((short *)(a));
+  ByVal(*(short *)(a));
+}
+
+struct Inner { int m; };
+struct Outer1 { struct Inner i; };
+struct Outer2 { struct Outer1 o; };
+struct Inner i;
+struct Outer2 o;
+
+void ByValInner (struct Inner);
+void ByValOuter2 (struct Outer2);
+
+void Inherit() {
+  // Check we see through multiple levels
+  int in;
+
+  ByValOuter2(*(struct Outer2 *));
+  ByValOuter2(*(struct Outer2 *));
+  ByValInner(*(struct Inner *));
+  ByValInner(*(struct Inner *));
+  ByVal(*(int *));
+}
+
+// PR 50066
+typedef unsigned char uchar;
+
+void Double(double);
+
+int main() {
+  double d = 2.34;
+  int i[2];
+  Double(d);
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  *(long long *)i =
+  // level123-note@-1{{not alias compatible}}
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  *(long long *)
+  // level123-note@-1{{not alias compatible}}
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ((int *))[0] = i[0];
+  // level123-note@-1{{not 

[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-06 Thread Nathan Sidwell via cfe-commits

https://github.com/urnathan updated 
https://github.com/llvm/llvm-project/pull/74155

>From 89c05618bb75fd073343046f3b54bde5f2254719 Mon Sep 17 00:00:00 2001
From: Nathan Sidwell 
Date: Wed, 15 Nov 2023 15:27:16 -0500
Subject: [PATCH 1/6] [clang] Strict aliasing warning ala GCC [PR50066]

This implements -Wstrict-aliasing(=[123])? along the same lines as
GCC. It's not 100% the same for reasons expanded on below. The default
is level 3, and I have verified that bootstrapping does not trigger
any warnings (just like building with GCC).

As with GCC, higher levels result in fewer warnings, reducing the
number of false positives at the cost of missing (some) potential
cases. Unlike GCC, this is entirely in the FE, we do not propagate any
checking into the IR (so there are cases GCC will detect we do not, I
have not encountered any). GCC's documentation is not very specific
about which cases are detected. I examined GCC's source code to reverse
engineer the algorithm[*], and as can be seen from the testcases, noted
GCC's behaviour.

The strict aliasing check relies on TBAA. LLVM's representation is
semantically different to GCCs for structured types. I have tried to
keep with the spirit of the warning.

The warning checks reinterpret_casts that are CK_BitCast or
CK_LValueBitCast. It also checks C-style casts that are equivalent (to
the extent available, as a C-style bitcast could be a well-formed
static cast).

level=1 looks for reinterpret casts from T * to U * (or an lvalue of
type T to U &), where T and U are not TBAA compatible.

level=2 requires the src expression be an lvalue something of
known(ish) static type. I.e a variable, array dereference or member
access.

level=3 requires level 2 and that the resultant pointer is actually
referenced (lvalue->rvalue or lvalue written). Here we can do better
than GCC, which doesn't represent this in the IR -- we merely get a
dereference (and reference types are modeled as pointers with
auto-dereference semantics). There is one exception to this, which is
by-value aggregate arguments. These end up invoking a copy constructor
(passing a reference of course), but are IMHO morally an rvalue -- so
should trigger.

The warning hooks into clang's code-generator's TBAA machinery. For
scalar types the TBAA is straight forwards, comparing LLVM's MDNode
representaion. For record & union types we check if one of the types
is (recursively) the same (or TBAA compatible) as the first direct
base or a field(s) of the record at offset zero (i.e. the first member
of a record, or any members of a union). This covers both C++ and
C-Style inheritance. Also. the member maybe alias_any, or in
ubiquitous-char's alias set, which is also permissible.

The warning is similar to the alignment check that
CheckCompatibleReinterpretCast does, perhaps some merging could occur
if this is accepted?

[*] I implemented what turned into GCC's level=1 way back when.

WIP: adjust tests
---
 clang/include/clang/AST/ASTConsumer.h |   25 +
 clang/include/clang/Basic/DiagnosticGroups.td |6 -
 clang/include/clang/Basic/DiagnosticOptions.h |4 +
 .../clang/Basic/DiagnosticSemaKinds.td|8 +
 clang/include/clang/Driver/Options.td |6 +
 clang/include/clang/Sema/Sema.h   |   11 +
 clang/lib/CodeGen/BackendConsumer.h   |1 +
 clang/lib/CodeGen/CodeGenAction.cpp   |4 +
 clang/lib/CodeGen/CodeGenModule.h |1 +
 clang/lib/CodeGen/CodeGenTBAA.cpp |  134 ++
 clang/lib/CodeGen/CodeGenTBAA.h   |5 +-
 clang/lib/CodeGen/ModuleBuilder.cpp   |2 +
 clang/lib/Frontend/CompilerInvocation.cpp |3 +
 clang/lib/Frontend/FrontendAction.cpp |8 +
 clang/lib/Sema/SemaCast.cpp   |  139 ++
 clang/lib/Sema/SemaExpr.cpp   |   17 +-
 clang/lib/Sema/SemaExprMember.cpp |2 +
 clang/lib/Sema/SemaInit.cpp   |5 +
 clang/test/Misc/warning-flags-tree.c  |4 -
 clang/test/Sema/strict-aliasing-warn.c|  192 +++
 clang/test/SemaCXX/strict-aliasing-warn.cpp   | 1375 +
 21 files changed, 1939 insertions(+), 13 deletions(-)
 create mode 100644 clang/test/Sema/strict-aliasing-warn.c
 create mode 100644 clang/test/SemaCXX/strict-aliasing-warn.cpp

diff --git a/clang/include/clang/AST/ASTConsumer.h 
b/clang/include/clang/AST/ASTConsumer.h
index ebcd8059284d8..d5731ed6adf63 100644
--- a/clang/include/clang/AST/ASTConsumer.h
+++ b/clang/include/clang/AST/ASTConsumer.h
@@ -21,6 +21,7 @@ namespace clang {
   class DeclGroupRef;
   class ASTMutationListener;
   class ASTDeserializationListener; // layering violation because void* is ugly
+  class QualType;
   class SemaConsumer; // layering violation required for safe SemaConsumer
   class TagDecl;
   class VarDecl;
@@ -37,6 +38,27 @@ class ASTConsumer {
 
   friend class SemaConsumer;
 
+public:
+  /// Allow type-based aliasing information to be interrogated by 

[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-06 Thread Nathan Sidwell via cfe-commits


@@ -128,6 +128,10 @@ class DiagnosticOptions : public 
RefCountedBase{
   /// whether -Wsystem-headers is enabled on a per-module basis.
   std::vector SystemHeaderWarningsModules;
 
+  /// Level of scrutiny reinterpret_casts get for type-unsafe aliasing
+  /// checks. Requires an ASTConsumer that provides TBAA information.
+  unsigned StrictAliasing;

urnathan wrote:

I was misled by the -Wundef-prefix pattern that I followed.  Should have looked 
closer and found the DiagnosticOptions.def file.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-06 Thread Vlad Serebrennikov via cfe-commits


@@ -498,3 +498,137 @@ 
CodeGenTBAA::mergeTBAAInfoForMemoryTransfer(TBAAAccessInfo DestInfo,
   // access type regardless of their base types.
   return TBAAAccessInfo::getMayAliasInfo();
 }
+
+// Determine the aliasing kind bit-converting from type Src to type Dst.
+CodeGenTBAA::AliasingKind CodeGenTBAA::getAliasingKind(QualType ,
+   QualType ) {
+  assert(!Src->isVoidType() && !Dst->isVoidType());
+  if (TypeHasMayAlias(Src) || TypeHasMayAlias(Dst))
+return AK_Ok;
+
+  Src = QualType{Src->getBaseElementTypeUnsafe(), 0};
+  Dst = QualType{Dst->getBaseElementTypeUnsafe(), 0};
+
+  auto *SrcDecl = Src->getAsRecordDecl();
+  auto *DstDecl = Dst->getAsRecordDecl();
+
+  const llvm::MDNode *AnyTBAA = getChar();
+  const llvm::MDNode *SrcTBAA = nullptr;
+  const llvm::MDNode *DstTBAA = nullptr;
+
+  if (!SrcDecl) {
+SrcTBAA = getTypeInfo(Src);
+if (!SrcTBAA || SrcTBAA == AnyTBAA)

Endilll wrote:

Are you sure `signed char` is allowed to alias everything, like `char` and 
`unsigned char`? It's not there in the strict aliasing rule.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-06 Thread Vlad Serebrennikov via cfe-commits


@@ -37,6 +38,27 @@ class ASTConsumer {
 
   friend class SemaConsumer;
 
+public:
+  /// Allow type-based aliasing information to be interrogated by the AST
+  /// producer (for diagnostics).
+  class TypeAliasing {

Endilll wrote:

Sorry I didn't make myself clear enough. I thought that `TypeAliasing` exists 
for the sole purpose of providing a scope for unscoped enumeration, so my idea 
was that this whole class can be replaces with a scoped enum.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-06 Thread Vlad Serebrennikov via cfe-commits


@@ -498,3 +498,137 @@ 
CodeGenTBAA::mergeTBAAInfoForMemoryTransfer(TBAAAccessInfo DestInfo,
   // access type regardless of their base types.
   return TBAAAccessInfo::getMayAliasInfo();
 }
+
+// Determine the aliasing kind bit-converting from type Src to type Dst.
+CodeGenTBAA::AliasingKind CodeGenTBAA::getAliasingKind(QualType ,
+   QualType ) {
+  assert(!Src->isVoidType() && !Dst->isVoidType());
+  if (TypeHasMayAlias(Src) || TypeHasMayAlias(Dst))
+return AK_Ok;
+
+  Src = QualType{Src->getBaseElementTypeUnsafe(), 0};
+  Dst = QualType{Dst->getBaseElementTypeUnsafe(), 0};
+
+  auto *SrcDecl = Src->getAsRecordDecl();
+  auto *DstDecl = Dst->getAsRecordDecl();
+
+  const llvm::MDNode *AnyTBAA = getChar();
+  const llvm::MDNode *SrcTBAA = nullptr;
+  const llvm::MDNode *DstTBAA = nullptr;
+
+  if (!SrcDecl) {
+SrcTBAA = getTypeInfo(Src);
+if (!SrcTBAA || SrcTBAA == AnyTBAA)
+  return AK_Ok;
+  }
+  if (!DstDecl) {
+DstTBAA = getTypeInfo(Dst);
+if (!DstTBAA || DstTBAA == AnyTBAA)
+  return AK_Ok;
+  }
+
+  auto IsAncestor = [](const llvm::MDNode *Ancestor,
+   const llvm::MDNode *Descendant) {
+assert(Ancestor != Descendant && "Identical TBAA");
+while (Descendant->getNumOperands() != 1) {
+  Descendant = cast(Descendant->getOperand(1));
+  if (Descendant == Ancestor)
+return true;
+}
+return false;
+  };
+
+  assert(SrcTBAA != AnyTBAA && DstTBAA != AnyTBAA &&

Endilll wrote:

Sure, I'm not insisting.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-06 Thread Aaron Ballman via cfe-commits

AaronBallman wrote:

> > FWIW the GCC doc is 
> > https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstrict-aliasing_003dn
> >  It says for Level 3 "If optimization is enabled, it also runs in the back 
> > end, where it deals with multiple statement cases using flow-sensitive 
> > points-to information."
> > Do you know how it works? Any example?
> 
> I do not now how it works -- didn't go poking there. Neither do I have 
> examples.

My understanding (which could be totally wrong) is that the logic for when to 
emit the level 3 diagnostics rests mostly in the optimizer passes. This is not 
something I think Clang should emulate as it creates a very poor user 
experience in practice (not just with strict aliasing diagnostics -- I don't 
think *any* diagnostics other than remarks should be emitted based on LLVM 
optimization decisions aside from the `error` and `warning` attributes which 
are special circumstances).

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-06 Thread Nathan Sidwell via cfe-commits

https://github.com/urnathan edited 
https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-06 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

> Thank you for your efforts! I scratched the surface a bit; not qualified to 
> do an in-depth review. Can you also add a release note?

Thanks for your comments.  A release note is added.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-06 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

> FWIW the GCC doc is 
> https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstrict-aliasing_003dn
>  It says for Level 3 "If optimization is enabled, it also runs in the back 
> end, where it deals with multiple statement cases using flow-sensitive 
> points-to information."
> 
> Do you know how it works? Any example?

I do not now how it works -- didn't go poking there.  Neither do I have 
examples.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-06 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

> Making Sema pull the TBAA info out of clang/lib/CodeGen is a layering 
> violation (and probably breaks if we aren't actually generating code). If we 
> need some notion of "aliasing" in Sema, we should pull the relevant code into 
> clang/lib/AST.

That's unfortunate.  The code will not call the TBAA stuff if the code 
generator doesn't provide that predicate -- so if not generating optimized code 
the warning is inactive.  This is the same as with GCC btw. To be clear what's 
exposed is a new predicate allowing querying of type conversion's TBAAness -- 
the representations of TBAA etc are not exposed.  The CodeGen TBAA machinery 
builds on llvm's aliasing MDNodes. It would seem a large task to replicate that 
in AST (and I presume propagating llvm's bits further into AST is even worse?)

> I don't have any experience with this warning, so I'm not sure how effective 
> it is at detecting issues; do a significant fraction of issues actually show 
> up at a pure syntactic level? Do you have any idea what false positive rate 
> we should expect?

I think at level 3 the false positive rate is very low. Given GCC's extended 
this over the years from the simple check I had (IIRC any reinterpret_cast 
between pointers/references to TBAA incompatible types) to the semantics it has 
now suggests the warning is useful. and the level 3 default is satisfactory. 
But I don't have hard data.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-06 Thread Nathan Sidwell via cfe-commits


@@ -2026,6 +2027,137 @@ static TryCastResult TryConstCast(Sema , 
ExprResult ,
   return TC_Success;
 }
 
+// We're dereferencing E, either by turning into an RValue, or by dereferencing
+// it. Check whether it's a deref of a reinterpret cast that has aliasing
+// issues.
+void Sema::CheckStrictAliasingDeref(Expr const *E, bool IsLValue) {
+  if (Diags.getDiagnosticOptions().StrictAliasing < 3)
+return;
+
+  assert(IsLValue || E->getType()->isAnyPointerType());
+  CastExpr const *CE = nullptr;
+  for (;;) {
+CE = dyn_cast(E->IgnoreParens());
+if (!CE)
+  return;
+
+if (IsLValue || CE->getCastKind() != CK_ArrayToPointerDecay)
+  break;
+
+E = CE->getSubExpr();
+IsLValue = true;
+  }
+
+  if (CE->getCastKind() != (IsLValue ? CK_LValueBitCast : CK_BitCast))
+return;
+
+  if (CE->getSubExpr()->getType()->isVoidPointerType())
+return;
+
+  QualType DestTy = CE->getType();
+  if (!IsLValue)
+DestTy = DestTy->getPointeeType();
+
+  CheckStrictAliasing(CE->getSubExpr(), DestTy, IsLValue, 
CE->getSourceRange());
+}
+
+/// We're building a cast from E to pointer type DestType. If ISLValueCast is
+/// true, DestType is the pointer equivalent of the reference type we're 
casting
+/// to.
+void Sema::CheckStrictAliasingCast(Expr const *E, QualType DestType,
+   bool IsLValueCast, SourceRange Range) {
+  if (Diags.getDiagnosticOptions().StrictAliasing < 1 ||

urnathan wrote:

Heh, that's amusing :)  this started as a switch, and then changed when I found 
the switch to be line noise..

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-06 Thread Nathan Sidwell via cfe-commits


@@ -498,3 +498,137 @@ 
CodeGenTBAA::mergeTBAAInfoForMemoryTransfer(TBAAAccessInfo DestInfo,
   // access type regardless of their base types.
   return TBAAAccessInfo::getMayAliasInfo();
 }
+
+// Determine the aliasing kind bit-converting from type Src to type Dst.
+CodeGenTBAA::AliasingKind CodeGenTBAA::getAliasingKind(QualType ,
+   QualType ) {
+  assert(!Src->isVoidType() && !Dst->isVoidType());
+  if (TypeHasMayAlias(Src) || TypeHasMayAlias(Dst))
+return AK_Ok;
+
+  Src = QualType{Src->getBaseElementTypeUnsafe(), 0};
+  Dst = QualType{Dst->getBaseElementTypeUnsafe(), 0};
+
+  auto *SrcDecl = Src->getAsRecordDecl();
+  auto *DstDecl = Dst->getAsRecordDecl();
+
+  const llvm::MDNode *AnyTBAA = getChar();
+  const llvm::MDNode *SrcTBAA = nullptr;
+  const llvm::MDNode *DstTBAA = nullptr;
+
+  if (!SrcDecl) {
+SrcTBAA = getTypeInfo(Src);
+if (!SrcTBAA || SrcTBAA == AnyTBAA)
+  return AK_Ok;
+  }
+  if (!DstDecl) {
+DstTBAA = getTypeInfo(Dst);
+if (!DstTBAA || DstTBAA == AnyTBAA)
+  return AK_Ok;
+  }
+
+  auto IsAncestor = [](const llvm::MDNode *Ancestor,
+   const llvm::MDNode *Descendant) {
+assert(Ancestor != Descendant && "Identical TBAA");
+while (Descendant->getNumOperands() != 1) {
+  Descendant = cast(Descendant->getOperand(1));
+  if (Descendant == Ancestor)
+return true;
+}
+return false;
+  };
+
+  assert(SrcTBAA != AnyTBAA && DstTBAA != AnyTBAA &&

urnathan wrote:

Ah, I found the current placement better -- as a reminder to the code below 
that that particular condition did not need to be handled.  The assert got 
added explicitly when I needed that reminder!

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-06 Thread Nathan Sidwell via cfe-commits


@@ -498,3 +498,137 @@ 
CodeGenTBAA::mergeTBAAInfoForMemoryTransfer(TBAAAccessInfo DestInfo,
   // access type regardless of their base types.
   return TBAAAccessInfo::getMayAliasInfo();
 }
+
+// Determine the aliasing kind bit-converting from type Src to type Dst.
+CodeGenTBAA::AliasingKind CodeGenTBAA::getAliasingKind(QualType ,
+   QualType ) {
+  assert(!Src->isVoidType() && !Dst->isVoidType());
+  if (TypeHasMayAlias(Src) || TypeHasMayAlias(Dst))
+return AK_Ok;
+
+  Src = QualType{Src->getBaseElementTypeUnsafe(), 0};
+  Dst = QualType{Dst->getBaseElementTypeUnsafe(), 0};
+
+  auto *SrcDecl = Src->getAsRecordDecl();
+  auto *DstDecl = Dst->getAsRecordDecl();
+
+  const llvm::MDNode *AnyTBAA = getChar();
+  const llvm::MDNode *SrcTBAA = nullptr;
+  const llvm::MDNode *DstTBAA = nullptr;
+
+  if (!SrcDecl) {
+SrcTBAA = getTypeInfo(Src);
+if (!SrcTBAA || SrcTBAA == AnyTBAA)

urnathan wrote:

Yeah, I didn't because it was a class member, but scoped is better. done.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-06 Thread Nathan Sidwell via cfe-commits

https://github.com/urnathan updated 
https://github.com/llvm/llvm-project/pull/74155

>From 89c05618bb75fd073343046f3b54bde5f2254719 Mon Sep 17 00:00:00 2001
From: Nathan Sidwell 
Date: Wed, 15 Nov 2023 15:27:16 -0500
Subject: [PATCH 1/5] [clang] Strict aliasing warning ala GCC [PR50066]

This implements -Wstrict-aliasing(=[123])? along the same lines as
GCC. It's not 100% the same for reasons expanded on below. The default
is level 3, and I have verified that bootstrapping does not trigger
any warnings (just like building with GCC).

As with GCC, higher levels result in fewer warnings, reducing the
number of false positives at the cost of missing (some) potential
cases. Unlike GCC, this is entirely in the FE, we do not propagate any
checking into the IR (so there are cases GCC will detect we do not, I
have not encountered any). GCC's documentation is not very specific
about which cases are detected. I examined GCC's source code to reverse
engineer the algorithm[*], and as can be seen from the testcases, noted
GCC's behaviour.

The strict aliasing check relies on TBAA. LLVM's representation is
semantically different to GCCs for structured types. I have tried to
keep with the spirit of the warning.

The warning checks reinterpret_casts that are CK_BitCast or
CK_LValueBitCast. It also checks C-style casts that are equivalent (to
the extent available, as a C-style bitcast could be a well-formed
static cast).

level=1 looks for reinterpret casts from T * to U * (or an lvalue of
type T to U &), where T and U are not TBAA compatible.

level=2 requires the src expression be an lvalue something of
known(ish) static type. I.e a variable, array dereference or member
access.

level=3 requires level 2 and that the resultant pointer is actually
referenced (lvalue->rvalue or lvalue written). Here we can do better
than GCC, which doesn't represent this in the IR -- we merely get a
dereference (and reference types are modeled as pointers with
auto-dereference semantics). There is one exception to this, which is
by-value aggregate arguments. These end up invoking a copy constructor
(passing a reference of course), but are IMHO morally an rvalue -- so
should trigger.

The warning hooks into clang's code-generator's TBAA machinery. For
scalar types the TBAA is straight forwards, comparing LLVM's MDNode
representaion. For record & union types we check if one of the types
is (recursively) the same (or TBAA compatible) as the first direct
base or a field(s) of the record at offset zero (i.e. the first member
of a record, or any members of a union). This covers both C++ and
C-Style inheritance. Also. the member maybe alias_any, or in
ubiquitous-char's alias set, which is also permissible.

The warning is similar to the alignment check that
CheckCompatibleReinterpretCast does, perhaps some merging could occur
if this is accepted?

[*] I implemented what turned into GCC's level=1 way back when.

WIP: adjust tests
---
 clang/include/clang/AST/ASTConsumer.h |   25 +
 clang/include/clang/Basic/DiagnosticGroups.td |6 -
 clang/include/clang/Basic/DiagnosticOptions.h |4 +
 .../clang/Basic/DiagnosticSemaKinds.td|8 +
 clang/include/clang/Driver/Options.td |6 +
 clang/include/clang/Sema/Sema.h   |   11 +
 clang/lib/CodeGen/BackendConsumer.h   |1 +
 clang/lib/CodeGen/CodeGenAction.cpp   |4 +
 clang/lib/CodeGen/CodeGenModule.h |1 +
 clang/lib/CodeGen/CodeGenTBAA.cpp |  134 ++
 clang/lib/CodeGen/CodeGenTBAA.h   |5 +-
 clang/lib/CodeGen/ModuleBuilder.cpp   |2 +
 clang/lib/Frontend/CompilerInvocation.cpp |3 +
 clang/lib/Frontend/FrontendAction.cpp |8 +
 clang/lib/Sema/SemaCast.cpp   |  139 ++
 clang/lib/Sema/SemaExpr.cpp   |   17 +-
 clang/lib/Sema/SemaExprMember.cpp |2 +
 clang/lib/Sema/SemaInit.cpp   |5 +
 clang/test/Misc/warning-flags-tree.c  |4 -
 clang/test/Sema/strict-aliasing-warn.c|  192 +++
 clang/test/SemaCXX/strict-aliasing-warn.cpp   | 1375 +
 21 files changed, 1939 insertions(+), 13 deletions(-)
 create mode 100644 clang/test/Sema/strict-aliasing-warn.c
 create mode 100644 clang/test/SemaCXX/strict-aliasing-warn.cpp

diff --git a/clang/include/clang/AST/ASTConsumer.h 
b/clang/include/clang/AST/ASTConsumer.h
index ebcd8059284d8..d5731ed6adf63 100644
--- a/clang/include/clang/AST/ASTConsumer.h
+++ b/clang/include/clang/AST/ASTConsumer.h
@@ -21,6 +21,7 @@ namespace clang {
   class DeclGroupRef;
   class ASTMutationListener;
   class ASTDeserializationListener; // layering violation because void* is ugly
+  class QualType;
   class SemaConsumer; // layering violation required for safe SemaConsumer
   class TagDecl;
   class VarDecl;
@@ -37,6 +38,27 @@ class ASTConsumer {
 
   friend class SemaConsumer;
 
+public:
+  /// Allow type-based aliasing information to be interrogated by 

[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-05 Thread Fangrui Song via cfe-commits

MaskRay wrote:

Add @shafik who authors 
https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8 ("Although 
clang allows these flags it apparently does not actually implement the 
warnings")

> [*] I implemented what turned into GCC's level=1 way back when.

Cool!

FWIW the GCC doc is 
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstrict-aliasing_003dn
It says for Level 3 "If optimization is enabled, it also runs in the back end, 
where it deals with multiple statement cases using flow-sensitive points-to 
information."

Do you know how it works? Any example?


https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-05 Thread Fangrui Song via cfe-commits


@@ -0,0 +1,192 @@
+// RUN: %clang_cc1 %s -O0 -Wstrict-aliasing -S -o %t -verify=quiet
+// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=0 -S -o %t -verify=quiet
+// RUN: %clang_cc1 %s -O2 -Wno-strict-aliasing -S -o %t -verify=quiet
+// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=1 -S -o %t 
-verify=level1,level12,level123
+// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=2 -S -o %t 
-verify=level2,level23,level12,level123
+// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing=3 -S -o %t -verify=level23,level123
+// RUN: %clang_cc1 %s -O2 -Wstrict-aliasing -S -o %t -verify=level23,level123
+// RUN: %clang_cc1 %s -O2 -S -o %t -verify=level23,level123
+
+// quiet-no-diagnostics
+
+#if _LP64
+// These names make more sense on an ilp32 machine
+typedef long INT;
+typedef long long LONG;
+typedef unsigned long UINT;
+#else
+typedef int INT;
+typedef long LONG;
+typedef unsigned int UINT;
+#endif
+typedef short SHORT;
+
+INT ScalarINT;
+INT Ary[2];
+struct {int m;} Struct;
+
+_Complex int CPLx;
+
+void ByVal(long long);
+void ByPtr(void *);
+
+void VarPtr(INT *Ptr) {
+  // GCC: 1
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByPtr((LONG *)(Ptr));
+  // level1-note@-1{{not alias compatible}}
+
+  // GCC:
+  ByPtr((LONG *)((void *)(Ptr)));
+
+  // GCC: 1
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*(LONG *)(Ptr));
+  // level1-note@-1{{not alias compatible}}
+}
+
+void Object() {
+  // GCC: 1, 2
+  // level2-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByPtr((LONG *)());
+  // level12-note@-1{{not alias compatible}}
+
+  // GCC:
+  ByPtr((LONG *)((void *)()));
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*(LONG *)());
+  // level123-note@-1{{not alias compatible}}
+}
+
+// Level 1, 2, 3 - 1, 2, 3
+void DetectedVariants() {
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*(LONG *)([1]));
+  // level123-note@-1{{not alias compatible}}
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*(LONG *)());
+  // level123-note@-1{{not alias compatible}}
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*(LONG *)(&()->m));
+  // level123-note@-1{{not alias compatible}}
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*(LONG *)(&__real__(CPLx)));
+  // level123-note@-1{{not alias compatible}}
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*(LONG *)(&__imag__(CPLx)));
+  // level123-note@-1{{not alias compatible}}
+}
+
+void Ok() {
+  // GCC:
+  ByPtr((UINT *)());
+  // GCC:
+  ByPtr((UINT *)((void *)()));
+  // GCC:
+  ByVal(*(UINT *)());
+}
+
+// Level 1, 2, 3 - 1, 2, 3
+void Parens() {
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*((LONG *)((&(ScalarINT);
+  // level123-note@-1{{not alias compatible}}
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ByVal(*((LONG *)((&(Ary[1]);
+  // level123-note@-1{{not alias compatible}}
+}
+
+// Clang models may_alias as a decl attribute, not a type attribute.
+
+typedef int MA __attribute__((may_alias));
+
+void Frob(MA *a) {
+  ByPtr((short *)(a));
+  ByVal(*(short *)(a));
+}
+
+struct Inner { int m; };
+struct Outer1 { struct Inner i; };
+struct Outer2 { struct Outer1 o; };
+struct Inner i;
+struct Outer2 o;
+
+void ByValInner (struct Inner);
+void ByValOuter2 (struct Outer2);
+
+void Inherit() {
+  // Check we see through multiple levels
+  int in;
+
+  ByValOuter2(*(struct Outer2 *));
+  ByValOuter2(*(struct Outer2 *));
+  ByValInner(*(struct Inner *));
+  ByValInner(*(struct Inner *));
+  ByVal(*(int *));
+}
+
+// PR 50066
+typedef unsigned char uchar;
+
+void Double(double);
+
+int main() {
+  double d = 2.34;
+  int i[2];
+  Double(d);
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  *(long long *)i =
+  // level123-note@-1{{not alias compatible}}
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  *(long long *)
+  // level123-note@-1{{not alias compatible}}
+
+  // GCC: 1, 2, 3
+  // level23-warning@+2{{type-punned pointer breaks}}
+  // level1-warning@+1{{type-punned pointer might break}}
+  ((int *))[0] = i[0];
+  // level123-note@-1{{not 

[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-05 Thread Eli Friedman via cfe-commits

efriedma-quic wrote:

Making Sema pull the TBAA info out of clang/lib/CodeGen is a layering violation 
(and probably breaks if we aren't actually generating code).  If we need some 
notion of "aliasing" in Sema, we should pull the relevant code into 
clang/lib/AST.

I don't have any experience with this warning, so I'm not sure how effective it 
is at detecting issues; do a significant fraction of issues actually show up at 
a pure syntactic level?  Do you have any idea what false positive rate we 
should expect?

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-03 Thread Vlad Serebrennikov via cfe-commits


@@ -128,6 +128,10 @@ class DiagnosticOptions : public 
RefCountedBase{
   /// whether -Wsystem-headers is enabled on a per-module basis.
   std::vector SystemHeaderWarningsModules;
 
+  /// Level of scrutiny reinterpret_casts get for type-unsafe aliasing
+  /// checks. Requires an ASTConsumer that provides TBAA information.
+  unsigned StrictAliasing;

Endilll wrote:

Wouldn't a small bit-field suffice here?

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-03 Thread Vlad Serebrennikov via cfe-commits


@@ -498,3 +498,137 @@ 
CodeGenTBAA::mergeTBAAInfoForMemoryTransfer(TBAAAccessInfo DestInfo,
   // access type regardless of their base types.
   return TBAAAccessInfo::getMayAliasInfo();
 }
+
+// Determine the aliasing kind bit-converting from type Src to type Dst.
+CodeGenTBAA::AliasingKind CodeGenTBAA::getAliasingKind(QualType ,
+   QualType ) {
+  assert(!Src->isVoidType() && !Dst->isVoidType());
+  if (TypeHasMayAlias(Src) || TypeHasMayAlias(Dst))
+return AK_Ok;
+
+  Src = QualType{Src->getBaseElementTypeUnsafe(), 0};
+  Dst = QualType{Dst->getBaseElementTypeUnsafe(), 0};
+
+  auto *SrcDecl = Src->getAsRecordDecl();
+  auto *DstDecl = Dst->getAsRecordDecl();
+
+  const llvm::MDNode *AnyTBAA = getChar();
+  const llvm::MDNode *SrcTBAA = nullptr;
+  const llvm::MDNode *DstTBAA = nullptr;
+
+  if (!SrcDecl) {
+SrcTBAA = getTypeInfo(Src);
+if (!SrcTBAA || SrcTBAA == AnyTBAA)
+  return AK_Ok;
+  }
+  if (!DstDecl) {
+DstTBAA = getTypeInfo(Dst);
+if (!DstTBAA || DstTBAA == AnyTBAA)
+  return AK_Ok;
+  }
+
+  auto IsAncestor = [](const llvm::MDNode *Ancestor,
+   const llvm::MDNode *Descendant) {
+assert(Ancestor != Descendant && "Identical TBAA");
+while (Descendant->getNumOperands() != 1) {
+  Descendant = cast(Descendant->getOperand(1));
+  if (Descendant == Ancestor)
+return true;
+}
+return false;
+  };
+
+  assert(SrcTBAA != AnyTBAA && DstTBAA != AnyTBAA &&

Endilll wrote:

I think this assert should be placed closer to where `AnyTBAA` is handled, 
before `IsAncestor` definition.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-03 Thread Vlad Serebrennikov via cfe-commits


@@ -37,6 +38,27 @@ class ASTConsumer {
 
   friend class SemaConsumer;
 
+public:
+  /// Allow type-based aliasing information to be interrogated by the AST
+  /// producer (for diagnostics).
+  class TypeAliasing {

Endilll wrote:

It seems to me that we'd be better off with `AliasingKind` being scoped enum.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-03 Thread Vlad Serebrennikov via cfe-commits


@@ -2026,6 +2027,137 @@ static TryCastResult TryConstCast(Sema , 
ExprResult ,
   return TC_Success;
 }
 
+// We're dereferencing E, either by turning into an RValue, or by dereferencing
+// it. Check whether it's a deref of a reinterpret cast that has aliasing
+// issues.
+void Sema::CheckStrictAliasingDeref(Expr const *E, bool IsLValue) {
+  if (Diags.getDiagnosticOptions().StrictAliasing < 3)
+return;
+
+  assert(IsLValue || E->getType()->isAnyPointerType());
+  CastExpr const *CE = nullptr;
+  for (;;) {
+CE = dyn_cast(E->IgnoreParens());
+if (!CE)
+  return;
+
+if (IsLValue || CE->getCastKind() != CK_ArrayToPointerDecay)
+  break;
+
+E = CE->getSubExpr();
+IsLValue = true;
+  }
+
+  if (CE->getCastKind() != (IsLValue ? CK_LValueBitCast : CK_BitCast))
+return;
+
+  if (CE->getSubExpr()->getType()->isVoidPointerType())
+return;
+
+  QualType DestTy = CE->getType();
+  if (!IsLValue)
+DestTy = DestTy->getPointeeType();
+
+  CheckStrictAliasing(CE->getSubExpr(), DestTy, IsLValue, 
CE->getSourceRange());
+}
+
+/// We're building a cast from E to pointer type DestType. If ISLValueCast is
+/// true, DestType is the pointer equivalent of the reference type we're 
casting
+/// to.
+void Sema::CheckStrictAliasingCast(Expr const *E, QualType DestType,
+   bool IsLValueCast, SourceRange Range) {
+  if (Diags.getDiagnosticOptions().StrictAliasing < 1 ||

Endilll wrote:

Switch might be a better fit here.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-03 Thread Vlad Serebrennikov via cfe-commits


@@ -498,3 +498,137 @@ 
CodeGenTBAA::mergeTBAAInfoForMemoryTransfer(TBAAAccessInfo DestInfo,
   // access type regardless of their base types.
   return TBAAAccessInfo::getMayAliasInfo();
 }
+
+// Determine the aliasing kind bit-converting from type Src to type Dst.
+CodeGenTBAA::AliasingKind CodeGenTBAA::getAliasingKind(QualType ,
+   QualType ) {
+  assert(!Src->isVoidType() && !Dst->isVoidType());
+  if (TypeHasMayAlias(Src) || TypeHasMayAlias(Dst))
+return AK_Ok;
+
+  Src = QualType{Src->getBaseElementTypeUnsafe(), 0};
+  Dst = QualType{Dst->getBaseElementTypeUnsafe(), 0};
+
+  auto *SrcDecl = Src->getAsRecordDecl();
+  auto *DstDecl = Dst->getAsRecordDecl();
+
+  const llvm::MDNode *AnyTBAA = getChar();
+  const llvm::MDNode *SrcTBAA = nullptr;
+  const llvm::MDNode *DstTBAA = nullptr;
+
+  if (!SrcDecl) {
+SrcTBAA = getTypeInfo(Src);
+if (!SrcTBAA || SrcTBAA == AnyTBAA)

Endilll wrote:

Does this check take into account all 3 types (`char`, `unsigned char`, 
`std::byte`) listed in 
[basic.lval#11.3](http://eel.is/c++draft/basic.lval#11.3)? Same goes for the 
similar `DstTBAA` check below.

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-03 Thread Vlad Serebrennikov via cfe-commits

https://github.com/Endilll edited 
https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-03 Thread Vlad Serebrennikov via cfe-commits

https://github.com/Endilll commented:

Thank you for your efforts!
I scratched the surface a bit; not qualified to do an in-depth review.
Can you also add a release note?

https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-02 Thread Nathan Sidwell via cfe-commits

https://github.com/urnathan edited 
https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-02 Thread via cfe-commits

llvmbot wrote:



@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Nathan Sidwell (urnathan)


Changes

This implements -Wstrict-aliasing(=[123])? along the same lines as GCC. It's 
not 100% the same for reasons expanded on below. The default is level 3, and I 
have verified that bootstrapping does not trigger any warnings (just like 
building with GCC).

As with GCC, higher levels result in fewer warnings, reducing the number of 
false positives at the cost of missing (some) potential cases. Unlike GCC, this 
is entirely in the FE, we do not propagate any checking into the IR (so there 
are cases GCC will detect we do not, I have not encountered any). GCC's 
documentation is not very specific about which cases are detected. I examined 
GCC's source code to reverse engineer the algorithm[*], and as can be seen from 
the testcases, noted GCC's behaviour.

The strict aliasing check relies on TBAA. LLVM's representation is semantically 
different to GCCs for structured types. I have tried to keep with the spirit of 
the warning.

The warning checks reinterpret_casts that are CK_BitCast or CK_LValueBitCast. 
It also checks C-style casts that are equivalent (to the extent available, as a 
C-style bitcast could be a well-formed static cast).

level=1 looks for reinterpret casts from T * to U * (or an lvalue of type T to 
U ), where T and U are not TBAA compatible.

level=2 requires the src expression be an lvalue something of known(ish) static 
type. I.e a variable, array dereference or member access.

level=3 requires level 2 and that the resultant pointer is actually referenced 
(lvalue-rvalue or lvalue written). Here we can do better than GCC, which 
doesn't represent this in the IR -- we merely get a dereference (and reference 
types are modeled as pointers with auto-dereference semantics). There is one 
exception to this, which is by-value aggregate arguments. These end up invoking 
a copy constructor (passing a reference of course), but are IMHO morally an 
rvalue -- so should trigger.

The warning hooks into clang's code-generator's TBAA machinery. For scalar 
types the TBAA is straight forwards, comparing LLVM's MDNode representaion. For 
record  union types we check if one of the types is (recursively) the same 
(or TBAA compatible) as the first direct base or a field(s) of the record at 
offset zero (i.e. the first member of a record, or any members of a union). 
This covers both C++ and C-Style inheritance. Also. the member maybe alias_any, 
or in ubiquitous-char's alias set, which is also permissible.

The warning is similar to the alignment check that 
CheckCompatibleReinterpretCast does, perhaps some merging could occur if this 
is accepted?

[*] I implemented what turned into GCC's level=1 way back when.

WIP: adjust tests

---

Patch is 77.54 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/74155.diff


21 Files Affected:

- (modified) clang/include/clang/AST/ASTConsumer.h (+25) 
- (modified) clang/include/clang/Basic/DiagnosticGroups.td (-6) 
- (modified) clang/include/clang/Basic/DiagnosticOptions.h (+4) 
- (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+8) 
- (modified) clang/include/clang/Driver/Options.td (+6) 
- (modified) clang/include/clang/Sema/Sema.h (+11) 
- (modified) clang/lib/CodeGen/BackendConsumer.h (+1) 
- (modified) clang/lib/CodeGen/CodeGenAction.cpp (+4) 
- (modified) clang/lib/CodeGen/CodeGenModule.h (+1) 
- (modified) clang/lib/CodeGen/CodeGenTBAA.cpp (+134) 
- (modified) clang/lib/CodeGen/CodeGenTBAA.h (+4-1) 
- (modified) clang/lib/CodeGen/ModuleBuilder.cpp (+2) 
- (modified) clang/lib/Frontend/CompilerInvocation.cpp (+3) 
- (modified) clang/lib/Frontend/FrontendAction.cpp (+8) 
- (modified) clang/lib/Sema/SemaCast.cpp (+139) 
- (modified) clang/lib/Sema/SemaExpr.cpp (+15-2) 
- (modified) clang/lib/Sema/SemaExprMember.cpp (+2) 
- (modified) clang/lib/Sema/SemaInit.cpp (+5) 
- (modified) clang/test/Misc/warning-flags-tree.c (-4) 
- (added) clang/test/Sema/strict-aliasing-warn.c (+192) 
- (added) clang/test/SemaCXX/strict-aliasing-warn.cpp (+1375) 


``diff
diff --git a/clang/include/clang/AST/ASTConsumer.h 
b/clang/include/clang/AST/ASTConsumer.h
index ebcd8059284d8..d5731ed6adf63 100644
--- a/clang/include/clang/AST/ASTConsumer.h
+++ b/clang/include/clang/AST/ASTConsumer.h
@@ -21,6 +21,7 @@ namespace clang {
   class DeclGroupRef;
   class ASTMutationListener;
   class ASTDeserializationListener; // layering violation because void* is ugly
+  class QualType;
   class SemaConsumer; // layering violation required for safe SemaConsumer
   class TagDecl;
   class VarDecl;
@@ -37,6 +38,27 @@ class ASTConsumer {
 
   friend class SemaConsumer;
 
+public:
+  /// Allow type-based aliasing information to be interrogated by the AST
+  /// producer (for diagnostics).
+  class TypeAliasing {
+  public:
+TypeAliasing() = default;
+virtual ~TypeAliasing(){};
+
+  public:
+enum 

[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-02 Thread Nathan Sidwell via cfe-commits

https://github.com/urnathan ready_for_review 
https://github.com/llvm/llvm-project/pull/74155
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Strict aliasing warning ala GCC [PR50066] (PR #74155)

2023-12-01 Thread Nathan Sidwell via cfe-commits

https://github.com/urnathan created 
https://github.com/llvm/llvm-project/pull/74155

This implements -Wstrict-aliasing(=[123])? along the same lines as GCC. It's 
not 100% the same for reasons expanded on below. The default is level 3, and I 
have verified that bootstrapping does not trigger any warnings (just like 
building with GCC).

As with GCC, higher levels result in fewer warnings, reducing the number of 
false positives at the cost of missing (some) potential cases. Unlike GCC, this 
is entirely in the FE, we do not propagate any checking into the IR (so there 
are cases GCC will detect we do not, I have not encountered any). GCC's 
documentation is not very specific about which cases are detected. I examined 
GCC's source code to reverse engineer the algorithm[*], and as can be seen from 
the testcases, noted GCC's behaviour.

The strict aliasing check relies on TBAA. LLVM's representation is semantically 
different to GCCs for structured types. I have tried to keep with the spirit of 
the warning.

The warning checks reinterpret_casts that are CK_BitCast or CK_LValueBitCast. 
It also checks C-style casts that are equivalent (to the extent available, as a 
C-style bitcast could be a well-formed static cast).

level=1 looks for reinterpret casts from T * to U * (or an lvalue of type T to 
U &), where T and U are not TBAA compatible.

level=2 requires the src expression be an lvalue something of known(ish) static 
type. I.e a variable, array dereference or member access.

level=3 requires level 2 and that the resultant pointer is actually referenced 
(lvalue->rvalue or lvalue written). Here we can do better than GCC, which 
doesn't represent this in the IR -- we merely get a dereference (and reference 
types are modeled as pointers with auto-dereference semantics). There is one 
exception to this, which is by-value aggregate arguments. These end up invoking 
a copy constructor (passing a reference of course), but are IMHO morally an 
rvalue -- so should trigger.

The warning hooks into clang's code-generator's TBAA machinery. For scalar 
types the TBAA is straight forwards, comparing LLVM's MDNode representaion. For 
record & union types we check if one of the types is (recursively) the same (or 
TBAA compatible) as the first direct base or a field(s) of the record at offset 
zero (i.e. the first member of a record, or any members of a union). This 
covers both C++ and C-Style inheritance. Also. the member maybe alias_any, or 
in ubiquitous-char's alias set, which is also permissible.

The warning is similar to the alignment check that 
CheckCompatibleReinterpretCast does, perhaps some merging could occur if this 
is accepted?

[*] I implemented what turned into GCC's level=1 way back when.

WIP: adjust tests

>From 89c05618bb75fd073343046f3b54bde5f2254719 Mon Sep 17 00:00:00 2001
From: Nathan Sidwell 
Date: Wed, 15 Nov 2023 15:27:16 -0500
Subject: [PATCH] [clang] Strict aliasing warning ala GCC [PR50066]

This implements -Wstrict-aliasing(=[123])? along the same lines as
GCC. It's not 100% the same for reasons expanded on below. The default
is level 3, and I have verified that bootstrapping does not trigger
any warnings (just like building with GCC).

As with GCC, higher levels result in fewer warnings, reducing the
number of false positives at the cost of missing (some) potential
cases. Unlike GCC, this is entirely in the FE, we do not propagate any
checking into the IR (so there are cases GCC will detect we do not, I
have not encountered any). GCC's documentation is not very specific
about which cases are detected. I examined GCC's source code to reverse
engineer the algorithm[*], and as can be seen from the testcases, noted
GCC's behaviour.

The strict aliasing check relies on TBAA. LLVM's representation is
semantically different to GCCs for structured types. I have tried to
keep with the spirit of the warning.

The warning checks reinterpret_casts that are CK_BitCast or
CK_LValueBitCast. It also checks C-style casts that are equivalent (to
the extent available, as a C-style bitcast could be a well-formed
static cast).

level=1 looks for reinterpret casts from T * to U * (or an lvalue of
type T to U &), where T and U are not TBAA compatible.

level=2 requires the src expression be an lvalue something of
known(ish) static type. I.e a variable, array dereference or member
access.

level=3 requires level 2 and that the resultant pointer is actually
referenced (lvalue->rvalue or lvalue written). Here we can do better
than GCC, which doesn't represent this in the IR -- we merely get a
dereference (and reference types are modeled as pointers with
auto-dereference semantics). There is one exception to this, which is
by-value aggregate arguments. These end up invoking a copy constructor
(passing a reference of course), but are IMHO morally an rvalue -- so
should trigger.

The warning hooks into clang's code-generator's TBAA machinery. For
scalar types the TBAA is straight forwards, comparing LLVM's