[clang] [Clang] Fix parsing of reversible type traits in template arguments (PR #95969)
@@ -141,3 +141,15 @@ namespace r360308_regression { return a == b; } } + +namespace GH95598 { +template +struct __is_pointer {}; +// expected-warning@-1 {{keyword '__is_pointer' will be made available as an identifier for the remainder of the translation unit}} zygoloid wrote: There are now patches posted for libstdc++ to avoid using our type trait keywords as identifiers. https://github.com/llvm/llvm-project/pull/95969 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Fix parsing of reversible type traits in template arguments (PR #95969)
@@ -141,3 +141,15 @@ namespace r360308_regression { return a == b; } } + +namespace GH95598 { +template +struct __is_pointer {}; +// expected-warning@-1 {{keyword '__is_pointer' will be made available as an identifier for the remainder of the translation unit}} zygoloid wrote: One thing we could consider doing would be removing the entire "revertible keyword" mechanism, and instead, in the lexer, classifying these tokens as keywords only if the next pp-token is an `lparen`. That would probably simplify the handling of these keywords everywhere, and would happen to give the outcome that libstdc++ is expecting in this case too. (It's a breaking change for some weird corner cases, such as producing the keyword via macro expansion and things like `bool __is_pointer(true);`, but hopefully those things don't happen in practice!) https://github.com/llvm/llvm-project/pull/95969 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Fix parsing of reversible type traits in template arguments (PR #95969)
@@ -141,3 +141,15 @@ namespace r360308_regression { return a == b; } } + +namespace GH95598 { +template +struct __is_pointer {}; +// expected-warning@-1 {{keyword '__is_pointer' will be made available as an identifier for the remainder of the translation unit}} zygoloid wrote: I think we've incrementally ended up in a bad place. Here's my memory of what happened: Originally, the idea was: we provide builtins such as `__is_pointer` for use by the standard library. But for a bunch of those, libstdc++ provides a type with the same name that works it out for itself. OK, we can work around that: if we see a declaration with that name, then the standard library isn't using our builtin, so we treat it as a non-keyword so that libstdc++ doesn't break. But then it turns out that boost uses these things as keywords directly, rather than using the standard library functionality. To keep both that and libstdc++ working at the same time, we treat the keyword followed by an open paren as having the builtin meaning even when the keyword has been reverted to being a non-keyword. And now we're noticing that we also need special behavior in other parts of the parser to treat the keyword followed by an open paren as a special case. I don't think we should be extending this hack further. I've suggested on the libstdc++ bug report that this be fixed in libstdc++ instead, by stopping using our keywords as their type names. Maybe one day we can then remove it entirely. https://github.com/llvm/llvm-project/pull/95969 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Fix-it hint for `++this` -> `++*this` when deref is modifiable (PR #94159)
zygoloid wrote: Perhaps we can start by supplying `CheckForModifiableLvalue` with extra information about *why* we're doing the check. The current diagnostic we get for incrementing a non-modifiable lvalue: ```console :2:3: error: cannot assign to variable 'n' with const-qualified type 'const int' 2 | ++n; | ^ ~ ``` ... is really bad -- this isn't an assignment, it's an increment. An enum of possible reasons why we're doing the check would let us significantly improve the diagnostic here. https://github.com/llvm/llvm-project/pull/94159 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Fix-it hint for `++this` -> `++*this` when deref is modifiable (PR #94159)
@@ -13273,6 +13273,23 @@ enum { ConstUnknown, // Keep as last element }; +static void MaybeSuggestDerefFixIt(Sema , const Expr *E, SourceLocation Loc) { + ExprResult Deref; + Expr *TE = const_cast(E); + { +Sema::TentativeAnalysisScope Trap(S); +Deref = S.ActOnUnaryOp(S.getCurScope(), Loc, tok::star, TE); + } zygoloid wrote: It's not really feasible to use a `TentativeAnalysisScope` for this. Here are three reasons: 1) If `ActOnUnaryOp` produces an "immediate context" diagnostic (the common case), such as a warning, then the note you produce below will be attached to that diagnostic instead of to the original diagnostic, so your note will be suppressed. 2) If `ActOnUnaryOp` triggers template instantiation, then `TentativeAnalysisScope` will not suppress diagnostics outside of the immediate context, potentially leading to additional spurious diagnostics being produced. 3) If this function is used to attach a note to a warning / extension diagnostic rather than an error (which doesn't appear to be the case right now, but it seems plausible that someone would make such a change to `CheckForModifiableLValue` in the future), then the diagnostics produced by (2) would lead to rejecting a valid program. Checking for an overloaded `operator*` seems overly complicated here, so I think the right thing to test here is just whether the expression has pointer type, and the pointee type is an object type (not void or a function type). https://github.com/llvm/llvm-project/pull/94159 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Fix-it hint for `++this` -> `++*this` when deref is modifiable (PR #94159)
@@ -65,8 +65,10 @@ void test4(void) { void test5(int *X, float *P) { (float*)X = P; // expected-error {{assignment to cast is illegal, lvalue casts are not supported}} + // expected-note@-1 {{add '*' to dereference it}} zygoloid wrote: This is a diagnostic regression. https://github.com/llvm/llvm-project/pull/94159 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Fix-it hint for `++this` -> `++*this` when deref is modifiable (PR #94159)
@@ -0,0 +1,49 @@ +// RUN: %clang_cc1 -std=c++23 -fsyntax-only -verify %s + +struct S { + void f() { +++this; // expected-error {{expression is not assignable}} +// expected-note@-1 {{add '*' to dereference it}} + } + + void g() const { +++this; // expected-error {{expression is not assignable}} zygoloid wrote: Likewise. https://github.com/llvm/llvm-project/pull/94159 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Fix-it hint for `++this` -> `++*this` when deref is modifiable (PR #94159)
@@ -13273,6 +13273,23 @@ enum { ConstUnknown, // Keep as last element }; +static void MaybeSuggestDerefFixIt(Sema , const Expr *E, SourceLocation Loc) { + ExprResult Deref; + Expr *TE = const_cast(E); + { +Sema::TentativeAnalysisScope Trap(S); +Deref = S.ActOnUnaryOp(S.getCurScope(), Loc, tok::star, TE); + } + if (Deref.isUsable() && + Deref.get()->isModifiableLvalue(S.Context, ) == Expr::MLV_Valid && + !E->getType()->isObjCObjectPointerType()) { +S.Diag(E->getBeginLoc(), + diag::note_typecheck_add_deref_star_not_modifiable_lvalue) +<< E->getSourceRange() +<< FixItHint::CreateInsertion(E->getBeginLoc(), "*"); zygoloid wrote: We shouldn't be suggesting adding a `*` unless that would actually make sense in context. Eg, given: ```c++ const int *const p; const int *const q; p = q; ``` ... we should not suggest a dereference -- the problem is clearly that `p` is `const`, and adding a `*` doesn't help. I suspect we simply don't have enough information to do this from within `CheckForModifiableLvalue`. https://github.com/llvm/llvm-project/pull/94159 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Fix-it hint for `++this` -> `++*this` when deref is modifiable (PR #94159)
@@ -65,8 +65,10 @@ void test4(void) { void test5(int *X, float *P) { (float*)X = P; // expected-error {{assignment to cast is illegal, lvalue casts are not supported}} + // expected-note@-1 {{add '*' to dereference it}} #define FOO ((float*) X) FOO = P; // expected-error {{assignment to cast is illegal, lvalue casts are not supported}} + // expected-note@-1 {{add '*' to dereference it}} zygoloid wrote: This is a diagnostic regression. https://github.com/llvm/llvm-project/pull/94159 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Fix-it hint for `++this` -> `++*this` when deref is modifiable (PR #94159)
@@ -0,0 +1,49 @@ +// RUN: %clang_cc1 -std=c++23 -fsyntax-only -verify %s + +struct S { + void f() { +++this; // expected-error {{expression is not assignable}} +// expected-note@-1 {{add '*' to dereference it}} zygoloid wrote: This is a bad diagnostic change: adding the `*` doesn't help. https://github.com/llvm/llvm-project/pull/94159 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Fix-it hint for `++this` -> `++*this` when deref is modifiable (PR #94159)
https://github.com/zygoloid commented: I have some significant concerns about the approach of this PR. It seems to be producing notes that in a lot of cases suggest doing something that doesn't make sense or won't compile, and tentatively building a `*p` expression is unsafe. Can we revert and consider other approaches? https://github.com/llvm/llvm-project/pull/94159 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Fix-it hint for `++this` -> `++*this` when deref is modifiable (PR #94159)
https://github.com/zygoloid edited https://github.com/llvm/llvm-project/pull/94159 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] allow `` `@$ `` in raw string delimiters in C++26 (PR #93216)
@@ -2261,8 +2261,17 @@ bool Lexer::LexRawStringLiteral(Token , const char *CurPtr, unsigned PrefixLen = 0; - while (PrefixLen != 16 && isRawStringDelimBody(CurPtr[PrefixLen])) + while (PrefixLen != 16 && isRawStringDelimBody(CurPtr[PrefixLen])) { ++PrefixLen; +if (!isLexingRawMode() && +llvm::is_contained({'$', '@', '`'}, CurPtr[PrefixLen])) { zygoloid wrote: There's an off-by-one error here: we're incrementing `PrefixLen` before checking the character, so this is checking the character *after* the one we just processed. Hence we don't diagnose `"@(foo)@"`, because the characters we look at are the `(` and `"` after the `@`s, not the `@`s themselves. https://github.com/llvm/llvm-project/pull/93216 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Pass LangOpts from CompilerInstance to DependencyScanningWorker (PR #93753)
zygoloid wrote: > > I guess the general question is - is it acceptable to have the Scanner > > operating in a language standard different than the passed in language mode > > and different than the compiler language standard? > > I think that is acceptable. It is kinda hacky, but the lexer and preprocessor > are largely independent of the language and the standard. When they do depend > on those settings, taking the union of the features and letting the compiler > trim it down is still a perfectly sound thing to do. You can certainly construct cases where the different lexing rules in different language modes allow you to detect which language you're in from within the preprocessor ([1](https://eel.is/c++draft/diff.cpp11.lex) [2](https://eel.is/c++draft/diff.cpp14.lex#2) [3](https://eel.is/c++draft/diff.cpp03.lex#1)) or where enabling more language mode flags may reject valid code. It may be good enough for what the scanner is trying to do, but I think it's a stretch to say that it's sound. https://github.com/llvm/llvm-project/pull/93753 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Sema] Fix handling of fields with initializers in nested anonymous unions. (PR #91692)
@@ -77,3 +77,38 @@ namespace use_self { int fib(int n) { return FibTree{n}.v; } } + +namespace nested_union { + union Test1 { +union { + int inner { 42 }; +}; +int outer; + }; + static_assert(Test1{}.inner == 42, ""); + struct Test2 { +union { + struct { +int inner : 32 { 42 }; // expected-warning {{C++20 extension}} + }; + int outer; +}; + }; + static_assert(Test2{}.inner == 42, ""); + struct Int { int x; }; + struct Test3 { +int x; +union { + struct { +const int& y; +int inner : 32 { 42 }; // expected-warning {{C++20 extension}} + }; + int outer; +}; + }; + constexpr char f(Test3) { return 1; } // expected-note {{candidate function}} + constexpr char f(Int) { return 2; } // expected-note {{candidate function}} + // FIXME: This shouldn't be ambiguous; either we should reject the declaration + // of Test3, or we should exclude f(Test3) as a candidate. + static_assert(f({1}) == 2, ""); // expected-error {{call to 'f' is ambiguous}} zygoloid wrote: Maybe an `expected-error` for `Test3 test3 = {1};` would be nice here too. I think that part at least should work -- if I'm understanding correctly, we're only missing the check for the uninitialized reference member during overload resolution / in `VerifyOnly` mode? https://github.com/llvm/llvm-project/pull/91692 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Sema] Fix handling of fields with initializers in nested anonymous unions. (PR #91692)
https://github.com/zygoloid approved this pull request. Looks good to me too. https://github.com/llvm/llvm-project/pull/91692 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Sema] Fix handling of fields with initializers in nested anonymous unions. (PR #91692)
@@ -77,3 +77,38 @@ namespace use_self { int fib(int n) { return FibTree{n}.v; } } + +namespace nested_union { + union Test1 { +union { + int inner { 42 }; +}; +int outer; + }; + static_assert(Test1{}.inner == 42, ""); + struct Test2 { +union { + struct { +int inner : 32 { 42 }; // expected-warning {{C++20 extension}} + }; + int outer; +}; + }; + static_assert(Test2{}.inner == 42, ""); zygoloid wrote: Might be useful to also test that the rest of the union member gets initialized. ```suggestion struct Test2 { union { struct { int inner : 32 { 42 }; // expected-warning {{C++20 extension}} int inner_no_init; }; int outer; }; }; static_assert(Test2{}.inner == 42, ""); static_assert(Test2{}.inner_no_init == 0, ""); ``` https://github.com/llvm/llvm-project/pull/91692 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Sema] Fix handling of fields with initializers in nested anonymous unions. (PR #91692)
https://github.com/zygoloid edited https://github.com/llvm/llvm-project/pull/91692 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] allow `` `@$ `` in raw string delimiters in C++26 (PR #93216)
@@ -152,16 +151,17 @@ LLVM_READONLY inline bool isHexDigit(unsigned char c) { /// Note that '_' is both a punctuation character and an identifier character! LLVM_READONLY inline bool isPunctuation(unsigned char c) { using namespace charinfo; - return (InfoTable[c] & (CHAR_UNDER|CHAR_PERIOD|CHAR_RAWDEL|CHAR_PUNCT)) != 0; + return (InfoTable[c] & + (CHAR_UNDER | CHAR_PERIOD | CHAR_PUNCT | CHAR_PUNCT)) != 0; zygoloid wrote: `CHAR_PUNCT` is listed here twice now. https://github.com/llvm/llvm-project/pull/93216 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang codegen] Delete unnecessary GEP cleanup code. (PR #90303)
https://github.com/zygoloid approved this pull request. https://github.com/llvm/llvm-project/pull/90303 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Fix codegen of consteval functions returning an empty class. (PR #93115)
https://github.com/zygoloid approved this pull request. Looks good. Do we also need to worry about overwriting tail padding here? https://github.com/llvm/llvm-project/pull/93115 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Implement CWG2351 `void{}` (PR #78060)
https://github.com/zygoloid approved this pull request. LG https://github.com/llvm/llvm-project/pull/78060 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Implement provisional wording for CWG2398 regarding packs (PR #90820)
https://github.com/zygoloid commented: I think we should go ahead with this. The behavior here is subtle but I think it does make sense, and we're in the process of proposing this change to WG21. https://github.com/llvm/llvm-project/pull/90820 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Enable C++17 relaxed template template argument matching by default (PR #89807)
zygoloid wrote: > The immediate deprecation causes a few issues On the one hand: waiting to deprecate something that we know we're going to deprecate usually doesn't help anyone. We delay getting the message out to our users, we sometimes forget to do it for the next release, and at best it means it takes an extra release cycle to deliver whatever kind of benefit we were aiming for. So I think that we should generally start issuing deprecation warnings as soon as the new thing is ready and we know that we plan to remove the old thing. On the other hand: a warning without a dedicated warning flag is, for some of our users, the same thing as an error. And, for some of our users, a deprecation error is the same thing as removing the feature immediately. So that doesn't work in this case :) Would it be reasonable to add a `-Wno-deprecated-relaxed-template-template-args` flag (or something like that) for this specific deprecation? https://github.com/llvm/llvm-project/pull/89807 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Add attribute for consteval builtins; Declare constexpr builtins as constexpr in C++ (PR #91894)
@@ -14,13 +14,18 @@ void __builtin_va_copy(double d); // expected-error@+2 {{cannot redeclare builtin function '__builtin_va_end'}} // expected-note@+1 {{'__builtin_va_end' is a builtin with type}} void __builtin_va_end(__builtin_va_list); -// RUN: %clang_cc1 %s -fsyntax-only -verify -// RUN: %clang_cc1 %s -fsyntax-only -verify -x c void __va_start(__builtin_va_list*, ...); + void *__builtin_assume_aligned(const void *, size_t, ...); #ifdef __cplusplus -void *__builtin_assume_aligned(const void *, size_t, ...) noexcept; -#else -void *__builtin_assume_aligned(const void *, size_t, ...); +constexpr void *__builtin_assume_aligned(const void *, size_t, ...); + void *__builtin_assume_aligned(const void *, size_t, ...) noexcept; +constexpr void *__builtin_assume_aligned(const void *, size_t, ...) noexcept; + void *__builtin_assume_aligned(const void *, size_t, ...) throw(); +constexpr void *__builtin_assume_aligned(const void *, size_t, ...) throw(); + zygoloid wrote: I don't think there are particularly good non-historical reasons why we allow non-`LIBBUILTIN` builtins to be redeclared -- I think it's allowed primarily for compatibility with GCC and existing code, and it's not clear to me why GCC allows it. There were some examples of code redeclaring builtins in [this prior review thread](https://reviews.llvm.org/D45383). https://github.com/llvm/llvm-project/pull/91894 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][Sema] Fix malformed AST for anonymous class access in template. (PR #90842)
https://github.com/zygoloid approved this pull request. Nothing further on my side, LGTM https://github.com/llvm/llvm-project/pull/90842 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)
zygoloid wrote: > I'd like to proposal a separate PR for static analyzer. #91879 WDYT? That sounds good to me. https://github.com/llvm/llvm-project/pull/87933 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Analyzer][CFG] Correctly handle rebuilt default arg and default init expression (PR #91879)
@@ -2433,6 +2429,30 @@ CFGBlock *CFGBuilder::VisitChildren(Stmt *S) { return B; } +CFGBlock *CFGBuilder::VisitCXXDefaultArgExpr(CXXDefaultArgExpr *Arg, + AddStmtChoice asc) { + if (Arg->hasRewrittenInit()) { +if (asc.alwaysAdd(*this, Arg)) { + autoCreateBlock(); + appendStmt(Block, Arg); +} +return VisitStmt(Arg->getExpr(), asc); + } + return VisitStmt(Arg, asc); zygoloid wrote: Actually, are we safe from that even if the default argument is rewritten? Do we guarantee to recreate all subexpressions in that case? https://github.com/llvm/llvm-project/pull/91879 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Analyzer][CFG] Correctly handle rebuilt default arg and default init expression (PR #91879)
@@ -2433,6 +2429,30 @@ CFGBlock *CFGBuilder::VisitChildren(Stmt *S) { return B; } +CFGBlock *CFGBuilder::VisitCXXDefaultArgExpr(CXXDefaultArgExpr *Arg, + AddStmtChoice asc) { + if (Arg->hasRewrittenInit()) { +if (asc.alwaysAdd(*this, Arg)) { + autoCreateBlock(); + appendStmt(Block, Arg); +} +return VisitStmt(Arg->getExpr(), asc); + } + return VisitStmt(Arg, asc); zygoloid wrote: I think it'd be useful to keep some of the old comment here: we can't add the default argument if it's not rewritten because we could end up with the same expression appearing multiple times. https://github.com/llvm/llvm-project/pull/91879 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)
https://github.com/zygoloid approved this pull request. Looks great! I think it'd be good to understand what's happening with the static analyzer test, and update the comment to explain. But given that this is fixing a conformance / wrong code bug, I don't think we should block this PR on fixing any issue that turns up in the analyzer, unless it turns out that there's still a conformance issue here. https://github.com/llvm/llvm-project/pull/87933 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)
@@ -711,6 +711,26 @@ void InitListChecker::FillInEmptyInitForField(unsigned Init, FieldDecl *Field, if (VerifyOnly) return; + // Enter a lifetime extension context, then we can support lifetime + // extension of temporary created by aggregate initialization using a + // default member initializer (DR1815 https://wg21.link/CWG1815). + // + // In a lifetime extension context, BuildCXXDefaultInitExpr will clone the + // initializer expression on each use that would lifetime extend its + // temporaries. + EnterExpressionEvaluationContext LifetimeExtensionContext( + SemaRef, Sema::ExpressionEvaluationContext::PotentiallyEvaluated, + /*LambdaContextDecl=*/nullptr, + Sema::ExpressionEvaluationContextRecord::EK_Other, true); + + // Lifetime extension in default-member-init. + auto = SemaRef.ExprEvalContexts.back(); + + // Just copy previous record, make sure we haven't forget anything. + LastRecord = + SemaRef.ExprEvalContexts[SemaRef.ExprEvalContexts.size() - 2]; + LastRecord.InLifetimeExtendingContext = true; zygoloid wrote: This doesn't look right -- this would lifetime-extend *all* temporaries in the initializer. Only those that are actually bound to a reference member should be extended here. Perhaps instead of these changes involving `InLifetimeExtendedContext`, you could change `BuildCXXDefaultInitExpr` to always rebuild the initializer if it contains any temporaries (if the initializer expression is an `ExprWithCleanups`). Then make sure the normal lifetime extension code recurses into the default initializer and does lifetime extension when warranted. https://github.com/llvm/llvm-project/pull/87933 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)
@@ -206,13 +206,10 @@ namespace cwg1814 { // cwg1814: yes #endif } -namespace cwg1815 { // cwg1815: no +namespace cwg1815 { // cwg1815: yes zygoloid wrote: A test for constant evaluation would be nice here too. Maybe: ```c++ struct C { const int = 0; }; constexpr C c = {}; static_assert(c.r == 0); ``` or ```c++ constexpr int f() { A a = {}; return a.r; } static_assert(f() == 0); ``` https://github.com/llvm/llvm-project/pull/87933 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)
@@ -10698,7 +10698,7 @@ C++ defect report implementation status https://cplusplus.github.io/CWG/issues/1815.html;>1815 CD4 Lifetime extension in aggregate initialization -No +Clang 19 zygoloid wrote: ```suggestion Clang 19 ``` We use a different CSS class here for features that haven't been in an official release yet. https://github.com/llvm/llvm-project/pull/87933 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)
@@ -122,7 +122,7 @@ void aggregateWithReferences() { clang_analyzer_dump(viaReference.ry); // expected-warning-re {{_extended_object{Composite, viaReference, S{{[0-9]+}}} }} // clang does not currently implement extending lifetime of object bound to reference members of aggregates, - // that are created from default member initializer (see `warn_unsupported_lifetime_extension` from `-Wdangling`) + // that are created from default member initializer (see `warn_unsupported_lifetime_extension` from `-Wdangling`) (Supported now). RefAggregate defaultInitExtended{i}; // clang-bug does not extend `Composite` clang_analyzer_dump(defaultInitExtended.ry); // expected-warning {{Unknown }} zygoloid wrote: If the bug is fixed, does the analyzer output not change here? In any case, the comment should explain whatever the new behavior is, and shouldn't be mentioning a warning that no longer exists. https://github.com/llvm/llvm-project/pull/87933 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)
@@ -269,6 +269,26 @@ void init_capture_init_list() { // CHECK: } } +void check_dr1815() { // dr1815: yes +#if __cplusplus >= 201402L + + struct A { +int & = 0; +~A() {} + }; + + struct B { +A & = A{}; +~B() {} + }; + + // CHECK: void @_Z12check_dr1815v() + // CHECK: call void @_ZZ12check_dr1815vEN1BD1Ev( + // CHECK: call void @_ZZ12check_dr1815vEN1AD1Ev( + B a = {}; zygoloid wrote: This check of destructor ordering is a bit too subtle for my tastes. How about: ```suggestion // CHECK: void @_Z12check_dr1815v() B a = {}; // CHECK: call {{.*}}some_other_function extern void some_other_function(); some_other_function(); // CHECK: call void @_ZZ12check_dr1815vEN1BD1Ev( // CHECK: call void @_ZZ12check_dr1815vEN1AD1Ev( ``` ... to really drive home that the `A` temporary isn't destroyed until the end of the function? https://github.com/llvm/llvm-project/pull/87933 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)
@@ -8194,25 +8216,18 @@ void Sema::checkInitializerLifetime(const InitializedEntity , } switch (shouldLifetimeExtendThroughPath(Path)) { + case PathLifetimeKind::ShouldExtend: zygoloid wrote: Do we need separate `ShouldExtend` and `Extend` kinds still, or can we combine them? https://github.com/llvm/llvm-project/pull/87933 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)
https://github.com/zygoloid edited https://github.com/llvm/llvm-project/pull/87933 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)
https://github.com/zygoloid commented: Thanks for working on this. https://github.com/llvm/llvm-project/pull/87933 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][Sema] Fix malformed AST for anonymous class access in template. (PR #90842)
https://github.com/zygoloid commented: Another possibility to consider: when [transforming a member access](https://github.com/llvm/llvm-project/blob/ff210b94d449de8ebe1f32cf0d7763ba63b27b39/clang/lib/Sema/TreeTransform.h#L11950), strip off any implicit member accesses from the base expression before transforming the base. That way, we'll rebuild the implicit member access from scratch, which might be a little more work, but should get details like this one right. (We can then probably also remove the logic to deal with an absent `DeclName` in `RebuildMemberExpr` since that shouldn't happen any more.) https://github.com/llvm/llvm-project/pull/90842 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [SemaCXX] Implement CWG2351 `void{}` (PR #78060)
zygoloid wrote: > Note that the AST for the expression `T{}` looks like: > > ``` > // using T = int; > CXXFunctionalCastExpr 'T':'int' functional cast to T > `-InitListExpr 'T':'int' > // using T = const int; > CXXFunctionalCastExpr 'int' functional cast to T > `-InitListExpr 'int' > // using T = void; > CXXFunctionalCastExpr 'void' functional cast to T > `-InitListExpr 'void' > // using T = const void; > CXXFunctionalCastExpr 'void' functional cast to T > `-InitListExpr 'void' > ``` > > (Since the `InitListExpr` already has `void` before anything is done, the > type doesn't need to be adjusted) Nonetheless, I think it'd be more reasonable to use `CK_ToVoid` here rather than `CK_NoOp`. The `void` type for the `InitListExpr` is just a placeholder and not meant to mean that it's a real expression of type `void`. (We really ought to use a better placeholder there, so we can print the type out as `` instead of as `void` in diagnostics.) https://github.com/llvm/llvm-project/pull/78060 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] deprecate frelaxed-template-template-args, make it on by default (PR #89807)
@@ -8343,58 +8343,52 @@ bool Sema::CheckTemplateTemplateArgument(TemplateTemplateParmDecl *Param, // C++1z [temp.arg.template]p3: (DR 150) // A template-argument matches a template template-parameter P when P // is at least as specialized as the template-argument A. - // FIXME: We should enable RelaxedTemplateTemplateArgs by default as it is a - // defect report resolution from C++17 and shouldn't be introduced by - // concepts. - if (getLangOpts().RelaxedTemplateTemplateArgs) { zygoloid wrote: I think the goal should be to remove the flags eventually, but given that it's easy to keep the flags functioning, I think it'd be a good idea to do so for at least long enough for people to tell us if this change breaks something. Maybe one release cycle with the default flipped and the flags working but warning, then we remove them if people aren't complaining? https://github.com/llvm/llvm-project/pull/89807 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][modules] HeaderSearch::MarkFileModuleHeader creates extra HeaderFileInfo, breaks PCM reuse (PR #89005)
zygoloid wrote: > Unfortunately with this patch I'm still seeing the same > source-location-exhausted error. Can you try applying #89428 as well? I think we would need both in order to prune the module map from the serialized SLocEntries. https://github.com/llvm/llvm-project/pull/89005 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][modules] Only avoid pruning module maps when asked to (PR #89428)
https://github.com/zygoloid approved this pull request. Thank you! https://github.com/llvm/llvm-project/pull/89428 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][modules] HeaderSearch::MarkFileModuleHeader creates extra HeaderFileInfo, breaks PCM reuse (PR #89005)
@@ -1313,6 +1313,14 @@ OptionalFileEntryRef HeaderSearch::LookupSubframeworkHeader( // File Info Management. //===--===// +static bool +headerFileInfoModuleBitsMatchRole(const HeaderFileInfo *HFI, + ModuleMap::ModuleHeaderRole Role) { + return (HFI->isModuleHeader == ModuleMap::isModular(Role)) && + (HFI->isTextualModuleHeader == + ((Role & ModuleMap::TextualHeader) != 0)); zygoloid wrote: If a file is in multiple modules, we use the logic in [`findModuleForHeader` and `isBetterKnownHeader`](https://github.com/llvm/llvm-project/blob/main/clang/lib/Lex/ModuleMap.cpp#L578) to pick between them when deciding how to handle a `#include`: prefer the declaration from the current module over anything else, prefer available over non-available, public over private, non-textual over textual, and (if we're allowing excluded headers at all) non-excluded over excluded. If there's still a tie, then yeah, we make an arbitrary decision. (Not non-deterministic -- we prefer the first-loaded module map -- but also not likely to be predictable or useful. But it should generally not matter which one we pick -- the header should have been parsed and interpreted in the same way regardless -- unless different `.pcm`s have different compiler flags, in which case we have an ODR violation.) The point of `textual header` -- at least in our configuration -- is to mark a header as being intentionally part of a module, so that `[no_undeclared_includes]` / `-fstrict-modules-decluse` can diagnose a `#include` of a file that wasn't a declared dependency. This means that it can be reasonable for a header to be a textual header in one module and a non-textual header in another module -- in particular, that would mean that code with a direct dependency on the first module is permitted to include the header, but that the header is not built as part of building that library. If that compilation is passed a `.pcm` containing the same header as a non-textual header, the desired behavior is that we perform an import for that header (treat it as non-textual for `#include` handling) but use the `textual header` declaration when determining whether the inclusion is permitted. https://github.com/llvm/llvm-project/pull/89005 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][modules] HeaderSearch::MarkFileModuleHeader creates extra HeaderFileInfo, breaks PCM reuse (PR #89005)
@@ -1313,6 +1313,14 @@ OptionalFileEntryRef HeaderSearch::LookupSubframeworkHeader( // File Info Management. //===--===// +static bool +headerFileInfoModuleBitsMatchRole(const HeaderFileInfo *HFI, + ModuleMap::ModuleHeaderRole Role) { + return (HFI->isModuleHeader == ModuleMap::isModular(Role)) && + (HFI->isTextualModuleHeader == + ((Role & ModuleMap::TextualHeader) != 0)); zygoloid wrote: Should this be: ```suggestion return (HFI->isModuleHeader == ModuleMap::isModular(Role)) && (HFI->isModuleHeader || HFI->isTextualModuleHeader == ((Role & ModuleMap::TextualHeader) != 0)); ``` It looks like you're considering "modular header" and "textual header" to be mutually exclusive in `HeaderFileInfo` when merging, so presumably we should do the same here. I don't think this would make a difference for our use case, though. Incidentally, this choice of meaning for `isTextualModuleHeader` seems a little confusing to me from a modeling perspective, given that a header can absolutely be modular in one module and textual in another, but I think it's fine from a correctness standpoint. I think it'd be clearer to either use a three-value enum here, or to independently track whether the header is textual or modular and change the check below to `isTextualModuleHeader && !isModuleHeader`. But that's not relevant for this PR. https://github.com/llvm/llvm-project/pull/89005 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][modules] Only compute affecting module maps with implicit search (PR #87849)
@@ -161,8 +161,13 @@ static TypeCode getTypeCodeForTypeClass(Type::TypeClass id) { namespace { -std::set GetAffectingModuleMaps(const Preprocessor , - Module *RootModule) { +std::optional> +GetAffectingModuleMaps(const Preprocessor , Module *RootModule) { + // Without implicit module map search, there's no good reason to know about + // any module maps that are not affecting. + if (!PP.getHeaderSearchInfo().getHeaderSearchOpts().ImplicitModuleMaps) +return std::nullopt; zygoloid wrote: Yes, that sounds great, thank you. In our case, it's not that the module map files are unused, it's that we specify the same `-fmodule-map-file=` flag to all of our compilations, and so we *usually* get the information from a `.pcm` file rather than from the module map file, and it turns out that pruning out the repeated module maps make a rather large difference to our source location address space usage :) All that said, we have some other ideas for how to address the problems we're seeing with source location address space usage. Maybe we can get to a point where we're not relying on this at all -- I think this patch is probably the right behavior, even though I think it's contributing to a regression for us, so I think this new behavior should be the default, and we can clean up the flag once we don't need it any more. https://github.com/llvm/llvm-project/pull/87849 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][modules] Only compute affecting module maps with implicit search (PR #87849)
@@ -161,8 +161,13 @@ static TypeCode getTypeCodeForTypeClass(Type::TypeClass id) { namespace { -std::set GetAffectingModuleMaps(const Preprocessor , - Module *RootModule) { +std::optional> +GetAffectingModuleMaps(const Preprocessor , Module *RootModule) { + // Without implicit module map search, there's no good reason to know about + // any module maps that are not affecting. + if (!PP.getHeaderSearchInfo().getHeaderSearchOpts().ImplicitModuleMaps) +return std::nullopt; zygoloid wrote: Hm, I see that adding this check was the whole point of the patch. I don't think this works -- it's not feasible for the build system to know whether a module map file would affect a compilation or not. https://github.com/llvm/llvm-project/pull/87849 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][modules] Only compute affecting module maps with implicit search (PR #87849)
@@ -161,8 +161,13 @@ static TypeCode getTypeCodeForTypeClass(Type::TypeClass id) { namespace { -std::set GetAffectingModuleMaps(const Preprocessor , - Module *RootModule) { +std::optional> +GetAffectingModuleMaps(const Preprocessor , Module *RootModule) { + // Without implicit module map search, there's no good reason to know about + // any module maps that are not affecting. + if (!PP.getHeaderSearchInfo().getHeaderSearchOpts().ImplicitModuleMaps) +return std::nullopt; zygoloid wrote: I don't think this is correct -- we can know about non-affecting module map files because they were specified on the command line using `-fmodule-map-file=` and either weren't actually used or (as happens in our case) we got the same information from a PCM file and so didn't need the module map file. I think this check is a regression; can it be removed? https://github.com/llvm/llvm-project/pull/87849 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][modules] Headers meant to be included multiple times can be completely invisible in clang module builds (PR #83660)
@@ -1403,94 +1421,146 @@ bool HeaderSearch::isFileMultipleIncludeGuarded(FileEntryRef File) const { void HeaderSearch::MarkFileModuleHeader(FileEntryRef FE, ModuleMap::ModuleHeaderRole Role, bool isCompilingModuleHeader) { - bool isModularHeader = ModuleMap::isModular(Role); - // Don't mark the file info as non-external if there's nothing to change. if (!isCompilingModuleHeader) { -if (!isModularHeader) +if ((Role & ModuleMap::ExcludedHeader)) return; auto *HFI = getExistingFileInfo(FE); if (HFI && HFI->isModuleHeader) return; } auto = getFileInfo(FE); - HFI.isModuleHeader |= isModularHeader; + HFI.mergeModuleMembership(Role); HFI.isCompilingModuleHeader |= isCompilingModuleHeader; zygoloid wrote: It looks to me like we're now calling `getFileInfo(FE)` even in cases where the info doesn't change, for a textual header. I think that's what's causing the regression we're seeing -- we're now considering a repeatedly-used module map file to be affecting, even though it isn't, because we end up with a local `HeaderFileInfo` instead of an imported one. https://github.com/llvm/llvm-project/pull/83660 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Modules] Add -cc1 -flate-module-map-file to load module maps after PCMs (PR #88893)
zygoloid wrote: Have you tried changing the behavior of the existing `-fmodule-map-file=` to load the file after we load module files? If so, what kinds of things do you see breaking or changing? If we can avoid it, I think it would be better to only have a single mode here, and loading the module files first seems to make a lot of sense given that they can make it unnecessary to load the module map files at all. https://github.com/llvm/llvm-project/pull/88893 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Extend lifetime of temporaries in mem-default-init for P2718R0 (PR #86960)
https://github.com/zygoloid edited https://github.com/llvm/llvm-project/pull/86960 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Extend lifetime of temporaries in mem-default-init for P2718R0 (PR #86960)
@@ -1230,11 +1230,26 @@ CodeGenFunction::EmitCXXForRangeStmt(const CXXForRangeStmt , JumpDest LoopExit = getJumpDestInCurrentScope("for.end"); LexicalScope ForScope(*this, S.getSourceRange()); + const DeclStmt *RangeDS = cast(S.getRangeStmt()); + const VarDecl *RangeVar = cast(RangeDS->getSingleDecl()); + if (getLangOpts().CPlusPlus23) zygoloid wrote: Thr quoted rule is [class.base.init]/11, and in that context is describing the behavior of the initialization of class members in a constructor. Aggregate initialization is handled by the general rule in [class.temporary] but see also the [note in [dcl.init.general]](https://eel.is/c++draft/dcl.init#general-16.6.2.2.sentence-8) that explicitly mentions this case. https://github.com/llvm/llvm-project/pull/86960 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][Sema] Skip checking anonymous enum in using enum declaration (PR #87144)
@@ -1537,6 +1537,10 @@ void Sema::PushOnScopeChains(NamedDecl *D, Scope *S, bool AddToContext) { cast(D)->isFunctionTemplateSpecialization()) return; + if (isa(D) && D->getDeclName().isEmpty()) { zygoloid wrote: The reserved identifier check appears to be incorrect. For example, for: ```c++ enum __A { x, y }; void f() { using enum __A; enum __A e; } ``` Clang produces two warnings: one on the declaration of `enum __A` and one on the `using enum __A`. The second warning is a bug -- the `-Wreserved-identifier` warning is only supposed to fire on the declaration of a reserved name, and `using enum __A` does not declare the name `__A`, it only references it. (Note that there's no warning on the use of `enum __A` on the next line, which again is only a reference, not a declaration.) https://github.com/llvm/llvm-project/pull/87144 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][Sema] Skip checking anonymous enum in using enum declaration (PR #87144)
@@ -1537,6 +1537,10 @@ void Sema::PushOnScopeChains(NamedDecl *D, Scope *S, bool AddToContext) { cast(D)->isFunctionTemplateSpecialization()) return; + if (isa(D) && D->getDeclName().isEmpty()) { zygoloid wrote: For what name? The `using enum` declaration doesn't introduce the name of the enum. https://github.com/llvm/llvm-project/pull/87144 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][Sema] Skip checking anonymous enum in using enum declaration (PR #87144)
@@ -1537,6 +1537,10 @@ void Sema::PushOnScopeChains(NamedDecl *D, Scope *S, bool AddToContext) { cast(D)->isFunctionTemplateSpecialization()) return; + if (isa(D) && D->getDeclName().isEmpty()) { zygoloid wrote: Are those useful checks? It seems surprising to me that we'd push a `using enum` declaration into scope given that it never introduces any name of its own (the enumerators are handled separately). Are we really performing checks here that we need for a named enum but that are wrong pr unnecessary for an anonymous enum? https://github.com/llvm/llvm-project/pull/87144 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][Sema] Skip checking anonymous enum in using enum declaration (PR #87144)
@@ -1537,6 +1537,10 @@ void Sema::PushOnScopeChains(NamedDecl *D, Scope *S, bool AddToContext) { cast(D)->isFunctionTemplateSpecialization()) return; + if (isa(D) && D->getDeclName().isEmpty()) { zygoloid wrote: Why do we push a `UsingEnumDecl` into the scope at all? Would it make sense for the caller to just add the declaration to the `DeclContext` instead of calling `PushOnScopeChains`? https://github.com/llvm/llvm-project/pull/87144 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [BitInt] Expose a _BitInt literal suffix in C++ (PR #86586)
@@ -1117,19 +1118,37 @@ NumericLiteralParser::NumericLiteralParser(StringRef TokSpelling, if (isImaginary) break; // Cannot be repeated. isImaginary = true; continue; // Success. +case '_': + if (isFPConstant) +break; // Invalid for floats + if (HasSize) +break; + if (possibleBitInt) +break; // Cannot be repeated. + if (LangOpts.CPlusPlus && s[1] == '_') { +// Scan ahead to find possible rest of BitInt suffix +for (const char *c = s; c != ThisTokEnd; ++c) { + if (*c == 'w' || *c == 'W') { +possibleBitInt = true; +++s; zygoloid wrote: FYI, per CWG1810 / [over.literal]/1, it's now IFNDR rather than UB: > Some literal suffix identifiers are reserved for future standardization; see > [usrlit.suffix]. A declaration whose literal-operator-id uses such a literal > suffix identifier is ill-formed, no diagnostic required. ... but that doesn't make a difference here. (The "NDR" part is just to allow implementations to permit the standard library to declare these literal operators.) https://github.com/llvm/llvm-project/pull/86586 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Resolve FIXME: Look at E, not M (PR #85541)
zygoloid wrote: IIRC the issue here is that in C++98, the MTE is in a "weird" place in the AST, because of the different rules governing how and when temporaries are formed. For example, for `const int = C().a[0];`, we'll form an MTE wrapping just the `C()` expression, converting it to an xvalue, in C++11 onwards. But in C++98, xvalues mostly don't exist, the MTE wraps the whole `C().a[0]`, and you need to call `skipRValueSubobjectAdjustments` to get the actual expression that initializes the temporary, which I guess is what `E` is here. So I think the idea is that `E` is always the expression that initializes the temporary, and the type of `M` in C++98 might be some subobject of that, with a type that doesn't reflect properties of the temporary. So we should generally look at properties of `E`, not `M`, to understand properties of the temporary object. But in C++11 onwards I think it doesn't matter, because we always materialize immediately. (That's mostly from memory, I might have got some details wrong.) https://github.com/llvm/llvm-project/pull/85541 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)
zygoloid wrote: I don't think we've established an explicit policy, and we've made ABI-breaking changes previously. I think we should avoid breaks within a major release version (don't backport this) to avoid giving packagers headaches: the ubsan runtime is installed in a versioned directory, but I don't think that includes the patch version. Having an ABI break here seems OK and in line with historic precedent. (But we ought to document our ABI policy for our runtime libraries.) https://github.com/llvm/llvm-project/pull/75481 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] Fix second arg of __builtin_{clzg,ctzg} not always being evaluated (PR #86742)
@@ -0,0 +1,39 @@ +// RUN: %clang_cc1 -fsyntax-only -verify %s +// expected-no-diagnostics + +constexpr int increment(int& x) { + x++; + return x; +} + +constexpr int test_clzg_0() { + int x = 0; + [[maybe_unused]] int unused = __builtin_clzg(0U, increment(x)); + return x; +} + +static_assert(test_clzg_0() == 1); + +constexpr int test_clzg_1() { + int x = 0; + [[maybe_unused]] int unused = __builtin_clzg(1U, increment(x)); + return x; +} + +static_assert(test_clzg_1() == 1); + +constexpr int test_ctzg_0() { + int x = 0; + [[maybe_unused]] int unused = __builtin_ctzg(0U, increment(x)); + return x; +} + +static_assert(test_ctzg_0() == 1); + +constexpr int test_ctzg_1() { + int x = 0; + [[maybe_unused]] int unused = __builtin_ctzg(1U, increment(x)); zygoloid wrote: Can you cast to `void` instead? ```suggestion (void)__builtin_clzg(0U, increment(x)); return x; } static_assert(test_clzg_0() == 1); constexpr int test_clzg_1() { int x = 0; (void)__builtin_clzg(1U, increment(x)); return x; } static_assert(test_clzg_1() == 1); constexpr int test_ctzg_0() { int x = 0; (void)__builtin_ctzg(0U, increment(x)); return x; } static_assert(test_ctzg_0() == 1); constexpr int test_ctzg_1() { int x = 0; (void)__builtin_ctzg(1U, increment(x)); ``` https://github.com/llvm/llvm-project/pull/86742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [codegen] Emit missing cleanups for stmt-expr and coro suspensions [take-2] (PR #85398)
@@ -2503,6 +2506,29 @@ Stmt *BlockExpr::getBody() { // Generic Expression Routines //===--===// +bool Expr::mayBranchOut() const { zygoloid wrote: Computing this seems a little expensive in general. I wonder if we could track a bit on `FunctionDecl` that indicates whether it contains any branches out of statement expressions, and skip calling this if the enclosing function is not a coroutine and does not contain branches out of statement expressions. https://github.com/llvm/llvm-project/pull/85398 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HLSL] Implement array temporary support (PR #79382)
@@ -3173,41 +3174,46 @@ class ArrayType : public Type, public llvm::FoldingSetNode { return T->getTypeClass() == ConstantArray || T->getTypeClass() == VariableArray || T->getTypeClass() == IncompleteArray || - T->getTypeClass() == DependentSizedArray; + T->getTypeClass() == DependentSizedArray || + T->getTypeClass() == ArrayParameter; } }; /// Represents the canonical version of C arrays with a specified constant size. /// For example, the canonical type for 'int A[4 + 4*100]' is a /// ConstantArrayType where the element type is 'int' and the size is 404. -class ConstantArrayType final -: public ArrayType, - private llvm::TrailingObjects { zygoloid wrote: As a general principle we try to keep our AST nodes small. This change isn't going to make a big difference by itself, but abandoning that principle and making a bunch of changes like this would. I think it's worth putting in a small amount of effort here, but if it doesn't work out nicely then it's fine to remove the optimization. We can find other places to save space. https://github.com/llvm/llvm-project/pull/79382 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [HLSL] Implement array temporary support (PR #79382)
@@ -3173,41 +3174,46 @@ class ArrayType : public Type, public llvm::FoldingSetNode { return T->getTypeClass() == ConstantArray || T->getTypeClass() == VariableArray || T->getTypeClass() == IncompleteArray || - T->getTypeClass() == DependentSizedArray; + T->getTypeClass() == DependentSizedArray || + T->getTypeClass() == ArrayParameter; } }; /// Represents the canonical version of C arrays with a specified constant size. /// For example, the canonical type for 'int A[4 + 4*100]' is a /// ConstantArrayType where the element type is 'int' and the size is 404. -class ConstantArrayType final -: public ArrayType, - private llvm::TrailingObjects { zygoloid wrote: The size expression is usually not part of a constant array type. Normally, all we want as part of the type node is the bound, and the size expression if needed can be found on the `ArrayTypeLoc`. The rare special case being handled here is that if the array bound is instantiation-dependent but not dependent, then we need to preserve the expression in the type (not just in the `TypeLoc`) because it affects (for example) whether two template declarations involving such a type are considered to declare the same template. Because this is a very rare case, and `ConstantArrayType`s can be relatively common (eg, as the types of string literals), it seemed preferable to not store the `Expr*` if it's not needed, but in absolute terms it's probably a minor difference in memory usage, so it's probably fine to remove this optimization. If we want to keep the optimization, perhaps we could replace the `Size` and `SizeExpr` fields with a pointer union that can hold either a <= 63-bit array bound (the common case) or a pointer to a slab-allocated pair of `APInt` and `const Expr*` (in the case where the bound doesn't fit in 63 bits or the size expression is instantiation-dependent). https://github.com/llvm/llvm-project/pull/79382 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)
@@ -26996,18 +26996,38 @@ class, structure, array, or other object. Arguments: "" -The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a -pointer to or into the ``object``. The second argument determines whether -``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size is -unknown. The third argument controls how ``llvm.objectsize`` acts when ``null`` -in address space 0 is used as its pointer argument. If it's ``false``, -``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, if -the ``null`` is in a non-zero address space or if ``true`` is given for the -third argument of ``llvm.objectsize``, we assume its size is unknown. The fourth -argument to ``llvm.objectsize`` determines if the value should be evaluated at -runtime. +The ``llvm.objectsize`` intrinsic takes six arguments: + +- The first argument is a pointer to or into the ``object``. +- The second argument controls which value to return when the size is unknown: + + - If it's ``false``, ``llvm.objectsize`` returns ``-1``. + - If it's ``true``, ``llvm.objectsize`` returns ``0``. + +- The third argument controls how ``llvm.objectsize`` acts when ``null`` in + address space 0 is used as its pointer argument: + + - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given +``null``. + - If it's ``true``, or the ``null`` pointer is in a non-zero address space, +the size is assumed to be unknown. + +- The fourth argument to ``llvm.objectsize`` determines if the value should be + evaluated at runtime. +- The fifth argument controls which size ``llvm.objectsize`` returns: + + - If it's ``false``, ``llvm.objectsize`` returns the size of the closest +surrounding subobject. + - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object. + +- If non-zero, the sixth and seventh arguments encode the size and offset + information, respectively, of the original subobject's layout and is used + when the fifth argument is ``false``. +- The seventh argument encodes the offset information of the original + subobject's layout and is used when the fifth argument is ``false``. zygoloid wrote: > > in some simple test cases it does appear to work. > > Please don't be insulting. I didn't just run a couple of "simple" tests and > called it a day. I had many rather convoluted examples and they all worked. I apologize, I didn't mean to imply you hadn't been diligent here. https://github.com/llvm/llvm-project/pull/83204 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)
https://github.com/zygoloid edited https://github.com/llvm/llvm-project/pull/83204 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)
@@ -26996,18 +26996,38 @@ class, structure, array, or other object. Arguments: "" -The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a -pointer to or into the ``object``. The second argument determines whether -``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size is -unknown. The third argument controls how ``llvm.objectsize`` acts when ``null`` -in address space 0 is used as its pointer argument. If it's ``false``, -``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, if -the ``null`` is in a non-zero address space or if ``true`` is given for the -third argument of ``llvm.objectsize``, we assume its size is unknown. The fourth -argument to ``llvm.objectsize`` determines if the value should be evaluated at -runtime. +The ``llvm.objectsize`` intrinsic takes six arguments: + +- The first argument is a pointer to or into the ``object``. +- The second argument controls which value to return when the size is unknown: + + - If it's ``false``, ``llvm.objectsize`` returns ``-1``. + - If it's ``true``, ``llvm.objectsize`` returns ``0``. + +- The third argument controls how ``llvm.objectsize`` acts when ``null`` in + address space 0 is used as its pointer argument: + + - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given +``null``. + - If it's ``true``, or the ``null`` pointer is in a non-zero address space, +the size is assumed to be unknown. + +- The fourth argument to ``llvm.objectsize`` determines if the value should be + evaluated at runtime. +- The fifth argument controls which size ``llvm.objectsize`` returns: + + - If it's ``false``, ``llvm.objectsize`` returns the size of the closest +surrounding subobject. + - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object. + +- If non-zero, the sixth and seventh arguments encode the size and offset + information, respectively, of the original subobject's layout and is used + when the fifth argument is ``false``. +- The seventh argument encodes the offset information of the original + subobject's layout and is used when the fifth argument is ``false``. zygoloid wrote: > First you tell me that I can't use LLVM's IR to determine the subobject, even > though I did and it worked just fine It definitely doesn't work fine, but sure, in some simple test cases it does appear to work. > now you're saying that I can't use the front-end's knowledge about the > structure. I'm not. I'm saying that you can't assume that the LLVM IR idea of the complete object lines up with some enclosing subobject you've found in the frontend. You're still trying to use the IR notion of complete object and subobjects, and that still doesn't work for the same reasons as before. You can absolutely compute where the subobject is in the frontend, and pass that information onto LLVM. But you'll need to pass it on in a way that is meaningful to LLVM. For example, if you pass a pointer and size to the intrinsic describing the complete range of addresses covering the subobject, that should be fine. But an offset is not useful if the frontend and middle-end can't agree on what it's an offset relative to. > In your example, you've explicitly lied to the compiler about the types being > passed in. Sorry, that was a typo in my example: the call in `h()` should be `f()`. > I have no clue what you mean by pass a "pointer to `p->n` to the intrinsic" > as that's already the first argument in the intrinsic. I mean, pass a pointer to the start of the subobject. For example, for `p->arr[i]`, you'd pass in `>arr`. https://github.com/llvm/llvm-project/pull/83204 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)
@@ -26996,18 +26996,38 @@ class, structure, array, or other object. Arguments: "" -The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a -pointer to or into the ``object``. The second argument determines whether -``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size is -unknown. The third argument controls how ``llvm.objectsize`` acts when ``null`` -in address space 0 is used as its pointer argument. If it's ``false``, -``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, if -the ``null`` is in a non-zero address space or if ``true`` is given for the -third argument of ``llvm.objectsize``, we assume its size is unknown. The fourth -argument to ``llvm.objectsize`` determines if the value should be evaluated at -runtime. +The ``llvm.objectsize`` intrinsic takes six arguments: + +- The first argument is a pointer to or into the ``object``. +- The second argument controls which value to return when the size is unknown: + + - If it's ``false``, ``llvm.objectsize`` returns ``-1``. + - If it's ``true``, ``llvm.objectsize`` returns ``0``. + +- The third argument controls how ``llvm.objectsize`` acts when ``null`` in + address space 0 is used as its pointer argument: + + - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given +``null``. + - If it's ``true``, or the ``null`` pointer is in a non-zero address space, +the size is assumed to be unknown. + +- The fourth argument to ``llvm.objectsize`` determines if the value should be + evaluated at runtime. +- The fifth argument controls which size ``llvm.objectsize`` returns: + + - If it's ``false``, ``llvm.objectsize`` returns the size of the closest +surrounding subobject. + - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object. + +- If non-zero, the sixth and seventh arguments encode the size and offset + information, respectively, of the original subobject's layout and is used + when the fifth argument is ``false``. +- The seventh argument encodes the offset information of the original + subobject's layout and is used when the fifth argument is ``false``. zygoloid wrote: You're passing the difference between the address of some arbitrary class object and a field. The address of that class object is not the base address that the LLVM intrinsic will calculate. So the offset you're passing in is not meaningful. And there's no way it even *can* be meaningful. Consider something like this: ```c++ struct A { int n; }; struct B { int data[100]; A a; }; int f(A *p) { return __builtin_dynamic_object_size(>n, 1); } int g() { A a; return f(); } int h() { B b; return f(); } ``` After inlining, the BDOS intrinsic in `g` will see a base pointer of ``, and the intrinsic in `h` will see a base pointer of ``. The offset from the base pointer to the `n` subobject is different in each case. So there's no way you can compute a constant offset in Clang's lowering and pass it to the intrinsic. What you *can* do is pass a pointer to `p->n` to the intrinsic as the start of the subobject. That'll do the right thing regardless of what LLVM computes to be the complete enclosing object. https://github.com/llvm/llvm-project/pull/83204 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)
@@ -26996,18 +26996,38 @@ class, structure, array, or other object. Arguments: "" -The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a -pointer to or into the ``object``. The second argument determines whether -``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size is -unknown. The third argument controls how ``llvm.objectsize`` acts when ``null`` -in address space 0 is used as its pointer argument. If it's ``false``, -``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, if -the ``null`` is in a non-zero address space or if ``true`` is given for the -third argument of ``llvm.objectsize``, we assume its size is unknown. The fourth -argument to ``llvm.objectsize`` determines if the value should be evaluated at -runtime. +The ``llvm.objectsize`` intrinsic takes six arguments: + +- The first argument is a pointer to or into the ``object``. +- The second argument controls which value to return when the size is unknown: + + - If it's ``false``, ``llvm.objectsize`` returns ``-1``. + - If it's ``true``, ``llvm.objectsize`` returns ``0``. + +- The third argument controls how ``llvm.objectsize`` acts when ``null`` in + address space 0 is used as its pointer argument: + + - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given +``null``. + - If it's ``true``, or the ``null`` pointer is in a non-zero address space, +the size is assumed to be unknown. + +- The fourth argument to ``llvm.objectsize`` determines if the value should be + evaluated at runtime. +- The fifth argument controls which size ``llvm.objectsize`` returns: + + - If it's ``false``, ``llvm.objectsize`` returns the size of the closest +surrounding subobject. + - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object. + +- If non-zero, the sixth and seventh arguments encode the size and offset + information, respectively, of the original subobject's layout and is used + when the fifth argument is ``false``. +- The seventh argument encodes the offset information of the original + subobject's layout and is used when the fifth argument is ``false``. zygoloid wrote: As far as I can see, the "offset" value you're passing in is a difference between two pointers both of which are unknown to the middle-end -- it's the difference from the start of the two-levels-out enclosing class subobject to the one-level-out enclosing field subobject, I think. I could be misunderstanding something, but I don't see how you can use that information to get correct results from the builtin. Can you explain how the LLVM intrinsic uses this information? https://github.com/llvm/llvm-project/pull/83204 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)
@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const Expr *E, unsigned Type, return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, IsSigned)); } +namespace { + +/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a +/// __builtin_dynamic_object_size call. Information gathered from the +/// sub-object is used by the back-end to determine the correct size when the +/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking +/// for the sub-object size). +/// +/// The expectation is that we'll eventually hit one of three expression types: +/// +/// 1. DeclRefExpr - This is the expression for the base of the structure. +/// 2. MemberExpr - This is the field in the structure. +/// 3. CompoundLiteralExpr - This is for people who create something +/// heretical like (struct foo has a flexible array member): +/// +///(struct foo){ 1, 2 }.blah[idx]; +/// +/// All other expressions can be correctly handled with the current code. +struct SubobjectFinder +: public ConstStmtVisitor { + SubobjectFinder() = default; + + //======// + //Visitor Methods + //======// + + const Expr *VisitStmt(const Stmt *S) { return nullptr; } + + const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; } + const Expr *VisitMemberExpr(const MemberExpr *E) { return E; } + const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) { +return E; + } + + const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) { +return Visit(E->getBase()); + } + const Expr *VisitCastExpr(const CastExpr *E) { zygoloid wrote: Does GCC look through explicit casts? I wonder if this should be restricted to `ImplicitCastExpr`s. https://github.com/llvm/llvm-project/pull/83204 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)
@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const Expr *E, unsigned Type, return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, IsSigned)); } +namespace { + +/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a +/// __builtin_dynamic_object_size call. Information gathered from the +/// sub-object is used by the back-end to determine the correct size when the +/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking +/// for the sub-object size). +/// +/// The expectation is that we'll eventually hit one of three expression types: +/// +/// 1. DeclRefExpr - This is the expression for the base of the structure. +/// 2. MemberExpr - This is the field in the structure. +/// 3. CompoundLiteralExpr - This is for people who create something +/// heretical like (struct foo has a flexible array member): +/// +///(struct foo){ 1, 2 }.blah[idx]; +/// +/// All other expressions can be correctly handled with the current code. +struct SubobjectFinder +: public ConstStmtVisitor { + SubobjectFinder() = default; + + //======// + //Visitor Methods + //======// + + const Expr *VisitStmt(const Stmt *S) { return nullptr; } + + const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; } + const Expr *VisitMemberExpr(const MemberExpr *E) { return E; } + const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) { +return E; + } + + const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) { +return Visit(E->getBase()); + } + const Expr *VisitCastExpr(const CastExpr *E) { +return Visit(E->getSubExpr()); + } + const Expr *VisitParenExpr(const ParenExpr *E) { +return Visit(E->getSubExpr()); + } + const Expr *VisitUnaryAddrOf(const clang::UnaryOperator *E) { +return Visit(E->getSubExpr()); + } + const Expr *VisitUnaryDeref(const clang::UnaryOperator *E) { +return Visit(E->getSubExpr()); + } zygoloid wrote: I think you'll need to be more careful when walking through address-of / dereferences -- the set of things you should step over when traversing a pointer to an object is different from the set of things you should step over when traversing an object lvalue. For example, the bounds to use for `*p->member` will be computed as the bounds of `member`, which isn't correct. I think you could address this by either having separate traversals for pointers versus lvalues, or by avoiding (for example) stepping through lvalue-to-rvalue conversions when stepping over `CastExpr`s -- and in fact, the latter seems like a good idea in general, given that a `CastExpr` could do pretty much anything to the pointer / lvalue. In general, I think it's only really safe to step over casts that are a no-op for address purposes. Bitcasts seem OK, address space conversions seem OK, etc. but a lot of cast kinds are not going to be reasonable to step over. https://github.com/llvm/llvm-project/pull/83204 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)
@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const Expr *E, unsigned Type, return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, IsSigned)); } +namespace { + +/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a +/// __builtin_dynamic_object_size call. Information gathered from the +/// sub-object is used by the back-end to determine the correct size when the +/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking +/// for the sub-object size). +/// +/// The expectation is that we'll eventually hit one of three expression types: +/// +/// 1. DeclRefExpr - This is the expression for the base of the structure. +/// 2. MemberExpr - This is the field in the structure. +/// 3. CompoundLiteralExpr - This is for people who create something +/// heretical like (struct foo has a flexible array member): +/// +///(struct foo){ 1, 2 }.blah[idx]; +/// +/// All other expressions can be correctly handled with the current code. +struct SubobjectFinder +: public ConstStmtVisitor { + SubobjectFinder() = default; + + //======// + //Visitor Methods + //======// + + const Expr *VisitStmt(const Stmt *S) { return nullptr; } + + const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; } + const Expr *VisitMemberExpr(const MemberExpr *E) { return E; } + const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) { +return E; + } + + const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) { +return Visit(E->getBase()); + } + const Expr *VisitCastExpr(const CastExpr *E) { +return Visit(E->getSubExpr()); + } + const Expr *VisitParenExpr(const ParenExpr *E) { +return Visit(E->getSubExpr()); + } + const Expr *VisitUnaryAddrOf(const clang::UnaryOperator *E) { +return Visit(E->getSubExpr()); + } + const Expr *VisitUnaryDeref(const clang::UnaryOperator *E) { +return Visit(E->getSubExpr()); + } +}; + +} // end anonymous namespace + +/// getFieldInfo - Gather the size and offset of the field \p VD in \p RD. +static std::pair getFieldInfo(CodeGenFunction , + const RecordDecl *RD, + const ValueDecl *VD, + uint64_t Offset = 0) { + if (!RD) +return std::make_pair(0, 0); + + ASTContext = CGF.getContext(); + const ASTRecordLayout = Ctx.getASTRecordLayout(RD); + unsigned FieldNo = 0; + + for (const Decl *D : RD->decls()) { +if (const auto *Record = dyn_cast(D)) { + std::pair Res = + getFieldInfo(CGF, Record->getDefinition(), VD, + Offset + Layout.getFieldOffset(FieldNo)); + if (Res.first != 0) +return Res; + continue; +} + +if (const auto *FD = dyn_cast(D); FD == VD) { + Offset += Layout.getFieldOffset(FieldNo); + return std::make_pair(Ctx.getTypeSizeInChars(FD->getType()).getQuantity(), +Ctx.toCharUnitsFromBits(Offset).getQuantity()); +} + +if (isa(D)) + ++FieldNo; + } zygoloid wrote: This work recursively looping through fields is not necessary: this function only succeeds when `VD` is a `FieldDecl`, so you can `dyn_cast` it to that type, then get the enclosing `DeclContext` to find the record, and use `FieldDecl::getFieldIndex` to find the field number. https://github.com/llvm/llvm-project/pull/83204 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)
@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const Expr *E, unsigned Type, return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, IsSigned)); } +namespace { + +/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a +/// __builtin_dynamic_object_size call. Information gathered from the +/// sub-object is used by the back-end to determine the correct size when the +/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking +/// for the sub-object size). +/// +/// The expectation is that we'll eventually hit one of three expression types: +/// +/// 1. DeclRefExpr - This is the expression for the base of the structure. +/// 2. MemberExpr - This is the field in the structure. +/// 3. CompoundLiteralExpr - This is for people who create something +/// heretical like (struct foo has a flexible array member): +/// +///(struct foo){ 1, 2 }.blah[idx]; +/// +/// All other expressions can be correctly handled with the current code. +struct SubobjectFinder +: public ConstStmtVisitor { + SubobjectFinder() = default; + + //======// + //Visitor Methods + //======// + + const Expr *VisitStmt(const Stmt *S) { return nullptr; } + + const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; } + const Expr *VisitMemberExpr(const MemberExpr *E) { return E; } + const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) { +return E; + } + + const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) { +return Visit(E->getBase()); + } + const Expr *VisitCastExpr(const CastExpr *E) { +return Visit(E->getSubExpr()); + } + const Expr *VisitParenExpr(const ParenExpr *E) { +return Visit(E->getSubExpr()); + } + const Expr *VisitUnaryAddrOf(const clang::UnaryOperator *E) { +return Visit(E->getSubExpr()); + } + const Expr *VisitUnaryDeref(const clang::UnaryOperator *E) { +return Visit(E->getSubExpr()); + } +}; + +} // end anonymous namespace + +/// getFieldInfo - Gather the size and offset of the field \p VD in \p RD. +static std::pair getFieldInfo(CodeGenFunction , + const RecordDecl *RD, + const ValueDecl *VD, + uint64_t Offset = 0) { + if (!RD) +return std::make_pair(0, 0); + + ASTContext = CGF.getContext(); + const ASTRecordLayout = Ctx.getASTRecordLayout(RD); + unsigned FieldNo = 0; + + for (const Decl *D : RD->decls()) { +if (const auto *Record = dyn_cast(D)) { + std::pair Res = + getFieldInfo(CGF, Record->getDefinition(), VD, + Offset + Layout.getFieldOffset(FieldNo)); + if (Res.first != 0) +return Res; + continue; +} + +if (const auto *FD = dyn_cast(D); FD == VD) { + Offset += Layout.getFieldOffset(FieldNo); + return std::make_pair(Ctx.getTypeSizeInChars(FD->getType()).getQuantity(), +Ctx.toCharUnitsFromBits(Offset).getQuantity()); +} + +if (isa(D)) + ++FieldNo; + } + + return std::make_pair(0, 0); +} + +/// getSubobjectInfo - Find the sub-object that \p E points to. If it lives +/// inside a struct, return the "size" and "offset" of that sub-object. +static std::pair getSubobjectInfo(CodeGenFunction , + const Expr *E) { + const Expr *Subobject = SubobjectFinder().Visit(E); + if (!Subobject) +return std::make_pair(0, 0); + + const RecordDecl *OuterRD = nullptr; + const ValueDecl *VD = nullptr; + + if (const auto *DRE = dyn_cast(Subobject)) { +// We're pointing to the beginning of the struct. +VD = DRE->getDecl(); +QualType Ty = VD->getType(); +if (Ty->isPointerType()) + Ty = Ty->getPointeeType(); +OuterRD = Ty->getAsRecordDecl(); zygoloid wrote: If I'm reading this correctly, I think this case is redundant: `getFieldInfo` only succeeds when `VD` is a field, but we're not going to have an evaluated `DeclRefExpr` that names a field. Can we return `0, 0` in this case, like we do for compound literals? I think the only case when we have non-zero values to return is when we've found a `FieldDecl`. https://github.com/llvm/llvm-project/pull/83204 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)
@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const Expr *E, unsigned Type, return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, IsSigned)); } +namespace { + +/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a +/// __builtin_dynamic_object_size call. Information gathered from the +/// sub-object is used by the back-end to determine the correct size when the +/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking +/// for the sub-object size). +/// +/// The expectation is that we'll eventually hit one of three expression types: +/// +/// 1. DeclRefExpr - This is the expression for the base of the structure. +/// 2. MemberExpr - This is the field in the structure. +/// 3. CompoundLiteralExpr - This is for people who create something +/// heretical like (struct foo has a flexible array member): +/// +///(struct foo){ 1, 2 }.blah[idx]; +/// +/// All other expressions can be correctly handled with the current code. +struct SubobjectFinder +: public ConstStmtVisitor { + SubobjectFinder() = default; + + //======// + //Visitor Methods + //======// + + const Expr *VisitStmt(const Stmt *S) { return nullptr; } + + const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; } + const Expr *VisitMemberExpr(const MemberExpr *E) { return E; } + const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) { +return E; + } + + const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) { +return Visit(E->getBase()); + } + const Expr *VisitCastExpr(const CastExpr *E) { +return Visit(E->getSubExpr()); + } zygoloid wrote: A derived-to-base cast is a traversal to a subobject in C++, so we should presumably terminate the traversal when we reach one and use the offset and size of the base class as the subobject. That'd be a pretty big delta from what you have here, but it'd be a good idea to add a FIXME here. https://github.com/llvm/llvm-project/pull/83204 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)
@@ -26996,18 +26996,38 @@ class, structure, array, or other object. Arguments: "" -The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a -pointer to or into the ``object``. The second argument determines whether -``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size is -unknown. The third argument controls how ``llvm.objectsize`` acts when ``null`` -in address space 0 is used as its pointer argument. If it's ``false``, -``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, if -the ``null`` is in a non-zero address space or if ``true`` is given for the -third argument of ``llvm.objectsize``, we assume its size is unknown. The fourth -argument to ``llvm.objectsize`` determines if the value should be evaluated at -runtime. +The ``llvm.objectsize`` intrinsic takes six arguments: + +- The first argument is a pointer to or into the ``object``. +- The second argument controls which value to return when the size is unknown: + + - If it's ``false``, ``llvm.objectsize`` returns ``-1``. + - If it's ``true``, ``llvm.objectsize`` returns ``0``. + +- The third argument controls how ``llvm.objectsize`` acts when ``null`` in + address space 0 is used as its pointer argument: + + - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given +``null``. + - If it's ``true``, or the ``null`` pointer is in a non-zero address space, +the size is assumed to be unknown. + +- The fourth argument to ``llvm.objectsize`` determines if the value should be + evaluated at runtime. +- The fifth argument controls which size ``llvm.objectsize`` returns: + + - If it's ``false``, ``llvm.objectsize`` returns the size of the closest +surrounding subobject. + - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object. + +- If non-zero, the sixth and seventh arguments encode the size and offset + information, respectively, of the original subobject's layout and is used + when the fifth argument is ``false``. +- The seventh argument encodes the offset information of the original + subobject's layout and is used when the fifth argument is ``false``. zygoloid wrote: I think the information you're passing in here isn't quite what we'd want. If I'm reading the code correctly, the offset you're passing in is the field offset relative to the immediately-enclosing record type, which doesn't give us any information about either where the pointer is within the subobject, or where the subobject is within the complete object, so this doesn't seem like it can be enough information to produce a correct result. Rather than passing in the offset of the subobject (relative to an unknown anchor point), I think it would be more useful to pass in a pointer to the start of the subobject. Or to pass in the offset from the start of the subobject to the pointer argument, but that would likely be harder for the frontend to calculate (eg, you'd need to work out the offset produced by array indexing). https://github.com/llvm/llvm-project/pull/83204 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)
@@ -5571,11 +5571,52 @@ LValue CodeGenFunction::EmitBinaryOperatorLValue(const BinaryOperator *E) { break; } -RValue RV = EmitAnyExpr(E->getRHS()); +llvm::Value *Previous = nullptr; +RValue RV; +QualType SrcType = E->getRHS()->getType(); +// Check if LHS is a bitfield, if RHS contains an implicit cast expression +// we want to extract that value and potentially (if the bitfield sanitizer +// is enabled) use it to check for an implicit conversion. +if (E->getLHS()->refersToBitField()) { + // Get the RHS before scalar conversion. + if (auto *ICE = GetOriginalRHSForBitfieldSanitizer(E)) { zygoloid wrote: What do you think about renaming this to `GetOriginalRHSForBitfieldAssignment`, given that we now use it unconditionally for all lowering of bitfield assignments? https://github.com/llvm/llvm-project/pull/75481 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)
https://github.com/zygoloid edited https://github.com/llvm/llvm-project/pull/75481 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)
https://github.com/zygoloid commented: Looks good to me. @AaronBallman Did you have remaining concerns here? https://github.com/llvm/llvm-project/pull/75481 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)
@@ -555,13 +555,11 @@ static void handleImplicitConversion(ImplicitConversionData *Data, ReportOptions Opts, ValueHandle Src, ValueHandle Dst) { SourceLocation Loc = Data->Loc.acquire(); - ErrorType ET = ErrorType::GenericUB; - const TypeDescriptor = Data->FromType; const TypeDescriptor = Data->ToType; - bool SrcSigned = SrcTy.isSignedIntegerTy(); bool DstSigned = DstTy.isSignedIntegerTy(); + ErrorType ET = ErrorType::GenericUB; zygoloid wrote: Ah, OK. That makes sense. https://github.com/llvm/llvm-project/pull/75481 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)
zygoloid wrote: > I'm not opposed to the changes, though I do find it somewhat strange to add > them to UBSan given that they're not undefined behavior (even though there's > precedent). This is adding checks to `-fsanitize=implicit-conversion`, which is not part of `-fsanitize=undefined`. It's sort of ambiguous whether UBSan means `-fsanitize=undefined` (that is, checks for UB that can be done without a big runtime), or whether it means any of the checks that the UBSan infrastructure and runtime power, but the established precedent is we also use the UBSan infrastructure to do non-UB data-loss checks like this, and this patch seems like a logical extension of the existing behavior. I'd be happier if we drew a brighter distinction between the UB checks and the non-UB checks that our sanitizers perform. For example, perhaps the non-UB sanitizers shouldn't be listed in https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html but in a different page, or perhaps at least in a different section from the other checks. But I think that's independent of this PR, and in the context of this PR, I don't think we should be re-litigating whether this kind of data-loss-through-conversion check is in scope. https://github.com/llvm/llvm-project/pull/75481 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)
@@ -555,13 +555,11 @@ static void handleImplicitConversion(ImplicitConversionData *Data, ReportOptions Opts, ValueHandle Src, ValueHandle Dst) { SourceLocation Loc = Data->Loc.acquire(); - ErrorType ET = ErrorType::GenericUB; - const TypeDescriptor = Data->FromType; const TypeDescriptor = Data->ToType; - bool SrcSigned = SrcTy.isSignedIntegerTy(); bool DstSigned = DstTy.isSignedIntegerTy(); + ErrorType ET = ErrorType::GenericUB; zygoloid wrote: This seems suspect, given that this handler is detecting things that aren't UB. https://github.com/llvm/llvm-project/pull/75481 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)
@@ -5571,11 +5571,50 @@ LValue CodeGenFunction::EmitBinaryOperatorLValue(const BinaryOperator *E) { break; } -RValue RV = EmitAnyExpr(E->getRHS()); +llvm::Value *Previous = nullptr; +RValue RV; +QualType SrcType = E->getRHS()->getType(); +// Check if LHS is a bitfield and sanitizer checks are enabled zygoloid wrote: (nit, throughout) Coding style is to end comments with a period. https://github.com/llvm/llvm-project/pull/75481 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)
@@ -5571,11 +5571,50 @@ LValue CodeGenFunction::EmitBinaryOperatorLValue(const BinaryOperator *E) { break; } -RValue RV = EmitAnyExpr(E->getRHS()); +llvm::Value *Previous = nullptr; +RValue RV; +QualType SrcType = E->getRHS()->getType(); +// Check if LHS is a bitfield and sanitizer checks are enabled +if (E->getLHS()->refersToBitField() && +SanOpts.hasOneOf(SanitizerKind::ImplicitConversion | + SanitizerKind::ImplicitBitfieldConversion)) { zygoloid wrote: ```suggestion SanOpts.hasOneOf(SanitizerKind::ImplicitBitfieldConversion)) { ``` No need to check for the containing group; that should be handled automatically. https://github.com/llvm/llvm-project/pull/75481 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)
@@ -5571,11 +5571,50 @@ LValue CodeGenFunction::EmitBinaryOperatorLValue(const BinaryOperator *E) { break; } -RValue RV = EmitAnyExpr(E->getRHS()); +llvm::Value *Previous = nullptr; +RValue RV; +QualType SrcType = E->getRHS()->getType(); +// Check if LHS is a bitfield and sanitizer checks are enabled +if (E->getLHS()->refersToBitField() && +SanOpts.hasOneOf(SanitizerKind::ImplicitConversion | + SanitizerKind::ImplicitBitfieldConversion)) { + // Get the RHS before scalar conversion + if (auto *ICE = GetOriginalRHSForBitfieldSanitizer(E)) { +SrcType = ICE->getSubExpr()->getType(); +Previous = EmitScalarExpr(ICE->getSubExpr()); +// Pass default ScalarConversionOpts to avoid emitting +// integer sanitizer checks as E refers to bitfield +llvm::Value *RHS = EmitScalarConversion( +Previous, SrcType, ICE->getType(), ICE->getExprLoc()); +RV = RValue::get(RHS); + } +} + +// Otherwise, visit RHS as usual +if (!Previous) + RV = EmitAnyExpr(E->getRHS()); zygoloid wrote: Would it be correct to do the above, more-elaborate lowering instead of this regardless of whether a sanitizer is enabled? I'd have more confidence that we're performing a correct emission if we used the same code regardless of sanitizer mode. https://github.com/llvm/llvm-project/pull/75481 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)
@@ -5571,11 +5571,50 @@ LValue CodeGenFunction::EmitBinaryOperatorLValue(const BinaryOperator *E) { break; } -RValue RV = EmitAnyExpr(E->getRHS()); +llvm::Value *Previous = nullptr; +RValue RV; +QualType SrcType = E->getRHS()->getType(); +// Check if LHS is a bitfield and sanitizer checks are enabled +if (E->getLHS()->refersToBitField() && +SanOpts.hasOneOf(SanitizerKind::ImplicitConversion | + SanitizerKind::ImplicitBitfieldConversion)) { + // Get the RHS before scalar conversion + if (auto *ICE = GetOriginalRHSForBitfieldSanitizer(E)) { +SrcType = ICE->getSubExpr()->getType(); +Previous = EmitScalarExpr(ICE->getSubExpr()); +// Pass default ScalarConversionOpts to avoid emitting +// integer sanitizer checks as E refers to bitfield +llvm::Value *RHS = EmitScalarConversion( +Previous, SrcType, ICE->getType(), ICE->getExprLoc()); +RV = RValue::get(RHS); + } +} + +// Otherwise, visit RHS as usual +if (!Previous) + RV = EmitAnyExpr(E->getRHS()); + LValue LV = EmitCheckedLValue(E->getLHS(), TCK_Store); + if (RV.isScalar()) EmitNullabilityCheck(LV, RV.getScalarVal(), E->getExprLoc()); -EmitStoreThroughLValue(RV, LV); + +// Passing a pointer EmitStoreThroughBitfieldLValue will emit the result +// If sanitizer checks are not used, we do not use the result +// Hence, use EmitStoreThroughLValue instead +if (LV.isBitField() && RV.isScalar() && +SanOpts.hasOneOf(SanitizerKind::ImplicitBitfieldConversion)) { + // If the expression contained an implicit conversion, make sure + // to use the value before the scalar conversion. + llvm::Value *Src = Previous ? Previous : RV.getScalarVal(); + QualType DstType = E->getLHS()->getType(); + llvm::Value *Result; + EmitStoreThroughBitfieldLValue(RV, LV, ); + EmitBitfieldConversionCheck(Src, SrcType, Result, DstType, + LV.getBitFieldInfo(), E->getExprLoc()); +} else + EmitStoreThroughLValue(RV, LV); zygoloid wrote: Same here: can we use the same code regardless, and just make a choice between passing in `` and passing in `nullptr` to `EmitStoreThroughBitfieldLValue`, and a choice as to whether we `EmitBitfieldConversionCheck`, based on the sanitizer mode? https://github.com/llvm/llvm-project/pull/75481 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)
@@ -4564,15 +4737,41 @@ Value *ScalarExprEmitter::VisitBinAssign(const BinaryOperator *E) { case Qualifiers::OCL_None: // __block variables need to have the rhs evaluated first, plus // this should improve codegen just a little. -RHS = Visit(E->getRHS()); +Value *Previous = nullptr; +QualType SrcType = E->getRHS()->getType(); +// Check if LHS is a bitfield and sanitizer checks are enabled +if (E->getLHS()->refersToBitField() && +CGF.SanOpts.hasOneOf(SanitizerKind::ImplicitConversion | + SanitizerKind::ImplicitBitfieldConversion)) { zygoloid wrote: ```suggestion CGF.SanOpts.hasOneOf(SanitizerKind::ImplicitBitfieldConversion)) { ``` https://github.com/llvm/llvm-project/pull/75481 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)
https://github.com/zygoloid commented: Generally this looks good to me, thanks. https://github.com/llvm/llvm-project/pull/75481 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)
https://github.com/zygoloid edited https://github.com/llvm/llvm-project/pull/75481 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][test] Add test for incompatible cv-qualified reference types in conversion function template (PR #81950)
https://github.com/zygoloid closed https://github.com/llvm/llvm-project/pull/81950 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][test] Add test for incompatible cv-qualified reference types in conversion function template (PR #81950)
https://github.com/zygoloid approved this pull request. https://github.com/llvm/llvm-project/pull/81950 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)
zygoloid wrote: > +1 for this, it will be the 95%/5% rule, you might think all cases of > (identifier) is a cast, but I'm pretty sure it won't be Specifically, what I suggested was: when disambiguating whether `(identifier)&...` or `(identifier)*...` is a cast vs a multiplication or bitwise and (that is, the case that the previous patch changed, resulting in the regression we're trying to fix), treat it as a cast unless `identifier` is a macro parameter name. I think that will be right >99% of the time. And even when clang-format gets it wrong, the programmer can fix that by removing the redundant and non-idiomatic parentheses around the identifier. > I'm with @owenca on this, this patch will move us closer, meaning users don't > need to add common types to TypeNames As I said before, I think special-casing these typedefs is a good thing - it's reasonable for clang-format to assume they're type names. But I don't think this moves us much closer to fixing the bug, because most casts won't be to one of these typedefs. I do wonder, though -- there's a lot of churn in this patch adding and wiring through a new parameter. Can the `TypeNames` mechanism be prepopulated with this list of types instead? https://github.com/llvm/llvm-project/pull/83709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)
zygoloid wrote: > It does fix the example given. #83400 has 6 real-world examples. This patch fixes none of them. It also has a reduced testcase, which this patch does fix. But fixing the reduced testcase without fixing the real-world examples is not fixing the bug. https://github.com/llvm/llvm-project/pull/83709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)
https://github.com/zygoloid edited https://github.com/llvm/llvm-project/pull/83709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)
@@ -34,9 +34,15 @@ const char *getTokenTypeName(TokenType Type) { return nullptr; } +// Sorted common C++ non-keyword types. +static SmallVector CppNonKeywordTypes = { +"byte", "int16_t", "int32_t", "int64_t", "int8_t", +"size_t", "uint16_t", "uint32_t", "uint64_t", "uint8_t", zygoloid wrote: On codesearch.isocpp.org, `(intptr_t)` is more common than `(int8_t)` and `(int16_t)`, and similar to `(int32_t)`. But why should this general function that's used in various places in clang-format care about how often the type is cast with a C-style cast? That doesn't make sense to me. I find it extremely unlikely that anyone would use `intN_t` as anything other than a type; I think your concerns about regressions are unfounded. In contrast, there are 518 hits for `(intptr_t)&` in isocpp codesearch, which seems pretty logical to me because casts to `(intptr_t)` will be casts from pointers. https://github.com/llvm/llvm-project/pull/83709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)
zygoloid wrote: > This patch does not only fix formatting of C-casting to a C++ standard type. > It correctly identifies (most of) such types and might have fixed other kinds > of bugs. Sure, this patch seems like a good change. But it does not fix #83400. https://github.com/llvm/llvm-project/pull/83709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)
@@ -34,9 +34,15 @@ const char *getTokenTypeName(TokenType Type) { return nullptr; } +// Sorted common C++ non-keyword types. +static SmallVector CppNonKeywordTypes = { +"byte", "int16_t", "int32_t", "int64_t", "int8_t", +"size_t", "uint16_t", "uint32_t", "uint64_t", "uint8_t", +}; + // FIXME: This is copy from Sema. Put it in a common place and remove // duplication. zygoloid wrote: Before this patch, the two functions were intended to do the same thing, but have apparently diverged. After this patch they are not intended to do the same thing. The FIXME is wrong, can you fix it please? https://github.com/llvm/llvm-project/pull/83709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)
zygoloid wrote: > > Another possibility to consider for the original bug: `(single_identifier)` > > is almost certainly a cast, not redundant parentheses, unless > > `single_identifier` names a macro argument. So I wonder if that would be a > > better heuristic to use to fix the regression. > > I don’t think that would work as`(identifier)` can be a number of constructs > depending on the context. It can be part of a function/macro call, function > declaration, macro definition, C++ cast, C cast, `sizeof` operand, etc. OK, but the context is deciding whether `(identifier)` or `(identifier)*x` is a binary operator or a cast. Given no other information, the fact that it's a single identifier seems like a very strong signal that it's a cast. I don't think this PR fixes #83400 -- handling a small handful of common types won't address the same issue for the much larger class of cases where the type name is not one of these types. Eg, in #83400 itself the examples included: ```diff - static _FORCE_INLINE_ m_type get(const Variant *v) { return (m_type)*VariantInternal::get_int(v); }\ + static _FORCE_INLINE_ m_type get(const Variant *v) { return (m_type) * VariantInternal::get_int(v); } \ ``` But looking for a parenthesized single identifier addresses all of the examples in the issue. https://github.com/llvm/llvm-project/pull/83709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)
@@ -34,9 +34,15 @@ const char *getTokenTypeName(TokenType Type) { return nullptr; } +// Sorted common C++ non-keyword types. +static SmallVector CppNonKeywordTypes = { +"byte", "int16_t", "int32_t", "int64_t", "int8_t", +"size_t", "uint16_t", "uint32_t", "uint64_t", "uint8_t", +}; + // FIXME: This is copy from Sema. Put it in a common place and remove // duplication. zygoloid wrote: This function is no longer simply copy/pasted from `Sema`. It doesn't even do the same thing as the Sema function any more. https://github.com/llvm/llvm-project/pull/83709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)
@@ -34,9 +34,15 @@ const char *getTokenTypeName(TokenType Type) { return nullptr; } +// Sorted common C++ non-keyword types. +static SmallVector CppNonKeywordTypes = { +"byte", "int16_t", "int32_t", "int64_t", "int8_t", +"size_t", "uint16_t", "uint32_t", "uint64_t", "uint8_t", zygoloid wrote: According to codesearch.isocpp.org, `intptr_t`, `uintptr_t`, and `ptrdiff_t` are all more common than `int8_t`. Also I'd note that POSIX reserves all identifiers ending in `_t` for use as types, and C reserves all identifiers starting with `int` or `uint` and ending in `_t` for use as types. https://github.com/llvm/llvm-project/pull/83709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)
zygoloid wrote: Another possibility to consider for the original bug: `(single_identifier)` is almost certainly a cast, not redundant parentheses, unless `single_identifier` names a macro argument. So I wonder if that would be a better heuristic to use to fix the regression. https://github.com/llvm/llvm-project/pull/83709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)
@@ -34,9 +34,15 @@ const char *getTokenTypeName(TokenType Type) { return nullptr; } +// Sorted common C++ non-keyword types. +static SmallVector CppNonKeywordTypes = { +"byte", "int16_t", "int32_t", "int64_t", "int8_t", +"size_t", "uint16_t", "uint32_t", "uint64_t", "uint8_t", +}; + // FIXME: This is copy from Sema. Put it in a common place and remove // duplication. zygoloid wrote: Looks like this comment is no longer correct. https://github.com/llvm/llvm-project/pull/83709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits