[clang] [clang] Fix `remove{CVR|Fast}Qualifiers` with 64-bit `Qualifiers::Mask` (PR #90329)

2024-04-28 Thread John McCall via cfe-commits

https://github.com/rjmccall approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/90329
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang codegen][NFC] Delete dead code in constant emission. (PR #90106)

2024-04-25 Thread John McCall via cfe-commits

https://github.com/rjmccall commented:

The Apple fork does actually use this downstream (for address-sensitive 
`__ptrauth`), and we've recently (finally) been making progress in upstreaming 
that work.  Obviously, we could delete it and then reinstate it when we catch 
up to this point in upstreaming, but I don't think it's actively harmful to 
keep around in the short term.

https://github.com/llvm/llvm-project/pull/90106
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] [CodeGen] Perform derived-to-base conversion on explicit object parameter in lambda (PR #89828)

2024-04-24 Thread John McCall via cfe-commits


@@ -4684,6 +4684,29 @@ LValue CodeGenFunction::EmitLValueForLambdaField(const 
FieldDecl *Field,
 else
   LambdaLV = MakeAddrLValue(AddrOfExplicitObject,
 D->getType().getNonReferenceType());
+
+// Make sure we have an lvalue to the lambda itself and not a derived 
class.
+auto *ThisTy = D->getType().getNonReferenceType()->getAsCXXRecordDecl();
+auto *LambdaTy = cast(Field->getParent());
+if (ThisTy != LambdaTy) {
+  CXXBasePaths Paths(/*FindAmbiguities=*/false, /*RecordPaths=*/true,
+ /*DetectVirtual=*/false);
+
+  [[maybe_unused]] bool Derived = ThisTy->isDerivedFrom(LambdaTy, Paths);
+  assert(Derived && "Type not derived from lambda type?");

rjmccall wrote:

We could do either.  Doing it once in the prologue makes some sense to me; in 
the (very unlikely) case that it's not just a static offset, that would 
probably lead to better code.

https://github.com/llvm/llvm-project/pull/89828
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] [CodeGen] Perform derived-to-base conversion on explicit object parameter in lambda (PR #89828)

2024-04-24 Thread John McCall via cfe-commits


@@ -4684,6 +4684,29 @@ LValue CodeGenFunction::EmitLValueForLambdaField(const 
FieldDecl *Field,
 else
   LambdaLV = MakeAddrLValue(AddrOfExplicitObject,
 D->getType().getNonReferenceType());
+
+// Make sure we have an lvalue to the lambda itself and not a derived 
class.
+auto *ThisTy = D->getType().getNonReferenceType()->getAsCXXRecordDecl();
+auto *LambdaTy = cast(Field->getParent());
+if (ThisTy != LambdaTy) {
+  CXXBasePaths Paths(/*FindAmbiguities=*/false, /*RecordPaths=*/true,
+ /*DetectVirtual=*/false);
+
+  [[maybe_unused]] bool Derived = ThisTy->isDerivedFrom(LambdaTy, Paths);
+  assert(Derived && "Type not derived from lambda type?");

rjmccall wrote:

If we were going to put it in the AST, the right place would be on the call 
operator FD rather than in every DRE for a capture.  We'd presumably store it 
with a side-table in the ASTContext, which we'd only populate for lambdas with 
a derived object parameter.  It's probably not an actual problem to repeatedly 
recompute it as long as we have that restriction, though.

https://github.com/llvm/llvm-project/pull/89828
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] [CodeGen] Perform derived-to-base conversion on explicit object parameter in lambda (PR #89828)

2024-04-23 Thread John McCall via cfe-commits


@@ -4684,6 +4684,29 @@ LValue CodeGenFunction::EmitLValueForLambdaField(const 
FieldDecl *Field,
 else
   LambdaLV = MakeAddrLValue(AddrOfExplicitObject,
 D->getType().getNonReferenceType());
+
+// Make sure we have an lvalue to the lambda itself and not a derived 
class.
+auto *ThisTy = D->getType().getNonReferenceType()->getAsCXXRecordDecl();
+auto *LambdaTy = cast(Field->getParent());
+if (ThisTy != LambdaTy) {
+  CXXBasePaths Paths(/*FindAmbiguities=*/false, /*RecordPaths=*/true,
+ /*DetectVirtual=*/false);
+
+  [[maybe_unused]] bool Derived = ThisTy->isDerivedFrom(LambdaTy, Paths);
+  assert(Derived && "Type not derived from lambda type?");

rjmccall wrote:

We can manually resolve between the `operator()`s at the call site with a 
qualified member name, and the `operator()` should still get called with the 
original `Overloaded` without any hint about which subobject it was called on.

We could also have `Overloaded` just pick one of its bases to `using` the 
`operator()` from, but then I suppose someone might argue that the operator 
ought to pick up on that to decide the base path.  (That would be absurd, IMO.)

Example of the first: https://godbolt.org/z/rqvKGfevY

I think you need to open a DR with the committee.  It just needs to be 
ill-formed for a lambda call operator with an explicit `this` parameter to be 
instantiated such that the `this` parameter has a type with multiple base 
subobjects of the lambda type.  I can't imagine any other resolution short of 
something absurd like adding an implicit offset parameter, which would then be 
breaking in other ways.

Please go ahead and diagnose that condition in Sema.

https://github.com/llvm/llvm-project/pull/89828
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] [CodeGen] Perform derived-to-base conversion on explicit object parameter in lambda (PR #89828)

2024-04-23 Thread John McCall via cfe-commits


@@ -4684,6 +4684,29 @@ LValue CodeGenFunction::EmitLValueForLambdaField(const 
FieldDecl *Field,
 else
   LambdaLV = MakeAddrLValue(AddrOfExplicitObject,
 D->getType().getNonReferenceType());
+
+// Make sure we have an lvalue to the lambda itself and not a derived 
class.
+auto *ThisTy = D->getType().getNonReferenceType()->getAsCXXRecordDecl();
+auto *LambdaTy = cast(Field->getParent());
+if (ThisTy != LambdaTy) {
+  CXXBasePaths Paths(/*FindAmbiguities=*/false, /*RecordPaths=*/true,
+ /*DetectVirtual=*/false);
+
+  [[maybe_unused]] bool Derived = ThisTy->isDerivedFrom(LambdaTy, Paths);
+  assert(Derived && "Type not derived from lambda type?");

rjmccall wrote:

Is there a reason there has to be a unique base subobject of the lambda type?

https://github.com/llvm/llvm-project/pull/89828
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Fix objc_sel_{name,types} missing an underscore (PR #88713)

2024-04-23 Thread John McCall via cfe-commits

rjmccall wrote:

Great, thanks!  @Midar, if you can fix this code up so that these strings are 
created in one place (and test the new output, since this is not NFC), I'd be 
happy to sign off.

https://github.com/llvm/llvm-project/pull/88713
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Set correct FPOptions if attribute 'optnone' presents (PR #85605)

2024-04-22 Thread John McCall via cfe-commits

https://github.com/rjmccall approved this pull request.

This seems fine.  Is there any existing bookkeeping we no longer need to do if 
we're going to have this RAII object in scope during function parsing?

https://github.com/llvm/llvm-project/pull/85605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Fix objc_sel_{name,types} missing an underscore (PR #88713)

2024-04-22 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/88713
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Fix objc_sel_{name,types} missing an underscore (PR #88713)

2024-04-22 Thread John McCall via cfe-commits

https://github.com/rjmccall commented:

At first, I was worried that this was an ABI break.  Then I noticed the 
internal inconsistency within this single file, and so I became worried that it 
was an ABI *fix*.  But then I noticed that the normal selector-emission code 
actually makes these strings hidden by default, and so now my guess is that the 
uniqueness of these strings is at best an optimization.  Still, it's an 
optimization we should do.

There are actually three differences between the code emission here and the 
code emission in the "primary" paths in `GetConstantSelector` and 
`GetTypeString`:
- the primary path uses an underscore in the symbol name prefix
- the primary path adjusts the symbol name to avoid restricted characters in 
the target object file format
- the primary path generates hidden symbols
My assumption is that, on all three of these points, we should just do what the 
primary path would do.  Please extract common functions as necessary in order 
to reuse the logic.

@davidchisnall, could you verify my reasoning here?

https://github.com/llvm/llvm-project/pull/88713
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-tools-extra] [PAC][clang] Define `PointerAuthQualifier` and `PointerAuthenticationMode` (PR #84384)

2024-04-22 Thread John McCall via cfe-commits

rjmccall wrote:

> Ah, okay, thank you! I had two concerns, but I think neither are valid now 
> that I better understand how `ExtQuals` works: 1) I thought we were 
> increasing the in-memory size of `Qualifiers` in a way that matter (like 
> `SplitQualType`, `ExtProtoInfo` primarily),

AFAIK, `SplitQualType` is never stored long-term.  I'd forgotten that 
`Qualifiers` was used directly in `ExtProtoInfo` (address-space-qualified 
methods!), but as it happens, `ExtProtoInfo` is *also* not stored long-term: 
`FunctionProtoType` reassembles it from its internal storage, and it has the 
same fast-qualifiers vs. extended-qualifiers optimization as `QualType` where 
it uses a trailing `Qualifiers` as the slow case.  So this should only increase 
memory overhead for C++ methods with address-space qualifiers, and it's not 
much overhead even there.  (IIRC, none of the other extended qualifiers apply 
to methods — `__ptrauth` could, actually, but we currently restrict it from 
appearing in parameter positions.)

> 2) I thought we had 32-bit builds of Clang so all the places where we 
> pass/return a `Qualifiers` would require passing in multiple registers now.

I'm sure we still have 32-bit builds of Clang, and yeah, they're going to be 
incrementally slower as `Qualifiers` grows.  I think we just have to eat that 
micro cost, though; most of the operations on `Qualifiers` are doing 
set-algebra-ish operations that  forcing a heap allocation would really punish. 
 If we're passing and returns `Qualifiers` by value a lot in tight code, we 
should consider whether we can avoid that, but the micro cost should vanish in 
the noise typically.

We're somewhat "saved" here by the fact that many 32-bit ABIs are quite bad.  
For example, the SysV x86 calling convention always returns structs indirectly, 
no matter how small they are.

> Do you think the performance concerns are sufficient to warrant doing this 
> optimization work up front? .25% is big enough to warrant concern, but not 
> big enough to need to ask for onerous efforts.

I personally don't think so, no.  But this patch upstreams an overhead we've 
been eating at Apple for about 7 years, so one could argue that it's easy for 
me to say that.  It's not unreasonable for you to ask for an investigation to 
see if we can identify any localized source of the slowdown.  If it's 
across-the-board (which at .25%, I'd say it probably is, but you never know), 
maybe we just have to accept it.

https://github.com/llvm/llvm-project/pull/84384
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-tools-extra] [PAC][clang] Define `PointerAuthQualifier` and `PointerAuthenticationMode` (PR #84384)

2024-04-18 Thread John McCall via cfe-commits

rjmccall wrote:

> > @AaronBallman See test results from compile-time-tracker here: 
> > https://llvm-compile-time-tracker.com/compare.php?from=693a458287d019c5c6a66fe3019d099df2978cdb=dbb5e29d136a18060ba6759b328ad80fa9cea649.
> > It looks like that there is a statistically meaningful difference, but it's 
> > only about 0.05..0.25% depending on the test. Is it considered OK?
> 
> Yeah, this seems to have noticeable impact on compile times for every 
> compilation; out of curiosity, have you tried an approach where this 
> information is stored in `ExtQuals` instead? That's heap allocated, but would 
> mean that the only folks paying the cost are the ones using the functionality.

`Qualifiers` is an inline value type representing all possible qualifiers, 
separated from its application to any specific type.  `ExtQuals` represents an 
application of qualifiers that don't fit into the inline fast-qualifiers bits 
to a specific type.  `ExtQuals` stores a `Qualifiers` inline, with a 
precondition that the fast qualifier bits are clear.  Outside of that, we never 
store `Qualifiers` long-term AFAIK.

`PointerAuthQualifier` is 32 bits.  Adding it to `Qualifiers` increases 
`Qualifiers` from 32 bits (mostly occupied) to 64 bits.  `__ptrauth` qualifiers 
are not a fast qualifier, so when applied to a type, they require the use of an 
`ExtQuals` node.

Given all that, I'm not sure what you're asking for.  Storing uncommon 
qualifiers out of line is what we already do with `QualType` and is why 
`ExtQuals` exists; doing it again with `Qualifiers` doesn't seem to serve any 
purpose.  It's certainly not going to make `Qualifiers` smaller or more 
efficient to work with, since `PointerAuthQualifier` is smaller than a pointer. 
 `ExtQuals` is 128-bit-aligned and starts with two pointers, so there's space 
for 64 bits of qualifiers on 32-bit hosts and 128 bits of qualifiers on 64-bit 
hosts before `ExtQuals` grows.

The overhead is probably from additional checks rather than any cost associated 
with working with a 64-bit `Qualifiers` value.  We could look into ways to 
optimize those checks (e.g. qualifier compatibility) in the common cases where 
there are no extended qualifiers.  It's also possible that merging 
`PointerAuthQualifier` into `Mask` inside `Qualifiers` would make some of the 
low-level handling more efficient.

https://github.com/llvm/llvm-project/pull/84384
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Move tailclipping to bitfield allocation (PR #87090)

2024-04-15 Thread John McCall via cfe-commits

https://github.com/rjmccall approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/87090
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [PAC][clang] Define `PointerAuthQualifier` and `PointerAuthenticationMode` (PR #84384)

2024-04-12 Thread John McCall via cfe-commits

rjmccall wrote:

It's my patch, so me signing off isn't very meaningful.

https://github.com/llvm/llvm-project/pull/84384
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CodeGen] Start migrating away from assuming the Default AS is 0 (PR #88182)

2024-04-09 Thread John McCall via cfe-commits


@@ -2216,7 +2216,7 @@ static llvm::Value *EmitTypeidFromVTable(CodeGenFunction 
, const Expr *E,
 }
 
 llvm::Value *CodeGenFunction::EmitCXXTypeidExpr(const CXXTypeidExpr *E) {
-  llvm::Type *PtrTy = llvm::PointerType::getUnqual(getLLVMContext());
+  llvm::Type *PtrTy = Int8PtrTy;

rjmccall wrote:

Should this be `GlobalsInt8PtrTy`?

https://github.com/llvm/llvm-project/pull/88182
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CodeGen] Start migrating away from assuming the Default AS is 0 (PR #88182)

2024-04-09 Thread John McCall via cfe-commits

rjmccall wrote:

> Libcall emission needs to know what AS to use for pointer arguments to things 
> like __sync/__atomic implementations and various string-y mem*/str* 
> functions. That's the main one that comes to mind from our experience in 
> CHERI LLVM, and current LLVM just assumes that's AS0.

That's a fair point, but it's clearly specific to libcall emission and not a 
generic concept for all IR.  And it needs real intelligence, not just a 
hardcoded AS in the `DataLayout`, e.g. to know that it simply cannot do libcall 
emission for certain operations because there's no way to convert a pointer 
into AS 10 to the parameter type of the relevant function.

https://github.com/llvm/llvm-project/pull/88182
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CodeGen] Start migrating away from assuming the Default AS is 0 (PR #88182)

2024-04-09 Thread John McCall via cfe-commits

rjmccall wrote:

> I'm not quite sure how to parse this comment, could you explain what you have 
> in mind here? The problem is precisely that the FE assumes 0 is fine / picks 
> it by default, which ends up into dangerzones when e.g. a target happened to 
> use 0 to point to private (stack). I feel as if I'm missing the core of your 
> comment though, so apologies in advance.

I'm just saying that I don't think it makes any sense to add a concept of a 
default AS to LLVM.  The "default" AS is a frontend concept, not a middle-end / 
back-end concept.  LLVM would only need a "default" AS if it were inventing a 
memory allocation/operation from whole cloth, which is generally not something 
LLVM should ever be doing; the only legitimate counter-example I can think of 
would be something like materializing a constant into constant global memory.

https://github.com/llvm/llvm-project/pull/88182
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CodeGen] Start migrating away from assuming the Default AS is 0 (PR #88182)

2024-04-09 Thread John McCall via cfe-commits

rjmccall wrote:

> Why can't we just declare that the "generic" address-space must always be 0? 
> The specific numbers we use for address-spaces are completely arbitrary 
> anyway.

I don't think this works; there's nothing that LLVM could possibly do with a 
generic AS that would justify it being called out in the representation like 
this, because the existence of a generic AS is a deeply non-portable concept.  
The only AS that I think would be worthwhile to call out like this would be the 
`alloca` AS, and in fact I tried to argue that as a reason not to introduce the 
`alloca` AS to `DataLayout` in the first place.  Unfortunately, I lost that 
argument, IIRC because some targets were adamant that they wanted to vary the 
`alloca` AS between functions.

https://github.com/llvm/llvm-project/pull/88182
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CodeGen] Start migrating away from assuming the Default AS is 0 (PR #88182)

2024-04-09 Thread John McCall via cfe-commits

rjmccall wrote:

It's very uncommon for LLVM to need to come up with an address space on its 
own, as opposed to just propagating the original address space of some memory / 
operation as chosen by the frontend.  LLVM occasionally creates new storage 
allocations, but usually they're (non-escaping) `alloca`s and therefore have to 
be in the `alloca` AS.  The only situation I can think of where LLVM might 
legitimately have to come with an address space is when LLVM decides to 
introduce new global constants.  (I can't think of any transformation that 
would introduce a *non-constant* global.)  So if you add a default AS to 
`DataLayout`, please focus on that specifically and call it something like a 
"preferred constant address space" rather than some sort of default AS.  The 
default AS for pointer types and so on is really a frontend issue that's none 
of LLVM's business to know about.

https://github.com/llvm/llvm-project/pull/88182
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [PAC][clang] Define `PointerAuthQualifier` and `PointerAuthenticationMode` (PR #84384)

2024-04-08 Thread John McCall via cfe-commits

rjmccall wrote:

It'll be tested in a follow-up commit which actually adds parsing and Sema 
support for these as qualifiers.  That should be sufficient rather than adding 
a whole unit-test apparatus.  Daniil is just trying to avoid landing this in a 
single massive commit.

https://github.com/llvm/llvm-project/pull/84384
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Fix bitfield access unit for vbase corner case (PR #87238)

2024-04-01 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/87238
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Fix bitfield access unit for vbase corner case (PR #87238)

2024-04-01 Thread John McCall via cfe-commits

https://github.com/rjmccall approved this pull request.

Doing in in one commit this time seems fine.

https://github.com/llvm/llvm-project/pull/87238
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Resolve FIXME: Look at E, not M (PR #85541)

2024-03-29 Thread John McCall via cfe-commits

rjmccall wrote:

Hmm.  The best solution there is probably to use a consistent representation 
but introduce some sort of `GLValueThatTheStandardAbsurdlyPretendsIsAnRValue` 
(definitely the only and best word for this, ship it) that we can use as the 
value category.  IIRC, something similar happens in C where the standard 
pretends that function l-values don't exist.

https://github.com/llvm/llvm-project/pull/85541
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Resolve FIXME: Look at E, not M (PR #85541)

2024-03-28 Thread John McCall via cfe-commits

rjmccall wrote:

Hmm.  Well, you'd have to ask @zygoloid what he was thinking in 
r183721/r183859, but I don't think it's correct in general to be looking at E 
(the MTE sub-expression) for most of the logic here.  IIRC, the MTE 
sub-expression is essentially the initializer for the materialized temporary, 
so it should be a pr-value, which means that for ARC (and any similar features 
in the future that we might add with qualifier-specific ownership) it won't 
have the right type information to set up the destructor for the temporary.  
The MTE is going to tell us the type of the I assume that's why the proposed 
patch in this PR ends up ignoring the half of the function that handles the ARC 
qualifiers.  Maybe Richard had some better representation in mind, though.

https://github.com/llvm/llvm-project/pull/85541
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Resolve FIXME: Look at E, not M (PR #85541)

2024-03-28 Thread John McCall via cfe-commits

rjmccall wrote:

Well, sometimes we do leave FIXMEs that end up being suprisingly easy to fix.  
But what FIXME are we talking about here?  The patch doesn't remove any FIXMEs, 
or for that matter include any tests.

https://github.com/llvm/llvm-project/pull/85541
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Factor out OpenACC part of `Sema` (PR #84184)

2024-03-25 Thread John McCall via cfe-commits


@@ -0,0 +1,74 @@
+//===- SemaOpenACC.h - Semantic Analysis for OpenACC constructs 
---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+/// \file
+/// This file declares semantic analysis for OpenACC constructs and
+/// clauses.
+///
+//===--===//
+
+#ifndef LLVM_CLANG_SEMA_SEMAOPENACC_H
+#define LLVM_CLANG_SEMA_SEMAOPENACC_H
+
+#include "clang/AST/DeclGroup.h"
+#include "clang/Basic/OpenACCKinds.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Sema/Ownership.h"
+
+namespace clang {
+
+class ASTContext;
+class DiagnosticEngine;
+class LangOptions;
+class Sema;
+
+class SemaOpenACC {
+public:
+  SemaOpenACC(Sema );
+
+  ASTContext () const;
+  DiagnosticsEngine () const;
+  const LangOptions () const;
+
+  Sema 

rjmccall wrote:

I don't think it's over-generalization to go ahead and extract these basic 
accessors into a common base class.  Just make sure you keep it to these common 
accessors.

https://github.com/llvm/llvm-project/pull/84184
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-22 Thread John McCall via cfe-commits

https://github.com/rjmccall approved this pull request.

Okay.  Thanks for putting up with such a protracted review!  I really like 
where we've ended up.

LGTM.  Please wait a few days before merging to give other reviewers a chance 
to give final feedback.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][AST][NFC] Add '[[fallthrough]]' to cases fall through (PR #85921)

2024-03-21 Thread John McCall via cfe-commits

rjmccall wrote:

Yeah, I don't think we need this change.  The comment is a little unnecessary, 
but it's not a big deal.  Really, the code ought to be using a visitor instead 
of a `switch`.

https://github.com/llvm/llvm-project/pull/85921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Rebase swiftasynccall's musttail support onto the [[clang::musttail]] logic (PR #86011)

2024-03-20 Thread John McCall via cfe-commits

https://github.com/rjmccall closed 
https://github.com/llvm/llvm-project/pull/86011
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Rebase swiftasynccall's musttail support onto the [[clang::musttail]] logic (PR #86011)

2024-03-20 Thread John McCall via cfe-commits

https://github.com/rjmccall created 
https://github.com/llvm/llvm-project/pull/86011

The old logic expects the call to be the last thing we emitted, and since it 
kicks in before we emit cleanups, and since `swiftasynccall` functions always 
return void, that's likely to be true.  "Likely" isn't very reassuring when 
we're talking about slapping attributes on random calls, though.  And indeed, 
while I can't find any way to break the logic directly in current main, our 
previous (ongoing?) experiments with shortening argument temporary lifetimes 
definitely broke it wide open.  So while this commit is prophylactic for now, 
it's clearly the right thing to do, and it can cherry-picked to other branches 
to fix problems.

>From 22ee234b13d5adec71627e9df8912b551bbb0b8e Mon Sep 17 00:00:00 2001
From: John McCall 
Date: Wed, 20 Mar 2024 17:04:28 -0400
Subject: [PATCH] Rebase swiftasynccall's musttail support onto the
 [[clang::musttail]] logic

The old logic expects the call to be the last thing we emitted, and since
it kicks in before we emit cleanups, and since swiftasynccall functions
always return void, that's likely to be true.  "Likely" isn't very
reassuring when we're talking about slapping attributes on random calls,
though.  And indeed, while I can't find any way to break the logic directly
in current main, our previous (ongoing?) experiments with shortening
argument temporary lifetimes definitely broke it wide open.  So while
this commit is prophylactic for now, it's clearly the right thing to do,
and it can cherry-picked to other branches to fix problems.
---
 clang/lib/CodeGen/CGStmt.cpp   | 33 --
 clang/test/CodeGen/swift-async-call-conv.c | 16 +++
 2 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/clang/lib/CodeGen/CGStmt.cpp b/clang/lib/CodeGen/CGStmt.cpp
index 8898e3f22a7df6..cb5a004e4f4a6c 100644
--- a/clang/lib/CodeGen/CGStmt.cpp
+++ b/clang/lib/CodeGen/CGStmt.cpp
@@ -1341,10 +1341,8 @@ struct SaveRetExprRAII {
 };
 } // namespace
 
-/// If we have 'return f(...);', where both caller and callee are SwiftAsync,
-/// codegen it as 'tail call ...; ret void;'.
-static void makeTailCallIfSwiftAsync(const CallExpr *CE, CGBuilderTy ,
- const CGFunctionInfo *CurFnInfo) {
+/// Determine if the given call uses the swiftasync calling convention.
+static bool isSwiftAsyncCallee(const CallExpr *CE) {
   auto calleeQualType = CE->getCallee()->getType();
   const FunctionType *calleeType = nullptr;
   if (calleeQualType->isFunctionPointerType() ||
@@ -1359,18 +1357,12 @@ static void makeTailCallIfSwiftAsync(const CallExpr 
*CE, CGBuilderTy ,
   // getMethodDecl() doesn't handle member pointers at the moment.
   calleeType = methodDecl->getType()->castAs();
 } else {
-  return;
+  return false;
 }
   } else {
-return;
-  }
-  if (calleeType->getCallConv() == CallingConv::CC_SwiftAsync &&
-  (CurFnInfo->getASTCallingConvention() == CallingConv::CC_SwiftAsync)) {
-auto CI = cast(()->back());
-CI->setTailCallKind(llvm::CallInst::TCK_MustTail);
-Builder.CreateRetVoid();
-Builder.ClearInsertionPoint();
+return false;
   }
+  return calleeType->getCallConv() == CallingConv::CC_SwiftAsync;
 }
 
 /// EmitReturnStmt - Note that due to GCC extensions, this can have an operand
@@ -1410,6 +1402,19 @@ void CodeGenFunction::EmitReturnStmt(const ReturnStmt 
) {
   RunCleanupsScope cleanupScope(*this);
   if (const auto *EWC = dyn_cast_or_null(RV))
 RV = EWC->getSubExpr();
+
+  // If we're in a swiftasynccall function, and the return expression is a
+  // call to a swiftasynccall function, mark the call as the musttail call.
+  std::optional> SaveMustTail;
+  if (RV && CurFnInfo &&
+  CurFnInfo->getASTCallingConvention() == CallingConv::CC_SwiftAsync) {
+if (auto CE = dyn_cast(RV)) {
+  if (isSwiftAsyncCallee(CE)) {
+SaveMustTail.emplace(MustTailCall, CE);
+  }
+}
+  }
+
   // FIXME: Clean this up by using an LValue for ReturnTemp,
   // EmitStoreThroughLValue, and EmitAnyExpr.
   // Check if the NRVO candidate was not globalized in OpenMP mode.
@@ -1432,8 +1437,6 @@ void CodeGenFunction::EmitReturnStmt(const ReturnStmt ) 
{
 // for side effects.
 if (RV) {
   EmitAnyExpr(RV);
-  if (auto *CE = dyn_cast(RV))
-makeTailCallIfSwiftAsync(CE, Builder, CurFnInfo);
 }
   } else if (!RV) {
 // Do nothing (return value is left uninitialized)
diff --git a/clang/test/CodeGen/swift-async-call-conv.c 
b/clang/test/CodeGen/swift-async-call-conv.c
index ce32c22fe80984..39511698bbae9d 100644
--- a/clang/test/CodeGen/swift-async-call-conv.c
+++ b/clang/test/CodeGen/swift-async-call-conv.c
@@ -182,3 +182,19 @@ SWIFTASYNCCALL void async_struct_field_and_methods(int i, 
S , S *sptr) {
 // CPPONLY-LABEL: define{{.*}} swifttailcc void @{{.*}}async_nonleaf_method2
 // CPPONLY: musttail call swifttailcc void @{{.*}}async_leaf_method
 #endif
+

[clang] [clang] Set correct FPOptions if attribute 'optnone' presents (PR #85605)

2024-03-20 Thread John McCall via cfe-commits


@@ -1495,6 +1495,10 @@ Decl *Parser::ParseFunctionDefinition(ParsingDeclarator 
,
 return Actions.ActOnFinishFunctionBody(Res, nullptr, false);
   }
 
+  // Some function attributes (like OptimizeNoneAttr) affect FP options.
+  Sema::FPFeaturesStateRAII SaveFPFeatures(Actions);
+  Actions.applyFunctionAttributesBeforeParsingBody(Res);

rjmccall wrote:

Hmm.  There are actually a lot of paths that call into 
`ParseFunctionStatementBody` like this.  Rather than changing all of them, can 
we just do this in `ActOnStartOfFunctionDef` / `ActOnFinishFunctionBody`?  We 
do a lot of similar context push/pop operations there already.

Please test an inline C++ method (which are delayed and use a different path) 
and ObjC method parsing (which also use a different path).

https://github.com/llvm/llvm-project/pull/85605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-20 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-20 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-20 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-20 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-20 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-20 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-20 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-20 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-20 Thread John McCall via cfe-commits

https://github.com/rjmccall commented:

This looks great, thanks!  I have a few editorial nits, and there's an 
assertion that seems off, but otherwise this looks ready to go.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-20 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first 

[clang] [clang][NFC] Add documentation for `CastExpr::path()`. (PR #85623)

2024-03-19 Thread John McCall via cfe-commits

https://github.com/rjmccall approved this pull request.


https://github.com/llvm/llvm-project/pull/85623
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Set correct FPOptions if attribute 'optnone' presents (PR #85605)

2024-03-19 Thread John McCall via cfe-commits


@@ -2508,6 +2508,10 @@ Decl *Parser::ParseFunctionStatementBody(Decl *Decl, 
ParseScope ) {
   Sema::PragmaStackSentinelRAII
 PragmaStackSentinel(Actions, "InternalPragmaState", IsCXXMethod);
 
+  // Some function attributes (like OptimizeNoneAttr) affect FP options.
+  Sema::FPFeaturesStateRAII SaveFPFeatures(Actions);
+  Actions.applyFunctionAttributesBeforeParsingBody(Decl);

rjmccall wrote:

The attributes should cover the entire function body, so you're also going to 
need to do this before parsing function-try-blocks and constructor 
initializers.  That probably just means pushing it a level up across the board.

https://github.com/llvm/llvm-project/pull/85605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Set correct FPOptions if attribute 'optnone' presents (PR #85605)

2024-03-19 Thread John McCall via cfe-commits

https://github.com/rjmccall commented:

Alright.  I don't really have a problem with doing this, and this seems like 
basically the right approach.  You're applying the attributes in the wrong 
place, though.

https://github.com/llvm/llvm-project/pull/85605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Set correct FPOptions if attribute 'optnone' presents (PR #85605)

2024-03-19 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/85605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][NFC] Add documentation for `CastExpr::path()`. (PR #85623)

2024-03-18 Thread John McCall via cfe-commits


@@ -3552,6 +3552,15 @@ class CastExpr : public Expr {
   /// function that it invokes.
   NamedDecl *getConversionFunction() const;
 
+  /// Path through the class hierarchy taken by a `DerivedToBase` or
+  /// `UncheckedDerivedToBase` cast. For each derived-to-base edge in the path,
+  /// the path contains a `CXXBaseSpecifier` for the base class of that edge;
+  /// the entries are ordered from derived class to base class.

rjmccall wrote:

You can see in `CastConsistency` the set of cast kinds that require a base 
path; it's more than just these two.

https://github.com/llvm/llvm-project/pull/85623
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Set correct FPOptions if attribute 'optnone' presents (PR #85605)

2024-03-18 Thread John McCall via cfe-commits

rjmccall wrote:

> > Hmm. Is there some sort of optimization in IRGen that we need to suppress 
> > here, or is it something in LLVM code gen? Presumably normal LLVM 
> > optimization passes all just skip `optnone` functions.
> 
> The issue #62098 demonstrates such case.

Okay.  So if I understand correctly, this substitution of libcalls is a 
frontend optimization, and the bug is arguing that it ought to be suppressed by 
`optnone`.

> Anyway, It is not good to have bogus attributes in AST.

I think it's a fair question whether this is a bogus attribute.  `optnone` is a 
very strange attribute; it's a directive to the compiler with no clear 
semantics. There are "optimizations" that I don't think we expect `optnone` to 
suppress — we wouldn't want `optnone` to turn off NRVO in C++, for example.  
`optnone` seems to be about getting a straightforward, literal translation of 
the code.  If the frontend can do a transformation during IR emission, though, 
it's generally a highly reliable transformation and can basically be thought of 
as part of the semantics.

>From the bug, it sounds more like the user wants some way to disable fast-math 
>for a specific function, and apparently we don't have one.  That seems like a 
>reasonable feature request.  Because we don't have that feature, they're 
>flailing around trying essentially every pragma and attribute that we *do* 
>have and arguing that at least one of them ought to have that impact, but it's 
>not at all clear to me that they should.  And presumably they don't actually 
>want to disable all optimization of their function anyway; they just want to 
>turn off fast math.

> > Mostly I'm wondering how far we're expected to go with `optnone`.
> 
> It impedes implementation of pragma FENV_ROUND. The pragma requires setting 
> FP options inside CompoundStmt and it interacts with the fictious attributes. 
> This is the reason of this patch.

That doesn't make sense as a reason for this patch; `#pragma FENV_ROUND` can 
obviously occur in non-`optnone` functions.

https://github.com/llvm/llvm-project/pull/85605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Add `__has_extension(swiftcc)` support (PR #85347)

2024-03-18 Thread John McCall via cfe-commits

https://github.com/rjmccall approved this pull request.

Yeah, it's fine with me.

https://github.com/llvm/llvm-project/pull/85347
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-18 Thread John McCall via cfe-commits

rjmccall wrote:

Yeah, we don't need to care about the actual bit offset of the zero-width 
bit-field as long as we honor the non-interference properties across it.  I'll 
take a look at the patch, thanks.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Set correct FPOptions if attribute 'optnone' presents (PR #85605)

2024-03-18 Thread John McCall via cfe-commits

rjmccall wrote:

Hmm.  Is there some sort of optimization in IRGen that we need to suppress 
here, or is it something in LLVM code gen?  Presumably normal LLVM optimization 
passes all just skip `optnone` functions.

Mostly I'm wondering how far we're expected to go with `optnone`.

https://github.com/llvm/llvm-project/pull/85605
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64][PAC] Support ptrauth builtins and -fptrauth-intrinsics. (PR #65996)

2024-03-15 Thread John McCall via cfe-commits


@@ -0,0 +1,265 @@
+Pointer Authentication
+==
+
+.. contents::
+   :local:
+
+Introduction
+
+
+Pointer authentication is a technology which offers strong probabilistic 
protection against exploiting a broad class of memory bugs to take control of 
program execution.  When adopted consistently in a language ABI, it provides a 
form of relatively fine-grained control flow integrity (CFI) check that resists 
both return-oriented programming (ROP) and jump-oriented programming (JOP) 
attacks.
+
+While pointer authentication can be implemented purely in software, direct 
hardware support (e.g. as provided by ARMv8.3 PAuth) can dramatically lower the 
execution speed and code size costs.  Similarly, while pointer authentication 
can be implemented on any architecture, taking advantage of the (typically) 
excess addressing range of a target with 64-bit pointers minimizes the impact 
on memory performance and can allow interoperation with existing code (by 
disabling pointer authentication dynamically).  This document will generally 
attempt to present the pointer authentication feature independent of any 
hardware implementation or ABI.  Considerations that are 
implementation-specific are clearly identified throughout.
+
+Note that there are several different terms in use:
+
+- **Pointer authentication** is a target-independent language technology.
+
+- **PAuth** (sometimes referred to as **PAC**, for Pointer Authentication 
Codes) is an AArch64 architecture extension that provides hardware support for 
pointer authentication.
+
+- **ARMv8.3** is an AArch64 architecture revision that makes PAuth mandatory.  
It is implemented on several shipping processors, including the Apple A12 and 
later.
+
+* **arm64e** is a specific ABI (not yet fully stable) for implementing pointer 
authentication using PAuth on certain Apple operating systems.
+
+This document serves four purposes:
+
+- It describes the basic ideas of pointer authentication.
+
+- It documents several language extensions that are useful on targets using 
pointer authentication.
+
+- It will eventually present a theory of operation for the security 
mitigation, describing the basic requirements for correctness, various 
weaknesses in the mechanism, and ways in which programmers can strengthen its 
protections (including recommendations for language implementors).
+
+- It will eventually document the language ABIs currently used for C, C++, 
Objective-C, and Swift on arm64e, although these are not yet stable on any 
target.
+
+Basic Concepts
+--
+
+The simple address of an object or function is a **raw pointer**.  A raw 
pointer can be **signed** to produce a **signed pointer**.  A signed pointer 
can be then **authenticated** in order to verify that it was **validly signed** 
and extract the original raw pointer.  These terms reflect the most likely 
implementation technique: computing and storing a cryptographic signature along 
with the pointer.  The security of pointer authentication does not rely on 
attackers not being able to separately overwrite the signature.
+
+An **abstract signing key** is a name which refers to a secret key which can 
used to sign and authenticate pointers.  The key value for a particular name is 
consistent throughout a process.
+
+A **discriminator** is an arbitrary value used to **diversify** signed 
pointers so that one validly-signed pointer cannot simply be copied over 
another.  A discriminator is simply opaque data of some implementation-defined 
size that is included in the signature as a salt.
+
+Nearly all aspects of pointer authentication use just these two primary 
operations:
+
+- ``sign(raw_pointer, key, discriminator)`` produces a signed pointer given a 
raw pointer, an abstract signing key, and a discriminator.
+
+- ``auth(signed_pointer, key, discriminator)`` produces a raw pointer given a 
signed pointer, an abstract signing key, and a discriminator.
+
+``auth(sign(raw_pointer, key, discriminator), key, discriminator)`` must 
succeed and produce ``raw_pointer``.  ``auth`` applied to a value that was 
ultimately produced in any other way is expected to immediately halt the 
program.  However, it is permitted for ``auth`` to fail to detect that a signed 
pointer was not produced in this way, in which case it may return anything; 
this is what makes pointer authentication a probabilistic mitigation rather 
than a perfect one.
+
+There are two secondary operations which are required only to implement 
certain intrinsics in :
+
+- ``strip(signed_pointer, key)`` produces a raw pointer given a signed pointer 
and a key it was presumptively signed with.  This is useful for certain kinds 
of tooling, such as crash backtraces; it should generally not be used in the 
basic language ABI except in very careful ways.
+
+- ``sign_generic(value)`` produces a cryptographic signature for arbitrary 
data, not necessarily a pointer.  This is useful for 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-15 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-15 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-15 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-15 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-15 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-15 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-14 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-14 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-14 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-14 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-14 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-14 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-14 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-14 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-14 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-14 Thread John McCall via cfe-commits

https://github.com/rjmccall commented:

Thank you, the structure looks great!  Mostly style and clarity comments from 
here.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Stub out gcc_struct attribute (PR #71148)

2024-03-12 Thread John McCall via cfe-commits

rjmccall wrote:

> @rjmccall Re asking GCC developers about gcc_struct on non-x86: 
> https://gcc.gnu.org/pipermail/gcc/2024-January/243154.html. Either GCC devs 
> aren't really worried about this or I can't properly write emails (what's 
> totally possible).

Alright, well, we tried.  I think rolling forward with what your proposal on 
non-x86 platforms is reasonable.

https://github.com/llvm/llvm-project/pull/71148
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][Sema] Add noinline check for __builtin_frame_address and __builtin_return_address (PR #82966)

2024-03-04 Thread John McCall via cfe-commits

rjmccall wrote:

> I'm seeing evidence that this might be a chatty diagnostic in practice:
> 
> https://sourcegraph.com/github.com/torvalds/linux@90d35da658da8cff0d4ecbb5113f5fac9d00eb72/-/blob/kernel/fork.c?L311
>  
> https://sourcegraph.com/github.com/torvalds/linux@90d35da658da8cff0d4ecbb5113f5fac9d00eb72/-/blob/mm/util.c?L644
>  
> https://sourcegraph.com/github.com/torvalds/linux@90d35da658da8cff0d4ecbb5113f5fac9d00eb72/-/blob/arch/arm/mm/nommu.c?L224
>  
> https://sourcegraph.com/github.com/torvalds/linux@90d35da658da8cff0d4ecbb5113f5fac9d00eb72/-/blob/kernel/scs.c?L48
>  (and quite a few others).
> 
> CC @nathanchance @nickdesaulniers @rjmccall for opinions

Does anyone understand why Linux uses `__builtin_return_address` there?  On 
first glance, I'd think this is exactly the sort of thing that the warning 
ought to be warning about — a `static` function that could easily be inlined 
into its callers. 

Have we consider the alternative of just disabling inlining when a function 
uses `__builtin_return_address`?  The middle-end already knows that there are 
things it can't inline.

Although I seem to remember having seen code that uses `always_inline` in order 
to force `__builtin_return_address` to actually apply to its caller.

https://github.com/llvm/llvm-project/pull/82966
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-01 Thread John McCall via cfe-commits


@@ -394,33 +412,155 @@ void CGRecordLowering::accumulateFields() {
   : getStorageType(*Field),
   *Field));
   ++Field;
-} else {
-  ++Field;
 }
   }
 }
 
-void
-CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
-  RecordDecl::field_iterator FieldEnd) {
-  // Run stores the first element of the current run of bitfields.  FieldEnd is
-  // used as a special value to note that we don't have a current run.  A
-  // bitfield run is a contiguous collection of bitfields that can be stored in
-  // the same storage block.  Zero-sized bitfields and bitfields that would
-  // cross an alignment boundary break a run and start a new one.
-  RecordDecl::field_iterator Run = FieldEnd;
-  // Tail is the offset of the first bit off the end of the current run.  It's
-  // used to determine if the ASTRecordLayout is treating these two bitfields 
as
-  // contiguous.  StartBitOffset is offset of the beginning of the Run.
-  uint64_t StartBitOffset, Tail = 0;
+namespace {
+
+// A run of bitfields assigned to the same access unit -- the size of memory
+// loads & stores.
+class BitFieldAccessUnit {
+  RecordDecl::field_iterator Begin; // Field at start of this access unit.
+  RecordDecl::field_iterator End;   // Field just after this access unit.
+
+  CharUnits StartOffset; // Starting offset in the containing record.
+  CharUnits EndOffset;   // Finish offset (exclusive) in the containing record.
+
+  bool ContainsVolatile; // This access unit contains a volatile bitfield.
+
+public:
+  // End barrier constructor.
+  BitFieldAccessUnit(RecordDecl::field_iterator F, CharUnits Offset,
+ bool Volatile = false)
+  : Begin(F), End(F), StartOffset(Offset), EndOffset(Offset),
+ContainsVolatile(Volatile) {}
+
+  // Collect contiguous bitfields into an access unit.
+  BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin,
+ RecordDecl::field_iterator FieldEnd,
+ const CGRecordLowering );
+
+  // Compute the access unit following this one -- which might be a barrier at
+  // Limit.
+  BitFieldAccessUnit accumulateNextUnit(RecordDecl::field_iterator FieldEnd,
+CharUnits Limit,
+const CGRecordLowering ) const {
+return end() != FieldEnd ? BitFieldAccessUnit(end(), FieldEnd, CGRL)
+ : BitFieldAccessUnit(FieldEnd, Limit);
+  }
+  // Re-set the end of this unit if there is space before Probe starts.
+  void enlargeIfSpace(const BitFieldAccessUnit , CharUnits Offset) {
+if (Probe.getStartOffset() >= Offset) {
+  End = Probe.begin();
+  EndOffset = Offset;
+}
+  }
+
+public:
+  RecordDecl::field_iterator begin() const { return Begin; }
+  RecordDecl::field_iterator end() const { return End; }
+
+public:
+  // Accessors
+  CharUnits getSize() const { return EndOffset - StartOffset; }
+  CharUnits getStartOffset() const { return StartOffset; }
+  CharUnits getEndOffset() const { return EndOffset; }
+
+  // Predicates
+  bool isBarrier() const { return getSize().isZero(); }
+  bool hasVolatile() const { return ContainsVolatile; }
+
+  // Create the containing access unit and install the bitfields.
+  void installUnit(CGRecordLowering ) const {
+if (!isBarrier()) {
+  // Add the storage member for the access unit to the record. The
+  // bitfields get the offset of their storage but come afterward and 
remain
+  // there after a stable sort.
+  llvm::Type *Type = CGRL.getIntNType(CGRL.Context.toBits(getSize()));
+  CGRL.Members.push_back(CGRL.StorageInfo(getStartOffset(), Type));
+  for (auto F : *this)
+if (!F->isZeroLengthBitField(CGRL.Context))
+  CGRL.Members.push_back(CGRecordLowering::MemberInfo(
+  getStartOffset(), CGRecordLowering::MemberInfo::Field, nullptr,
+  F));
+}
+  }
+};
+
+// Create an access unit of contiguous bitfields.
+BitFieldAccessUnit::BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin,
+   RecordDecl::field_iterator FieldEnd,
+   const CGRecordLowering )
+: BitFieldAccessUnit(FieldBegin, CharUnits::Zero(),
+ FieldBegin->getType().isVolatileQualified()) {
+  assert(End != FieldEnd);
+
+  uint64_t StartBit = CGRL.getFieldBitOffset(*FieldBegin);
+  uint64_t BitSize = End->getBitWidthValue(CGRL.Context);
+  unsigned CharBits = CGRL.Context.getCharWidth();
+
+  assert(!(StartBit % CharBits) && "Not at start of char");
+
+  ++End;
+  if (BitSize ||
+  !(CGRL.Context.getTargetInfo().useZeroLengthBitfieldAlignment() ||
+CGRL.Context.getTargetInfo().useBitFieldTypeAlignment()))
+// The first field is not a (zero-width) barrier. Collect contiguous 
fields.
+for (; End != FieldEnd; ++End) {
+  uint64_t BitOffset = 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-01 Thread John McCall via cfe-commits


@@ -858,6 +861,10 @@ class TargetInfo : public TransferrableTargetInfo,
 return PointerWidth;
   }
 
+  /// Return true, iff unaligned accesses are a single instruction (rather than

rjmccall wrote:

```suggestion
  /// Return true iff unaligned accesses are a single instruction (rather than
```

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-01 Thread John McCall via cfe-commits


@@ -394,33 +412,155 @@ void CGRecordLowering::accumulateFields() {
   : getStorageType(*Field),
   *Field));
   ++Field;
-} else {
-  ++Field;
 }
   }
 }
 
-void
-CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
-  RecordDecl::field_iterator FieldEnd) {
-  // Run stores the first element of the current run of bitfields.  FieldEnd is
-  // used as a special value to note that we don't have a current run.  A
-  // bitfield run is a contiguous collection of bitfields that can be stored in
-  // the same storage block.  Zero-sized bitfields and bitfields that would
-  // cross an alignment boundary break a run and start a new one.
-  RecordDecl::field_iterator Run = FieldEnd;
-  // Tail is the offset of the first bit off the end of the current run.  It's
-  // used to determine if the ASTRecordLayout is treating these two bitfields 
as
-  // contiguous.  StartBitOffset is offset of the beginning of the Run.
-  uint64_t StartBitOffset, Tail = 0;
+namespace {
+
+// A run of bitfields assigned to the same access unit -- the size of memory
+// loads & stores.
+class BitFieldAccessUnit {
+  RecordDecl::field_iterator Begin; // Field at start of this access unit.
+  RecordDecl::field_iterator End;   // Field just after this access unit.
+
+  CharUnits StartOffset; // Starting offset in the containing record.
+  CharUnits EndOffset;   // Finish offset (exclusive) in the containing record.
+
+  bool ContainsVolatile; // This access unit contains a volatile bitfield.
+
+public:
+  // End barrier constructor.
+  BitFieldAccessUnit(RecordDecl::field_iterator F, CharUnits Offset,
+ bool Volatile = false)
+  : Begin(F), End(F), StartOffset(Offset), EndOffset(Offset),
+ContainsVolatile(Volatile) {}
+
+  // Collect contiguous bitfields into an access unit.
+  BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin,
+ RecordDecl::field_iterator FieldEnd,
+ const CGRecordLowering );

rjmccall wrote:

The structure here is pretty weird to me.  This class seems to be mostly a 
helper class for breaking down the bit-field run into access units, but it's 
also a short-term representation of those access units?  If having a helper 
class for breaking the run is useful, let's separate that and have it generate 
these access units.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-01 Thread John McCall via cfe-commits


@@ -394,33 +412,155 @@ void CGRecordLowering::accumulateFields() {
   : getStorageType(*Field),
   *Field));
   ++Field;
-} else {
-  ++Field;
 }
   }
 }
 
-void
-CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
-  RecordDecl::field_iterator FieldEnd) {
-  // Run stores the first element of the current run of bitfields.  FieldEnd is
-  // used as a special value to note that we don't have a current run.  A
-  // bitfield run is a contiguous collection of bitfields that can be stored in
-  // the same storage block.  Zero-sized bitfields and bitfields that would
-  // cross an alignment boundary break a run and start a new one.
-  RecordDecl::field_iterator Run = FieldEnd;
-  // Tail is the offset of the first bit off the end of the current run.  It's
-  // used to determine if the ASTRecordLayout is treating these two bitfields 
as
-  // contiguous.  StartBitOffset is offset of the beginning of the Run.
-  uint64_t StartBitOffset, Tail = 0;
+namespace {
+
+// A run of bitfields assigned to the same access unit -- the size of memory
+// loads & stores.
+class BitFieldAccessUnit {
+  RecordDecl::field_iterator Begin; // Field at start of this access unit.
+  RecordDecl::field_iterator End;   // Field just after this access unit.
+
+  CharUnits StartOffset; // Starting offset in the containing record.
+  CharUnits EndOffset;   // Finish offset (exclusive) in the containing record.
+
+  bool ContainsVolatile; // This access unit contains a volatile bitfield.
+
+public:
+  // End barrier constructor.
+  BitFieldAccessUnit(RecordDecl::field_iterator F, CharUnits Offset,
+ bool Volatile = false)
+  : Begin(F), End(F), StartOffset(Offset), EndOffset(Offset),
+ContainsVolatile(Volatile) {}
+
+  // Collect contiguous bitfields into an access unit.
+  BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin,
+ RecordDecl::field_iterator FieldEnd,
+ const CGRecordLowering );
+
+  // Compute the access unit following this one -- which might be a barrier at
+  // Limit.
+  BitFieldAccessUnit accumulateNextUnit(RecordDecl::field_iterator FieldEnd,
+CharUnits Limit,
+const CGRecordLowering ) const {
+return end() != FieldEnd ? BitFieldAccessUnit(end(), FieldEnd, CGRL)
+ : BitFieldAccessUnit(FieldEnd, Limit);
+  }
+  // Re-set the end of this unit if there is space before Probe starts.
+  void enlargeIfSpace(const BitFieldAccessUnit , CharUnits Offset) {
+if (Probe.getStartOffset() >= Offset) {
+  End = Probe.begin();
+  EndOffset = Offset;
+}
+  }
+
+public:
+  RecordDecl::field_iterator begin() const { return Begin; }
+  RecordDecl::field_iterator end() const { return End; }
+
+public:
+  // Accessors
+  CharUnits getSize() const { return EndOffset - StartOffset; }
+  CharUnits getStartOffset() const { return StartOffset; }
+  CharUnits getEndOffset() const { return EndOffset; }
+
+  // Predicates
+  bool isBarrier() const { return getSize().isZero(); }
+  bool hasVolatile() const { return ContainsVolatile; }
+
+  // Create the containing access unit and install the bitfields.
+  void installUnit(CGRecordLowering ) const {
+if (!isBarrier()) {
+  // Add the storage member for the access unit to the record. The
+  // bitfields get the offset of their storage but come afterward and 
remain
+  // there after a stable sort.
+  llvm::Type *Type = CGRL.getIntNType(CGRL.Context.toBits(getSize()));
+  CGRL.Members.push_back(CGRL.StorageInfo(getStartOffset(), Type));
+  for (auto F : *this)
+if (!F->isZeroLengthBitField(CGRL.Context))
+  CGRL.Members.push_back(CGRecordLowering::MemberInfo(
+  getStartOffset(), CGRecordLowering::MemberInfo::Field, nullptr,
+  F));
+}
+  }
+};
+
+// Create an access unit of contiguous bitfields.
+BitFieldAccessUnit::BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin,
+   RecordDecl::field_iterator FieldEnd,
+   const CGRecordLowering )
+: BitFieldAccessUnit(FieldBegin, CharUnits::Zero(),
+ FieldBegin->getType().isVolatileQualified()) {
+  assert(End != FieldEnd);
+
+  uint64_t StartBit = CGRL.getFieldBitOffset(*FieldBegin);
+  uint64_t BitSize = End->getBitWidthValue(CGRL.Context);
+  unsigned CharBits = CGRL.Context.getCharWidth();
+
+  assert(!(StartBit % CharBits) && "Not at start of char");
+
+  ++End;
+  if (BitSize ||
+  !(CGRL.Context.getTargetInfo().useZeroLengthBitfieldAlignment() ||
+CGRL.Context.getTargetInfo().useBitFieldTypeAlignment()))
+// The first field is not a (zero-width) barrier. Collect contiguous 
fields.
+for (; End != FieldEnd; ++End) {
+  uint64_t BitOffset = 

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-01 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-01 Thread John McCall via cfe-commits

https://github.com/rjmccall commented:

I think using the existing information as a default is fine, but please 
continue to call the hook something like `hasCheapUnalignedAccess()` to 
preserve the intent, even if it always just calls the other hook.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [cfi][CodeGen] Call SetLLVMFunctionAttributes{, ForDefinition} on __cf… (PR #78253)

2024-03-01 Thread John McCall via cfe-commits

https://github.com/rjmccall approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/78253
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [ObjC] Check entire chain of superclasses to see if class layout is statically known (PR #81335)

2024-02-26 Thread John McCall via cfe-commits

https://github.com/rjmccall approved this pull request.

Thanks, that looks great.

https://github.com/llvm/llvm-project/pull/81335
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-23 Thread John McCall via cfe-commits

rjmccall wrote:

> > On the target hook, it's a shame we can't easily get this information from 
> > LLVM. I believe it's already there — `TargetLowering` has an 
> > `allowsMisalignedMemoryAccesses` method that includes some approximation of 
> > how fast a particular access would be. In practice, it seems to be quite 
> > complex and often specific to the type and subtarget. Maybe it'd be worth 
> > starting a conversation with LLVM folks to see if this could reasonably be 
> > lifted somewhere we could use it?
> 
> Agreed, I did look at that, and AFAICT the allowsMisalignedMemoryAccess is a 
> property of the llvm subtarget, which can, IIUC, change on a per-function 
> basis? One at least needs a function codegen context around? And, clang needs 
> to generate the llvm-structure in non-function contexts (eg namespace-scope 
> variables). I'd really like to not tangle this particular change with a 
> complex reorg.

Yeah, LLVM supports changing subtarget options on a per-function basis.  We 
would presumably make the query based on the global setting.

Anyway, getting this information from LLVM doesn't seem tractable in the 
short-to-medium term; it's just unfortunate.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [ObjC] Check entire chain of superclasses to see if class layout can be statically known (PR #81335)

2024-02-22 Thread John McCall via cfe-commits


@@ -1,6 +1,12 @@
 // RUN: %clang_cc1 -triple x86_64-apple-macosx10.14.0 -emit-llvm %s -o - | 
FileCheck %s
 
 // CHECK: @"OBJC_IVAR_$_StaticLayout.static_layout_ivar" = hidden constant i64 
20
+// CHECK: @"OBJC_IVAR_$_SuperClass.superClassIvar" = hidden constant i64 20

rjmccall wrote:

Please add a test for the access to this.  It'll be in a synthesized method.

https://github.com/llvm/llvm-project/pull/81335
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [ObjC] Check entire chain of superclasses to see if class layout can be statically known (PR #81335)

2024-02-22 Thread John McCall via cfe-commits


@@ -17,9 +23,71 @@ @implementation StaticLayout {
 -(void)meth {
   static_layout_ivar = 0;
   // CHECK-NOT: load i64, ptr @"OBJC_IVAR_$_StaticLayout
+  // CHECK: getelementptr inbounds i8, ptr %0, i64 20
 }
 @end
 
+// Ivars declared in the @interface
+@interface SuperClass : NSObject
+@property (nonatomic, assign) int superClassProperty;
+@end
+
+@implementation SuperClass {
+  int superClassIvar; // Declare an ivar
+}
+- (void)superClassMethod {
+_superClassProperty = 42;
+superClassIvar = 10;
+// CHECK-NOT: load i64, ptr @"OBJC_IVAR_$_SuperClass

rjmccall wrote:

Please add CHECK-LABEL lines for the `define` lines for each of the method 
definitions containing these ivar accesses.  You can look at 
test/CodeGenObjC/direct-methods.m for the pattern I'm looking for here.

https://github.com/llvm/llvm-project/pull/81335
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [ObjC] Check entire chain of superclasses to see if class layout can be statically known (PR #81335)

2024-02-22 Thread John McCall via cfe-commits


@@ -1,6 +1,12 @@
 // RUN: %clang_cc1 -triple x86_64-apple-macosx10.14.0 -emit-llvm %s -o - | 
FileCheck %s
 
 // CHECK: @"OBJC_IVAR_$_StaticLayout.static_layout_ivar" = hidden constant i64 
20
+// CHECK: @"OBJC_IVAR_$_SuperClass.superClassIvar" = hidden constant i64 20
+// CHECK: @"OBJC_IVAR_$_SuperClass._superClassProperty" = hidden constant i64 
24
+// CHECK: @"OBJC_IVAR_$_IntermediateClass.intermediateClassIvar" = constant 
i64 32
+// CHECK: @"OBJC_IVAR_$_IntermediateClass.intermediateClassIvar2" = constant 
i64 40

rjmccall wrote:

Please add a test for an access to this.

https://github.com/llvm/llvm-project/pull/81335
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [ObjC] Check entire chain of superclasses to see if class layout can be statically known (PR #81335)

2024-02-19 Thread John McCall via cfe-commits

https://github.com/rjmccall commented:

Thanks, the breadth of tests looks good.  Please improve the actual testing, 
though — CHECK lines are testing the IR output file at a whole, so now that 
there are multiple tests in this file, we really need to make these tests more 
specific.

- First, please add CHECK-LABEL lines for the functions you're trying to test 
so that we know that the checks are actually applying to the code in that 
function.
- Second, I know the existing test is doing this CHECK-NOT thing, but it should 
really always have been testing that the ivar access actually uses the right 
offset; please do that, both for your new tests and the existing one.
- Finally, please make sure you test at least one access for each of the ivars. 
 Putting accesses in separate methods will help with ensuring that you're 
testing each specifically.

As a procedural note, I was going to review this last week after you responded 
to the review, but then I saw that you kept uploading new versions of the 
patch, so I held off.  I certainly understand needing to update the patch a few 
times; I do that myself all the time.  Just make sure you give me a heads up 
that the patch has been updated and is actually ready for review: either hold 
your response until your patch is ready, or make a short comment like "Sorry 
about that, patch is ready for review now." when the updates are done.

https://github.com/llvm/llvm-project/pull/81335
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-02-15 Thread John McCall via cfe-commits

rjmccall wrote:

> @rjmccall fixed feedback
> 
> > You should handle compound assignments.
> 
> I believe I already support compound assignments (and PrePostIncDec), if I am 
> not missing something? 

Ah, maybe you are, sorry.  I was expecting it in `CGExpr.cpp` and forgot that 
that code was actually in `CGExprScalar.cpp`.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Revert "[X86][clang] Lift _BitInt() supported max width." (PR #81175)

2024-02-15 Thread John McCall via cfe-commits

rjmccall wrote:

There's no such thing as "this is impossible to do".  Clang, as the frontend, 
is responsible for emitting IR that has gets the effect we want.  If that means 
contorting the IR we generate to do ugly things like memcpying into a 
temporary, that's our life.

I am not surprised that we do not get any sort of reliable behavior from 
backends for `iNNN` when that's not architectural-legal.  I'm pretty sure I 
raised that concern during the review.

https://github.com/llvm/llvm-project/pull/81175
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [Sanitizer] add signed-integer-wrap sanitizer (PR #80089)

2024-02-14 Thread John McCall via cfe-commits

rjmccall wrote:

> > > > Why not just enforce -fsanitize=signed-integer-overflow with -fwrapv? I 
> > > > suspect it's just overlook, and not intentional behavior.
> > > 
> > > 
> > > +1
> > > We should consider this direction
> > 
> > 
> > The UB-vs-non-UB seemed to be a really specific goal in the existing code. 
> > i.e. that the sanitizer was disabled didn't look like an accident. For 
> > people using this to find _only_ UB, this would be a behavioral change, so 
> > to me it seems like a separate name makes the most sense. Anyone wanting 
> > wrap-around checking can use -wrap, and anyone wanting UB checking can use 
> > -overflow.
> 
> Isn't this still UB even with -fwrapv? UB is a language feature, not compiler.

`-fwrapv` is essentially a language dialect that defines the behavior of 
integer wraparound.  It is no longer UB in compilations using that mode.

https://github.com/llvm/llvm-project/pull/80089
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [Sanitizer] add signed-integer-wrap sanitizer (PR #80089)

2024-02-14 Thread John McCall via cfe-commits

https://github.com/rjmccall commented:

Yeah, LGTM.

https://github.com/llvm/llvm-project/pull/80089
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-02-13 Thread John McCall via cfe-commits


@@ -122,6 +122,26 @@ Non-comprehensive list of changes in this release
 
 New Compiler Flags
 --
+- ``-fsanitize=implicit-unsigned-bitfield-truncation`` catches implicit
+  unsigned conversions involving bitfields.
+- ``-fsanitize=implicit-signed-bitfield-truncation`` catches implicit
+  signed conversions involving bitfields.
+- ``-fsanitize=implicit-bitfield-sign-change`` catches implicit
+  conversions involving bitfields that result in a sign change.
+- ``-fsanitize=implicit-bitfield-truncation`` a group to include both
+  ``-fsanitize=implicit-unsigned-bitfield-truncation`` and
+  ``-fsanitize=implicit-signed-bitfield-truncation``.
+- ``-fsanitize=implicit-bitfield-arithmetic-value-change`` a group to
+  include both ``implicit-signed-bitfield-truncation`` and
+  ``implicit-bitfield-sign-change``.
+- ``-fsanitize=implicit-bitfield-conversion`` a group to include
+  ``-fsanitize=implicit-unsigned-bitfield-truncation``,
+  ``-fsanitize=implicit-signed-bitfield-truncation`` and
+  ``implicit-bitfield-sign-change``.
+- ``-fsanitize=implicit-integer-conversion`` a group to include
+  ``-fsanitize=implicit-unsigned-integer-truncation``,
+  ``-fsanitize=implicit-signed-integer-truncation`` and
+  ``implicit-integer-sign-change``.

rjmccall wrote:

Do we really need a million different flags here?  The reason we have different 
checks for different signednesses with the normal conversion checks, I think, 
is because the standard actually treats them differently: some of those checks 
are actually checking for UB.  Since there's no UB here in any case, I think 
you may be overthinking this.  Consider just having 
`-fsanitize=implicit-bitfield-conversion`, which covers every case in which the 
value we'd read out the bitfield is not the value that we assigned into it.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-02-13 Thread John McCall via cfe-commits

https://github.com/rjmccall commented:

You should handle compound assignments.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-02-13 Thread John McCall via cfe-commits


@@ -5570,11 +5570,44 @@ LValue CodeGenFunction::EmitBinaryOperatorLValue(const 
BinaryOperator *E) {
   break;
 }
 
-RValue RV = EmitAnyExpr(E->getRHS());
+llvm::Value *Previous = nullptr;
+RValue RV;
+QualType SrcType = E->getRHS()->getType();
+// If LHS refers to a bitfield we want to retrieve the value before
+// implicit conversion between the bitfield type and the RHS type
+// and evaluate RHS without integer sanitizer checks (if passed)
+if (auto *ICE = RetrieveImplicitCastExprForBitfieldSanitizer(E)) {

rjmccall wrote:

When we say "conversion between A and B", it typically means A is being 
converted to B, but I think you're actually talking about the reverse here.

Please pull the basic conditions for whether to emit this check into this 
function (LHS is a bit-field + the sanitizer option); it will make the data 
flow much clearer.  You can just evaluate them once and then check it again 
below.

Please test for a bit-field LHS before checking for the sanitizer option.

The function call here should be something like 
`getOriginalRHSForBitfieldSanitizer`.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-02-13 Thread John McCall via cfe-commits


@@ -1097,6 +1112,27 @@ void ScalarExprEmitter::EmitIntegerTruncationCheck(Value 
*Src, QualType SrcType,
 {Src, Dst});
 }
 
+static llvm::Value *EmitIsNegativeTestHelper(Value *V, QualType VType,
+ const char *Name,
+ CGBuilderTy ) {
+  // NOTE: zero value is considered to be non-negative.

rjmccall wrote:

This is self-evident and does not need to be in a comment.  Actually, most of 
the comments in this function are pretty unnecessary; you can just say 
something like "If the type is unsigned, the value is never negative." and 
leave everything else self-documenting.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-02-13 Thread John McCall via cfe-commits


@@ -324,6 +326,19 @@ class ScalarExprEmitter
   void EmitIntegerSignChangeCheck(Value *Src, QualType SrcType, Value *Dst,
   QualType DstType, SourceLocation Loc);
 
+  /// Emit a check that an [implicit] truncation of a bitfield does not
+  /// discard any bits. It is not UB, so we use the value after truncation.

rjmccall wrote:

The last sentence of the comment seems out of place here (and on the next 
function).

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-02-13 Thread John McCall via cfe-commits


@@ -5570,11 +5570,44 @@ LValue CodeGenFunction::EmitBinaryOperatorLValue(const 
BinaryOperator *E) {
   break;
 }
 
-RValue RV = EmitAnyExpr(E->getRHS());
+llvm::Value *Previous = nullptr;
+RValue RV;
+QualType SrcType = E->getRHS()->getType();
+// If LHS refers to a bitfield we want to retrieve the value before
+// implicit conversion between the bitfield type and the RHS type
+// and evaluate RHS without integer sanitizer checks (if passed)
+if (auto *ICE = RetrieveImplicitCastExprForBitfieldSanitizer(E)) {
+  SrcType = ICE->getSubExpr()->getType();
+  Previous = EmitScalarExpr(ICE->getSubExpr());
+  // Pass default ScalarConversionOpts so that sanitizer check is
+  // not emitted here
+  llvm::Value *RHS = EmitScalarConversion(Previous, SrcType, 
ICE->getType(),
+  ICE->getExprLoc());
+  RV = RValue::get(RHS);
+}
+
+if (!Previous)
+  RV = EmitAnyExpr(E->getRHS());
+
 LValue LV = EmitCheckedLValue(E->getLHS(), TCK_Store);
+
 if (RV.isScalar())
   EmitNullabilityCheck(LV, RV.getScalarVal(), E->getExprLoc());
-EmitStoreThroughLValue(RV, LV);
+
+// Passing a pointer EmitStoreThroughBitfieldLValue will emit the result
+// If sanitizer checks are not used, we do not use the result
+// Hence, use EmitStoreThroughLValue instead
+if (SanOpts.hasOneOf(SanitizerKind::ImplicitBitfieldConversion) &&
+LV.isBitField() && RV.isScalar()) {

rjmccall wrote:

Integer types always use scalar emission.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-02-13 Thread John McCall via cfe-commits


@@ -1097,6 +1112,27 @@ void ScalarExprEmitter::EmitIntegerTruncationCheck(Value 
*Src, QualType SrcType,
 {Src, Dst});
 }
 
+static llvm::Value *EmitIsNegativeTestHelper(Value *V, QualType VType,
+ const char *Name,
+ CGBuilderTy ) {
+  // NOTE: zero value is considered to be non-negative.
+  // Is this value a signed type?
+  bool VSigned = VType->isSignedIntegerOrEnumerationType();
+  llvm::Type *VTy = V->getType();
+  if (!VSigned) {
+// If the value is unsigned, then it is never negative.
+// FIXME: can we encounter non-scalar VTy here?

rjmccall wrote:

No, you cannot.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-02-13 Thread John McCall via cfe-commits


@@ -1239,6 +1257,228 @@ void 
ScalarExprEmitter::EmitIntegerSignChangeCheck(Value *Src, QualType SrcType,
 {Src, Dst});
 }
 
+// Should be called within CodeGenFunction::SanitizerScope RAII scope.
+// Returns 'i1 false' when the truncation Src -> Dst was lossy.
+static std::pair>
+EmitBitfieldTruncationCheckHelper(Value *Src, QualType SrcType, Value *Dst,
+  QualType DstType, CGBuilderTy ) {
+
+  llvm::Type *SrcTy = Src->getType();
+  llvm::Type *DstTy = Dst->getType();
+  (void)SrcTy; // Only used in assert()
+  (void)DstTy; // Only used in assert()

rjmccall wrote:

Then you should surround these variables with `#ifndef NDEBUG`.  But mostly you 
don't need to do this; `CreateIntCast` will assert this for you.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-02-13 Thread John McCall via cfe-commits


@@ -1035,7 +1050,7 @@ EmitIntegerTruncationCheckHelper(Value *Src, QualType 
SrcType, Value *Dst,
   }
 
   llvm::Value *Check = nullptr;
-  // 1. Extend the truncated value back to the same width as the Src.
+  // 1. Convert the Dst back to the same type as Src.

rjmccall wrote:

This is not an improvement in the comment.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-02-13 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HLSL] Vector standard conversions (PR #71098)

2024-02-13 Thread John McCall via cfe-commits

https://github.com/rjmccall approved this pull request.

Thanks, LGTM

https://github.com/llvm/llvm-project/pull/71098
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-12 Thread John McCall via cfe-commits

rjmccall wrote:

> @rjmccall thanks for your comments. Here's a reimplementation using a single 
> loop -- the downside is that we may end up repeating the initial grouping 
> scan, if we search more than 1 Access Unit ahead and fail to find a 'good' 
> access unit. This update changes the algorithm slightly, as I realized that 
> `struct A { short a: 16; char b:8; int c;};` and `struct B { char b: 8; short 
> a:16; int c;};` should have the same (32-bit) access unit. Originally only 
> `A` got that.

Thanks.  Sorry about the delay, I'll try to take a look in the next day.

> 1. I noticed that several `CGRecordLowering` member fns could either be 
> `const` or `static` -- would you like that as a separate NFC PR?

Yes, please.

> 2. I have corrected the error about merging across zero-sized members.
> 3. It may not be obvious from GH, but this PR breaks down to 3 (or 4) 
> separate commits:
>a) A set of test cases, marked up for the current scheme
>b) A new Target hook indicating whether unaligned accesses are cheap
>c) [optional] the CGRecordLowering change just mentioned
>d) the algorithm change, which updates those tests and makes it easier to 
> see how the behaviour changes.
>Do you want that commit structure maintained?

That sounds like good commit structure.

On the target hook, it's a shame we can't easily get this information from 
LLVM.  I believe it's already there — `TargetLowering` has an 
`allowsMisalignedMemoryAccesses` method that includes some approximation of how 
fast a particular access would be.  In practice, it seems to be quite complex 
and often specific to the type and subtarget.  Maybe it'd be worth starting a 
conversation with LLVM folks to see if this could reasonably be lifted 
somewhere we could use it?

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HLSL] Vector standard conversions (PR #71098)

2024-02-12 Thread John McCall via cfe-commits


@@ -361,6 +361,9 @@ CAST_OPERATION(AddressSpaceConversion)
 // Convert an integer initializer to an OpenCL sampler.
 CAST_OPERATION(IntToOCLSampler)
 
+// Truncate a vector type (HLSL only).
+CAST_OPERATION(HLSLVectorTruncation)

rjmccall wrote:

Okay.  I think this patch overall looks fine to land, except please do expand 
on the comment here to explain what truncating a vector type means (dropping 
elements from the end).

https://github.com/llvm/llvm-project/pull/71098
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [ObjC] Check entire chain of superclasses to see if class layout can be statically known (PR #81335)

2024-02-12 Thread John McCall via cfe-commits

rjmccall wrote:

> Alright. I think the LTO should be done as a follow-up PR since it will 
> likely take some time and multiple reviews to get right.

Yes, that's a fine plan.  I expect that that'll be a significant amount of work.

https://github.com/llvm/llvm-project/pull/81335
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   6   7   >