[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-13 Thread via cfe-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff 55d4816393f897054a4721920502d45c645edf1d 
cb86c3c12959513309ccf97592d602ba4da36d1d -- 
clang/test/CodeGen/object-size-sub-object.c clang/lib/CodeGen/CGBuiltin.cpp 
clang/lib/CodeGen/CGExpr.cpp clang/test/CodeGen/attr-counted-by.c 
clang/test/CodeGen/catch-undef-behavior.c clang/test/CodeGen/object-size.c 
clang/test/CodeGen/object-size.cpp clang/test/CodeGen/pass-object-size.c 
clang/test/CodeGen/sanitize-recover.c 
llvm/include/llvm/Analysis/MemoryBuiltins.h 
llvm/lib/Analysis/MemoryBuiltins.cpp llvm/lib/IR/AutoUpgrade.cpp 
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
``





View the diff from clang-format here.


``diff
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index d8d4aa5288..7461d0ee00 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -1244,8 +1244,8 @@ CodeGenFunction::emitBuiltinObjectSize(const Expr *E, 
unsigned Type,
   // it's set, a closest surrounding subobject is considered the object a
   // pointer points to.
   Value *WholeObj = Builder.getInt1((Type & 1) == 0);
-  return Builder.CreateCall(F, {Ptr, Min, NullIsUnknown, Dynamic, WholeObj,
-SubobjectSize});
+  return Builder.CreateCall(
+  F, {Ptr, Min, NullIsUnknown, Dynamic, WholeObj, SubobjectSize});
 }
 
 namespace {
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index b0aa3579bd..d94a775cab 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -750,8 +750,8 @@ void CodeGenFunction::EmitTypeCheck(TypeCheckKind TCK, 
SourceLocation Loc,
   llvm::Value *WholeObj = Builder.getTrue();
   llvm::Value *SubobjectSize = Builder.getInt64(0);
   llvm::Value *LargeEnough = Builder.CreateICmpUGE(
-  Builder.CreateCall(F, {Ptr, Min, NullIsUnknown, Dynamic, WholeObj,
- SubobjectSize}),
+  Builder.CreateCall(
+  F, {Ptr, Min, NullIsUnknown, Dynamic, WholeObj, SubobjectSize}),
   Size);
   Checks.push_back(std::make_pair(LargeEnough, SanitizerKind::ObjectSize));
 }

``




https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-12 Thread Richard Smith via cfe-commits


@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 ""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size 
is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, 
if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The 
fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.

zygoloid wrote:

> > in some simple test cases it does appear to work.
> 
> Please don't be insulting. I didn't just run a couple of "simple" tests and 
> called it a day. I had many rather convoluted examples and they all worked.

I apologize, I didn't mean to imply you hadn't been diligent here.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-12 Thread Bill Wendling via cfe-commits


@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 ""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size 
is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, 
if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The 
fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.

bwendling wrote:

> in some simple test cases it does appear to work.

Please don't be insulting. I didn't just run a couple of "simple" tests and 
called it a day. I had many rather convoluted examples and they all worked.

> Sorry, that was a typo in my example: the call in h() should be f(&b.a).

Okay. This patch generates the correct answer (modified so it's calculated by 
the middle end):

```
$  clang -O2 ~/llvm/a.c && ./a.out
&((char *)&p->n)[idx]:
__builtin_dynamic_object_size(&((char *)&p->n)[idx], 1) = 3

&((char *)&p->n)[idx]:
__builtin_dynamic_object_size(&((char *)&p->n)[idx], 1) = 3
```

> I mean, pass a pointer to the start of the subobject. For example, for 
> p->arr[i], you'd pass in &p->arr.

I don't think that's necessary. If we're given the original size and the middle 
end is calculating the final size, it may be sufficient to ignore the offset 
all together for sub-objects. I'll see if it works.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-12 Thread Richard Smith via cfe-commits

https://github.com/zygoloid edited 
https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-12 Thread Richard Smith via cfe-commits


@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 ""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size 
is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, 
if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The 
fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.

zygoloid wrote:

> First you tell me that I can't use LLVM's IR to determine the subobject, even 
> though I did and it worked just fine

It definitely doesn't work fine, but sure, in some simple test cases it does 
appear to work.

> now you're saying that I can't use the front-end's knowledge about the 
> structure.

I'm not. I'm saying that you can't assume that the LLVM IR idea of the complete 
object lines up with some enclosing subobject you've found in the frontend. 
You're still trying to use the IR notion of complete object and subobjects, and 
that still doesn't work for the same reasons as before.

You can absolutely compute where the subobject is in the frontend, and pass 
that information onto LLVM. But you'll need to pass it on in a way that is 
meaningful to LLVM. For example, if you pass a pointer and size to the 
intrinsic describing the complete range of addresses covering the subobject, 
that should be fine. But an offset is not useful if the frontend and middle-end 
can't agree on what it's an offset relative to.

> In your example, you've explicitly lied to the compiler about the types being 
> passed in.

Sorry, that was a typo in my example: the call in `h()` should be `f(&b.a)`.

> I have no clue what you mean by pass a "pointer to `p->n` to the intrinsic" 
> as that's already the first argument in the intrinsic.

I mean, pass a pointer to the start of the subobject. For example, for 
`p->arr[i]`, you'd pass in `&p->arr`.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-12 Thread Bill Wendling via cfe-commits


@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 ""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size 
is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, 
if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The 
fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.

bwendling wrote:

You really need to make up your mind. First you tell me that I can't use LLVM's 
IR to determine the subobject, even though I did and it worked just fine, and 
now you're saying that I can't use the front-end's knowledge about the 
structure. In your example, you've explicitly lied to the compiler about the 
types being passed in. Unlike in GCC, we don't do any inlining in the 
front-end, so we can't properly handle this. I have no clue what you mean by 
pass a "pointer to `p->n` to the intrinsic" as that's already the first 
argument in the intrinsic.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-12 Thread Richard Smith via cfe-commits


@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 ""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size 
is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, 
if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The 
fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.

zygoloid wrote:

You're passing the difference between the address of some arbitrary class 
object and a field. The address of that class object is not the base address 
that the LLVM intrinsic will calculate. So the offset you're passing in is not 
meaningful. And there's no way it even *can* be meaningful. Consider something 
like this:

```c++
struct A {
  int n;
};
struct B {
  int data[100];
  A a;
};
int f(A *p) { return __builtin_dynamic_object_size(&p->n, 1); }
int g() {
  A a;
  return f(&a);
}
int h() {
  B b;
  return f(&b);
}
```

After inlining, the BDOS intrinsic in `g` will see a base pointer of `&a`, and 
the intrinsic in `h` will see a base pointer of `&b`. The offset from the base 
pointer to the `n` subobject is different in each case. So there's no way you 
can compute a constant offset in Clang's lowering and pass it to the intrinsic.

What you *can* do is pass a pointer to `p->n` to the intrinsic as the start of 
the subobject. That'll do the right thing regardless of what LLVM computes to 
be the complete enclosing object.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-12 Thread Bill Wendling via cfe-commits


@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 ""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size 
is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, 
if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The 
fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.

bwendling wrote:

I'm not passing a difference between two pointers; it's the offset from the 
start of the outermost `RecordDecl` to the object. The LLVM intrinsic 
recursively walks back through the instructions (that define the pointer) to 
try to calculate the value. It's rather convoluted and hard to read, because it 
involves two visitor classes called one from another. Eventually, it returns a 
Size / Offset pair (Offset from the start of the structure). At that point, I 
use the extra information I added to determine if those values are within the 
range of the sub-object. If it's outside of that range, I return 0. Otherwise, 
I calculate the remaining size after adjusting the offset (i.e. the offset is 
adjusted to be from the beginning of the sub-object rather than the start of 
the structure).

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-12 Thread Richard Smith via cfe-commits


@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 ""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size 
is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, 
if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The 
fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.

zygoloid wrote:

As far as I can see, the "offset" value you're passing in is a difference 
between two pointers both of which are unknown to the middle-end -- it's the 
difference from the start of the two-levels-out enclosing class subobject to 
the one-level-out enclosing field subobject, I think. I could be 
misunderstanding something, but I don't see how you can use that information to 
get correct results from the builtin. Can you explain how the LLVM intrinsic 
uses this information?

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Bill Wendling via cfe-commits


@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const 
Expr *E, unsigned Type,
   return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, 
IsSigned));
 }
 
+namespace {
+
+/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a
+/// __builtin_dynamic_object_size call. Information gathered from the
+/// sub-object is used by the back-end to determine the correct size when the
+/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking
+/// for the sub-object size).
+///
+/// The expectation is that we'll eventually hit one of three expression types:
+///
+///   1. DeclRefExpr - This is the expression for the base of the structure.
+///   2. MemberExpr - This is the field in the structure.
+///   3. CompoundLiteralExpr - This is for people who create something
+///  heretical like (struct foo has a flexible array member):
+///
+///(struct foo){ 1, 2 }.blah[idx];
+///
+/// All other expressions can be correctly handled with the current code.
+struct SubobjectFinder
+: public ConstStmtVisitor {
+  SubobjectFinder() = default;
+
+  
//======//
+  //Visitor Methods
+  
//======//
+
+  const Expr *VisitStmt(const Stmt *S) { return nullptr; }
+
+  const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; }
+  const Expr *VisitMemberExpr(const MemberExpr *E) { return E; }
+  const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) {
+return E;
+  }
+
+  const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) {
+return Visit(E->getBase());
+  }
+  const Expr *VisitCastExpr(const CastExpr *E) {

bwendling wrote:

It seems to. See https://godbolt.org/z/4xaY4191o for an example (the `&((char 
*)&var.z.a)[argc]` example looks through them.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Bill Wendling via cfe-commits


@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 ""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size 
is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, 
if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The 
fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.

bwendling wrote:

A lot of your review seems to be based on this differing of opinion of what to 
do when indexing outside of the object currently being pointed to. Let's get 
this sorted out before I make changes...

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Bill Wendling via cfe-commits


@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 ""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size 
is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, 
if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The 
fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.

bwendling wrote:

> I think the information you're passing in here isn't quite what we'd want. If 
> I'm reading the code correctly, the offset you're passing in is the field 
> offset relative to the immediately-enclosing record type, which doesn't give 
> us any information about either where the pointer is within the subobject, or 
> where the subobject is within the complete object, so this doesn't seem like 
> it can be enough information to produce a correct result.

That's the information which leads to the correct calculation. If you have a 
pointer like this:

```c
struct S {
  int a;
  char c[234];
  int b;
};

void foo(struct S *ptr) {
  size_t x = __builtin_dynamic_object_size(ptr->a[22], 1);
/* ... */
}
```

the value of `x` should be `0`. See https://godbolt.org/z/4xaY4191o for a list 
of examples that show this behavior (at least in GCC). Notice that this applies 
for the sub-object type only. If the __bdos value is `0`, then your behavior is 
the correct behavior.
   

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Richard Smith via cfe-commits


@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const 
Expr *E, unsigned Type,
   return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, 
IsSigned));
 }
 
+namespace {
+
+/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a
+/// __builtin_dynamic_object_size call. Information gathered from the
+/// sub-object is used by the back-end to determine the correct size when the
+/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking
+/// for the sub-object size).
+///
+/// The expectation is that we'll eventually hit one of three expression types:
+///
+///   1. DeclRefExpr - This is the expression for the base of the structure.
+///   2. MemberExpr - This is the field in the structure.
+///   3. CompoundLiteralExpr - This is for people who create something
+///  heretical like (struct foo has a flexible array member):
+///
+///(struct foo){ 1, 2 }.blah[idx];
+///
+/// All other expressions can be correctly handled with the current code.
+struct SubobjectFinder
+: public ConstStmtVisitor {
+  SubobjectFinder() = default;
+
+  
//======//
+  //Visitor Methods
+  
//======//
+
+  const Expr *VisitStmt(const Stmt *S) { return nullptr; }
+
+  const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; }
+  const Expr *VisitMemberExpr(const MemberExpr *E) { return E; }
+  const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) {
+return E;
+  }
+
+  const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) {
+return Visit(E->getBase());
+  }
+  const Expr *VisitCastExpr(const CastExpr *E) {

zygoloid wrote:

Does GCC look through explicit casts? I wonder if this should be restricted to 
`ImplicitCastExpr`s.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Richard Smith via cfe-commits


@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const 
Expr *E, unsigned Type,
   return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, 
IsSigned));
 }
 
+namespace {
+
+/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a
+/// __builtin_dynamic_object_size call. Information gathered from the
+/// sub-object is used by the back-end to determine the correct size when the
+/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking
+/// for the sub-object size).
+///
+/// The expectation is that we'll eventually hit one of three expression types:
+///
+///   1. DeclRefExpr - This is the expression for the base of the structure.
+///   2. MemberExpr - This is the field in the structure.
+///   3. CompoundLiteralExpr - This is for people who create something
+///  heretical like (struct foo has a flexible array member):
+///
+///(struct foo){ 1, 2 }.blah[idx];
+///
+/// All other expressions can be correctly handled with the current code.
+struct SubobjectFinder
+: public ConstStmtVisitor {
+  SubobjectFinder() = default;
+
+  
//======//
+  //Visitor Methods
+  
//======//
+
+  const Expr *VisitStmt(const Stmt *S) { return nullptr; }
+
+  const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; }
+  const Expr *VisitMemberExpr(const MemberExpr *E) { return E; }
+  const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) {
+return E;
+  }
+
+  const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) {
+return Visit(E->getBase());
+  }
+  const Expr *VisitCastExpr(const CastExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitParenExpr(const ParenExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryAddrOf(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryDeref(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }

zygoloid wrote:

I think you'll need to be more careful when walking through address-of / 
dereferences -- the set of things you should step over when traversing a 
pointer to an object is different from the set of things you should step over 
when traversing an object lvalue. For example, the bounds to use for 
`*p->member` will be computed as the bounds of `member`, which isn't correct. I 
think you could address this by either having separate traversals for pointers 
versus lvalues, or by avoiding (for example) stepping through lvalue-to-rvalue 
conversions when stepping over `CastExpr`s -- and in fact, the latter seems 
like a good idea in general, given that a `CastExpr` could do pretty much 
anything to the pointer / lvalue. In general, I think it's only really safe to 
step over casts that are a no-op for address purposes. Bitcasts seem OK, 
address space conversions seem OK, etc. but a lot of cast kinds are not going 
to be reasonable to step over.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Richard Smith via cfe-commits


@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const 
Expr *E, unsigned Type,
   return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, 
IsSigned));
 }
 
+namespace {
+
+/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a
+/// __builtin_dynamic_object_size call. Information gathered from the
+/// sub-object is used by the back-end to determine the correct size when the
+/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking
+/// for the sub-object size).
+///
+/// The expectation is that we'll eventually hit one of three expression types:
+///
+///   1. DeclRefExpr - This is the expression for the base of the structure.
+///   2. MemberExpr - This is the field in the structure.
+///   3. CompoundLiteralExpr - This is for people who create something
+///  heretical like (struct foo has a flexible array member):
+///
+///(struct foo){ 1, 2 }.blah[idx];
+///
+/// All other expressions can be correctly handled with the current code.
+struct SubobjectFinder
+: public ConstStmtVisitor {
+  SubobjectFinder() = default;
+
+  
//======//
+  //Visitor Methods
+  
//======//
+
+  const Expr *VisitStmt(const Stmt *S) { return nullptr; }
+
+  const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; }
+  const Expr *VisitMemberExpr(const MemberExpr *E) { return E; }
+  const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) {
+return E;
+  }
+
+  const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) {
+return Visit(E->getBase());
+  }
+  const Expr *VisitCastExpr(const CastExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitParenExpr(const ParenExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryAddrOf(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryDeref(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }
+};
+
+} // end anonymous namespace
+
+/// getFieldInfo - Gather the size and offset of the field \p VD in \p RD.
+static std::pair getFieldInfo(CodeGenFunction &CGF,
+  const RecordDecl *RD,
+  const ValueDecl *VD,
+  uint64_t Offset = 0) {
+  if (!RD)
+return std::make_pair(0, 0);
+
+  ASTContext &Ctx = CGF.getContext();
+  const ASTRecordLayout &Layout = Ctx.getASTRecordLayout(RD);
+  unsigned FieldNo = 0;
+
+  for (const Decl *D : RD->decls()) {
+if (const auto *Record = dyn_cast(D)) {
+  std::pair Res =
+  getFieldInfo(CGF, Record->getDefinition(), VD,
+   Offset + Layout.getFieldOffset(FieldNo));
+  if (Res.first != 0)
+return Res;
+  continue;
+}
+
+if (const auto *FD = dyn_cast(D); FD == VD) {
+  Offset += Layout.getFieldOffset(FieldNo);
+  return 
std::make_pair(Ctx.getTypeSizeInChars(FD->getType()).getQuantity(),
+Ctx.toCharUnitsFromBits(Offset).getQuantity());
+}
+
+if (isa(D))
+  ++FieldNo;
+  }

zygoloid wrote:

This work recursively looping through fields is not necessary: this function 
only succeeds when `VD` is a `FieldDecl`, so you can `dyn_cast` it to that 
type, then get the enclosing `DeclContext` to find the record, and use 
`FieldDecl::getFieldIndex` to find the field number.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Richard Smith via cfe-commits


@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const 
Expr *E, unsigned Type,
   return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, 
IsSigned));
 }
 
+namespace {
+
+/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a
+/// __builtin_dynamic_object_size call. Information gathered from the
+/// sub-object is used by the back-end to determine the correct size when the
+/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking
+/// for the sub-object size).
+///
+/// The expectation is that we'll eventually hit one of three expression types:
+///
+///   1. DeclRefExpr - This is the expression for the base of the structure.
+///   2. MemberExpr - This is the field in the structure.
+///   3. CompoundLiteralExpr - This is for people who create something
+///  heretical like (struct foo has a flexible array member):
+///
+///(struct foo){ 1, 2 }.blah[idx];
+///
+/// All other expressions can be correctly handled with the current code.
+struct SubobjectFinder
+: public ConstStmtVisitor {
+  SubobjectFinder() = default;
+
+  
//======//
+  //Visitor Methods
+  
//======//
+
+  const Expr *VisitStmt(const Stmt *S) { return nullptr; }
+
+  const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; }
+  const Expr *VisitMemberExpr(const MemberExpr *E) { return E; }
+  const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) {
+return E;
+  }
+
+  const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) {
+return Visit(E->getBase());
+  }
+  const Expr *VisitCastExpr(const CastExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitParenExpr(const ParenExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryAddrOf(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryDeref(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }
+};
+
+} // end anonymous namespace
+
+/// getFieldInfo - Gather the size and offset of the field \p VD in \p RD.
+static std::pair getFieldInfo(CodeGenFunction &CGF,
+  const RecordDecl *RD,
+  const ValueDecl *VD,
+  uint64_t Offset = 0) {
+  if (!RD)
+return std::make_pair(0, 0);
+
+  ASTContext &Ctx = CGF.getContext();
+  const ASTRecordLayout &Layout = Ctx.getASTRecordLayout(RD);
+  unsigned FieldNo = 0;
+
+  for (const Decl *D : RD->decls()) {
+if (const auto *Record = dyn_cast(D)) {
+  std::pair Res =
+  getFieldInfo(CGF, Record->getDefinition(), VD,
+   Offset + Layout.getFieldOffset(FieldNo));
+  if (Res.first != 0)
+return Res;
+  continue;
+}
+
+if (const auto *FD = dyn_cast(D); FD == VD) {
+  Offset += Layout.getFieldOffset(FieldNo);
+  return 
std::make_pair(Ctx.getTypeSizeInChars(FD->getType()).getQuantity(),
+Ctx.toCharUnitsFromBits(Offset).getQuantity());
+}
+
+if (isa(D))
+  ++FieldNo;
+  }
+
+  return std::make_pair(0, 0);
+}
+
+/// getSubobjectInfo - Find the sub-object that \p E points to. If it lives
+/// inside a struct, return the "size" and "offset" of that sub-object.
+static std::pair getSubobjectInfo(CodeGenFunction &CGF,
+  const Expr *E) {
+  const Expr *Subobject = SubobjectFinder().Visit(E);
+  if (!Subobject)
+return std::make_pair(0, 0);
+
+  const RecordDecl *OuterRD = nullptr;
+  const ValueDecl *VD = nullptr;
+
+  if (const auto *DRE = dyn_cast(Subobject)) {
+// We're pointing to the beginning of the struct.
+VD = DRE->getDecl();
+QualType Ty = VD->getType();
+if (Ty->isPointerType())
+  Ty = Ty->getPointeeType();
+OuterRD = Ty->getAsRecordDecl();

zygoloid wrote:

If I'm reading this correctly, I think this case is redundant: `getFieldInfo` 
only succeeds when `VD` is a field, but we're not going to have an evaluated 
`DeclRefExpr` that names a field. Can we return `0, 0` in this case, like we do 
for compound literals? I think the only case when we have non-zero values to 
return is when we've found a `FieldDecl`.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Richard Smith via cfe-commits


@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const 
Expr *E, unsigned Type,
   return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, 
IsSigned));
 }
 
+namespace {
+
+/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a
+/// __builtin_dynamic_object_size call. Information gathered from the
+/// sub-object is used by the back-end to determine the correct size when the
+/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking
+/// for the sub-object size).
+///
+/// The expectation is that we'll eventually hit one of three expression types:
+///
+///   1. DeclRefExpr - This is the expression for the base of the structure.
+///   2. MemberExpr - This is the field in the structure.
+///   3. CompoundLiteralExpr - This is for people who create something
+///  heretical like (struct foo has a flexible array member):
+///
+///(struct foo){ 1, 2 }.blah[idx];
+///
+/// All other expressions can be correctly handled with the current code.
+struct SubobjectFinder
+: public ConstStmtVisitor {
+  SubobjectFinder() = default;
+
+  
//======//
+  //Visitor Methods
+  
//======//
+
+  const Expr *VisitStmt(const Stmt *S) { return nullptr; }
+
+  const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; }
+  const Expr *VisitMemberExpr(const MemberExpr *E) { return E; }
+  const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) {
+return E;
+  }
+
+  const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) {
+return Visit(E->getBase());
+  }
+  const Expr *VisitCastExpr(const CastExpr *E) {
+return Visit(E->getSubExpr());
+  }

zygoloid wrote:

A derived-to-base cast is a traversal to a subobject in C++, so we should 
presumably terminate the traversal when we reach one and use the offset and 
size of the base class as the subobject.

That'd be a pretty big delta from what you have here, but it'd be a good idea 
to add a FIXME here.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Richard Smith via cfe-commits


@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 ""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size 
is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, 
if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The 
fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.

zygoloid wrote:

I think the information you're passing in here isn't quite what we'd want. If 
I'm reading the code correctly, the offset you're passing in is the field 
offset relative to the immediately-enclosing record type, which doesn't give us 
any information about either where the pointer is within the subobject, or 
where the subobject is within the complete object, so this doesn't seem like it 
can be enough information to produce a correct result.

Rather than passing in the offset of the subobject (relative to an unknown 
anchor point), I think it would be more useful to pass in a pointer to the 
start of the subobject. Or to pass in the offset from the start of the 
subobject to the pointer argument, but that would likely be harder for the 
frontend to calculate (eg, you'd need to work out the offset produced by array 
indexing).

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Bill Wendling via cfe-commits

bwendling wrote:

Another ping.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-05 Thread Bill Wendling via cfe-commits

bwendling wrote:

Friendly ping.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-02-27 Thread Bill Wendling via cfe-commits

https://github.com/bwendling edited 
https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-02-27 Thread Bill Wendling via cfe-commits

bwendling wrote:

The first PR attempt was here: https://github.com/llvm/llvm-project/pull/78526

It was NACK'ed because it used the LLVM IR representation of the structure, 
which wasn't appropriate. To solve that issue, I chose to expand the 
`llvm.objectsize()` builtin to contain the size and offset of the sub-object, 
which is determined in the front-end.

Note that there are many other things wrong with our 
`__builtin_{dynamic_}object_size` implementations. For one, they perform loads 
of the pointer which isn't necessary and contrary to the idea that the builtins 
don't allow for side-effects. I'll be addressing those issues in future patches.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-02-27 Thread via cfe-commits

llvmbot wrote:



@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-clang

Author: Bill Wendling (bwendling)


Changes

The second argument of __builtin_dynamic_object_size controls whether it
returns the size of the whole object or the closest surrounding object.
For this struct:

  struct s { 
int  foo;
char bar[2][40];
int  baz;
int  qux;
  };  

  int main(int argc, char **argv) {
struct s f;

  #define report(x) printf(#x ": %zu\n", x)

argc = 1;
report(__builtin_dynamic_object_size(f.bar[argc], 0));
report(__builtin_dynamic_object_size(f.bar[argc], 1));
return 0;
  }

should return:

  __builtin_dynamic_object_size(f.bar[argc], 0): 48
  __builtin_dynamic_object_size(f.bar[argc], 1): 40

determined by the least significant bit of the TYPE.

The LLVM IR isn't sufficient to determine what could be considered a
"sub-object". Instead determine the size / offset info in the front-end
and pass that information along with the intrinsic.

This expands the llvm.objectsize intrinsic to add these three new 
fields:

  - The fifth argument controls which object:
- If false, return the size of the closest surrounding object.
- If true, return the size of the whole object from the pointer.
  - If non-zero and the fifth argument is 'false', the size of the 
sub-object.
  - If non-zero and the fifth argument is 'false', the offset of the 
sub-object.


---

Patch is 172.77 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/83204.diff


45 Files Affected:

- (modified) clang/lib/CodeGen/CGBuiltin.cpp (+163-10) 
- (modified) clang/lib/CodeGen/CGExpr.cpp (+8-3) 
- (modified) clang/test/CodeGen/catch-undef-behavior.c (+1-1) 
- (added) clang/test/CodeGen/object-size-sub-object.c (+311) 
- (modified) clang/test/CodeGen/object-size.c (+549-221) 
- (modified) clang/test/CodeGen/object-size.cpp (+7-7) 
- (modified) clang/test/CodeGen/pass-object-size.c (+3-3) 
- (modified) clang/test/CodeGen/sanitize-recover.c (+1-1) 
- (modified) llvm/docs/LangRef.rst (+34-14) 
- (modified) llvm/include/llvm/Analysis/MemoryBuiltins.h (+9) 
- (modified) llvm/include/llvm/IR/Intrinsics.td (+4-3) 
- (modified) llvm/lib/Analysis/MemoryBuiltins.cpp (+20-4) 
- (modified) llvm/lib/IR/AutoUpgrade.cpp (+63-6) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+2-1) 
- (modified) llvm/test/Analysis/CostModel/X86/free-intrinsics.ll (+4-4) 
- (modified) llvm/test/Analysis/CostModel/free-intrinsics-datalayout.ll (+4-4) 
- (modified) llvm/test/Analysis/CostModel/free-intrinsics-no_info.ll (+4-4) 
- (modified) llvm/test/Assembler/auto_upgrade_intrinsics.ll (+4-4) 
- (modified) llvm/test/Bitcode/objectsize-upgrade-7.0.ll (+2-2) 
- (modified) llvm/test/CodeGen/AArch64/GlobalISel/memcpy_chk_no_tail.ll (+2-2) 
- (modified) llvm/test/CodeGen/AArch64/memsize-remarks.ll (+16-16) 
- (modified) llvm/test/CodeGen/AMDGPU/promote-alloca-mem-intrinsics.ll (+4-4) 
- (modified) llvm/test/Other/cgscc-libcall-update.ll (+2-2) 
- (modified) llvm/test/Transforms/InferAddressSpaces/AMDGPU/debug-info.ll 
(+4-4) 
- (modified) llvm/test/Transforms/InferAddressSpaces/AMDGPU/intrinsics.ll 
(+6-6) 
- (modified) llvm/test/Transforms/InferAlignment/propagate-assume.ll (+2-2) 
- (modified) llvm/test/Transforms/Inline/call-intrinsic-objectsize.ll (+3-3) 
- (modified) llvm/test/Transforms/InstCombine/allocsize.ll (+5-5) 
- (modified) llvm/test/Transforms/InstCombine/assume_inevitable.ll (+1-1) 
- (modified) llvm/test/Transforms/InstCombine/builtin-dynamic-object-size.ll 
(+13-13) 
- (modified) llvm/test/Transforms/InstCombine/builtin-object-size-custom-dl.ll 
(+3-3) 
- (modified) 
llvm/test/Transforms/InstCombine/builtin-object-size-strdup-family.ll (+14-10) 
- (modified) llvm/test/Transforms/InstCombine/invoke.ll (+1-1) 
- (modified) llvm/test/Transforms/InstCombine/memset_chk-1.ll (+5-5) 
- (modified) llvm/test/Transforms/InstCombine/objsize.ll (+38-38) 
- (modified) llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll (+4-4) 
- (modified) llvm/test/Transforms/InstCombine/strcpy_chk-1.ll (+5-5) 
- (modified) 
llvm/test/Transforms/LowerConstantIntrinsics/builtin-object-size-load.ll (+2-2) 
- (modified) 
llvm/test/Transforms/LowerConstantIntrinsics/builtin-object-size-phi.ll (+5-5) 
- (modified) 
llvm/test/Transforms/LowerConstantIntrinsics/builtin-object-size-posix-memalign.ll
 (+8-8) 
- (modified) 
llvm/test/Transforms/LowerConstantIntrinsics/constant-intrinsics.ll (+2-2) 
- (modified) 
llvm/test/Transforms/LowerConstantIntrinsics/crash-on-large-allocas.ll (+2-2) 
- (modified) llvm/test/Transforms/LowerConstantIntrinsics/objectsize_basic.ll 
(+18-18) 
- (modified) llvm/test/Transforms/LowerConstantIntrinsics/stale-worklist-phi.ll 
(+2-2) 
- (modified) llvm/test/Transforms/SCCP/issue59602-assume-like-call-users.ll 
(+5-5) 


``diff
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 2d16e7cdc06053..9d8e5671d9d12d 100644
--- a/clang/lib/C