Re: [PATCH v2] bpf: add preserve_field_info builtin

2022-10-25 Thread Jose E. Marchesi via Gcc-patches


> Hi Jose,
>
> Thanks for your comments. I think I've addressed them all in the updated
> patch below.
>
>>>+  get_inner_reference (src, , , _off, , ,
>>>+   , );
>>
>>Since the information returned by the builtin is always constant
>>(positions, sizes) I think you will want to adjust the code for the
>>eventuality of variable positioned fields and also variable sized
>>fields.
>>
>>get_inner_reference sets var_off to a tree if the position of the field
>>is variable.  In these cases `bitpos' is relative to that position.
>>
>>Likewise, get_inner_reference sets `mode' is set to BLKmode and
>>`bitsize' will be set to -1.
>>
>>I'm not sure what the built-in is supposed to do/return in these cases.
>>I guess it makes sense to error out, but what does LLVM do?
>
> I would have thought erroring out the only option, but it seems that
> LLVM will return a value from the builtin and record a CO-RE relocation
> as normal.
>
> What value will be returned depends of course on KIND, but from what
> I can tell it seems that such fields are treated as having an offset of
> 0 bits and/or a size of 0 bits. For example FIELD_BYTE_SIZE for a
> flexible-length array will return 0. FIELD_RSHIFT_U64 will be
> calculated as 64 - 0 = 64.
>
> This sort of makes sense if you expect that any BPF loader will honor
> the CO-RE relocations and patch the return value before the program is
> run, i.e. the actual values at compile time are irrelevant.
>
> But, I'm not sure that BPF loaders in practice actually _can_ patch the
> return value correctly. The source of information for resolving the
> relocations is the BTF. But the BTF won't have more information about
> variable position/size members. A flexible-length array for example in
> BTF is represented as an array type with 0 elements. So the size
> calculation when patching the relocation (looking at the impl in
> libbpf) will be elem_size * nelems = 0, and the 'patched' values will
> be the same as the unpatched.
>
> I'm not sure whether this behavior is a known limitation or an
> oversight. In my opinion it makes more sense to error at compile time,
> becuase even after the loader patches the return value it still will
> not be correct for these cases.
>
> So for now I've set these cases to error out, but it would be just as
> simple to mimic the LLVM behavior. WDYT?

I would say it makes more sense to error out than to return invalid
data.

However, the divergence wrt LLVM is a concern.  What about keeping this
behavior in the GCC backend and simultaneously raise the issue in
bpf@vger?  If it was a design oversight and the change doesn't impact
kernel sources, they may be willing to change it.

>>If I read this properly, for something like:
>>
>>__builtin_preserve_field_info (a = foo.bar + bar.baz, KIND)
>>
>>On one side CO-RE relocations are computed for both foo.bar and bar.baz
>>(I see bpf_core_compute does that) as expected.
>>
>>But then the builtin returns information that can only apply to one
>>access.  Which one?
>
> Expressions like this should not be accepted by the builtin. I didn't
> consider this case in v1 so it led to an ICE. Clang rejects this
> outright and errors with "argument 1 is not a field access". It is
> actually very strict about the expressions that are accepted, unlike
> __builtin_preserve_access_index.
>
> I have updated this implementation to behave more like clang in that
> it will reject any expression that isn't directly a field access. That
> even includes rejecting things like:
>
>   __builtin_preserve_field_info (, KIND)
>
> Since unlike preserve_access_index this builtin does not actually
> perform the operation in EXPR, it makes sense to enforce that EXPR must
> be exactly a single field access.

Ok, thanks.

> [...]
> +@deftypefn {Built-in Function} unsigned int __builtin_preserve_field_info 
> (@var{expr}, unsigned int @var{kind})
> +BPF Compile Once-Run Everywhere (CO-RE) support. This builtin is used to
> +extract information to aid in struct/union relocations.  @var{expr} is
> +an access to a field of a struct or union. Depending on @var{kind}, different
> +information is returned to the program. A CO-RE relocation for the access in
> +@var{expr} with kind @var{kind} is recorded if @code{-mco-re} is in effect.
> +
> +The following values are supported for @var{kind}:
> +@table @var
> +@item FIELD_BYTE_OFFSET = 0
> +The returned value is the offset, in bytes, of the field from the
> +beginning of the containing structure.

What about bit fields?  Is this the byte offset of the containing word?

> +@item FIELD_BYTE_SIZE = 1
> +The returned value is the size, in bytes, of the field.

For bit fields,  is this the size of the containing word?

> +@item FIELD_EXISTENCE = 2
> +The returned value is 1 if the field exists, 0 otherwise. Always 1 at
> +compile time.
> +
> +@item FIELD_SIGNEDNESS = 3
> +The returned value is 1 if the field is signed, 0 otherwise.
> +
> +@item FIELD_LSHIFT_U64 = 4
> +@itemx FIELD_RSHIFT_U64 = 5
> 

[PATCH v2] bpf: add preserve_field_info builtin

2022-10-25 Thread David Faust via Gcc-patches
Hi Jose,

Thanks for your comments. I think I've addressed them all in the updated
patch below.

>>+  get_inner_reference (src, , , _off, , ,
>>+, );
>
>Since the information returned by the builtin is always constant
>(positions, sizes) I think you will want to adjust the code for the
>eventuality of variable positioned fields and also variable sized
>fields.
>
>get_inner_reference sets var_off to a tree if the position of the field
>is variable.  In these cases `bitpos' is relative to that position.
>
>Likewise, get_inner_reference sets `mode' is set to BLKmode and
>`bitsize' will be set to -1.
>
>I'm not sure what the built-in is supposed to do/return in these cases.
>I guess it makes sense to error out, but what does LLVM do?

I would have thought erroring out the only option, but it seems that
LLVM will return a value from the builtin and record a CO-RE relocation
as normal.

What value will be returned depends of course on KIND, but from what
I can tell it seems that such fields are treated as having an offset of
0 bits and/or a size of 0 bits. For example FIELD_BYTE_SIZE for a
flexible-length array will return 0. FIELD_RSHIFT_U64 will be
calculated as 64 - 0 = 64.

This sort of makes sense if you expect that any BPF loader will honor
the CO-RE relocations and patch the return value before the program is
run, i.e. the actual values at compile time are irrelevant.

But, I'm not sure that BPF loaders in practice actually _can_ patch the
return value correctly. The source of information for resolving the
relocations is the BTF. But the BTF won't have more information about
variable position/size members. A flexible-length array for example in
BTF is represented as an array type with 0 elements. So the size
calculation when patching the relocation (looking at the impl in
libbpf) will be elem_size * nelems = 0, and the 'patched' values will
be the same as the unpatched.

I'm not sure whether this behavior is a known limitation or an
oversight. In my opinion it makes more sense to error at compile time,
becuase even after the loader patches the return value it still will
not be correct for these cases.

So for now I've set these cases to error out, but it would be just as
simple to mimic the LLVM behavior. WDYT?


>If I read this properly, for something like:
>
>__builtin_preserve_field_info (a = foo.bar + bar.baz, KIND)
>
>On one side CO-RE relocations are computed for both foo.bar and bar.baz
>(I see bpf_core_compute does that) as expected.
>
>But then the builtin returns information that can only apply to one
>access.  Which one?

Expressions like this should not be accepted by the builtin. I didn't
consider this case in v1 so it led to an ICE. Clang rejects this
outright and errors with "argument 1 is not a field access". It is
actually very strict about the expressions that are accepted, unlike
__builtin_preserve_access_index.

I have updated this implementation to behave more like clang in that
it will reject any expression that isn't directly a field access. That
even includes rejecting things like:

  __builtin_preserve_field_info (, KIND)

Since unlike preserve_access_index this builtin does not actually
perform the operation in EXPR, it makes sense to enforce that EXPR must
be exactly a single field access.

---

[Changes from v1:
  - Error gracefully on variable-size or variable-offset fields if the
requested information according to KIND cannot be calculated for
an access to such a field, instead of maybe computing garbage.
  - Check for invalid expressions in EXPR and reject them rather than
ICEing later. This includes rejecting some expressions which were
accepted by previous patch but not by clang/LLVM.
  - Improve precision of location information for errors.
  - Add function-level comment describing maybe_make_core_relo ().
  - Document return values for all supported values of KIND.
  - Add tests for error conditions. ]

Add BPF __builtin_preserve_field_info. This builtin is used to extract
information to facilitate struct and union relocations performed by the
BPF loader, especially for bitfields.

The builtin has the following signature:

  unsigned int __builtin_preserve_field_info (EXPR, unsigned int KIND);

Where EXPR is an expression accessing a field of a struct or union.
Depending on KIND, different information is returned to the program. The
supported values for KIND are as follows:

  enum {
FIELD_BYTE_OFFSET = 0,
FIELD_BYTE_SIZE,
FIELD_EXISTENCE,
FIELD_SIGNEDNESS,
FIELD_LSHIFT_U64,
FIELD_RSHIFT_U64
  };

If -mco-re is in effect (explicitly or implicitly specified), a CO-RE
relocation is added for the access in EXPR recording the relevant
information according to KIND.

gcc/

* config/bpf/bpf.cc: Support __builtin_preserve_field_info.
(enum bpf_builtins): Add new builtin.
(bpf_init_builtins): Likewise.
(bpf_core_field_info): New function.
(bpf_expand_builtin): Accomodate new