Re: [PATCH 0/9] Add debug_annotate attributes

2022-11-01 Thread Yonghong Song via Gcc-patches

Hi, Jose and David,

Any progress on implement debug_annotate attribute in gcc?

Thanks,

Yonghong


On 6/15/22 3:56 PM, Yonghong Song wrote:



On 6/15/22 1:57 PM, David Faust wrote:



On 6/14/22 22:53, Yonghong Song wrote:



On 6/7/22 2:43 PM, David Faust wrote:

Hello,

This patch series adds support for:

- Two new C-language-level attributes that allow to associate (to 
"annotate" or
    to "tag") particular declarations and types with arbitrary 
strings. As
    explained below, this is intended to be used to, for example, 
characterize

    certain pointer types.

- The conveyance of that information in the DWARF output in the form 
of a new

    DIE: DW_TAG_GNU_annotation.

- The conveyance of that information in the BTF output in the form 
of two new

    kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.

All of these facilities are being added to the eBPF ecosystem, and 
support for

them exists in some form in LLVM.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify 
free-text

  tags on certain language elements, such as struct fields.

  The purpose of these annotations is to provide additional 
information about
  types, variables, and function parameters of interest to the 
kernel. A
  driving use case is to tag pointer types within the linux 
kernel and eBPF
  programs with additional semantic information, such as 
'__user' or '__rcu'.


  For example, consider the linux kernel function do_execve with 
the

  following declaration:

    static int do_execve(struct filename *filename,
   const char __user *const __user *__argv,
   const char __user *const __user *__envp);

  Here, __user could be defined with these annotations to record 
semantic
  information about the pointer parameters (e.g., they are 
user-provided) in
  DWARF and BTF information. Other kernel facilites such as the 
eBPF verifier

  can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

  The main motivation for emitting the tags in DWARF is that the 
Linux kernel
  generates its BTF information via pahole, using DWARF as a 
source:


  ++  BTF  BTF   +--+
  | pahole |---> vmlinux.btf --->| verifier |
  ++ +--+
  ^    ^
  |    |
    DWARF |    BTF |
  |    |
   vmlinux  +-+
   module1.ko   | BPF program |
   module2.ko   +-+
 ...

  This is because:

  a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

  b)  GCC can generate BTF for whatever target with -gbtf, but 
there is no

  support for linking/deduplicating BTF in the linker.

  In the scenario above, the verifier needs access to the 
pointer tags of
  both the kernel types/declarations (conveyed in the DWARF and 
translated
  to BTF by pahole) and those of the BPF program (available 
directly in BTF).


  Another motivation for having the tag information in DWARF, 
unrelated to
  BPF and BTF, is that the drgn project (another DWARF consumer) 
also wants
  to benefit from these tags in order to differentiate between 
different

  kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

  This is easy: the main purpose of having this info in BTF is 
for the
  compiled eBPF programs. The kernel verifier can then access 
the tags

  of pointers used by the eBPF programs.


For more information about these tags and the motivation behind 
them, please

refer to the following linux kernel discussions:

    https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
    https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
    https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/


Implementation Overview
===

To enable these annotations, two new C language attributes are added:
__attribute__((debug_annotate_decl("foo"))) and
__attribute__((debug_annotate_type("bar"))). Both attributes accept 
a single
arbitrary string constant argument, which will be recorded in the 
generated
DWARF and/or BTF debug information. They have no effect on code 
generation.


Note that we are not using the same attribute names as LLVM 
(btf_decl_tag and
btf_type_tag, respectively). While these attributes are functionally 
very
similar, they have grown beyond purely BTF-specific uses, so 
inclusion of "btf"

in the attribute name seems misleading.

DWARF support is enabled via a new DW_TAG_GNU_annotation. When 
generating DWARF,
declarations and types 

Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes

2022-07-15 Thread Yonghong Song via Gcc-patches




On 7/15/22 7:17 AM, Jose E. Marchesi wrote:



On 7/14/22 8:09 AM, Jose E. Marchesi wrote:

Hi Yonghong.


On 7/7/22 1:24 PM, Jose E. Marchesi wrote:

Hi Yonghong.


On 6/21/22 9:12 AM, Jose E. Marchesi wrote:



On 6/17/22 10:18 AM, Jose E. Marchesi wrote:

Hi Yonghong.


On 6/15/22 1:57 PM, David Faust wrote:


On 6/14/22 22:53, Yonghong Song wrote:



On 6/7/22 2:43 PM, David Faust wrote:

Hello,

This patch series adds support for:

- Two new C-language-level attributes that allow to associate (to "annotate" or
 to "tag") particular declarations and types with arbitrary strings. As
 explained below, this is intended to be used to, for example, 
characterize
 certain pointer types.

- The conveyance of that information in the DWARF output in the form of a new
 DIE: DW_TAG_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
 kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.

All of these facilities are being added to the eBPF ecosystem, and support for
them exists in some form in LLVM.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify free-text
   tags on certain language elements, such as struct fields.

   The purpose of these annotations is to provide additional 
information about
   types, variables, and function parameters of interest to the kernel. 
A
   driving use case is to tag pointer types within the linux kernel and 
eBPF
   programs with additional semantic information, such as '__user' or 
'__rcu'.

   For example, consider the linux kernel function do_execve with the
   following declaration:

 static int do_execve(struct filename *filename,
const char __user *const __user *__argv,
const char __user *const __user *__envp);

   Here, __user could be defined with these annotations to record 
semantic
   information about the pointer parameters (e.g., they are 
user-provided) in
   DWARF and BTF information. Other kernel facilites such as the eBPF 
verifier
   can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

   The main motivation for emitting the tags in DWARF is that the Linux 
kernel
   generates its BTF information via pahole, using DWARF as a source:

   ++  BTF  BTF   +--+
   | pahole |---> vmlinux.btf --->| verifier |
   ++ +--+
   ^^
   ||
 DWARF |BTF |
   ||
vmlinux  +-+
module1.ko   | BPF program |
module2.ko   +-+
  ...

   This is because:

   a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

   b)  GCC can generate BTF for whatever target with -gbtf, but there 
is no
   support for linking/deduplicating BTF in the linker.

   In the scenario above, the verifier needs access to the pointer tags 
of
   both the kernel types/declarations (conveyed in the DWARF and 
translated
   to BTF by pahole) and those of the BPF program (available directly 
in BTF).

   Another motivation for having the tag information in DWARF, 
unrelated to
   BPF and BTF, is that the drgn project (another DWARF consumer) also 
wants
   to benefit from these tags in order to differentiate between 
different
   kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

   This is easy: the main purpose of having this info in BTF is for the
   compiled eBPF programs. The kernel verifier can then access the tags
   of pointers used by the eBPF programs.


For more information about these tags and the motivation behind them, please
refer to the following linux kernel discussions:

 https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
 https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
 https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/


Implementation Overview
===

To enable these annotations, two new C language attributes are added:
__attribute__((debug_annotate_decl("foo"))) and
__attribute__((debug_annotate_type("bar"))). Both attributes accept a single
arbitrary string constant argument, which will be recorded in the generated
DWARF and/or BTF debug information. They have no effect on code generation.

Note that we are not using the same 

Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes

2022-07-14 Thread Yonghong Song via Gcc-patches




On 7/14/22 8:09 AM, Jose E. Marchesi wrote:


Hi Yonghong.


On 7/7/22 1:24 PM, Jose E. Marchesi wrote:

Hi Yonghong.


On 6/21/22 9:12 AM, Jose E. Marchesi wrote:



On 6/17/22 10:18 AM, Jose E. Marchesi wrote:

Hi Yonghong.


On 6/15/22 1:57 PM, David Faust wrote:


On 6/14/22 22:53, Yonghong Song wrote:



On 6/7/22 2:43 PM, David Faust wrote:

Hello,

This patch series adds support for:

- Two new C-language-level attributes that allow to associate (to "annotate" or
to "tag") particular declarations and types with arbitrary strings. As
explained below, this is intended to be used to, for example, 
characterize
certain pointer types.

- The conveyance of that information in the DWARF output in the form of a new
DIE: DW_TAG_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.

All of these facilities are being added to the eBPF ecosystem, and support for
them exists in some form in LLVM.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify free-text
  tags on certain language elements, such as struct fields.

  The purpose of these annotations is to provide additional information 
about
  types, variables, and function parameters of interest to the kernel. A
  driving use case is to tag pointer types within the linux kernel and 
eBPF
  programs with additional semantic information, such as '__user' or 
'__rcu'.

  For example, consider the linux kernel function do_execve with the
  following declaration:

static int do_execve(struct filename *filename,
   const char __user *const __user *__argv,
   const char __user *const __user *__envp);

  Here, __user could be defined with these annotations to record 
semantic
  information about the pointer parameters (e.g., they are 
user-provided) in
  DWARF and BTF information. Other kernel facilites such as the eBPF 
verifier
  can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

  The main motivation for emitting the tags in DWARF is that the Linux 
kernel
  generates its BTF information via pahole, using DWARF as a source:

  ++  BTF  BTF   +--+
  | pahole |---> vmlinux.btf --->| verifier |
  ++ +--+
  ^^
  ||
DWARF |BTF |
  ||
   vmlinux  +-+
   module1.ko   | BPF program |
   module2.ko   +-+
 ...

  This is because:

  a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

  b)  GCC can generate BTF for whatever target with -gbtf, but there is 
no
  support for linking/deduplicating BTF in the linker.

  In the scenario above, the verifier needs access to the pointer tags 
of
  both the kernel types/declarations (conveyed in the DWARF and 
translated
  to BTF by pahole) and those of the BPF program (available directly in 
BTF).

  Another motivation for having the tag information in DWARF, unrelated 
to
  BPF and BTF, is that the drgn project (another DWARF consumer) also 
wants
  to benefit from these tags in order to differentiate between different
  kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

  This is easy: the main purpose of having this info in BTF is for the
  compiled eBPF programs. The kernel verifier can then access the tags
  of pointers used by the eBPF programs.


For more information about these tags and the motivation behind them, please
refer to the following linux kernel discussions:

https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/


Implementation Overview
===

To enable these annotations, two new C language attributes are added:
__attribute__((debug_annotate_decl("foo"))) and
__attribute__((debug_annotate_type("bar"))). Both attributes accept a single
arbitrary string constant argument, which will be recorded in the generated
DWARF and/or BTF debug information. They have no effect on code generation.

Note that we are not using the same attribute names as LLVM (btf_decl_tag and
btf_type_tag, respectively). While these attributes 

Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes

2022-07-12 Thread Yonghong Song via Gcc-patches




On 7/7/22 1:24 PM, Jose E. Marchesi wrote:


Hi Yonghong.


On 6/21/22 9:12 AM, Jose E. Marchesi wrote:



On 6/17/22 10:18 AM, Jose E. Marchesi wrote:

Hi Yonghong.


On 6/15/22 1:57 PM, David Faust wrote:


On 6/14/22 22:53, Yonghong Song wrote:



On 6/7/22 2:43 PM, David Faust wrote:

Hello,

This patch series adds support for:

- Two new C-language-level attributes that allow to associate (to "annotate" or
   to "tag") particular declarations and types with arbitrary strings. As
   explained below, this is intended to be used to, for example, 
characterize
   certain pointer types.

- The conveyance of that information in the DWARF output in the form of a new
   DIE: DW_TAG_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
   kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.

All of these facilities are being added to the eBPF ecosystem, and support for
them exists in some form in LLVM.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify free-text
 tags on certain language elements, such as struct fields.

 The purpose of these annotations is to provide additional information 
about
 types, variables, and function parameters of interest to the kernel. A
 driving use case is to tag pointer types within the linux kernel and 
eBPF
 programs with additional semantic information, such as '__user' or 
'__rcu'.

 For example, consider the linux kernel function do_execve with the
 following declaration:

   static int do_execve(struct filename *filename,
  const char __user *const __user *__argv,
  const char __user *const __user *__envp);

 Here, __user could be defined with these annotations to record semantic
 information about the pointer parameters (e.g., they are 
user-provided) in
 DWARF and BTF information. Other kernel facilites such as the eBPF 
verifier
 can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

 The main motivation for emitting the tags in DWARF is that the Linux 
kernel
 generates its BTF information via pahole, using DWARF as a source:

 ++  BTF  BTF   +--+
 | pahole |---> vmlinux.btf --->| verifier |
 ++ +--+
 ^^
 ||
   DWARF |BTF |
 ||
  vmlinux  +-+
  module1.ko   | BPF program |
  module2.ko   +-+
...

 This is because:

 a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

 b)  GCC can generate BTF for whatever target with -gbtf, but there is 
no
 support for linking/deduplicating BTF in the linker.

 In the scenario above, the verifier needs access to the pointer tags of
 both the kernel types/declarations (conveyed in the DWARF and 
translated
 to BTF by pahole) and those of the BPF program (available directly in 
BTF).

 Another motivation for having the tag information in DWARF, unrelated 
to
 BPF and BTF, is that the drgn project (another DWARF consumer) also 
wants
 to benefit from these tags in order to differentiate between different
 kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

 This is easy: the main purpose of having this info in BTF is for the
 compiled eBPF programs. The kernel verifier can then access the tags
 of pointers used by the eBPF programs.


For more information about these tags and the motivation behind them, please
refer to the following linux kernel discussions:

   https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
   https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
   https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/


Implementation Overview
===

To enable these annotations, two new C language attributes are added:
__attribute__((debug_annotate_decl("foo"))) and
__attribute__((debug_annotate_type("bar"))). Both attributes accept a single
arbitrary string constant argument, which will be recorded in the generated
DWARF and/or BTF debug information. They have no effect on code generation.

Note that we are not using the same attribute names as LLVM (btf_decl_tag and
btf_type_tag, respectively). While these attributes are functionally very
similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf"
in the 

Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes

2022-06-24 Thread Yonghong Song via Gcc-patches




On 6/21/22 9:12 AM, Jose E. Marchesi wrote:



On 6/17/22 10:18 AM, Jose E. Marchesi wrote:

Hi Yonghong.


On 6/15/22 1:57 PM, David Faust wrote:


On 6/14/22 22:53, Yonghong Song wrote:



On 6/7/22 2:43 PM, David Faust wrote:

Hello,

This patch series adds support for:

- Two new C-language-level attributes that allow to associate (to "annotate" or
  to "tag") particular declarations and types with arbitrary strings. As
  explained below, this is intended to be used to, for example, characterize
  certain pointer types.

- The conveyance of that information in the DWARF output in the form of a new
  DIE: DW_TAG_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
  kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.

All of these facilities are being added to the eBPF ecosystem, and support for
them exists in some form in LLVM.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify free-text
tags on certain language elements, such as struct fields.

The purpose of these annotations is to provide additional information 
about
types, variables, and function parameters of interest to the kernel. A
driving use case is to tag pointer types within the linux kernel and 
eBPF
programs with additional semantic information, such as '__user' or 
'__rcu'.

For example, consider the linux kernel function do_execve with the
following declaration:

  static int do_execve(struct filename *filename,
 const char __user *const __user *__argv,
 const char __user *const __user *__envp);

Here, __user could be defined with these annotations to record semantic
information about the pointer parameters (e.g., they are user-provided) 
in
DWARF and BTF information. Other kernel facilites such as the eBPF 
verifier
can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

The main motivation for emitting the tags in DWARF is that the Linux 
kernel
generates its BTF information via pahole, using DWARF as a source:

++  BTF  BTF   +--+
| pahole |---> vmlinux.btf --->| verifier |
++ +--+
^^
||
  DWARF |BTF |
||
 vmlinux  +-+
 module1.ko   | BPF program |
 module2.ko   +-+
   ...

This is because:

a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

b)  GCC can generate BTF for whatever target with -gbtf, but there is no
support for linking/deduplicating BTF in the linker.

In the scenario above, the verifier needs access to the pointer tags of
both the kernel types/declarations (conveyed in the DWARF and translated
to BTF by pahole) and those of the BPF program (available directly in 
BTF).

Another motivation for having the tag information in DWARF, unrelated to
BPF and BTF, is that the drgn project (another DWARF consumer) also 
wants
to benefit from these tags in order to differentiate between different
kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

This is easy: the main purpose of having this info in BTF is for the
compiled eBPF programs. The kernel verifier can then access the tags
of pointers used by the eBPF programs.


For more information about these tags and the motivation behind them, please
refer to the following linux kernel discussions:

  https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
  https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
  https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/


Implementation Overview
===

To enable these annotations, two new C language attributes are added:
__attribute__((debug_annotate_decl("foo"))) and
__attribute__((debug_annotate_type("bar"))). Both attributes accept a single
arbitrary string constant argument, which will be recorded in the generated
DWARF and/or BTF debug information. They have no effect on code generation.

Note that we are not using the same attribute names as LLVM (btf_decl_tag and
btf_type_tag, respectively). While these attributes are functionally very
similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf"
in the attribute name seems misleading.

DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF,

Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes

2022-06-20 Thread Yonghong Song via Gcc-patches




On 6/17/22 10:18 AM, Jose E. Marchesi wrote:


Hi Yonghong.


On 6/15/22 1:57 PM, David Faust wrote:


On 6/14/22 22:53, Yonghong Song wrote:



On 6/7/22 2:43 PM, David Faust wrote:

Hello,

This patch series adds support for:

- Two new C-language-level attributes that allow to associate (to "annotate" or
 to "tag") particular declarations and types with arbitrary strings. As
 explained below, this is intended to be used to, for example, characterize
 certain pointer types.

- The conveyance of that information in the DWARF output in the form of a new
 DIE: DW_TAG_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
 kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.

All of these facilities are being added to the eBPF ecosystem, and support for
them exists in some form in LLVM.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify free-text
   tags on certain language elements, such as struct fields.

   The purpose of these annotations is to provide additional information 
about
   types, variables, and function parameters of interest to the kernel. A
   driving use case is to tag pointer types within the linux kernel and eBPF
   programs with additional semantic information, such as '__user' or 
'__rcu'.

   For example, consider the linux kernel function do_execve with the
   following declaration:

 static int do_execve(struct filename *filename,
const char __user *const __user *__argv,
const char __user *const __user *__envp);

   Here, __user could be defined with these annotations to record semantic
   information about the pointer parameters (e.g., they are user-provided) 
in
   DWARF and BTF information. Other kernel facilites such as the eBPF 
verifier
   can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

   The main motivation for emitting the tags in DWARF is that the Linux 
kernel
   generates its BTF information via pahole, using DWARF as a source:

   ++  BTF  BTF   +--+
   | pahole |---> vmlinux.btf --->| verifier |
   ++ +--+
   ^^
   ||
 DWARF |BTF |
   ||
vmlinux  +-+
module1.ko   | BPF program |
module2.ko   +-+
  ...

   This is because:

   a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

   b)  GCC can generate BTF for whatever target with -gbtf, but there is no
   support for linking/deduplicating BTF in the linker.

   In the scenario above, the verifier needs access to the pointer tags of
   both the kernel types/declarations (conveyed in the DWARF and translated
   to BTF by pahole) and those of the BPF program (available directly in 
BTF).

   Another motivation for having the tag information in DWARF, unrelated to
   BPF and BTF, is that the drgn project (another DWARF consumer) also wants
   to benefit from these tags in order to differentiate between different
   kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

   This is easy: the main purpose of having this info in BTF is for the
   compiled eBPF programs. The kernel verifier can then access the tags
   of pointers used by the eBPF programs.


For more information about these tags and the motivation behind them, please
refer to the following linux kernel discussions:

 https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
 https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
 https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/


Implementation Overview
===

To enable these annotations, two new C language attributes are added:
__attribute__((debug_annotate_decl("foo"))) and
__attribute__((debug_annotate_type("bar"))). Both attributes accept a single
arbitrary string constant argument, which will be recorded in the generated
DWARF and/or BTF debug information. They have no effect on code generation.

Note that we are not using the same attribute names as LLVM (btf_decl_tag and
btf_type_tag, respectively). While these attributes are functionally very
similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf"
in the attribute name seems misleading.

DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF,
declarations and types will be checked for the corresponding attributes. If
present, a 

Re: [PATCH 0/9] Add debug_annotate attributes

2022-06-15 Thread Yonghong Song via Gcc-patches




On 6/15/22 1:57 PM, David Faust wrote:



On 6/14/22 22:53, Yonghong Song wrote:



On 6/7/22 2:43 PM, David Faust wrote:

Hello,

This patch series adds support for:

- Two new C-language-level attributes that allow to associate (to "annotate" or
to "tag") particular declarations and types with arbitrary strings. As
explained below, this is intended to be used to, for example, characterize
certain pointer types.

- The conveyance of that information in the DWARF output in the form of a new
DIE: DW_TAG_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.

All of these facilities are being added to the eBPF ecosystem, and support for
them exists in some form in LLVM.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify free-text
  tags on certain language elements, such as struct fields.

  The purpose of these annotations is to provide additional information 
about
  types, variables, and function parameters of interest to the kernel. A
  driving use case is to tag pointer types within the linux kernel and eBPF
  programs with additional semantic information, such as '__user' or 
'__rcu'.

  For example, consider the linux kernel function do_execve with the
  following declaration:

static int do_execve(struct filename *filename,
   const char __user *const __user *__argv,
   const char __user *const __user *__envp);

  Here, __user could be defined with these annotations to record semantic
  information about the pointer parameters (e.g., they are user-provided) in
  DWARF and BTF information. Other kernel facilites such as the eBPF 
verifier
  can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

  The main motivation for emitting the tags in DWARF is that the Linux 
kernel
  generates its BTF information via pahole, using DWARF as a source:

  ++  BTF  BTF   +--+
  | pahole |---> vmlinux.btf --->| verifier |
  ++ +--+
  ^^
  ||
DWARF |BTF |
  ||
   vmlinux  +-+
   module1.ko   | BPF program |
   module2.ko   +-+
 ...

  This is because:

  a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

  b)  GCC can generate BTF for whatever target with -gbtf, but there is no
  support for linking/deduplicating BTF in the linker.

  In the scenario above, the verifier needs access to the pointer tags of
  both the kernel types/declarations (conveyed in the DWARF and translated
  to BTF by pahole) and those of the BPF program (available directly in 
BTF).

  Another motivation for having the tag information in DWARF, unrelated to
  BPF and BTF, is that the drgn project (another DWARF consumer) also wants
  to benefit from these tags in order to differentiate between different
  kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

  This is easy: the main purpose of having this info in BTF is for the
  compiled eBPF programs. The kernel verifier can then access the tags
  of pointers used by the eBPF programs.


For more information about these tags and the motivation behind them, please
refer to the following linux kernel discussions:

https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/


Implementation Overview
===

To enable these annotations, two new C language attributes are added:
__attribute__((debug_annotate_decl("foo"))) and
__attribute__((debug_annotate_type("bar"))). Both attributes accept a single
arbitrary string constant argument, which will be recorded in the generated
DWARF and/or BTF debug information. They have no effect on code generation.

Note that we are not using the same attribute names as LLVM (btf_decl_tag and
btf_type_tag, respectively). While these attributes are functionally very
similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf"
in the attribute name seems misleading.

DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF,
declarations and types will be checked for the corresponding attributes. If
present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for
the annotated type or declaration, one for each tag. 

Re: [PATCH 0/9] Add debug_annotate attributes

2022-06-14 Thread Yonghong Song via Gcc-patches




On 6/7/22 2:43 PM, David Faust wrote:

Hello,

This patch series adds support for:

- Two new C-language-level attributes that allow to associate (to "annotate" or
   to "tag") particular declarations and types with arbitrary strings. As
   explained below, this is intended to be used to, for example, characterize
   certain pointer types.

- The conveyance of that information in the DWARF output in the form of a new
   DIE: DW_TAG_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
   kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.

All of these facilities are being added to the eBPF ecosystem, and support for
them exists in some form in LLVM.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify free-text
 tags on certain language elements, such as struct fields.

 The purpose of these annotations is to provide additional information about
 types, variables, and function parameters of interest to the kernel. A
 driving use case is to tag pointer types within the linux kernel and eBPF
 programs with additional semantic information, such as '__user' or '__rcu'.

 For example, consider the linux kernel function do_execve with the
 following declaration:

   static int do_execve(struct filename *filename,
  const char __user *const __user *__argv,
  const char __user *const __user *__envp);

 Here, __user could be defined with these annotations to record semantic
 information about the pointer parameters (e.g., they are user-provided) in
 DWARF and BTF information. Other kernel facilites such as the eBPF verifier
 can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

 The main motivation for emitting the tags in DWARF is that the Linux kernel
 generates its BTF information via pahole, using DWARF as a source:

 ++  BTF  BTF   +--+
 | pahole |---> vmlinux.btf --->| verifier |
 ++ +--+
 ^^
 ||
   DWARF |BTF |
 ||
  vmlinux  +-+
  module1.ko   | BPF program |
  module2.ko   +-+
...

 This is because:

 a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

 b)  GCC can generate BTF for whatever target with -gbtf, but there is no
 support for linking/deduplicating BTF in the linker.

 In the scenario above, the verifier needs access to the pointer tags of
 both the kernel types/declarations (conveyed in the DWARF and translated
 to BTF by pahole) and those of the BPF program (available directly in BTF).

 Another motivation for having the tag information in DWARF, unrelated to
 BPF and BTF, is that the drgn project (another DWARF consumer) also wants
 to benefit from these tags in order to differentiate between different
 kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

 This is easy: the main purpose of having this info in BTF is for the
 compiled eBPF programs. The kernel verifier can then access the tags
 of pointers used by the eBPF programs.


For more information about these tags and the motivation behind them, please
refer to the following linux kernel discussions:

   https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
   https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
   https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/


Implementation Overview
===

To enable these annotations, two new C language attributes are added:
__attribute__((debug_annotate_decl("foo"))) and
__attribute__((debug_annotate_type("bar"))). Both attributes accept a single
arbitrary string constant argument, which will be recorded in the generated
DWARF and/or BTF debug information. They have no effect on code generation.

Note that we are not using the same attribute names as LLVM (btf_decl_tag and
btf_type_tag, respectively). While these attributes are functionally very
similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf"
in the attribute name seems misleading.

DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF,
declarations and types will be checked for the corresponding attributes. If
present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for
the annotated type or declaration, one for each tag. These DIEs link the
arbitrary tag value to the item they annotate.

For example, the following variable declaration:

   #define __typetag1 

Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations

2022-06-02 Thread Yonghong Song via Gcc-patches




On 5/27/22 12:56 PM, David Faust wrote:



On 5/26/22 00:29, Yonghong Song wrote:



On 5/24/22 10:04 AM, David Faust wrote:



On 5/24/22 09:03, Yonghong Song wrote:



On 5/24/22 8:53 AM, David Faust wrote:



On 5/24/22 04:07, Jose E. Marchesi wrote:



On 5/11/22 11:44 AM, David Faust wrote:


On 5/10/22 22:05, Yonghong Song wrote:



On 5/10/22 8:43 PM, Yonghong Song wrote:



On 5/6/22 2:18 PM, David Faust wrote:



On 5/5/22 16:00, Yonghong Song wrote:



On 5/4/22 10:03 AM, David Faust wrote:



On 5/3/22 15:32, Joseph Myers wrote:

On Mon, 2 May 2022, David Faust via Gcc-patches wrote:


Consider the following example:

     #define __typetag1 __attribute__((btf_type_tag("tag1")))
     #define __typetag2 __attribute__((btf_type_tag("tag2")))
     #define __typetag3 __attribute__((btf_type_tag("tag3")))

     int __typetag1 * __typetag2 __typetag3 * g;

The expected behavior is that 'g' is "a pointer with tags
'tag2' and
'tag3',
to a pointer with tag 'tag1' to an int". i.e.:


That's not a correct expectation for either GNU __attribute__ or
C2x [[]]
attribute syntax.  In either syntax, __typetag2 __typetag3 should
apply to
the type to which g points, not to g or its type, just as if
you had a
type qualifier there.  You'd need to put the attributes (or
qualifier)
after the *, not before, to make them apply to the pointer
type.  See
"Attribute Syntax" in the GCC manual for how the syntax is
defined for
GNU
attributes and deduce in turn, for each subsequence of the tokens
matching
the syntax for some kind of declarator, what the type for "T D1"
would be
as defined there and in the C standard, as deduced from the type for
"T D"
for a sub-declarator D.
      >> But GCC's attribute parsing produces a variable 'g'
which is "a

pointer with

tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an
int", i.e.


In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
syntax it applies to int.  Again, if you wanted it to apply to the
pointer
type it would need to go after the * not before.

If you are concerned with the fine details of what construct an
attribute
appertains to, I recommend using C2x syntax not GNU syntax.



Joseph, thank you! This is very helpful. My understanding of
the syntax
was not correct.

(Actually, I made a bad mistake in paraphrasing this example from the
discussion of it in the series cover letter. But, the reason
why it is
incorrect is the same.)


Yonghong, is the specific ordering an expectation in BPF programs or
other users of the tags?


This is probably a language writing issue. We are saying tags only
apply to pointer. We probably should say it only apply to pointee.

$ cat t.c
int const *ptr;

the llvm ir debuginfo:

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
!6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)

We could replace 'const' with a tag like below:

int __attribute__((btf_type_tag("tag"))) *ptr;

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
annotations: !7)
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!7 = !{!8}
!8 = !{!"btf_type_tag", !"tag"}

In the above IR, we generate annotations to pointer_type because
we didn't invent a new DI type for encode btf_type_tag. But it is
totally okay to have IR looks like

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
!11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)


OK, thanks.

There is still the question of why the DWARF generated for this case
that I have been concerned about:

       int __typetag1 * __typetag2 __typetag3 * g;

differs between GCC (with this series) and clang. After studying it,
GCC is doing with the attributes exactly as is described in the
Attribute Syntax portion of the GCC manual where the GNU syntax is
described. I do not think there is any problem here.

So the difference in DWARF suggests to me that clang is not handling
the GNU attribute syntax in this particular case correctly, since it
seems to be associating __typetag2 and __typetag3 to g's type rather
than the type to which it points.

I am not sure whether for the use purposes of the tags this difference
is very important, but it is worth noting.


As Joseph suggested, it may be better to encourage users of these tags
to use the C2x attribute syntax if they are concerned with precisely
which construct the tag applies.

This would also be a way around any issues in handling the attributes
due to the GNU syntax.

I tried a few test cases using C2x syntax BTF type tags with a
clang-15 build, but ran into some issues (in particular, some of the
tag attributes being ignored altogether). I couldn't find confirmation
whether C2x attribute syntax is fully supported in clang yet, so maybe
this isn't expected to work. Do you know whether the C2x syntax is
fully supported in 

Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations

2022-05-26 Thread Yonghong Song via Gcc-patches




On 5/24/22 10:04 AM, David Faust wrote:



On 5/24/22 09:03, Yonghong Song wrote:



On 5/24/22 8:53 AM, David Faust wrote:



On 5/24/22 04:07, Jose E. Marchesi wrote:



On 5/11/22 11:44 AM, David Faust wrote:


On 5/10/22 22:05, Yonghong Song wrote:



On 5/10/22 8:43 PM, Yonghong Song wrote:



On 5/6/22 2:18 PM, David Faust wrote:



On 5/5/22 16:00, Yonghong Song wrote:



On 5/4/22 10:03 AM, David Faust wrote:



On 5/3/22 15:32, Joseph Myers wrote:

On Mon, 2 May 2022, David Faust via Gcc-patches wrote:


Consider the following example:

    #define __typetag1 __attribute__((btf_type_tag("tag1")))
    #define __typetag2 __attribute__((btf_type_tag("tag2")))
    #define __typetag3 __attribute__((btf_type_tag("tag3")))

    int __typetag1 * __typetag2 __typetag3 * g;

The expected behavior is that 'g' is "a pointer with tags
'tag2' and
'tag3',
to a pointer with tag 'tag1' to an int". i.e.:


That's not a correct expectation for either GNU __attribute__ or
C2x [[]]
attribute syntax.  In either syntax, __typetag2 __typetag3 should
apply to
the type to which g points, not to g or its type, just as if
you had a
type qualifier there.  You'd need to put the attributes (or
qualifier)
after the *, not before, to make them apply to the pointer
type.  See
"Attribute Syntax" in the GCC manual for how the syntax is
defined for
GNU
attributes and deduce in turn, for each subsequence of the tokens
matching
the syntax for some kind of declarator, what the type for "T D1"
would be
as defined there and in the C standard, as deduced from the type for
"T D"
for a sub-declarator D.
     >> But GCC's attribute parsing produces a variable 'g'
which is "a

pointer with

tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an
int", i.e.


In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
syntax it applies to int.  Again, if you wanted it to apply to the
pointer
type it would need to go after the * not before.

If you are concerned with the fine details of what construct an
attribute
appertains to, I recommend using C2x syntax not GNU syntax.



Joseph, thank you! This is very helpful. My understanding of
the syntax
was not correct.

(Actually, I made a bad mistake in paraphrasing this example from the
discussion of it in the series cover letter. But, the reason
why it is
incorrect is the same.)


Yonghong, is the specific ordering an expectation in BPF programs or
other users of the tags?


This is probably a language writing issue. We are saying tags only
apply to pointer. We probably should say it only apply to pointee.

$ cat t.c
int const *ptr;

the llvm ir debuginfo:

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
!6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)

We could replace 'const' with a tag like below:

int __attribute__((btf_type_tag("tag"))) *ptr;

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
annotations: !7)
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!7 = !{!8}
!8 = !{!"btf_type_tag", !"tag"}

In the above IR, we generate annotations to pointer_type because
we didn't invent a new DI type for encode btf_type_tag. But it is
totally okay to have IR looks like

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
!11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)


OK, thanks.

There is still the question of why the DWARF generated for this case
that I have been concerned about:

      int __typetag1 * __typetag2 __typetag3 * g;

differs between GCC (with this series) and clang. After studying it,
GCC is doing with the attributes exactly as is described in the
Attribute Syntax portion of the GCC manual where the GNU syntax is
described. I do not think there is any problem here.

So the difference in DWARF suggests to me that clang is not handling
the GNU attribute syntax in this particular case correctly, since it
seems to be associating __typetag2 and __typetag3 to g's type rather
than the type to which it points.

I am not sure whether for the use purposes of the tags this difference
is very important, but it is worth noting.


As Joseph suggested, it may be better to encourage users of these tags
to use the C2x attribute syntax if they are concerned with precisely
which construct the tag applies.

This would also be a way around any issues in handling the attributes
due to the GNU syntax.

I tried a few test cases using C2x syntax BTF type tags with a
clang-15 build, but ran into some issues (in particular, some of the
tag attributes being ignored altogether). I couldn't find confirmation
whether C2x attribute syntax is fully supported in clang yet, so maybe
this isn't expected to work. Do you know whether the C2x syntax is
fully supported in clang yet?


Actually, I don't know either. But since the btf decl_tag and type_tag
are also 

Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations

2022-05-24 Thread Yonghong Song via Gcc-patches




On 5/24/22 8:53 AM, David Faust wrote:



On 5/24/22 04:07, Jose E. Marchesi wrote:



On 5/11/22 11:44 AM, David Faust wrote:


On 5/10/22 22:05, Yonghong Song wrote:



On 5/10/22 8:43 PM, Yonghong Song wrote:



On 5/6/22 2:18 PM, David Faust wrote:



On 5/5/22 16:00, Yonghong Song wrote:



On 5/4/22 10:03 AM, David Faust wrote:



On 5/3/22 15:32, Joseph Myers wrote:

On Mon, 2 May 2022, David Faust via Gcc-patches wrote:


Consider the following example:

   #define __typetag1 __attribute__((btf_type_tag("tag1")))
   #define __typetag2 __attribute__((btf_type_tag("tag2")))
   #define __typetag3 __attribute__((btf_type_tag("tag3")))

   int __typetag1 * __typetag2 __typetag3 * g;

The expected behavior is that 'g' is "a pointer with tags
'tag2' and
'tag3',
to a pointer with tag 'tag1' to an int". i.e.:


That's not a correct expectation for either GNU __attribute__ or
C2x [[]]
attribute syntax.  In either syntax, __typetag2 __typetag3 should
apply to
the type to which g points, not to g or its type, just as if
you had a
type qualifier there.  You'd need to put the attributes (or
qualifier)
after the *, not before, to make them apply to the pointer
type.  See
"Attribute Syntax" in the GCC manual for how the syntax is
defined for
GNU
attributes and deduce in turn, for each subsequence of the tokens
matching
the syntax for some kind of declarator, what the type for "T D1"
would be
as defined there and in the C standard, as deduced from the type for
"T D"
for a sub-declarator D.
    >> But GCC's attribute parsing produces a variable 'g'
which is "a

pointer with

tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an
int", i.e.


In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
syntax it applies to int.  Again, if you wanted it to apply to the
pointer
type it would need to go after the * not before.

If you are concerned with the fine details of what construct an
attribute
appertains to, I recommend using C2x syntax not GNU syntax.



Joseph, thank you! This is very helpful. My understanding of
the syntax
was not correct.

(Actually, I made a bad mistake in paraphrasing this example from the
discussion of it in the series cover letter. But, the reason
why it is
incorrect is the same.)


Yonghong, is the specific ordering an expectation in BPF programs or
other users of the tags?


This is probably a language writing issue. We are saying tags only
apply to pointer. We probably should say it only apply to pointee.

$ cat t.c
int const *ptr;

the llvm ir debuginfo:

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
!6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)

We could replace 'const' with a tag like below:

int __attribute__((btf_type_tag("tag"))) *ptr;

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
annotations: !7)
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!7 = !{!8}
!8 = !{!"btf_type_tag", !"tag"}

In the above IR, we generate annotations to pointer_type because
we didn't invent a new DI type for encode btf_type_tag. But it is
totally okay to have IR looks like

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
!11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)


OK, thanks.

There is still the question of why the DWARF generated for this case
that I have been concerned about:

     int __typetag1 * __typetag2 __typetag3 * g;

differs between GCC (with this series) and clang. After studying it,
GCC is doing with the attributes exactly as is described in the
Attribute Syntax portion of the GCC manual where the GNU syntax is
described. I do not think there is any problem here.

So the difference in DWARF suggests to me that clang is not handling
the GNU attribute syntax in this particular case correctly, since it
seems to be associating __typetag2 and __typetag3 to g's type rather
than the type to which it points.

I am not sure whether for the use purposes of the tags this difference
is very important, but it is worth noting.


As Joseph suggested, it may be better to encourage users of these tags
to use the C2x attribute syntax if they are concerned with precisely
which construct the tag applies.

This would also be a way around any issues in handling the attributes
due to the GNU syntax.

I tried a few test cases using C2x syntax BTF type tags with a
clang-15 build, but ran into some issues (in particular, some of the
tag attributes being ignored altogether). I couldn't find confirmation
whether C2x attribute syntax is fully supported in clang yet, so maybe
this isn't expected to work. Do you know whether the C2x syntax is
fully supported in clang yet?


Actually, I don't know either. But since the btf decl_tag and type_tag
are also used to compile linux kernel and the minimum compiler version
to compile kernel is gcc5.1 and 

Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations

2022-05-24 Thread Yonghong Song via Gcc-patches




On 5/24/22 4:07 AM, Jose E. Marchesi wrote:



On 5/11/22 11:44 AM, David Faust wrote:


On 5/10/22 22:05, Yonghong Song wrote:



On 5/10/22 8:43 PM, Yonghong Song wrote:



On 5/6/22 2:18 PM, David Faust wrote:



On 5/5/22 16:00, Yonghong Song wrote:



On 5/4/22 10:03 AM, David Faust wrote:



On 5/3/22 15:32, Joseph Myers wrote:

On Mon, 2 May 2022, David Faust via Gcc-patches wrote:


Consider the following example:

   #define __typetag1 __attribute__((btf_type_tag("tag1")))
   #define __typetag2 __attribute__((btf_type_tag("tag2")))
   #define __typetag3 __attribute__((btf_type_tag("tag3")))

   int __typetag1 * __typetag2 __typetag3 * g;

The expected behavior is that 'g' is "a pointer with tags
'tag2' and
'tag3',
to a pointer with tag 'tag1' to an int". i.e.:


That's not a correct expectation for either GNU __attribute__ or
C2x [[]]
attribute syntax.  In either syntax, __typetag2 __typetag3 should
apply to
the type to which g points, not to g or its type, just as if
you had a
type qualifier there.  You'd need to put the attributes (or
qualifier)
after the *, not before, to make them apply to the pointer
type.  See
"Attribute Syntax" in the GCC manual for how the syntax is
defined for
GNU
attributes and deduce in turn, for each subsequence of the tokens
matching
the syntax for some kind of declarator, what the type for "T D1"
would be
as defined there and in the C standard, as deduced from the type for
"T D"
for a sub-declarator D.
    >> But GCC's attribute parsing produces a variable 'g'
which is "a

pointer with

tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an
int", i.e.


In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
syntax it applies to int.  Again, if you wanted it to apply to the
pointer
type it would need to go after the * not before.

If you are concerned with the fine details of what construct an
attribute
appertains to, I recommend using C2x syntax not GNU syntax.



Joseph, thank you! This is very helpful. My understanding of
the syntax
was not correct.

(Actually, I made a bad mistake in paraphrasing this example from the
discussion of it in the series cover letter. But, the reason
why it is
incorrect is the same.)


Yonghong, is the specific ordering an expectation in BPF programs or
other users of the tags?


This is probably a language writing issue. We are saying tags only
apply to pointer. We probably should say it only apply to pointee.

$ cat t.c
int const *ptr;

the llvm ir debuginfo:

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
!6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)

We could replace 'const' with a tag like below:

int __attribute__((btf_type_tag("tag"))) *ptr;

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
annotations: !7)
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!7 = !{!8}
!8 = !{!"btf_type_tag", !"tag"}

In the above IR, we generate annotations to pointer_type because
we didn't invent a new DI type for encode btf_type_tag. But it is
totally okay to have IR looks like

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
!11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)


OK, thanks.

There is still the question of why the DWARF generated for this case
that I have been concerned about:

     int __typetag1 * __typetag2 __typetag3 * g;

differs between GCC (with this series) and clang. After studying it,
GCC is doing with the attributes exactly as is described in the
Attribute Syntax portion of the GCC manual where the GNU syntax is
described. I do not think there is any problem here.

So the difference in DWARF suggests to me that clang is not handling
the GNU attribute syntax in this particular case correctly, since it
seems to be associating __typetag2 and __typetag3 to g's type rather
than the type to which it points.

I am not sure whether for the use purposes of the tags this difference
is very important, but it is worth noting.


As Joseph suggested, it may be better to encourage users of these tags
to use the C2x attribute syntax if they are concerned with precisely
which construct the tag applies.

This would also be a way around any issues in handling the attributes
due to the GNU syntax.

I tried a few test cases using C2x syntax BTF type tags with a
clang-15 build, but ran into some issues (in particular, some of the
tag attributes being ignored altogether). I couldn't find confirmation
whether C2x attribute syntax is fully supported in clang yet, so maybe
this isn't expected to work. Do you know whether the C2x syntax is
fully supported in clang yet?


Actually, I don't know either. But since the btf decl_tag and type_tag
are also used to compile linux kernel and the minimum compiler version
to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1

Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations

2022-05-24 Thread Yonghong Song via Gcc-patches




On 5/11/22 11:44 AM, David Faust wrote:



On 5/10/22 22:05, Yonghong Song wrote:



On 5/10/22 8:43 PM, Yonghong Song wrote:



On 5/6/22 2:18 PM, David Faust wrote:



On 5/5/22 16:00, Yonghong Song wrote:



On 5/4/22 10:03 AM, David Faust wrote:



On 5/3/22 15:32, Joseph Myers wrote:

On Mon, 2 May 2022, David Faust via Gcc-patches wrote:


Consider the following example:

  #define __typetag1 __attribute__((btf_type_tag("tag1")))
  #define __typetag2 __attribute__((btf_type_tag("tag2")))
  #define __typetag3 __attribute__((btf_type_tag("tag3")))

  int __typetag1 * __typetag2 __typetag3 * g;

The expected behavior is that 'g' is "a pointer with tags 'tag2' 
and

'tag3',
to a pointer with tag 'tag1' to an int". i.e.:


That's not a correct expectation for either GNU __attribute__ or
C2x [[]]
attribute syntax.  In either syntax, __typetag2 __typetag3 should
apply to
the type to which g points, not to g or its type, just as if you 
had a
type qualifier there.  You'd need to put the attributes (or 
qualifier)
after the *, not before, to make them apply to the pointer type.  
See
"Attribute Syntax" in the GCC manual for how the syntax is 
defined for

GNU
attributes and deduce in turn, for each subsequence of the tokens
matching
the syntax for some kind of declarator, what the type for "T D1"
would be
as defined there and in the C standard, as deduced from the type for
"T D"
for a sub-declarator D.
   >> But GCC's attribute parsing produces a variable 'g' which 
is "a

pointer with
tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", 
i.e.


In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
syntax it applies to int.  Again, if you wanted it to apply to the
pointer
type it would need to go after the * not before.

If you are concerned with the fine details of what construct an
attribute
appertains to, I recommend using C2x syntax not GNU syntax.



Joseph, thank you! This is very helpful. My understanding of the 
syntax

was not correct.

(Actually, I made a bad mistake in paraphrasing this example from the
discussion of it in the series cover letter. But, the reason why 
it is

incorrect is the same.)


Yonghong, is the specific ordering an expectation in BPF programs or
other users of the tags?


This is probably a language writing issue. We are saying tags only
apply to pointer. We probably should say it only apply to pointee.

$ cat t.c
int const *ptr;

the llvm ir debuginfo:

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
!6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)

We could replace 'const' with a tag like below:

int __attribute__((btf_type_tag("tag"))) *ptr;

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
annotations: !7)
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!7 = !{!8}
!8 = !{!"btf_type_tag", !"tag"}

In the above IR, we generate annotations to pointer_type because
we didn't invent a new DI type for encode btf_type_tag. But it is
totally okay to have IR looks like

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
!11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)


OK, thanks.

There is still the question of why the DWARF generated for this case
that I have been concerned about:

    int __typetag1 * __typetag2 __typetag3 * g;

differs between GCC (with this series) and clang. After studying it,
GCC is doing with the attributes exactly as is described in the
Attribute Syntax portion of the GCC manual where the GNU syntax is
described. I do not think there is any problem here.

So the difference in DWARF suggests to me that clang is not handling
the GNU attribute syntax in this particular case correctly, since it
seems to be associating __typetag2 and __typetag3 to g's type rather
than the type to which it points.

I am not sure whether for the use purposes of the tags this difference
is very important, but it is worth noting.


As Joseph suggested, it may be better to encourage users of these tags
to use the C2x attribute syntax if they are concerned with precisely
which construct the tag applies.

This would also be a way around any issues in handling the attributes
due to the GNU syntax.

I tried a few test cases using C2x syntax BTF type tags with a
clang-15 build, but ran into some issues (in particular, some of the
tag attributes being ignored altogether). I couldn't find confirmation
whether C2x attribute syntax is fully supported in clang yet, so maybe
this isn't expected to work. Do you know whether the C2x syntax is
fully supported in clang yet?


Actually, I don't know either. But since the btf decl_tag and type_tag
are also used to compile linux kernel and the minimum compiler version
to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
supports c2x or not, I guess probably 

Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations

2022-05-10 Thread Yonghong Song via Gcc-patches




On 5/10/22 8:43 PM, Yonghong Song wrote:



On 5/6/22 2:18 PM, David Faust wrote:



On 5/5/22 16:00, Yonghong Song wrote:



On 5/4/22 10:03 AM, David Faust wrote:



On 5/3/22 15:32, Joseph Myers wrote:

On Mon, 2 May 2022, David Faust via Gcc-patches wrote:


Consider the following example:

 #define __typetag1 __attribute__((btf_type_tag("tag1")))
 #define __typetag2 __attribute__((btf_type_tag("tag2")))
 #define __typetag3 __attribute__((btf_type_tag("tag3")))

 int __typetag1 * __typetag2 __typetag3 * g;

The expected behavior is that 'g' is "a pointer with tags 'tag2' and
'tag3',
to a pointer with tag 'tag1' to an int". i.e.:


That's not a correct expectation for either GNU __attribute__ or 
C2x [[]]

attribute syntax.  In either syntax, __typetag2 __typetag3 should
apply to
the type to which g points, not to g or its type, just as if you had a
type qualifier there.  You'd need to put the attributes (or qualifier)
after the *, not before, to make them apply to the pointer type.  See
"Attribute Syntax" in the GCC manual for how the syntax is defined for
GNU
attributes and deduce in turn, for each subsequence of the tokens
matching
the syntax for some kind of declarator, what the type for "T D1" 
would be

as defined there and in the C standard, as deduced from the type for
"T D"
for a sub-declarator D.
  >> But GCC's attribute parsing produces a variable 'g' which is "a

pointer with

tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e.


In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
syntax it applies to int.  Again, if you wanted it to apply to the
pointer
type it would need to go after the * not before.

If you are concerned with the fine details of what construct an 
attribute

appertains to, I recommend using C2x syntax not GNU syntax.



Joseph, thank you! This is very helpful. My understanding of the syntax
was not correct.

(Actually, I made a bad mistake in paraphrasing this example from the
discussion of it in the series cover letter. But, the reason why it is
incorrect is the same.)


Yonghong, is the specific ordering an expectation in BPF programs or
other users of the tags?


This is probably a language writing issue. We are saying tags only
apply to pointer. We probably should say it only apply to pointee.

$ cat t.c
int const *ptr;

the llvm ir debuginfo:

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
!6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)

We could replace 'const' with a tag like below:

int __attribute__((btf_type_tag("tag"))) *ptr;

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
annotations: !7)
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!7 = !{!8}
!8 = !{!"btf_type_tag", !"tag"}

In the above IR, we generate annotations to pointer_type because
we didn't invent a new DI type for encode btf_type_tag. But it is
totally okay to have IR looks like

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
!11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)


OK, thanks.

There is still the question of why the DWARF generated for this case 
that I have been concerned about:


   int __typetag1 * __typetag2 __typetag3 * g;

differs between GCC (with this series) and clang. After studying it, 
GCC is doing with the attributes exactly as is described in the 
Attribute Syntax portion of the GCC manual where the GNU syntax is 
described. I do not think there is any problem here.


So the difference in DWARF suggests to me that clang is not handling 
the GNU attribute syntax in this particular case correctly, since it 
seems to be associating __typetag2 and __typetag3 to g's type rather 
than the type to which it points.


I am not sure whether for the use purposes of the tags this difference 
is very important, but it is worth noting.



As Joseph suggested, it may be better to encourage users of these tags 
to use the C2x attribute syntax if they are concerned with precisely 
which construct the tag applies.


This would also be a way around any issues in handling the attributes 
due to the GNU syntax.


I tried a few test cases using C2x syntax BTF type tags with a 
clang-15 build, but ran into some issues (in particular, some of the 
tag attributes being ignored altogether). I couldn't find confirmation 
whether C2x attribute syntax is fully supported in clang yet, so maybe 
this isn't expected to work. Do you know whether the C2x syntax is 
fully supported in clang yet?


Actually, I don't know either. But since the btf decl_tag and type_tag
are also used to compile linux kernel and the minimum compiler version
to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
supports c2x or not, I guess probably not. So I think we most likely
cannot use c2x syntax.


Okay, I think we can 

Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations

2022-05-10 Thread Yonghong Song via Gcc-patches




On 5/6/22 2:18 PM, David Faust wrote:



On 5/5/22 16:00, Yonghong Song wrote:



On 5/4/22 10:03 AM, David Faust wrote:



On 5/3/22 15:32, Joseph Myers wrote:

On Mon, 2 May 2022, David Faust via Gcc-patches wrote:


Consider the following example:

 #define __typetag1 __attribute__((btf_type_tag("tag1")))
 #define __typetag2 __attribute__((btf_type_tag("tag2")))
 #define __typetag3 __attribute__((btf_type_tag("tag3")))

 int __typetag1 * __typetag2 __typetag3 * g;

The expected behavior is that 'g' is "a pointer with tags 'tag2' and
'tag3',
to a pointer with tag 'tag1' to an int". i.e.:


That's not a correct expectation for either GNU __attribute__ or C2x 
[[]]

attribute syntax.  In either syntax, __typetag2 __typetag3 should
apply to
the type to which g points, not to g or its type, just as if you had a
type qualifier there.  You'd need to put the attributes (or qualifier)
after the *, not before, to make them apply to the pointer type.  See
"Attribute Syntax" in the GCC manual for how the syntax is defined for
GNU
attributes and deduce in turn, for each subsequence of the tokens
matching
the syntax for some kind of declarator, what the type for "T D1" 
would be

as defined there and in the C standard, as deduced from the type for
"T D"
for a sub-declarator D.
  >> But GCC's attribute parsing produces a variable 'g' which is "a

pointer with

tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e.


In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
syntax it applies to int.  Again, if you wanted it to apply to the
pointer
type it would need to go after the * not before.

If you are concerned with the fine details of what construct an 
attribute

appertains to, I recommend using C2x syntax not GNU syntax.



Joseph, thank you! This is very helpful. My understanding of the syntax
was not correct.

(Actually, I made a bad mistake in paraphrasing this example from the
discussion of it in the series cover letter. But, the reason why it is
incorrect is the same.)


Yonghong, is the specific ordering an expectation in BPF programs or
other users of the tags?


This is probably a language writing issue. We are saying tags only
apply to pointer. We probably should say it only apply to pointee.

$ cat t.c
int const *ptr;

the llvm ir debuginfo:

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
!6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)

We could replace 'const' with a tag like below:

int __attribute__((btf_type_tag("tag"))) *ptr;

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
annotations: !7)
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!7 = !{!8}
!8 = !{!"btf_type_tag", !"tag"}

In the above IR, we generate annotations to pointer_type because
we didn't invent a new DI type for encode btf_type_tag. But it is
totally okay to have IR looks like

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
!11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)


OK, thanks.

There is still the question of why the DWARF generated for this case 
that I have been concerned about:


   int __typetag1 * __typetag2 __typetag3 * g;

differs between GCC (with this series) and clang. After studying it, GCC 
is doing with the attributes exactly as is described in the Attribute 
Syntax portion of the GCC manual where the GNU syntax is described. I do 
not think there is any problem here.


So the difference in DWARF suggests to me that clang is not handling the 
GNU attribute syntax in this particular case correctly, since it seems 
to be associating __typetag2 and __typetag3 to g's type rather than the 
type to which it points.


I am not sure whether for the use purposes of the tags this difference 
is very important, but it is worth noting.



As Joseph suggested, it may be better to encourage users of these tags 
to use the C2x attribute syntax if they are concerned with precisely 
which construct the tag applies.


This would also be a way around any issues in handling the attributes 
due to the GNU syntax.


I tried a few test cases using C2x syntax BTF type tags with a clang-15 
build, but ran into some issues (in particular, some of the tag 
attributes being ignored altogether). I couldn't find confirmation 
whether C2x attribute syntax is fully supported in clang yet, so maybe 
this isn't expected to work. Do you know whether the C2x syntax is fully 
supported in clang yet?


Actually, I don't know either. But since the btf decl_tag and type_tag
are also used to compile linux kernel and the minimum compiler version
to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
supports c2x or not, I guess probably not. So I think we most likely
cannot use c2x syntax.







This example comes from my testing against clang to check 

Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations

2022-05-05 Thread Yonghong Song via Gcc-patches




On 5/4/22 10:03 AM, David Faust wrote:



On 5/3/22 15:32, Joseph Myers wrote:

On Mon, 2 May 2022, David Faust via Gcc-patches wrote:


Consider the following example:

    #define __typetag1 __attribute__((btf_type_tag("tag1")))
    #define __typetag2 __attribute__((btf_type_tag("tag2")))
    #define __typetag3 __attribute__((btf_type_tag("tag3")))

    int __typetag1 * __typetag2 __typetag3 * g;

The expected behavior is that 'g' is "a pointer with tags 'tag2' and 
'tag3',

to a pointer with tag 'tag1' to an int". i.e.:


That's not a correct expectation for either GNU __attribute__ or C2x [[]]
attribute syntax.  In either syntax, __typetag2 __typetag3 should 
apply to

the type to which g points, not to g or its type, just as if you had a
type qualifier there.  You'd need to put the attributes (or qualifier)
after the *, not before, to make them apply to the pointer type.  See
"Attribute Syntax" in the GCC manual for how the syntax is defined for 
GNU
attributes and deduce in turn, for each subsequence of the tokens 
matching

the syntax for some kind of declarator, what the type for "T D1" would be
as defined there and in the C standard, as deduced from the type for 
"T D"

for a sub-declarator D.
 >> But GCC's attribute parsing produces a variable 'g' which is "a 

pointer with

tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e.


In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
syntax it applies to int.  Again, if you wanted it to apply to the 
pointer

type it would need to go after the * not before.

If you are concerned with the fine details of what construct an attribute
appertains to, I recommend using C2x syntax not GNU syntax.



Joseph, thank you! This is very helpful. My understanding of the syntax 
was not correct.


(Actually, I made a bad mistake in paraphrasing this example from the 
discussion of it in the series cover letter. But, the reason why it is 
incorrect is the same.)



Yonghong, is the specific ordering an expectation in BPF programs or 
other users of the tags?


This is probably a language writing issue. We are saying tags only
apply to pointer. We probably should say it only apply to pointee.

$ cat t.c
int const *ptr;

the llvm ir debuginfo:

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
!6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)

We could replace 'const' with a tag like below:

int __attribute__((btf_type_tag("tag"))) *ptr;

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64, 
annotations: !7)

!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!7 = !{!8}
!8 = !{!"btf_type_tag", !"tag"}

In the above IR, we generate annotations to pointer_type because
we didn't invent a new DI type for encode btf_type_tag. But it is
totally okay to have IR looks like

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
!11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)




This example comes from my testing against clang to check that the BTF 
generated by both toolchains is compatible. In this case we get 
different results when using the GNU attribute syntax.



To avoid confusion, here is the full example (from the cover letter). 
The difference in the results is clear in the DWARF.



Consider the following example:

  #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
  #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
  #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))

  int __typetag1 * __typetag2 __typetag3 * g;

 type 0x774495e8 int>

    asm_written unsigned DI
    size 
    unit-size 
    align:64 warn_if_not_align:0 symtab:0 alias-set -1 
canonical-type 0x77450888

    attributes 
    value     value 0x77509738>

    readonly constant static "type-tag-3\000">>
    chain 

    value     value 

    readonly constant static "type-tag-2\000"
    pointer_to_this >
    asm_written unsigned DI size  
unit-size 
    align:64 warn_if_not_align:0 symtab:0 alias-set -1 
canonical-type 0x77509930
    attributes 0x7753a1e0 btf_type_tag>

    value     value 0x77509738>

    readonly constant static "type-tag-1\000"
    public static unsigned DI defer-output 
/home/dfaust/playpen/btf/annotate.c:29:42 size 0x7743c450 64> unit-size 

    align:64 warn_if_not_align:0>




The current implementation produces the following DWARF:

 <1><1e>: Abbrev Number: 4 (DW_TAG_variable)
    <1f>   DW_AT_name    : g
    <21>   DW_AT_decl_file   : 1
    <22>   DW_AT_decl_line   : 6
    <23>   DW_AT_decl_column : 42
    <24>   DW_AT_type   

Re: [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations

2022-04-04 Thread Yonghong Song via Gcc-patches




On 4/1/22 12:42 PM, David Faust wrote:

Hello,

This patch series is a first attempt at adding support for:

- Two new C-language-level attributes that allow to associate (to "tag")
   particular declarations and types with arbitrary strings. As explained below,
   this is intended to be used to, for example, characterize certain pointer
   types.

- The conveyance of that information in the DWARF output in the form of a new
   DIE: DW_TAG_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
   kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.

All of these facilities are being added to the eBPF ecosystem, and support for
them exists in some form in LLVM. However, as we shall see, we have found some
problems implementing them so some discussion is in order.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify free-text
 tags on certain language elements, such as struct fields.

 The purpose of these annotations is to provide additional information about
 types, variables, and function paratemeters of interest to the kernel. A
 driving use case is to tag pointer types within the linux kernel and eBPF
 programs with additional semantic information, such as '__user' or '__rcu'.

 For example, consider the linux kernel function do_execve with the
 following declaration:

   static int do_execve(struct filename *filename,
  const char __user *const __user *__argv,
  const char __user *const __user *__envp);

 Here, __user could be defined with these annotations to record semantic
 information about the pointer parameters (e.g., they are user-provided) in
 DWARF and BTF information. Other kernel facilites such as the eBPF verifier
 can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

 The main motivation for emitting the tags in DWARF is that the Linux kernel
 generates its BTF information via pahole, using DWARF as a source:

 ++  BTF  BTF   +--+
 | pahole |---> vmlinux.btf --->| verifier |
 ++ +--+
 ^^
 ||
   DWARF |BTF |
 ||
  vmlinux  +-+
  module1.ko   | BPF program |
  module2.ko   +-+
...

 This is because:

 a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

 b)  GCC can generate BTF for whatever target with -gbtf, but there is no
 support for linking/deduplicating BTF in the linker.

 In the scenario above, the verifier needs access to the pointer tags of
 both the kernel types/declarations (conveyed in the DWARF and translated
 to BTF by pahole) and those of the BPF program (available directly in BTF).

 Another motivation for having the tag information in DWARF, unrelated to
 BPF and BTF, is that the drgn project (another DWARF consumer) also wants
 to benefit from these tags in order to differentiate between different
 kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

 This is easy: the main purpose of having this info in BTF is for the
 compiled eBPF programs. The kernel verifier can then access the tags
 of pointers used by the eBPF programs.


For more information about these tags and the motivation behind them, please
refer to the following linux kernel discussions:

   https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
   https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
   https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/


What is in this patch series


This patch series adds support for these annotations in GCC. The implementation
is largely complete. However, in some cases the produced debug info (both DWARF
and BTF) differs significantly from that produced by LLVM. This issue is
discussed in detail below, along with a few specific questions for both GCC and
LLVM. Any input would be much appreciated.


Hi, David, Thanks for the RFC implementation! I will answer your 
questions related to llvm and kernel.





Implementation Overview
===

To enable these annotations, two new C language attributes are added:
__attribute__((btf_decl_tag("foo")) and __attribute__((btf_type_tag("bar"))).
Both attributes accept a single arbitrary string constant argument, which will
be recorded in the generated DWARF and/or BTF debugging information. They have
no effect on code generation.

Note that we are using the same attribute names as LLVM, which include