[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-27 Thread Tony Tye via cfe-commits

t-tye wrote:

I am not clear why new functions need to be added for this, as I think there 
are existing functions that already do this.

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-08 Thread Tony Tye via cfe-commits

https://github.com/t-tye approved this pull request.


https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-07 Thread Tony Tye via cfe-commits

https://github.com/t-tye commented:

Documentation LGTM. Thanks.

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Tony Tye via cfe-commits


@@ -1642,80 +1746,118 @@ The AMDGPU backend uses the following ELF header:
  ``EF_AMDGPU_FEATURE_SRAMECC_ON_V4``  0xc00 SRAMECC enabled.
   = 
===
 
+  .. table:: AMDGPU ELF Header ``e_flags`` for Code Object V6 and After
+ :name: amdgpu-elf-header-e_flags-table-v6-onwards
+
+  == 
=
+ Name Value  Description
+  == 
=
+ ``EF_AMDGPU_MACH``   0x0ff  AMDGPU processor 
selection
+ mask for
+ 
``EF_AMDGPU_MACH_xxx`` values
+ defined in
+ 
:ref:`amdgpu-ef-amdgpu-mach-table`.
+ ``EF_AMDGPU_FEATURE_XNACK_V4``   0x300  XNACK selection 
mask for
+ 
``EF_AMDGPU_FEATURE_XNACK_*_V4``
+ values.
+ ``EF_AMDGPU_FEATURE_XNACK_UNSUPPORTED_V4``   0x000  XNACK unsupported.
+ ``EF_AMDGPU_FEATURE_XNACK_ANY_V4``   0x100  XNACK can have 
any value.
+ ``EF_AMDGPU_FEATURE_XNACK_OFF_V4``   0x200  XNACK disabled.
+ ``EF_AMDGPU_FEATURE_XNACK_ON_V4``0x300  XNACK enabled.
+ ``EF_AMDGPU_FEATURE_SRAMECC_V4`` 0xc00  SRAMECC selection 
mask for
+ 
``EF_AMDGPU_FEATURE_SRAMECC_*_V4``
+ values.
+ ``EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4`` 0x000  SRAMECC 
unsupported.
+ ``EF_AMDGPU_FEATURE_SRAMECC_ANY_V4`` 0x400  SRAMECC can have 
any value.
+ ``EF_AMDGPU_FEATURE_SRAMECC_OFF_V4`` 0x800  SRAMECC disabled,
+ ``EF_AMDGPU_FEATURE_SRAMECC_ON_V4``  0xc00  SRAMECC enabled.
+ ``EF_AMDGPU_GENERIC_VERSION_V``   0x0100 Value between 1 
and 255 for generic code

t-tye wrote:

There needs to be a selection mask like for other fields.

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Tony Tye via cfe-commits


@@ -4135,6 +4283,33 @@ Code object V5 metadata is the same as
 
  == == = 

 
+.. _amdgpu-amdhsa-code-object-metadata-v6:
+
+Code Object V6 Metadata

+
+.. warning::
+  Code object V6 is not the default code object version emitted by this version
+  of LLVM.
+
+
+Code object V6 metadata is the same as
+:ref:`amdgpu-amdhsa-code-object-metadata-v5` with the changes defined in table
+:ref:`amdgpu-amdhsa-code-object-metadata-map-table-v6`.
+
+  .. table:: AMDHSA Code Object V6 Metadata Map Changes
+ :name: amdgpu-amdhsa-code-object-metadata-map-table-v6
+
+ = == = 
===
+ String KeyValue Type Required? Description
+ = == = 
===
+ "amdhsa.version"  sequence ofRequired  - The first integer is the 
major

t-tye wrote:

I am not sure what metadata changes would be needed to support generic code 
objects. I would not add this section.

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Tony Tye via cfe-commits


@@ -520,6 +520,102 @@ Every processor supports every OS ABI (see 
:ref:`amdgpu-os`) with the following
 
  === ===  = = 
=== === ==
 
+Generic processors allow execution of a single code objects on any of the 
processors that

t-tye wrote:

objects -> object

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Tony Tye via cfe-commits


@@ -520,6 +520,102 @@ Every processor supports every OS ABI (see 
:ref:`amdgpu-os`) with the following
 
  === ===  = = 
=== === ==
 
+Generic processors allow execution of a single code objects on any of the 
processors that
+it supports. Such code objects may not perform as well as those for the 
non-generic processors.
+
+Generic processors are only available on code object V6 and above (see 
:ref:`amdgpu-elf-code-object`).
+
+Generic processor code objects are versioned (see 
:ref:`amdgpu-elf-header-e_flags-table-v6-onwards`).
+The version number is used by runtimes to determine if a code object can be 
run on a specific agent.

t-tye wrote:

This does not really explain how version is used. What about something like:

The version of non-generic code objects is always set to 0.

For a generic code object, adding a new generic member may require the code 
generated for the generic target to be changed so it can continue to execute on 
the previous members as well as on the new member. When this happens the 
generic code object version number is incremented. Each member of the generic 
target has a version when it was introduced. A generic code object can execute 
on a specific member if the version of the code object being loaded is >= the 
version at which the member was introduced.

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-06 Thread Tony Tye via cfe-commits


@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
   bool IsNonTemporal) const override;
 };
 
+class SIPreciseMemorySupport {
+protected:
+  const GCNSubtarget 
+  const SIInstrInfo *TII = nullptr;
+
+  IsaVersion IV;
+
+  SIPreciseMemorySupport(const GCNSubtarget ) : ST(ST) {
+TII = ST.getInstrInfo();
+IV = getIsaVersion(ST.getCPU());
+  }
+
+public:
+  static std::unique_ptr create(const GCNSubtarget 
);
+
+  virtual bool handleNonAtomic(MachineBasicBlock::iterator ) = 0;
+  /// Handles atomic instruction \p MI with \p ret indicating whether \p MI
+  /// returns a result.
+  virtual bool handleAtomic(MachineBasicBlock::iterator , bool ret) = 0;
+};
+
+class SIGfx9PreciseMemorySupport : public SIPreciseMemorySupport {
+public:
+  SIGfx9PreciseMemorySupport(const GCNSubtarget )
+  : SIPreciseMemorySupport(ST) {}
+  bool handleNonAtomic(MachineBasicBlock::iterator ) override;
+  bool handleAtomic(MachineBasicBlock::iterator , bool ret) override;
+};
+
+class SIGfx10And11PreciseMemorySupport : public SIPreciseMemorySupport {
+public:
+  SIGfx10And11PreciseMemorySupport(const GCNSubtarget )
+  : SIPreciseMemorySupport(ST) {}
+  bool handleNonAtomic(MachineBasicBlock::iterator ) override;
+  bool handleAtomic(MachineBasicBlock::iterator , bool ret) override;
+};
+
+std::unique_ptr
+SIPreciseMemorySupport::create(const GCNSubtarget ) {
+  GCNSubtarget::Generation Generation = ST.getGeneration();
+  if (Generation < AMDGPUSubtarget::GFX10)

t-tye wrote:

Is there a reason that this functionality should not be available for any 
target? It is true it is only particularly useful for targets that have no 
precise memory operations hardware support, but the basic idea is meaningful 
for any target.

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-06 Thread Tony Tye via cfe-commits


@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
   bool IsNonTemporal) const override;
 };
 
+class SIPreciseMemorySupport {

t-tye wrote:

My initial thought had been that this would be part of the existing cache 
control functions. It seems it is the same kind of waitcnt as needs to be 
inserted after a store release. That also requires the right waitcnt to be 
generated according to the kind of memory instruction.

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-25 Thread Tony Tye via cfe-commits


@@ -641,6 +644,9 @@ class SIMemoryLegalizer final : public MachineFunctionPass {
   bool expandAtomicCmpxchgOrRmw(const SIMemOpInfo ,
 MachineBasicBlock::iterator );
 
+  bool GFX9InsertWaitcntForPreciseMem(MachineFunction );

t-tye wrote:

Should these be combined with the expand* functions? They are supposed to do 
all that is necessary to "legalize" the opcodes to meet the memory model. And 
this inserting waitcnts is just another piece of that expansion.

Combining it can also avoid inserting multiple waitcnt for the same memory 
operation.

Combining it may be able to use the existing operation to ensure a memory 
operation is completed. I believe that operation should already be determining 
what kind of waitcnts should be inserted. If not, then I would consider 
generalizing it so it can be used by both the atomics expansion and the precise 
memory expansion.

It also keeps the operations in this class architecture neutral.

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-25 Thread Tony Tye via cfe-commits

https://github.com/t-tye edited https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-25 Thread Tony Tye via cfe-commits

https://github.com/t-tye requested changes to this pull request.


https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [flang] [lld] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-17 Thread Tony Tye via cfe-commits


@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see 
:ref:`amdgpu-os`) with the following
 
  === ===  = = 
=== === ==
 
+Generic processors also exist. They group multiple processors into one,
+allowing to build code once and run it on multiple targets at the cost
+of less features being available.
+
+Generic processors are only available on Code Object V6 and up.
+
+  .. table:: AMDGPU Generic Processors
+ :name: amdgpu-generic-processor-table
+
+  == = 
=
+ Processor TargetSupported Target
+   TripleProcessorsFeatures
+   ArchitectureRestrictions
+
+
+
+
+
+
+
+
+  == = 
=
+ ``gfx9-generic`` ``amdgcn`` - ``gfx900``  - ``v_mad_mix`` 
instructions
+ - ``gfx902``are not available 
on
+ - ``gfx904````gfx900``, 
``gfx902``,
+ - ``gfx906````gfx909``, 
``gfx90c``
+ - ``gfx909``  - ``v_fma_mix`` 
instructions
+ - ``gfx90c``are not available 
on ``gfx904``
+   - sramecc is not 
available on

t-tye wrote:

So is code being generated for sramecc=any? 

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [lld] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-17 Thread Tony Tye via cfe-commits


@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see 
:ref:`amdgpu-os`) with the following
 
  === ===  = = 
=== === ==
 
+Generic processors also exist. They group multiple processors into one,

t-tye wrote:

What about:

Generic processors also exist. Generic processor code objects can be executed 
on any of the processors that are supported by the generic processor. Such code 
objects may not perform as well as those for the non-generic processors.

Generic processors are only available on code object V6 and above (see [ELF 
Code Object]).

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[flang] [llvm] [lld] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-17 Thread Tony Tye via cfe-commits

https://github.com/t-tye requested changes to this pull request.


https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [flang] [clang] [lld] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-17 Thread Tony Tye via cfe-commits


@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see 
:ref:`amdgpu-os`) with the following
 
  === ===  = = 
=== === ==
 
+Generic processors also exist. They group multiple processors into one,
+allowing to build code once and run it on multiple targets at the cost
+of less features being available.
+
+Generic processors are only available on Code Object V6 and up.
+

t-tye wrote:

Document that generic processes have a version (see 
:ref:`amdgpu-elf-header-e_flags-table-v6-onwards`) and explain how it is used 
by the runtime.

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[lld] [flang] [llvm] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-17 Thread Tony Tye via cfe-commits


@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see 
:ref:`amdgpu-os`) with the following
 
  === ===  = = 
=== === ==
 
+Generic processors also exist. They group multiple processors into one,
+allowing to build code once and run it on multiple targets at the cost
+of less features being available.
+
+Generic processors are only available on Code Object V6 and up.
+
+  .. table:: AMDGPU Generic Processors
+ :name: amdgpu-generic-processor-table
+
+  == = 
=
+ Processor TargetSupported Target
+   TripleProcessorsFeatures
+   ArchitectureRestrictions
+
+
+
+
+
+
+
+
+  == = 
=
+ ``gfx9-generic`` ``amdgcn`` - ``gfx900``  - ``v_mad_mix`` 
instructions
+ - ``gfx902``are not available 
on
+ - ``gfx904````gfx900``, 
``gfx902``,
+ - ``gfx906````gfx909``, 
``gfx90c``
+ - ``gfx909``  - ``v_fma_mix`` 
instructions
+ - ``gfx90c``are not available 
on ``gfx904``
+   - sramecc is not 
available on
+ ``gfx906``
+   - The following 
instructions
+ are not available 
on ``gfx906``:
+
+ - ``v_fmac_f32``
+ - ``v_xnor_b32``
+ - 
``v_dot4_i32_i8``
+ - 
``v_dot8_i32_i4``
+ - 
``v_dot2_i32_i16``
+ - 
``v_dot2_u32_u16``
+ - 
``v_dot4_u32_u8``
+ - 
``v_dot8_u32_u4``
+ - 
``v_dot2_f32_f16``
+
+
+ ``gfx10.1-generic``  ``amdgcn`` - ``gfx1010`` - The following 
instructions are
+ - ``gfx1011``   not available on 
``gfx1011``
+ - ``gfx1012``   and ``gfx1012``
+ - ``gfx1013``
+ - 
``v_dot4_i32_i8``
+ - 
``v_dot8_i32_i4``
+ - 
``v_dot2_i32_i16``
+ - 
``v_dot2_u32_u16``
+ - 
``v_dot2c_f32_f16``
+ - 
``v_dot4c_i32_i8``
+ - 
``v_dot4_u32_u8``
+ - 
``v_dot8_u32_u4``
+ - 
``v_dot2_f32_f16``
+
+   - BVH Ray Tracing 
instructions
+ are not available 
on
+ ``gfx1013``
+
+
+ ``gfx10.3-generic``  ``amdgcn`` - ``gfx1030`` No restrictions.
+ - ``gfx1031``
+ - ``gfx1032``
+ - ``gfx1033``
+ - ``gfx1034``
+ - ``gfx1035``
+ - ``gfx1036``
+
+
+ ``gfx11-generic````amdgcn`` - ``gfx1100`` Various codegen 
pessimizations
+ - ``gfx1101`` are applied to all 
targets to
+ - ``gfx1102`` work around 
hardware bugs on one

t-tye wrote:

I do not think we should be stating hardware bugs exist in public 
documentation. We can simply say less efficient code sequences are generated in 
various cases. Not sure we should list them.

Do we use msaa-load-dst-sel-bug, valu-trans-use-hazard, user-sgpr-init16-bug 
elsewhere in the code? Not sure we 

[lld] [clang] [flang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-17 Thread Tony Tye via cfe-commits


@@ -4135,6 +4283,33 @@ Code object V5 metadata is the same as
 
  == == = 

 
+.. _amdgpu-amdhsa-code-object-metadata-v6:
+
+Code Object V6 Metadata

+
+.. warning::
+  Code object V6 is not the default code object version emitted by this version
+  of LLVM.
+
+
+Code object V6 metadata is the same as
+:ref:`amdgpu-amdhsa-code-object-metadata-v5` with the changes defined in table
+:ref:`amdgpu-amdhsa-code-object-metadata-map-table-v6`.
+
+  .. table:: AMDHSA Code Object V6 Metadata Map Changes
+ :name: amdgpu-amdhsa-code-object-metadata-map-table-v6
+
+ = == = 
===
+ String KeyValue Type Required? Description
+ = == = 
===
+ "amdhsa.version"  sequence ofRequired  - The first integer is the 
major

t-tye wrote:

Since there are no changes to the metadata do we need this? Can make the V5 one 
be V5 and V6.

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [lld] [flang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-17 Thread Tony Tye via cfe-commits


@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see 
:ref:`amdgpu-os`) with the following
 
  === ===  = = 
=== === ==
 
+Generic processors also exist. They group multiple processors into one,
+allowing to build code once and run it on multiple targets at the cost
+of less features being available.
+
+Generic processors are only available on Code Object V6 and up.
+
+  .. table:: AMDGPU Generic Processors
+ :name: amdgpu-generic-processor-table
+
+  == = 
=
+ Processor TargetSupported Target

t-tye wrote:

There needs to be a column for "Target Features Supported" and "Target 
Properties".

The "Target Features Restrictions" should probably be renamed to "Target 
Restrictions".

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [lld] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-17 Thread Tony Tye via cfe-commits


@@ -1633,80 +1741,120 @@ The AMDGPU backend uses the following ELF header:
  ``EF_AMDGPU_FEATURE_SRAMECC_ON_V4``  0xc00 SRAMECC enabled.
   = 
===
 
+  .. table:: AMDGPU ELF Header ``e_flags`` for Code Object V6 and After
+ :name: amdgpu-elf-header-e_flags-table-v6-onwards
+
+  == 
=
+ Name Value  Description
+  == 
=
+ ``EF_AMDGPU_MACH``   0x0ff  AMDGPU processor 
selection
+ mask for
+ 
``EF_AMDGPU_MACH_xxx`` values
+ defined in
+ 
:ref:`amdgpu-ef-amdgpu-mach-table`.
+ ``EF_AMDGPU_FEATURE_XNACK_V4``   0x300  XNACK selection 
mask for
+ 
``EF_AMDGPU_FEATURE_XNACK_*_V4``
+ values.
+ ``EF_AMDGPU_FEATURE_XNACK_UNSUPPORTED_V4``   0x000  XNACK unsupported.
+ ``EF_AMDGPU_FEATURE_XNACK_ANY_V4``   0x100  XNACK can have 
any value.
+ ``EF_AMDGPU_FEATURE_XNACK_OFF_V4``   0x200  XNACK disabled.
+ ``EF_AMDGPU_FEATURE_XNACK_ON_V4``0x300  XNACK enabled.
+ ``EF_AMDGPU_FEATURE_SRAMECC_V4`` 0xc00  SRAMECC selection 
mask for
+ 
``EF_AMDGPU_FEATURE_SRAMECC_*_V4``
+ values.
+ ``EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4`` 0x000  SRAMECC 
unsupported.
+ ``EF_AMDGPU_FEATURE_SRAMECC_ANY_V4`` 0x400  SRAMECC can have 
any value.
+ ``EF_AMDGPU_FEATURE_SRAMECC_OFF_V4`` 0x800  SRAMECC disabled,
+ ``EF_AMDGPU_FEATURE_SRAMECC_ON_V4``  0xc00  SRAMECC enabled.
+ ``EF_AMDGPU_GENERIC_VERSION_V``   0x0100 The most 
significant byte of EFLAGS

t-tye wrote:

Move the description of the version to the generic processors section and 
simply reference that here.

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[lld] [flang] [clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-17 Thread Tony Tye via cfe-commits


@@ -1633,80 +1741,120 @@ The AMDGPU backend uses the following ELF header:
  ``EF_AMDGPU_FEATURE_SRAMECC_ON_V4``  0xc00 SRAMECC enabled.
   = 
===
 
+  .. table:: AMDGPU ELF Header ``e_flags`` for Code Object V6 and After
+ :name: amdgpu-elf-header-e_flags-table-v6-onwards
+
+  == 
=
+ Name Value  Description
+  == 
=
+ ``EF_AMDGPU_MACH``   0x0ff  AMDGPU processor 
selection
+ mask for
+ 
``EF_AMDGPU_MACH_xxx`` values
+ defined in
+ 
:ref:`amdgpu-ef-amdgpu-mach-table`.
+ ``EF_AMDGPU_FEATURE_XNACK_V4``   0x300  XNACK selection 
mask for
+ 
``EF_AMDGPU_FEATURE_XNACK_*_V4``
+ values.
+ ``EF_AMDGPU_FEATURE_XNACK_UNSUPPORTED_V4``   0x000  XNACK unsupported.
+ ``EF_AMDGPU_FEATURE_XNACK_ANY_V4``   0x100  XNACK can have 
any value.
+ ``EF_AMDGPU_FEATURE_XNACK_OFF_V4``   0x200  XNACK disabled.
+ ``EF_AMDGPU_FEATURE_XNACK_ON_V4``0x300  XNACK enabled.
+ ``EF_AMDGPU_FEATURE_SRAMECC_V4`` 0xc00  SRAMECC selection 
mask for
+ 
``EF_AMDGPU_FEATURE_SRAMECC_*_V4``
+ values.
+ ``EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4`` 0x000  SRAMECC 
unsupported.
+ ``EF_AMDGPU_FEATURE_SRAMECC_ANY_V4`` 0x400  SRAMECC can have 
any value.
+ ``EF_AMDGPU_FEATURE_SRAMECC_OFF_V4`` 0x800  SRAMECC disabled,
+ ``EF_AMDGPU_FEATURE_SRAMECC_ON_V4``  0xc00  SRAMECC enabled.
+ ``EF_AMDGPU_GENERIC_VERSION_V``   0x0100 The most 
significant byte of EFLAGS
+  to contains a 
"generic code object
+  0xff00 version". This is 
used by runtimes
+ to determine if a 
generic code
+ object can be run 
on a
+ machine.
+ NOTE: This is 
only set for generic
+ targets. (e.g., 
``gfx9-generic``).
+ See 
:ref:`amdgpu-generic-processor-table`
+  == 
=
+
   .. table:: AMDGPU ``EF_AMDGPU_MACH`` Values
  :name: amdgpu-ef-amdgpu-mach-table
 
-  == 
=
- Name Value  Description (see
- 
:ref:`amdgpu-processor-table`)
-  == 
=
- ``EF_AMDGPU_MACH_NONE``  0x000  *not specified*
- ``EF_AMDGPU_MACH_R600_R600`` 0x001  ``r600``
- ``EF_AMDGPU_MACH_R600_R630`` 0x002  ``r630``
- ``EF_AMDGPU_MACH_R600_RS880``0x003  ``rs880``
- ``EF_AMDGPU_MACH_R600_RV670``0x004  ``rv670``
- ``EF_AMDGPU_MACH_R600_RV710``0x005  ``rv710``
- ``EF_AMDGPU_MACH_R600_RV730``0x006  ``rv730``
- ``EF_AMDGPU_MACH_R600_RV770``0x007  ``rv770``
- ``EF_AMDGPU_MACH_R600_CEDAR``0x008  ``cedar``
- ``EF_AMDGPU_MACH_R600_CYPRESS``  0x009  ``cypress``
- ``EF_AMDGPU_MACH_R600_JUNIPER``  0x00a  ``juniper``
- ``EF_AMDGPU_MACH_R600_REDWOOD``  0x00b  ``redwood``
- ``EF_AMDGPU_MACH_R600_SUMO`` 0x00c  ``sumo``
- ``EF_AMDGPU_MACH_R600_BARTS``0x00d  ``barts``
- ``EF_AMDGPU_MACH_R600_CAICOS``   0x00e  ``caicos``
- ``EF_AMDGPU_MACH_R600_CAYMAN``   0x00f  ``cayman``
- ``EF_AMDGPU_MACH_R600_TURKS``0x010  ``turks``
- *reserved*   0x011 -Reserved for ``r600``
-  0x01f  architecture processors.
- ``EF_AMDGPU_MACH_AMDGCN_GFX600`` 0x020  ``gfx600``
- ``EF_AMDGPU_MACH_AMDGCN_GFX601``  

[llvm] [flang] [lld] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-17 Thread Tony Tye via cfe-commits


@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see 
:ref:`amdgpu-os`) with the following
 
  === ===  = = 
=== === ==
 
+Generic processors also exist. They group multiple processors into one,
+allowing to build code once and run it on multiple targets at the cost
+of less features being available.
+
+Generic processors are only available on Code Object V6 and up.
+
+  .. table:: AMDGPU Generic Processors
+ :name: amdgpu-generic-processor-table
+
+  == = 
=
+ Processor TargetSupported Target
+   TripleProcessorsFeatures
+   ArchitectureRestrictions
+

t-tye wrote:

Seems a lot of blank lines.

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [lld] [flang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-17 Thread Tony Tye via cfe-commits

https://github.com/t-tye edited https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r328347 - [AMDGPU] Remove use of OpenCL triple environment and replace with function attribute for AMDGPU (CLANG)

2018-03-23 Thread Tony Tye via cfe-commits
Author: t-tye
Date: Fri Mar 23 11:43:15 2018
New Revision: 328347

URL: http://llvm.org/viewvc/llvm-project?rev=328347=rev
Log:
[AMDGPU] Remove use of OpenCL triple environment and replace with function 
attribute for AMDGPU (CLANG)


- Remove use of the opencl and amdopencl environment member of the target 
triple for the AMDGPU target.
- Use a function attribute to communicate to the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D43735

Modified:
cfe/trunk/docs/UsersManual.rst
cfe/trunk/lib/CodeGen/TargetInfo.cpp
cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl

Modified: cfe/trunk/docs/UsersManual.rst
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/docs/UsersManual.rst?rev=328347=328346=328347=diff
==
--- cfe/trunk/docs/UsersManual.rst (original)
+++ cfe/trunk/docs/UsersManual.rst Fri Mar 23 11:43:15 2018
@@ -2180,7 +2180,7 @@ to the target, for example:
.. code-block:: console
 
  $ clang -target nvptx64-unknown-unknown test.cl
- $ clang -target amdgcn-amd-amdhsa-opencl test.cl
+ $ clang -target amdgcn-amd-amdhsa -mcpu=gfx900 test.cl
 
 Compiling to bitcode can be done as follows:
 
@@ -2288,7 +2288,7 @@ There is a set of concrete HW architectu
 
.. code-block:: console
 
- $ clang -target amdgcn-amd-amdhsa-opencl test.cl
+ $ clang -target amdgcn-amd-amdhsa -mcpu=gfx900 test.cl
 
 - For Nvidia architectures:
 

Modified: cfe/trunk/lib/CodeGen/TargetInfo.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/TargetInfo.cpp?rev=328347=328346=328347=diff
==
--- cfe/trunk/lib/CodeGen/TargetInfo.cpp (original)
+++ cfe/trunk/lib/CodeGen/TargetInfo.cpp Fri Mar 23 11:43:15 2018
@@ -7661,6 +7661,11 @@ void AMDGPUTargetCodeGenInfo::setTargetA
 
   const auto *ReqdWGS = M.getLangOpts().OpenCL ?
 FD->getAttr() : nullptr;
+
+  if (M.getLangOpts().OpenCL && FD->hasAttr() &&
+  (M.getTriple().getOS() == llvm::Triple::AMDHSA))
+F->addFnAttr("amdgpu-implicitarg-num-bytes", "32");
+
   const auto *FlatWGS = FD->getAttr();
   if (ReqdWGS || FlatWGS) {
 unsigned Min = FlatWGS ? FlatWGS->getMin() : 0;

Modified: cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl?rev=328347=328346=328347=diff
==
--- cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl (original)
+++ cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl Fri Mar 23 11:43:15 2018
@@ -1,4 +1,5 @@
-// RUN: %clang_cc1 -triple amdgcn-- -target-cpu tahiti -O0 -emit-llvm -o - %s 
| FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu tahiti -O0 -emit-llvm 
-o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple amdgcn-- -target-cpu tahiti -O0 -emit-llvm -o - %s 
| FileCheck %s -check-prefix=NONAMDHSA
 // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -O0 -emit-llvm -verify -o 
- %s | FileCheck -check-prefix=X86 %s
 
 __attribute__((amdgpu_flat_work_group_size(0, 0))) // expected-no-diagnostics
@@ -138,12 +139,18 @@ kernel void reqd_work_group_size_32_2_1_
 // CHECK: define amdgpu_kernel void 
@reqd_work_group_size_32_2_1_flat_work_group_size_16_128() 
[[FLAT_WORK_GROUP_SIZE_16_128:#[0-9]+]]
 }
 
+void a_function() {
+// CHECK: define void @a_function() [[A_FUNCTION:#[0-9]+]]
+}
+
 
 // Make sure this is silently accepted on other targets.
 // X86-NOT: "amdgpu-flat-work-group-size"
 // X86-NOT: "amdgpu-waves-per-eu"
 // X86-NOT: "amdgpu-num-vgpr"
 // X86-NOT: "amdgpu-num-sgpr"
+// X86-NOT: "amdgpu-implicitarg-num-bytes"
+// NONAMDHSA-NOT: "amdgpu-implicitarg-num-bytes"
 
 // CHECK-NOT: "amdgpu-flat-work-group-size"="0,0"
 // CHECK-NOT: "amdgpu-waves-per-eu"="0"
@@ -151,28 +158,30 @@ kernel void reqd_work_group_size_32_2_1_
 // CHECK-NOT: "amdgpu-num-sgpr"="0"
 // CHECK-NOT: "amdgpu-num-vgpr"="0"
 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64"
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_64_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="64,64"
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_16_128]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="16,128"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2]] = { convergent noinline nounwind 
optnone "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_4]] = { convergent noinline nounwind 
optnone "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[NUM_SGPR_32]] = { convergent noinline nounwind 
optnone "amdgpu-num-sgpr"="32"
-// CHECK-DAG: attributes [[NUM_VGPR_64]] = { convergent noinline nounwind 
optnone "amdgpu-num-vgpr"="64"
-
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2]] = { 
convergent noinline nounwind optnone 

r328350 - [AMDGPU] Update OpenCL to use 48 bytes of implicit arguments for AMDGPU (CLANG)

2018-03-23 Thread Tony Tye via cfe-commits
Author: t-tye
Date: Fri Mar 23 11:51:45 2018
New Revision: 328350

URL: http://llvm.org/viewvc/llvm-project?rev=328350=rev
Log:
[AMDGPU] Update OpenCL to use 48 bytes of implicit arguments for AMDGPU (CLANG)

Add two additional implicit arguments for OpenCL for the AMDGPU target using 
the AMDHSA runtime to support device enqueue.

Differential Revision: https://reviews.llvm.org/D44696

Modified:
cfe/trunk/lib/CodeGen/TargetInfo.cpp
cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl

Modified: cfe/trunk/lib/CodeGen/TargetInfo.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/TargetInfo.cpp?rev=328350=328349=328350=diff
==
--- cfe/trunk/lib/CodeGen/TargetInfo.cpp (original)
+++ cfe/trunk/lib/CodeGen/TargetInfo.cpp Fri Mar 23 11:51:45 2018
@@ -7664,7 +7664,7 @@ void AMDGPUTargetCodeGenInfo::setTargetA
 
   if (M.getLangOpts().OpenCL && FD->hasAttr() &&
   (M.getTriple().getOS() == llvm::Triple::AMDHSA))
-F->addFnAttr("amdgpu-implicitarg-num-bytes", "32");
+F->addFnAttr("amdgpu-implicitarg-num-bytes", "48");
 
   const auto *FlatWGS = FD->getAttr();
   if (ReqdWGS || FlatWGS) {

Modified: cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl?rev=328350=328349=328350=diff
==
--- cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl (original)
+++ cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl Fri Mar 23 11:51:45 2018
@@ -158,30 +158,30 @@ void a_function() {
 // CHECK-NOT: "amdgpu-num-sgpr"="0"
 // CHECK-NOT: "amdgpu-num-vgpr"="0"
 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="32" 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_64_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="64,64" 
"amdgpu-implicitarg-num-bytes"="32" 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_16_128]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="16,128" 
"amdgpu-implicitarg-num-bytes"="32" 
-// CHECK-DAG: attributes [[WAVES_PER_EU_2]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="32" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_4]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="32" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[NUM_SGPR_32]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="32" "amdgpu-num-sgpr"="32" 
-// CHECK-DAG: attributes [[NUM_VGPR_64]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="32" "amdgpu-num-vgpr"="64" 
+// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" 
+// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_64_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="64,64" 
"amdgpu-implicitarg-num-bytes"="48" 
+// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_16_128]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="16,128" 
"amdgpu-implicitarg-num-bytes"="48" 
+// CHECK-DAG: attributes [[WAVES_PER_EU_2]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2"
+// CHECK-DAG: attributes [[WAVES_PER_EU_2_4]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2,4"
+// CHECK-DAG: attributes [[NUM_SGPR_32]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
+// CHECK-DAG: attributes [[NUM_VGPR_64]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="32" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="32" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_SGPR_32]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="32" "amdgpu-num-sgpr"="32" 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_VGPR_64]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="32" "amdgpu-num-vgpr"="64" 
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="32" 
"amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: 

[PATCH] D26196: Add support for non-zero null pointers

2016-11-08 Thread Tony Tye via cfe-commits
tony-tye added inline comments.



Comment at: lib/CodeGen/CodeGenTypes.cpp:743
+auto NullPtr = CGM.getNullPtr(LLPT, T);
+return isa(NullPtr);
+  }

Is this correct if the target does not represent a NULL pointer as the address 
with value 0? Or should this be asking the target if this null pointer is 
represented by an address value of 0?


https://reviews.llvm.org/D26196



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits