[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
t-tye wrote: I am not clear why new functions need to be added for this, as I think there are existing functions that already do this. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/t-tye approved this pull request. https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/t-tye commented: Documentation LGTM. Thanks. https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -1642,80 +1746,118 @@ The AMDGPU backend uses the following ELF header: ``EF_AMDGPU_FEATURE_SRAMECC_ON_V4`` 0xc00 SRAMECC enabled. = === + .. table:: AMDGPU ELF Header ``e_flags`` for Code Object V6 and After + :name: amdgpu-elf-header-e_flags-table-v6-onwards + + == = + Name Value Description + == = + ``EF_AMDGPU_MACH`` 0x0ff AMDGPU processor selection + mask for + ``EF_AMDGPU_MACH_xxx`` values + defined in + :ref:`amdgpu-ef-amdgpu-mach-table`. + ``EF_AMDGPU_FEATURE_XNACK_V4`` 0x300 XNACK selection mask for + ``EF_AMDGPU_FEATURE_XNACK_*_V4`` + values. + ``EF_AMDGPU_FEATURE_XNACK_UNSUPPORTED_V4`` 0x000 XNACK unsupported. + ``EF_AMDGPU_FEATURE_XNACK_ANY_V4`` 0x100 XNACK can have any value. + ``EF_AMDGPU_FEATURE_XNACK_OFF_V4`` 0x200 XNACK disabled. + ``EF_AMDGPU_FEATURE_XNACK_ON_V4``0x300 XNACK enabled. + ``EF_AMDGPU_FEATURE_SRAMECC_V4`` 0xc00 SRAMECC selection mask for + ``EF_AMDGPU_FEATURE_SRAMECC_*_V4`` + values. + ``EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4`` 0x000 SRAMECC unsupported. + ``EF_AMDGPU_FEATURE_SRAMECC_ANY_V4`` 0x400 SRAMECC can have any value. + ``EF_AMDGPU_FEATURE_SRAMECC_OFF_V4`` 0x800 SRAMECC disabled, + ``EF_AMDGPU_FEATURE_SRAMECC_ON_V4`` 0xc00 SRAMECC enabled. + ``EF_AMDGPU_GENERIC_VERSION_V`` 0x0100 Value between 1 and 255 for generic code t-tye wrote: There needs to be a selection mask like for other fields. https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -4135,6 +4283,33 @@ Code object V5 metadata is the same as == == = +.. _amdgpu-amdhsa-code-object-metadata-v6: + +Code Object V6 Metadata + +.. warning:: + Code object V6 is not the default code object version emitted by this version + of LLVM. + + +Code object V6 metadata is the same as +:ref:`amdgpu-amdhsa-code-object-metadata-v5` with the changes defined in table +:ref:`amdgpu-amdhsa-code-object-metadata-map-table-v6`. + + .. table:: AMDHSA Code Object V6 Metadata Map Changes + :name: amdgpu-amdhsa-code-object-metadata-map-table-v6 + + = == = === + String KeyValue Type Required? Description + = == = === + "amdhsa.version" sequence ofRequired - The first integer is the major t-tye wrote: I am not sure what metadata changes would be needed to support generic code objects. I would not add this section. https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -520,6 +520,102 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following === === = = === === == +Generic processors allow execution of a single code objects on any of the processors that t-tye wrote: objects -> object https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -520,6 +520,102 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following === === = = === === == +Generic processors allow execution of a single code objects on any of the processors that +it supports. Such code objects may not perform as well as those for the non-generic processors. + +Generic processors are only available on code object V6 and above (see :ref:`amdgpu-elf-code-object`). + +Generic processor code objects are versioned (see :ref:`amdgpu-elf-header-e_flags-table-v6-onwards`). +The version number is used by runtimes to determine if a code object can be run on a specific agent. t-tye wrote: This does not really explain how version is used. What about something like: The version of non-generic code objects is always set to 0. For a generic code object, adding a new generic member may require the code generated for the generic target to be changed so it can continue to execute on the previous members as well as on the new member. When this happens the generic code object version number is incremented. Each member of the generic target has a version when it was introduced. A generic code object can execute on a specific member if the version of the code object being loaded is >= the version at which the member was introduced. https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl { bool IsNonTemporal) const override; }; +class SIPreciseMemorySupport { +protected: + const GCNSubtarget + const SIInstrInfo *TII = nullptr; + + IsaVersion IV; + + SIPreciseMemorySupport(const GCNSubtarget ) : ST(ST) { +TII = ST.getInstrInfo(); +IV = getIsaVersion(ST.getCPU()); + } + +public: + static std::unique_ptr create(const GCNSubtarget ); + + virtual bool handleNonAtomic(MachineBasicBlock::iterator ) = 0; + /// Handles atomic instruction \p MI with \p ret indicating whether \p MI + /// returns a result. + virtual bool handleAtomic(MachineBasicBlock::iterator , bool ret) = 0; +}; + +class SIGfx9PreciseMemorySupport : public SIPreciseMemorySupport { +public: + SIGfx9PreciseMemorySupport(const GCNSubtarget ) + : SIPreciseMemorySupport(ST) {} + bool handleNonAtomic(MachineBasicBlock::iterator ) override; + bool handleAtomic(MachineBasicBlock::iterator , bool ret) override; +}; + +class SIGfx10And11PreciseMemorySupport : public SIPreciseMemorySupport { +public: + SIGfx10And11PreciseMemorySupport(const GCNSubtarget ) + : SIPreciseMemorySupport(ST) {} + bool handleNonAtomic(MachineBasicBlock::iterator ) override; + bool handleAtomic(MachineBasicBlock::iterator , bool ret) override; +}; + +std::unique_ptr +SIPreciseMemorySupport::create(const GCNSubtarget ) { + GCNSubtarget::Generation Generation = ST.getGeneration(); + if (Generation < AMDGPUSubtarget::GFX10) t-tye wrote: Is there a reason that this functionality should not be available for any target? It is true it is only particularly useful for targets that have no precise memory operations hardware support, but the basic idea is meaningful for any target. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl { bool IsNonTemporal) const override; }; +class SIPreciseMemorySupport { t-tye wrote: My initial thought had been that this would be part of the existing cache control functions. It seems it is the same kind of waitcnt as needs to be inserted after a store release. That also requires the right waitcnt to be generated according to the kind of memory instruction. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -641,6 +644,9 @@ class SIMemoryLegalizer final : public MachineFunctionPass { bool expandAtomicCmpxchgOrRmw(const SIMemOpInfo , MachineBasicBlock::iterator ); + bool GFX9InsertWaitcntForPreciseMem(MachineFunction ); t-tye wrote: Should these be combined with the expand* functions? They are supposed to do all that is necessary to "legalize" the opcodes to meet the memory model. And this inserting waitcnts is just another piece of that expansion. Combining it can also avoid inserting multiple waitcnt for the same memory operation. Combining it may be able to use the existing operation to ensure a memory operation is completed. I believe that operation should already be determining what kind of waitcnts should be inserted. If not, then I would consider generalizing it so it can be used by both the atomics expansion and the precise memory expansion. It also keeps the operations in this class architecture neutral. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
https://github.com/t-tye edited https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
https://github.com/t-tye requested changes to this pull request. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [flang] [lld] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following === === = = === === == +Generic processors also exist. They group multiple processors into one, +allowing to build code once and run it on multiple targets at the cost +of less features being available. + +Generic processors are only available on Code Object V6 and up. + + .. table:: AMDGPU Generic Processors + :name: amdgpu-generic-processor-table + + == = = + Processor TargetSupported Target + TripleProcessorsFeatures + ArchitectureRestrictions + + + + + + + + + == = = + ``gfx9-generic`` ``amdgcn`` - ``gfx900`` - ``v_mad_mix`` instructions + - ``gfx902``are not available on + - ``gfx904````gfx900``, ``gfx902``, + - ``gfx906````gfx909``, ``gfx90c`` + - ``gfx909`` - ``v_fma_mix`` instructions + - ``gfx90c``are not available on ``gfx904`` + - sramecc is not available on t-tye wrote: So is code being generated for sramecc=any? https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [lld] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following === === = = === === == +Generic processors also exist. They group multiple processors into one, t-tye wrote: What about: Generic processors also exist. Generic processor code objects can be executed on any of the processors that are supported by the generic processor. Such code objects may not perform as well as those for the non-generic processors. Generic processors are only available on code object V6 and above (see [ELF Code Object]). https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [llvm] [lld] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/t-tye requested changes to this pull request. https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [flang] [clang] [lld] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following === === = = === === == +Generic processors also exist. They group multiple processors into one, +allowing to build code once and run it on multiple targets at the cost +of less features being available. + +Generic processors are only available on Code Object V6 and up. + t-tye wrote: Document that generic processes have a version (see :ref:`amdgpu-elf-header-e_flags-table-v6-onwards`) and explain how it is used by the runtime. https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [flang] [llvm] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following === === = = === === == +Generic processors also exist. They group multiple processors into one, +allowing to build code once and run it on multiple targets at the cost +of less features being available. + +Generic processors are only available on Code Object V6 and up. + + .. table:: AMDGPU Generic Processors + :name: amdgpu-generic-processor-table + + == = = + Processor TargetSupported Target + TripleProcessorsFeatures + ArchitectureRestrictions + + + + + + + + + == = = + ``gfx9-generic`` ``amdgcn`` - ``gfx900`` - ``v_mad_mix`` instructions + - ``gfx902``are not available on + - ``gfx904````gfx900``, ``gfx902``, + - ``gfx906````gfx909``, ``gfx90c`` + - ``gfx909`` - ``v_fma_mix`` instructions + - ``gfx90c``are not available on ``gfx904`` + - sramecc is not available on + ``gfx906`` + - The following instructions + are not available on ``gfx906``: + + - ``v_fmac_f32`` + - ``v_xnor_b32`` + - ``v_dot4_i32_i8`` + - ``v_dot8_i32_i4`` + - ``v_dot2_i32_i16`` + - ``v_dot2_u32_u16`` + - ``v_dot4_u32_u8`` + - ``v_dot8_u32_u4`` + - ``v_dot2_f32_f16`` + + + ``gfx10.1-generic`` ``amdgcn`` - ``gfx1010`` - The following instructions are + - ``gfx1011`` not available on ``gfx1011`` + - ``gfx1012`` and ``gfx1012`` + - ``gfx1013`` + - ``v_dot4_i32_i8`` + - ``v_dot8_i32_i4`` + - ``v_dot2_i32_i16`` + - ``v_dot2_u32_u16`` + - ``v_dot2c_f32_f16`` + - ``v_dot4c_i32_i8`` + - ``v_dot4_u32_u8`` + - ``v_dot8_u32_u4`` + - ``v_dot2_f32_f16`` + + - BVH Ray Tracing instructions + are not available on + ``gfx1013`` + + + ``gfx10.3-generic`` ``amdgcn`` - ``gfx1030`` No restrictions. + - ``gfx1031`` + - ``gfx1032`` + - ``gfx1033`` + - ``gfx1034`` + - ``gfx1035`` + - ``gfx1036`` + + + ``gfx11-generic````amdgcn`` - ``gfx1100`` Various codegen pessimizations + - ``gfx1101`` are applied to all targets to + - ``gfx1102`` work around hardware bugs on one t-tye wrote: I do not think we should be stating hardware bugs exist in public documentation. We can simply say less efficient code sequences are generated in various cases. Not sure we should list them. Do we use msaa-load-dst-sel-bug, valu-trans-use-hazard, user-sgpr-init16-bug elsewhere in the code? Not sure we
[lld] [clang] [flang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -4135,6 +4283,33 @@ Code object V5 metadata is the same as == == = +.. _amdgpu-amdhsa-code-object-metadata-v6: + +Code Object V6 Metadata + +.. warning:: + Code object V6 is not the default code object version emitted by this version + of LLVM. + + +Code object V6 metadata is the same as +:ref:`amdgpu-amdhsa-code-object-metadata-v5` with the changes defined in table +:ref:`amdgpu-amdhsa-code-object-metadata-map-table-v6`. + + .. table:: AMDHSA Code Object V6 Metadata Map Changes + :name: amdgpu-amdhsa-code-object-metadata-map-table-v6 + + = == = === + String KeyValue Type Required? Description + = == = === + "amdhsa.version" sequence ofRequired - The first integer is the major t-tye wrote: Since there are no changes to the metadata do we need this? Can make the V5 one be V5 and V6. https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [flang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following === === = = === === == +Generic processors also exist. They group multiple processors into one, +allowing to build code once and run it on multiple targets at the cost +of less features being available. + +Generic processors are only available on Code Object V6 and up. + + .. table:: AMDGPU Generic Processors + :name: amdgpu-generic-processor-table + + == = = + Processor TargetSupported Target t-tye wrote: There needs to be a column for "Target Features Supported" and "Target Properties". The "Target Features Restrictions" should probably be renamed to "Target Restrictions". https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [lld] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -1633,80 +1741,120 @@ The AMDGPU backend uses the following ELF header: ``EF_AMDGPU_FEATURE_SRAMECC_ON_V4`` 0xc00 SRAMECC enabled. = === + .. table:: AMDGPU ELF Header ``e_flags`` for Code Object V6 and After + :name: amdgpu-elf-header-e_flags-table-v6-onwards + + == = + Name Value Description + == = + ``EF_AMDGPU_MACH`` 0x0ff AMDGPU processor selection + mask for + ``EF_AMDGPU_MACH_xxx`` values + defined in + :ref:`amdgpu-ef-amdgpu-mach-table`. + ``EF_AMDGPU_FEATURE_XNACK_V4`` 0x300 XNACK selection mask for + ``EF_AMDGPU_FEATURE_XNACK_*_V4`` + values. + ``EF_AMDGPU_FEATURE_XNACK_UNSUPPORTED_V4`` 0x000 XNACK unsupported. + ``EF_AMDGPU_FEATURE_XNACK_ANY_V4`` 0x100 XNACK can have any value. + ``EF_AMDGPU_FEATURE_XNACK_OFF_V4`` 0x200 XNACK disabled. + ``EF_AMDGPU_FEATURE_XNACK_ON_V4``0x300 XNACK enabled. + ``EF_AMDGPU_FEATURE_SRAMECC_V4`` 0xc00 SRAMECC selection mask for + ``EF_AMDGPU_FEATURE_SRAMECC_*_V4`` + values. + ``EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4`` 0x000 SRAMECC unsupported. + ``EF_AMDGPU_FEATURE_SRAMECC_ANY_V4`` 0x400 SRAMECC can have any value. + ``EF_AMDGPU_FEATURE_SRAMECC_OFF_V4`` 0x800 SRAMECC disabled, + ``EF_AMDGPU_FEATURE_SRAMECC_ON_V4`` 0xc00 SRAMECC enabled. + ``EF_AMDGPU_GENERIC_VERSION_V`` 0x0100 The most significant byte of EFLAGS t-tye wrote: Move the description of the version to the generic processors section and simply reference that here. https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [flang] [clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -1633,80 +1741,120 @@ The AMDGPU backend uses the following ELF header: ``EF_AMDGPU_FEATURE_SRAMECC_ON_V4`` 0xc00 SRAMECC enabled. = === + .. table:: AMDGPU ELF Header ``e_flags`` for Code Object V6 and After + :name: amdgpu-elf-header-e_flags-table-v6-onwards + + == = + Name Value Description + == = + ``EF_AMDGPU_MACH`` 0x0ff AMDGPU processor selection + mask for + ``EF_AMDGPU_MACH_xxx`` values + defined in + :ref:`amdgpu-ef-amdgpu-mach-table`. + ``EF_AMDGPU_FEATURE_XNACK_V4`` 0x300 XNACK selection mask for + ``EF_AMDGPU_FEATURE_XNACK_*_V4`` + values. + ``EF_AMDGPU_FEATURE_XNACK_UNSUPPORTED_V4`` 0x000 XNACK unsupported. + ``EF_AMDGPU_FEATURE_XNACK_ANY_V4`` 0x100 XNACK can have any value. + ``EF_AMDGPU_FEATURE_XNACK_OFF_V4`` 0x200 XNACK disabled. + ``EF_AMDGPU_FEATURE_XNACK_ON_V4``0x300 XNACK enabled. + ``EF_AMDGPU_FEATURE_SRAMECC_V4`` 0xc00 SRAMECC selection mask for + ``EF_AMDGPU_FEATURE_SRAMECC_*_V4`` + values. + ``EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4`` 0x000 SRAMECC unsupported. + ``EF_AMDGPU_FEATURE_SRAMECC_ANY_V4`` 0x400 SRAMECC can have any value. + ``EF_AMDGPU_FEATURE_SRAMECC_OFF_V4`` 0x800 SRAMECC disabled, + ``EF_AMDGPU_FEATURE_SRAMECC_ON_V4`` 0xc00 SRAMECC enabled. + ``EF_AMDGPU_GENERIC_VERSION_V`` 0x0100 The most significant byte of EFLAGS + to contains a "generic code object + 0xff00 version". This is used by runtimes + to determine if a generic code + object can be run on a + machine. + NOTE: This is only set for generic + targets. (e.g., ``gfx9-generic``). + See :ref:`amdgpu-generic-processor-table` + == = + .. table:: AMDGPU ``EF_AMDGPU_MACH`` Values :name: amdgpu-ef-amdgpu-mach-table - == = - Name Value Description (see - :ref:`amdgpu-processor-table`) - == = - ``EF_AMDGPU_MACH_NONE`` 0x000 *not specified* - ``EF_AMDGPU_MACH_R600_R600`` 0x001 ``r600`` - ``EF_AMDGPU_MACH_R600_R630`` 0x002 ``r630`` - ``EF_AMDGPU_MACH_R600_RS880``0x003 ``rs880`` - ``EF_AMDGPU_MACH_R600_RV670``0x004 ``rv670`` - ``EF_AMDGPU_MACH_R600_RV710``0x005 ``rv710`` - ``EF_AMDGPU_MACH_R600_RV730``0x006 ``rv730`` - ``EF_AMDGPU_MACH_R600_RV770``0x007 ``rv770`` - ``EF_AMDGPU_MACH_R600_CEDAR``0x008 ``cedar`` - ``EF_AMDGPU_MACH_R600_CYPRESS`` 0x009 ``cypress`` - ``EF_AMDGPU_MACH_R600_JUNIPER`` 0x00a ``juniper`` - ``EF_AMDGPU_MACH_R600_REDWOOD`` 0x00b ``redwood`` - ``EF_AMDGPU_MACH_R600_SUMO`` 0x00c ``sumo`` - ``EF_AMDGPU_MACH_R600_BARTS``0x00d ``barts`` - ``EF_AMDGPU_MACH_R600_CAICOS`` 0x00e ``caicos`` - ``EF_AMDGPU_MACH_R600_CAYMAN`` 0x00f ``cayman`` - ``EF_AMDGPU_MACH_R600_TURKS``0x010 ``turks`` - *reserved* 0x011 -Reserved for ``r600`` - 0x01f architecture processors. - ``EF_AMDGPU_MACH_AMDGCN_GFX600`` 0x020 ``gfx600`` - ``EF_AMDGPU_MACH_AMDGCN_GFX601``
[llvm] [flang] [lld] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following === === = = === === == +Generic processors also exist. They group multiple processors into one, +allowing to build code once and run it on multiple targets at the cost +of less features being available. + +Generic processors are only available on Code Object V6 and up. + + .. table:: AMDGPU Generic Processors + :name: amdgpu-generic-processor-table + + == = = + Processor TargetSupported Target + TripleProcessorsFeatures + ArchitectureRestrictions + t-tye wrote: Seems a lot of blank lines. https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [flang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/t-tye edited https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
r328347 - [AMDGPU] Remove use of OpenCL triple environment and replace with function attribute for AMDGPU (CLANG)
Author: t-tye Date: Fri Mar 23 11:43:15 2018 New Revision: 328347 URL: http://llvm.org/viewvc/llvm-project?rev=328347=rev Log: [AMDGPU] Remove use of OpenCL triple environment and replace with function attribute for AMDGPU (CLANG) - Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target. - Use a function attribute to communicate to the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D43735 Modified: cfe/trunk/docs/UsersManual.rst cfe/trunk/lib/CodeGen/TargetInfo.cpp cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl Modified: cfe/trunk/docs/UsersManual.rst URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/docs/UsersManual.rst?rev=328347=328346=328347=diff == --- cfe/trunk/docs/UsersManual.rst (original) +++ cfe/trunk/docs/UsersManual.rst Fri Mar 23 11:43:15 2018 @@ -2180,7 +2180,7 @@ to the target, for example: .. code-block:: console $ clang -target nvptx64-unknown-unknown test.cl - $ clang -target amdgcn-amd-amdhsa-opencl test.cl + $ clang -target amdgcn-amd-amdhsa -mcpu=gfx900 test.cl Compiling to bitcode can be done as follows: @@ -2288,7 +2288,7 @@ There is a set of concrete HW architectu .. code-block:: console - $ clang -target amdgcn-amd-amdhsa-opencl test.cl + $ clang -target amdgcn-amd-amdhsa -mcpu=gfx900 test.cl - For Nvidia architectures: Modified: cfe/trunk/lib/CodeGen/TargetInfo.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/TargetInfo.cpp?rev=328347=328346=328347=diff == --- cfe/trunk/lib/CodeGen/TargetInfo.cpp (original) +++ cfe/trunk/lib/CodeGen/TargetInfo.cpp Fri Mar 23 11:43:15 2018 @@ -7661,6 +7661,11 @@ void AMDGPUTargetCodeGenInfo::setTargetA const auto *ReqdWGS = M.getLangOpts().OpenCL ? FD->getAttr() : nullptr; + + if (M.getLangOpts().OpenCL && FD->hasAttr() && + (M.getTriple().getOS() == llvm::Triple::AMDHSA)) +F->addFnAttr("amdgpu-implicitarg-num-bytes", "32"); + const auto *FlatWGS = FD->getAttr(); if (ReqdWGS || FlatWGS) { unsigned Min = FlatWGS ? FlatWGS->getMin() : 0; Modified: cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl?rev=328347=328346=328347=diff == --- cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl (original) +++ cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl Fri Mar 23 11:43:15 2018 @@ -1,4 +1,5 @@ -// RUN: %clang_cc1 -triple amdgcn-- -target-cpu tahiti -O0 -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu tahiti -O0 -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple amdgcn-- -target-cpu tahiti -O0 -emit-llvm -o - %s | FileCheck %s -check-prefix=NONAMDHSA // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -O0 -emit-llvm -verify -o - %s | FileCheck -check-prefix=X86 %s __attribute__((amdgpu_flat_work_group_size(0, 0))) // expected-no-diagnostics @@ -138,12 +139,18 @@ kernel void reqd_work_group_size_32_2_1_ // CHECK: define amdgpu_kernel void @reqd_work_group_size_32_2_1_flat_work_group_size_16_128() [[FLAT_WORK_GROUP_SIZE_16_128:#[0-9]+]] } +void a_function() { +// CHECK: define void @a_function() [[A_FUNCTION:#[0-9]+]] +} + // Make sure this is silently accepted on other targets. // X86-NOT: "amdgpu-flat-work-group-size" // X86-NOT: "amdgpu-waves-per-eu" // X86-NOT: "amdgpu-num-vgpr" // X86-NOT: "amdgpu-num-sgpr" +// X86-NOT: "amdgpu-implicitarg-num-bytes" +// NONAMDHSA-NOT: "amdgpu-implicitarg-num-bytes" // CHECK-NOT: "amdgpu-flat-work-group-size"="0,0" // CHECK-NOT: "amdgpu-waves-per-eu"="0" @@ -151,28 +158,30 @@ kernel void reqd_work_group_size_32_2_1_ // CHECK-NOT: "amdgpu-num-sgpr"="0" // CHECK-NOT: "amdgpu-num-vgpr"="0" -// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" -// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_64_64]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="64,64" -// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_16_128]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="16,128" -// CHECK-DAG: attributes [[WAVES_PER_EU_2]] = { convergent noinline nounwind optnone "amdgpu-waves-per-eu"="2" -// CHECK-DAG: attributes [[WAVES_PER_EU_2_4]] = { convergent noinline nounwind optnone "amdgpu-waves-per-eu"="2,4" -// CHECK-DAG: attributes [[NUM_SGPR_32]] = { convergent noinline nounwind optnone "amdgpu-num-sgpr"="32" -// CHECK-DAG: attributes [[NUM_VGPR_64]] = { convergent noinline nounwind optnone "amdgpu-num-vgpr"="64" - -// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2]] = { convergent noinline nounwind optnone
r328350 - [AMDGPU] Update OpenCL to use 48 bytes of implicit arguments for AMDGPU (CLANG)
Author: t-tye Date: Fri Mar 23 11:51:45 2018 New Revision: 328350 URL: http://llvm.org/viewvc/llvm-project?rev=328350=rev Log: [AMDGPU] Update OpenCL to use 48 bytes of implicit arguments for AMDGPU (CLANG) Add two additional implicit arguments for OpenCL for the AMDGPU target using the AMDHSA runtime to support device enqueue. Differential Revision: https://reviews.llvm.org/D44696 Modified: cfe/trunk/lib/CodeGen/TargetInfo.cpp cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl Modified: cfe/trunk/lib/CodeGen/TargetInfo.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/TargetInfo.cpp?rev=328350=328349=328350=diff == --- cfe/trunk/lib/CodeGen/TargetInfo.cpp (original) +++ cfe/trunk/lib/CodeGen/TargetInfo.cpp Fri Mar 23 11:51:45 2018 @@ -7664,7 +7664,7 @@ void AMDGPUTargetCodeGenInfo::setTargetA if (M.getLangOpts().OpenCL && FD->hasAttr() && (M.getTriple().getOS() == llvm::Triple::AMDHSA)) -F->addFnAttr("amdgpu-implicitarg-num-bytes", "32"); +F->addFnAttr("amdgpu-implicitarg-num-bytes", "48"); const auto *FlatWGS = FD->getAttr(); if (ReqdWGS || FlatWGS) { Modified: cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl?rev=328350=328349=328350=diff == --- cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl (original) +++ cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl Fri Mar 23 11:51:45 2018 @@ -158,30 +158,30 @@ void a_function() { // CHECK-NOT: "amdgpu-num-sgpr"="0" // CHECK-NOT: "amdgpu-num-vgpr"="0" -// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-implicitarg-num-bytes"="32" -// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_64_64]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="64,64" "amdgpu-implicitarg-num-bytes"="32" -// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_16_128]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="16,128" "amdgpu-implicitarg-num-bytes"="32" -// CHECK-DAG: attributes [[WAVES_PER_EU_2]] = { convergent noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="32" "amdgpu-waves-per-eu"="2" -// CHECK-DAG: attributes [[WAVES_PER_EU_2_4]] = { convergent noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="32" "amdgpu-waves-per-eu"="2,4" -// CHECK-DAG: attributes [[NUM_SGPR_32]] = { convergent noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="32" "amdgpu-num-sgpr"="32" -// CHECK-DAG: attributes [[NUM_VGPR_64]] = { convergent noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="32" "amdgpu-num-vgpr"="64" +// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-implicitarg-num-bytes"="48" +// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_64_64]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="64,64" "amdgpu-implicitarg-num-bytes"="48" +// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_16_128]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="16,128" "amdgpu-implicitarg-num-bytes"="48" +// CHECK-DAG: attributes [[WAVES_PER_EU_2]] = { convergent noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2" +// CHECK-DAG: attributes [[WAVES_PER_EU_2_4]] = { convergent noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2,4" +// CHECK-DAG: attributes [[NUM_SGPR_32]] = { convergent noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" +// CHECK-DAG: attributes [[NUM_VGPR_64]] = { convergent noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" -// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-implicitarg-num-bytes"="32" "amdgpu-waves-per-eu"="2" -// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-implicitarg-num-bytes"="32" "amdgpu-waves-per-eu"="2,4" -// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_SGPR_32]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-implicitarg-num-bytes"="32" "amdgpu-num-sgpr"="32" -// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_VGPR_64]] = { convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" "amdgpu-implicitarg-num-bytes"="32" "amdgpu-num-vgpr"="64" -// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_SGPR_32]] = { convergent noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="32" "amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2" -// CHECK-DAG:
[PATCH] D26196: Add support for non-zero null pointers
tony-tye added inline comments. Comment at: lib/CodeGen/CodeGenTypes.cpp:743 +auto NullPtr = CGM.getNullPtr(LLPT, T); +return isa(NullPtr); + } Is this correct if the target does not represent a NULL pointer as the address with value 0? Or should this be asking the target if this null pointer is represented by an address value of 0? https://reviews.llvm.org/D26196 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits