from:"Brian Sumner via Phabricator via cfe\-commits"

[PATCH] D156040: [AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output

2023-07-24 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In D156040#4528606 , @arsenm wrote:

> In D156040#4526036 , 
> @JonChesterfield wrote:
>
>> I don't see how this conveys any information. The compiler writes the stack 
>> size to be allocated. If it doesn't know what is sufficient, it's going to 
>> request some maximum and hope for the best.
>
> That was the old broken workaround for the old bit that was never actually 
> implemented in the runtime. The runtime now does properly respect some field 
> to switch to interpreting the reported size as a minimum and then allocates 
> the max of that minimum and some API provided size value

"runtime" == "hip/opencl runtime".  The openmp runtime needs to do the same if 
not being done already.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156040/new/

https://reviews.llvm.org/D156040

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D154790: [HIP] Use native math functions for `-fcuda-approx-transcendentals`

2023-07-09 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added inline comments.



Comment at: clang/lib/Headers/__clang_hip_math.h:304
+__DEVICE__
+float __tanf(float __x) { return __ocml_tan_f32(__x); }
+// END INTRINSICS

We could consider multiplying native_sin here with the native_recip of 
native_cos.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154790/new/

https://reviews.llvm.org/D154790

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D154123: [HIP] Start document HIP support by clang

2023-06-29 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added inline comments.



Comment at: clang/docs/HIPSupport.rst:104
+ - This macro is defined when the GPU default stream kind is set to 
per-thread.
+

Should we include the __gfxNNN__ or __GFXN__ macros here?  What about wave 
size, and CU mode?  And what about unsafe FP atomics macro?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154123/new/

https://reviews.llvm.org/D154123

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D148796: [AMDGPU][GFX908] Add builtin support for global add atomic f16/f32

2023-04-20 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In D148796#4284504 , @rampitec wrote:

> We used to support it that way and decided just not doing it. It is very hard 
> to explain why a supported atomic results in error. Someone who really needs 
> it can use intrinsic.

I tend to agree.  This oddity is probably best handled with an intrinsic.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148796/new/

https://reviews.llvm.org/D148796

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D146840: [AMDGPU] Replace target feature for global fadd32

2023-03-28 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

No objection here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146840/new/

https://reviews.llvm.org/D146840

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D145343: [AMDGPU] Emit predefined macro `__AMDGCN_CUMODE_OPTION`

2023-03-05 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In D145343#4170305 , @yaxunl wrote:

> In D145343#4170250 , @arsenm wrote:
>
>> I think exposing whether or not the flag was used is weird/broken, as is 
>> including _OPTION in the name. Should just define to whether it's enabled or 
>> not
>
> I agree. @b-sumner What do you think?

I think applications may need to check if CUMode is enabled at compile time and 
select code based on that.  But a concern has been raised about compiling such 
source with an older compiler which is not setting the macro regardless of 
whether -mcumode was used.   The conservative approach here would be to only 
define a macro only if -mcumode is used, and define nothing if it is not used.  
Then, when using an old compiler, the code will assume -mno-cumode which is 
always fine to do.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145343/new/

https://reviews.llvm.org/D145343

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.



> My current understanding is the c-p will go into already forked clang-16, but 
> not to rocm 5.4. So rocm device-libs will be accompanied by the older 
> clang-16 w/o this and stay compatible. Someone building from scratch will use 
> latest clang-16 and staging device-libs with this change. Do you think this 
> will work?

I wouldn't recommend it.  I would patch whatever device libs are being built in 
association with clang-16, not staging.  Staging device libs is only 
appropriate for the staging compiler.  A hash of device libs from around the 
time that clang-16 stable released would probably be safe.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142507/new/

https://reviews.llvm.org/D142507

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In D142507#4127382 , @rampitec wrote:

> In D142507#4127374 , @aaronmondal 
> wrote:
>
>> I think unless conflicts arise creating an issue similar to this 
>> https://github.com/llvm/llvm-project/issues/60600 with the `cherry-pick` 
>> line set to this commit should be enough. (See also 
>> https://llvm.org/docs/GitHub.html).
>
> I believe it will need D142407  to be 
> cherry-picked as well to apply cleanly. Otherwise I do not expect conflicts. 
> So the c-p need to go into release/16.x, right?
> Let's wait for @b-sumner first anyway, he is maintaining device-lib.

I have no objection to backporting this, but it may need to be accompanied with 
a device-libs patch, and I don't know where that patch would be checked in.  
The ROCm-Device-Libs in github certainly doesn't have a "clang-16" branch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142507/new/

https://reviews.llvm.org/D142507

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D138507: HIP: Directly use sqrt builtins instead of calling ocml (f32 case)

2022-11-22 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

__builtin_sqrtf does not produce a correctly rounded result.  I don't recommend 
this change.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138507/new/

https://reviews.llvm.org/D138507

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D136981: [HIP] add float to fp16 convert functions

2022-10-28 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Thank you!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136981/new/

https://reviews.llvm.org/D136981

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-11 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

> Different functions providing different behaviors can be handled at link time 
> like any other function, instead of the same functions providing different 
> behaviors per translation unit and requires cloning. The current scheme 
> transfers complexity from the device library build system into the driver and 
> user binaries

OK, but we are talking about trading a solved problem with a solution working 
for years for adding a large amount of new work and new maintenance and new 
bugs.  Does this need to be done now, or at all?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130096: [Clang][AMDGPU] Emit AMDGPU library control constants in clang

2022-10-11 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In D130096#3850473 , @arsenm wrote:

> In D130096#3850472 , @jhuber6 wrote:
>
>> I don't like the fact that we need to have two different kinds of control 
>> constants, one per-TU and others per-link job. I'm wondering how difficult 
>> it would be to make the fast versions of the math calls use different entry 
>> points. That way we could handle this in the math header wrappers.
>
> That's really how the C linkage model wants you to handle this. I also would 
> like to have FP value tracking optimizations take care of the special cases 
> in the library code

There's the "small matter" of implementing the new device library functions.  
Why is all that more likeable than two kinds of control constants?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130096/new/

https://reviews.llvm.org/D130096

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D135614: [OpenMP][CUDA][AMDGPU] Accept case insensitive subarchitecture names

2022-10-11 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Also, we may want to use uppercase for other purposes in the future.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135614/new/

https://reviews.llvm.org/D135614

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D135614: [OpenMP][CUDA][AMDGPU] Accept case insensitive subarchitecture names

2022-10-11 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

I don't particularly see a need for this.  I am not opposed to a "did you mean" 
in the error diagnostic.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135614/new/

https://reviews.llvm.org/D135614

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D134355: [AMDGPU] Emit module flag for all code object versions

2022-09-22 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In D134355#3809294 , @yaxunl wrote:

> In D134355#3807435 , @cfang wrote:
>
>> LGTM
>>
>> Should the module flag name be amdgpu_code_object_version or 
>> amdhsa_code_object_version?
>
> Good question.
>
> @b-sumner Does code object version affects PAL? Thanks.

It should, we use the same code objects on PAL and ROCm.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134355/new/

https://reviews.llvm.org/D134355

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D132140: [AMDGPU] Add builtin s_sendmsg_rtn

2022-08-19 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In D132140#3732337 , @yaxunl wrote:

> revised by Brian's comments

Thank you.  Looks good to me.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132140/new/

https://reviews.llvm.org/D132140

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D132140: [AMDGPU] Add builtin s_sendmsg_rtn_b{32|64}

2022-08-18 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Following existing naming, it might make sense to rename "rtn_b32" --> "rtn" 
and "rtn_b64" --> "rtnl".


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132140/new/

https://reviews.llvm.org/D132140

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D128022: [HIP] add -fhip-kernel-arg-name

2022-06-23 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added inline comments.



Comment at: clang/lib/CodeGen/CodeGenModule.cpp:1845-1846
+  }
+  if (getCodeGenOpts().EmitOpenCLArgMetadata ||
+  getCodeGenOpts().HIPSaveKernelArgName)
 Fn->setMetadata("kernel_arg_name",

yaxunl wrote:
> tra wrote:
> > yaxunl wrote:
> > > yaxunl wrote:
> > > > tra wrote:
> > > > > tra wrote:
> > > > > > Should we consolidate both options into `-fkernel-arg-info` and 
> > > > > > make `-cl-kernel-arg-info` an alias to it?
> > > > > Also, this check is odd. For some reason only *arg name*  metadata is 
> > > > > set conditionally, but for whatever reason OpenCL sets other arg 
> > > > > metadata unconditionally.
> > > > > 
> > > > > Now I'm really curious what's so special about "kernel_arg_name" vs 
> > > > > the other arg metadata.
> > > > > Should we consolidate both options into `-fkernel-arg-info` and make 
> > > > > `-cl-kernel-arg-info` an alias to it?
> > > > 
> > > > -cl-kernel-arg-info is an OpenCL option defined in OpenCL spec, 
> > > > therefore is made OpenCL only option. It would be confusing to allow it 
> > > > with other languages.
> > > > Also, this check is odd. For some reason only *arg name*  metadata is 
> > > > set conditionally, but for whatever reason OpenCL sets other arg 
> > > > metadata unconditionally.
> > > > 
> > > > Now I'm really curious what's so special about "kernel_arg_name" vs the 
> > > > other arg metadata.
> > > 
> > > The other metadata are mandatory because they are necessary for OpenCL 
> > > runtime to set kernel argument.
> > > 
> > > The kernel argument name is emitted only with -cl-kernel-arg-info since 
> > > it is only used to support clGetKernelArgInfo which requires 
> > > -cl-kernel-arg-info to work.
> > >  It would be confusing to allow it with other languages.
> > 
> > On the other hand having two options that control exactly the same 
> > functionality also looks odd to me.
> > 
> > The way I see it is that an entity may have more than one name. OpenCL 
> > standard requires that that particular functionality must be enabled via 
> > `-cl-kernel-arg-info` and I'm not proposing to change that. The OpenCL 
> > standard has no say whether that flag may have a different name in addition 
> > to the standard-required one.
> > 
> > This is similar to how over time we've been transitioning what used to be 
> > CUDA-only options into their generic `-gpu` and `-offload` variants, only 
> > in this case OpenCL functionality becomes useful outside of OpenCL.
> > 
> > >  It would be confusing to allow it with other languages.
> > 
> > On the other hand having two options that control exactly the same 
> > functionality also looks odd to me.
> > 
> > The way I see it is that an entity may have more than one name. OpenCL 
> > standard requires that that particular functionality must be enabled via 
> > `-cl-kernel-arg-info` and I'm not proposing to change that. The OpenCL 
> > standard has no say whether that flag may have a different name in addition 
> > to the standard-required one.
> > 
> > This is similar to how over time we've been transitioning what used to be 
> > CUDA-only options into their generic `-gpu` and `-offload` variants, only 
> > in this case OpenCL functionality becomes useful outside of OpenCL.
> > 
> 
> If we introduce a generic option -fkernel-arg-info and make 
> -cl-kernel-arg-info alias to it, do we want to make it available to all 
> languages or limit it to HIP and OpenCL only?
> > >  It would be confusing to allow it with other languages.
> > 
> > On the other hand having two options that control exactly the same 
> > functionality also looks odd to me.
> > 
> > The way I see it is that an entity may have more than one name. OpenCL 
> > standard requires that that particular functionality must be enabled via 
> > `-cl-kernel-arg-info` and I'm not proposing to change that. The OpenCL 
> > standard has no say whether that flag may have a different name in addition 
> > to the standard-required one.
> > 
> > This is similar to how over time we've been transitioning what used to be 
> > CUDA-only options into their generic `-gpu` and `-offload` variants, only 
> > in this case OpenCL functionality becomes useful outside of OpenCL.
> > 
> 
> If we introduce a generic option -fkernel-arg-info and make 
> -cl-kernel-arg-info alias to it, do we want to make it available to all 
> languages or limit it to HIP and OpenCL only?

I think that would be fine. 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128022/new/

https://reviews.llvm.org/D128022

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124537: [AMDGPU][clang] Definition of gfx11 subtarget

2022-04-27 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

t-tye should review this too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124537/new/

https://reviews.llvm.org/D124537

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-02 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Can we agree to drop this in LLVM 15 and note that in a comment or elsewhere?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114957/new/

https://reviews.llvm.org/D114957

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-02 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In D114957#3166936 , @foad wrote:

> In D114957#3166858 , @yaxunl wrote:
>
>> In D114957#3166817 , @foad wrote:
>>
>>> This is a flag-day change to the signatures of the LLVM intrinsics and the 
>>> OpenCL builtins. Is that OK?
>>
>> This breaks users' code. If we have to do this, at least let clang emit a 
>> pre-defined macro e.g. `__amdgcn_bvh_use_vec3__`=1 so that users can make 
>> their code work before and after the change.
>
> I don't know anything about OpenCL macros. Is it good enough to put this in 
> `AMDGPUTargetInfo::getTargetDefines`:
>
>   if (Opts.OpenCL)
> Builder.defineMacro("__amdgcn_bvh_use_vec3__");
>
> Does it need tests, documentation, etc?

But how long would that be carried?  And then deprecated?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114957/new/

https://reviews.llvm.org/D114957

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-02 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In D114957#3166882 , @yaxunl wrote:

> In D114957#3166861 , @arsenm wrote:
>
>> In D114957#3166858 , @yaxunl wrote:
>>
>>> In D114957#3166817 , @foad wrote:
>>>
 This is a flag-day change to the signatures of the LLVM intrinsics and the 
 OpenCL builtins. Is that OK?
>>>
>>> This breaks users' code. If we have to do this, at least let clang emit a 
>>> pre-defined macro e.g. `__amdgcn_bvh_use_vec3__`=1 so that users can make 
>>> their code work before and after the change.
>>
>> I do not think it's worth introducing a macro for this. Are there actually C 
>> users of these builtins?
>
> Yes we have users who use these clang builtins. We have received quite a few 
> complaints about making breaking API changes without a way to detect them in 
> the program.

But builtins are not part of the documented API and we have advised developers 
using them that they are subject to change.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114957/new/

https://reviews.llvm.org/D114957

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90809: [amdgpu] Add `llvm.amdgcn.endpgm` support.

2020-11-05 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Should this also be IntrConvergent?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90809/new/

https://reviews.llvm.org/D90809

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension

2020-05-02 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In D77910#2015225 , @arsenm wrote:

> In D77910#1989163 , @b-sumner wrote:
>
> > In D77910#1988807 , @arsenm wrote:
> >
> > > In D77910#1981828 , @b-sumner 
> > > wrote:
> > >
> > > > In D77910#1981429 , @arsenm 
> > > > wrote:
> > > >
> > > > > In D77910#1976171 , 
> > > > > @b-sumner wrote:
> > > > >
> > > > > > I don't think we can guarantee this is or will be supported on all 
> > > > > > devices.  The language runtime makes this decision.
> > > > >
> > > > >
> > > > > We don't need to worry about theoretical devices. We should know the 
> > > > > properties of the driver from -amdhsa, -amdpal, -mesa3d
> > > >
> > > >
> > > > It takes more than support in the ISA for some features.  The OpenCL 
> > > > driver may not want to support a given optional feature, e.g. images.  
> > > > I'm not opposed to defaults, but if the driver chooses to not support 
> > > > images, it needs to be able to prevent `__IMAGE_SUPPORT__` from being 
> > > > defined.  Conformance will fail if the runtime and compiler are not 
> > > > consistent.
> > >
> > >
> > > The driver details should be captured by the the triple. If some weird 
> > > driver decided to do something different, we would need to add a new 
> > > triple for it. We don't have such a driver, so I don't see why worry 
> > > about it. It's possible to work around with undef and redef in an 
> > > implicitly included header. We need to fix properties of the driver based 
> > > on the target to have perfectly matching offline compilation
> >
> >
> > I don't see anywhere in the triple talking about driver specific details, 
> > unless you would use the environment?  That seems like overkill to me.  But 
> > again, I'm not opposed to defaults, and as long as the driver can override 
> > them, this should be OK.
>
>
> The OS is the driver. It doesn't need to specifically encode these details; 
> the OS should encode properties of the driver environment. Anything using 
> -amdhsa should be reporting image support


But that's not how things work now or are likely to work in the future.  The 
language runtime is what decides if it is going to report the availability of 
image support.  For offline compilation, I suppose it is OK to assume images 
are supported, but for online compilation, the language runtime needs to be 
able to reflect its own decision.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77910/new/

https://reviews.llvm.org/D77910



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-22 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added inline comments.



Comment at: clang/test/CodeGenCXX/builtin-amdgcn-fence-failure.cpp:5
+
+void test_amdgcn_fence_failure() {
+

JonChesterfield wrote:
> arsenm wrote:
> > Does this really depend on C++? Can it use OpenCL like the other builtin 
> > tests?This also belongs in a Sema* test directory since it's checking an 
> > error
> Making it opencl-only would force some of the openmp runtime to be written in 
> opencl, which is not presently the case. Currently that library is written in 
> a dialect of hip, but there's a plan to implement it in openmp instead.
> 
> I'd much rather this builtin work from any language, instead of tying it to 
> opencl, as that means one can use it from openmp target regions.
I thought the question was about this test itself.  The test being in 
CodeGenOpenCL doesn't affect whether other languages can use the builtin.  Why 
not put this test in CodeGenOpenCL alongside all of the other 
builtins-amdgcn-*.cl ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75917/new/

https://reviews.llvm.org/D75917



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension

2020-04-17 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In D77910#1988807 , @arsenm wrote:

> In D77910#1981828 , @b-sumner wrote:
>
> > In D77910#1981429 , @arsenm wrote:
> >
> > > In D77910#1976171 , @b-sumner 
> > > wrote:
> > >
> > > > I don't think we can guarantee this is or will be supported on all 
> > > > devices.  The language runtime makes this decision.
> > >
> > >
> > > We don't need to worry about theoretical devices. We should know the 
> > > properties of the driver from -amdhsa, -amdpal, -mesa3d
> >
> >
> > It takes more than support in the ISA for some features.  The OpenCL driver 
> > may not want to support a given optional feature, e.g. images.  I'm not 
> > opposed to defaults, but if the driver chooses to not support images, it 
> > needs to be able to prevent `__IMAGE_SUPPORT__` from being defined.  
> > Conformance will fail if the runtime and compiler are not consistent.
>
>
> The driver details should be captured by the the triple. If some weird driver 
> decided to do something different, we would need to add a new triple for it. 
> We don't have such a driver, so I don't see why worry about it. It's possible 
> to work around with undef and redef in an implicitly included header. We need 
> to fix properties of the driver based on the target to have perfectly 
> matching offline compilation


I don't see anywhere in the triple talking about driver specific details, 
unless you would use the environment?  That seems like overkill to me.  But 
again, I'm not opposed to defaults, and as long as the driver can override 
them, this should be OK.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77910/new/

https://reviews.llvm.org/D77910



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension

2020-04-14 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In D77910#1981429 , @arsenm wrote:

> In D77910#1976171 , @b-sumner wrote:
>
> > I don't think we can guarantee this is or will be supported on all devices. 
> >  The language runtime makes this decision.
>
>
> We don't need to worry about theoretical devices. We should know the 
> properties of the driver from -amdhsa, -amdpal, -mesa3d

It takes more than support in the ISA for some features.  The OpenCL driver may 
not want to support a given optional feature, e.g. images.  I'm not opposed to 
defaults, but if the driver chooses to not support images, it needs to be able 
to prevent `__IMAGE_SUPPORT__` from being defined.  Conformance will fail if 
the runtime and compiler are not consistent.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77910/new/

https://reviews.llvm.org/D77910

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D77923: OpenCL: Fix some missing predefined macros

2020-04-13 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In my opinion, for on-line compile for OpenCL, the platform is responsible for 
setting `__OPENCL_VERSION__`.  Also, it should be the platform's choice as to 
how to respond the image support query and how `__IMAGE_SUPPORT__` is set.  For 
offline compile, it doesn't seem unreasonable to ask the developer to set these.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77923/new/

https://reviews.llvm.org/D77923



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D77910: AMDGPU: Define cl_khr_gl_sharing as a supported extension

2020-04-11 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

I don't think we can guarantee this is or will be supported on all devices.  
The language runtime makes this decision.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77910/new/

https://reviews.llvm.org/D77910



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-08 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In addition to predefining `__ATOMIC_RELAXED`, etc., clang also predefines 
`__OPENCL_MEMORY_SCOPE_WORK_ITEM` and friends.  So it doesn't really seem 
unreasonable for clang to also predefine its known syncscopes, and to require 
the argument to be one of those integers.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75917/new/

https://reviews.llvm.org/D75917



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D77390: Fix __builtin_amdgcn_workgroup_size_x/y/z return type

2020-04-03 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77390/new/

https://reviews.llvm.org/D77390



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-02 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Please go ahead and update to a string for the scope.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75917/new/

https://reviews.llvm.org/D75917



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D76795: [HIP] Change default --gpu-max-threads-per-block value to 1024

2020-03-25 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Thanks.  This looks fine to me.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76795/new/

https://reviews.llvm.org/D76795



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D76795: [HIP] Change default --gpu-max-threads-per-block value to 1024

2020-03-25 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:8123
+// --gpu-max-threads-per-block=n or its default value for HIP.
+const unsigned OpenCLMaxWorkGroupSize = 256;
+const unsigned MaxWorkGroupSize =

I'd like to see the word default, e.g. OpenCLDefaultMaxWorkGroupSize, used more 
since that is what this is about.  Ideally the option would have been named 
gpu-default-max-threads-per-block, but I suppose I can see why it was shortened.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76795/new/

https://reviews.llvm.org/D76795



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D74910: [OpenCL] Remove spurious atomic_fetch_min/max builtins

2020-02-20 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

I recall we agreed that conformance tests using mixed types were broken, so 
this change should be OK.  Hopefully this will not affect users.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74910/new/

https://reviews.llvm.org/D74910



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D74807: Add cl_khr_mipmap_image_writes as supported to AMDGPU

2020-02-19 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner accepted this revision.
b-sumner added a comment.
This revision is now accepted and ready to land.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74807/new/

https://reviews.llvm.org/D74807



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D71365: expand printf when compiling HIP to AMDGPU

2020-01-07 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Should this be looking forward to also handling OpenCL, which does require 
vector support?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71365/new/

https://reviews.llvm.org/D71365



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D66198: AMDGPU: Add builtins for is_local/is_private

2019-08-14 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Looks fine to me.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66198/new/

https://reviews.llvm.org/D66198



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D66198: AMDGPU: Add builtins for is_local/is_private

2019-08-14 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Looks fine to me.  Thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66198/new/

https://reviews.llvm.org/D66198



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62739: AMDGPU: Always emit amdgpu-flat-work-group-size

2019-06-14 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In D62739#1543438 , @arsenm wrote:

> In D62739#1536428 , @b-sumner wrote:
>
> > We need to communicate with anyone generating IR to ensure this is being 
> > generated before we change the default.  clang is only one of those 
> > generators.  This change will also need to be documented in the usage 
> > document.
>
>
> The planned change is to make the backend more conservative, so it shouldn't 
> break other frontends


It may not break other frontends, but could cause substantial performance 
regressions.  At a minimum the summary should clearly mention this possibility.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62739/new/

https://reviews.llvm.org/D62739



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62739: AMDGPU: Always emit amdgpu-flat-work-group-size

2019-06-10 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

We need to communicate with anyone generating IR to ensure this is being 
generated before we change the default.  clang is only one of those generators. 
 This change will also need to be documented in the usage document.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62739/new/

https://reviews.llvm.org/D62739



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62739: AMDGPU: Always emit amdgpu-flat-work-group-size

2019-05-31 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7885
+// By default, restrict the maximum size to 256.
+F->addFnAttr("amdgpu-flat-work-group-size", "128,256");
   }

Theoretically, shouldn't the minimum be 1?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62739/new/

https://reviews.llvm.org/D62739



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D61112: AMDGPU: Enable _Float16

2019-04-25 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Looks good to me.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61112/new/

https://reviews.llvm.org/D61112



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D59494: AMDGPU: Add support for cross address space synchronization scopes (clang)

2019-03-18 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7976
+
+Name = Twine(Twine(Name) + Twine("one-as")).str();
+  }

rampitec wrote:
> b-sumner wrote:
> > kzhuravl wrote:
> > > rampitec wrote:
> > > > I think subgroup is in the single address space even if sequentially 
> > > > consistent.
> > > I have synced with @t-tye, and it seems like it might not be. @b-sumner, 
> > > do you know what opencl spec states? Thanks.
> > As I understand the spec, memory order seq_cst must be consistent with both 
> > local- and global-happens-before, so I would say even subgroup is not in 
> > the single address space for OpenCL seq_cst.
> OK. Is it always one-as if not sequentially consistent? I thought we are 
> about to change sequentially consistent case, and not everything else except 
> it.
Right.  In OpenCL, only seq_cst can tie together address spaces.  But other 
languages without explicit address spaces will want them tied for other memory 
orders


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59494/new/

https://reviews.llvm.org/D59494



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D59494: AMDGPU: Add support for cross address space synchronization scopes (clang)

2019-03-18 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.






Comment at: lib/CodeGen/TargetInfo.cpp:7976
+
+Name = Twine(Twine(Name) + Twine("one-as")).str();
+  }

kzhuravl wrote:
> rampitec wrote:
> > I think subgroup is in the single address space even if sequentially 
> > consistent.
> I have synced with @t-tye, and it seems like it might not be. @b-sumner, do 
> you know what opencl spec states? Thanks.
As I understand the spec, memory order seq_cst must be consistent with both 
local- and global-happens-before, so I would say even subgroup is not in the 
single address space for OpenCL seq_cst.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59494/new/

https://reviews.llvm.org/D59494



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D57831: AMDGPU: set wchar_t and wint_t to be unsigned short on windows

2019-02-06 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Maybe there are already other types like this, but it saddens me that an 
offline compiled code object could potentially not work properly if the 
application is using any of these types.  Or should the runtime try to detect a 
problem using argument metadata?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57831/new/

https://reviews.llvm.org/D57831



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D52320: AMDGPU: add __builtin_amdgcn_update_dpp

2018-10-16 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Ping.  There's quite a bit of interest in getting this exposed by clang.


https://reviews.llvm.org/D52320



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D52320: AMDGPU: add __builtin_amdgcn_update_dpp

2018-09-28 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added inline comments.



Comment at: lib/CodeGen/CGBuiltin.cpp:11313-11315
+  case AMDGPU::BI__builtin_amdgcn_update_dpp: {
+llvm::SmallVector Args;
+for (unsigned I = 0; I != 6; ++I)

arsenm wrote:
> The only difference between this and mov_dpp is the argument count and the 
> intrinsic ID, so you can combine the cases
We should really drop mov_dpp.  It will be easier to do so if we keep the cases 
separate. 


https://reviews.llvm.org/D52320



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D50376: AMDGPU: Fix enabling denormals by default on pre-VI targets

2018-08-07 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

This approach seems fine to me.


https://reviews.llvm.org/D50376



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D48667: [HIP] Fix ordering of device-lib linking

2018-06-27 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Thanks, looks good.


Repository:
  rC Clang

https://reviews.llvm.org/D48667



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D48493: [HIP] Support flush denorms bitcode

2018-06-22 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

LGTM


Repository:
  rC Clang

https://reviews.llvm.org/D48493



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D46601: [OpenCL] Fix typos in emitted enqueue kernel function names

2018-05-08 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Thanks!  Looks good to me.


https://reviews.llvm.org/D46601



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D39739: [HCC] Add flag to Import Weak Functions in Function Importer

2018-03-22 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In https://reviews.llvm.org/D39739#1045611, @ashi1 wrote:

> Is first one encountered a poor design?


Strong or first weak is the standard behavior for ISA level linkers.


Repository:
  rL LLVM

https://reviews.llvm.org/D39739



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D43911: [AMDGPU] Clean up old address space mapping and fix constant address space value

2018-03-02 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Looks fine to me.


https://reviews.llvm.org/D43911



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D43340: Clean up AMDGCN tests

2018-02-15 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner accepted this revision.
b-sumner added a comment.

Looks good to me.


https://reviews.llvm.org/D43340



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D43281: [AMDGPU] fixes for lds f32 builtins

2018-02-14 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added inline comments.



Comment at: lib/CodeGen/CGBuiltin.cpp:9866
+  case AMDGPU::BI__builtin_amdgcn_ds_fmax: {
+llvm::SmallVector Args;
+for (unsigned I = 0; I != 5; ++I)

Can the pointer argument address space be checked here?


Repository:
  rC Clang

https://reviews.llvm.org/D43281



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D42578: [AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functions

2018-01-29 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Should we expect that the last 3 arguments have any effect?  Do we want to test 
to ensure they have the expected effects?


Repository:
  rC Clang

https://reviews.llvm.org/D42578



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D42578: [AMDGPU] Add ds_fadd builtin function

2018-01-26 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

Were you going to add min and max separately?


Repository:
  rC Clang

https://reviews.llvm.org/D42578



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D39739: [HCC] Add flag to Import Weak Functions in Function Importer

2017-12-05 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

The usual rule is to take the first weak definition encountered.


Repository:
  rL LLVM

https://reviews.llvm.org/D39739



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D37568: [AMDGPU] Allow flexible register names in inline asm constraints

2017-09-28 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner accepted this revision.
b-sumner added a comment.
This revision is now accepted and ready to land.

LGTM.  I think we can leave immediates to another patch.


https://reviews.llvm.org/D37568



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D37822: [OpenCL] Clean up and add missing fields for block struct

2017-09-21 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In https://reviews.llvm.org/D37822#877903, @Anastasia wrote:

> In https://reviews.llvm.org/D37822#877572, @yaxunl wrote:
>
> > In https://reviews.llvm.org/D37822#873876, @Anastasia wrote:
> >
> > > In https://reviews.llvm.org/D37822#872446, @yaxunl wrote:
> > >
> > > > In https://reviews.llvm.org/D37822#872291, @Anastasia wrote:
> > > >
> > > > > Could you please explain a bit more why the alignment have to be put 
> > > > > explicitly in the struct? I am just not very convinced this is 
> > > > > general enough.
> > > >
> > > >
> > > > The captured variables are fields of the block literal struct. Due to 
> > > > alignment requirement of these fields, there is alignment requirement of
> > > >  the block literal struct. The ISA of the block invoke function is 
> > > > generated with the assumption of these alignments. If the block literal 
> > > > is
> > > >  allocated at a memory address not satisfying the alignment 
> > > > requirement, the kernel behavior is undefined.
> > > >
> > > > Generally, __enqueue_kernel library function needs to prepare the 
> > > > kernel argument before launching the kernel. It usually does this by 
> > > > copying
> > > >  the block literal to some buffer then pass the address of the buffer 
> > > > to the kernel. Then the address of the buffer has to satisfy the 
> > > > alignment
> > > >  requirement.
> > > >
> > > > If this block literal struct is not general enough, how about add 
> > > > another field as target reserved size, and leave the remaining space of 
> > > > header for
> > > >  target specific use. And add a target hook to allow target fill the 
> > > > reserved space, e.g.
> > > >
> > > >   struct __opencl_block_literal {
> > > > int total_size;
> > > > int align;
> > > > __generic void *invoke;
> > > > int target_reserved_size; /* round up to 4 bytes */
> > > > int target_reserved[];
> > > > /* captures */
> > > >   };
> > > >
> > >
> > >
> > > I like the idea of the target reserved part actually. But not sure how it 
> > > could be used without adding any target specific methods?
> >
> >
> > If we decide to add target reserved fields, I can add target hooks to fill 
> > these fields. However I would suggest to leave this for future since I 
> > don't see there is need for other fields for now.
>
>
> I could imagine it can be usefull for some vendor implementations.
>
> >> However, I am still not clear why the alignment of this struct has to be 
> >> different from any other struct Clang produces. Normally the alignment of 
> >> objects have to be known during IR generation to put them correctly in the 
> >> attributes of generated alloca, store and loads. But as a field inside 
> >> struct I don't know how it can be useful. I would imagine `enqueue_kernel` 
> >> would just operate on the block as if it would be an arbitrary buffer of 
> >> data. Also would size of the struct not account for any padding to make 
> >> sure the alignment can be deduced based on it correctly?
> > 
> > enqueue_kernel needs to pass the block struct to the kernel. Let's assume 
> > it does this by copying the block struct to a buffer. If enqueue_kernel 
> > does not know the alignment of the struct, it can only put it at an 
> > arbitrary address in the buffer. Then the kernel has to copy the struct to 
> > an aligned private memory and load the fields. However, if the 
> > enqueued_kernel knows the alignment of the struct, it can put it at an 
> > address satisfying the alignment. Then the kernel can load the fields 
> > directly from the buffer, skips the step of copying to an aligned private 
> > memory. Therefore, alignment of the block struct is usually a useful 
> > information for enqueue_kernel. I think that's why in the SPIRV spec 
> > OpEnqueueKernel requires an alignment operand for the block context.
>
> Ok, I just think in C if you use `malloc` to obtain a pointer to some memory 
> location it doesn't take any alignment information. Then you can use the 
> pointer to copy any data including the struct into the location its pointed 
> to. And the pointer can be used later on correctly. I think the alignment is 
> deduced in this case from the type or the size of an object. Do you know 
> where the alignment information is used for SPIRV call? Also how is the block 
> represented in SPIRV?


Actually malloc alignment is not sufficient more many uses such as CPU 
supported vectors, e.g. AVX512 or passed to create buffer with 
use-host-pointer.  In such cases you need posix_memalign or some similar API.  
Having the alignment means it is available if needed.  If an implementation 
doesn't need it, there is no harm is there?


https://reviews.llvm.org/D37822



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D37703: [AMDGPU] Change addr space of clk_event_t, queue_t and reserve_id_t to global

2017-09-13 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner accepted this revision.
b-sumner added a comment.
This revision is now accepted and ready to land.

Looks good to me.


https://reviews.llvm.org/D37703



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D37568: [AMDGPU] Allow flexible register names in inline asm constraints

2017-09-07 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

The assembler accepts v[N] in addition to vN.  I'm not sure if that is needed 
here.


https://reviews.llvm.org/D37568



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36802: AMDGPU: Cleanup most of the macros

2017-08-28 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added inline comments.



Comment at: lib/Basic/Targets/AMDGPU.cpp:362
+Builder.defineMacro(Twine("__") + Twine(GPUName));
+Builder.defineMacro(Twine("__") + Twine(GPUName) + Twine("__"));
+  }

At the meeting we discussed defining every alias of the given GPU, rather than 
only the mcpu name.   Were we going to proceed with that?


https://reviews.llvm.org/D36802



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D32896: [OpenCL] Make CLK_NULL_RESERVE_ID invalid reserve id.

2017-08-09 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added inline comments.



Comment at: lib/Headers/opencl-c.h:16020
+// The macro CLK_NULL_RESERVE_ID refers to an invalid reservation ID.
+#define CLK_NULL_RESERVE_ID (__builtin_astype((void *)0, reserve_id_t))
 bool __ovld is_valid_reserve_id(reserve_id_t reserve_id);

yaxunl wrote:
> bader wrote:
> > yaxunl wrote:
> > > bader wrote:
> > > > yaxunl wrote:
> > > > > bader wrote:
> > > > > > yaxunl wrote:
> > > > > > > yaxunl wrote:
> > > > > > > > bader wrote:
> > > > > > > > > yaxunl wrote:
> > > > > > > > > > Anastasia wrote:
> > > > > > > > > > > echuraev wrote:
> > > > > > > > > > > > yaxunl wrote:
> > > > > > > > > > > > > Anastasia wrote:
> > > > > > > > > > > > > > yaxunl wrote:
> > > > > > > > > > > > > > > Anastasia wrote:
> > > > > > > > > > > > > > > > Looks good from my side.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > @yaxunl , since you originally committed this. 
> > > > > > > > > > > > > > > > Could you please verify that changing from 
> > > > > > > > > > > > > > > > `SIZE_MAX` to `0` would be fine.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Btw, we have a similar definition for 
> > > > > > > > > > > > > > > > `CLK_NULL_EVENT`.
> > > > > > > > > > > > > > > `__PIPE_RESERVE_ID_VALID_BIT` is implementation 
> > > > > > > > > > > > > > > detail and not part of the spec. I would suggest 
> > > > > > > > > > > > > > > to remove it from this header file.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > The spec only requires CLK_NULL_RESERVE_ID to be 
> > > > > > > > > > > > > > > defined but does not define its value. Naturally 
> > > > > > > > > > > > > > > a valid id starts from 0 and increases. I don't 
> > > > > > > > > > > > > > > see significant advantage to change 
> > > > > > > > > > > > > > > CLK_NULL_RESERVE_ID from __SIZE_MAX to 0.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Is there any reason that this change is needed?
> > > > > > > > > > > > > > I don't see issues to commit things outside of spec 
> > > > > > > > > > > > > > as soon as they prefixed properly with "__".  But I 
> > > > > > > > > > > > > > agree it would be nice to see if it's any useful 
> > > > > > > > > > > > > > and what the motivation is for having different 
> > > > > > > > > > > > > > implementation.
> > > > > > > > > > > > > For `__PIPE_RESERVE_ID_VALID_BIT`, it assumes that 
> > > > > > > > > > > > > the implementation uses one specific bit of a reserve 
> > > > > > > > > > > > > id to indicate that the reserve id is valid. Not all 
> > > > > > > > > > > > > implementations assume that. Actually I am curious 
> > > > > > > > > > > > > why that is needed too.
> > > > > > > > > > > > About `CLK_NULL_RESERVE_ID`: we check that reserve id 
> > > > > > > > > > > > is valid if significant bit equal to one. 
> > > > > > > > > > > > `CLK_NULL_RESERVE_ID refers to an invalid reservation, 
> > > > > > > > > > > > so if `CLK_NULL_RESERVE_ID equal to 0, we can be sure 
> > > > > > > > > > > > that significant bit doesn't equal to 1 and it is 
> > > > > > > > > > > > invalid reserve id. Also it is more obviously if 
> > > > > > > > > > > > CLK_**NULL**_RESERVE_ID is equal to 0.
> > > > > > > > > > > > 
> > > > > > > > > > > > What about `__PIPE_RESERVE_ID_VALID_BIT`: As I 
> > > > > > > > > > > > understand previous implementation also assumes that 
> > > > > > > > > > > > one specific bit was of a reverse id was used to 
> > > > > > > > > > > > indicate that the reserve id is valid. So, we just 
> > > > > > > > > > > > increased reserve id size by one bit on 32-bit 
> > > > > > > > > > > > platforms and by 33 bits on 64-bit platforms. 
> > > > > > > > > > > It is more logical to me that `CLK_NULL_RESERVE_ID` is 0, 
> > > > > > > > > > > but spec doesn't define it of course.
> > > > > > > > > > In our implementation, valid reserve id starts at 0 and 
> > > > > > > > > > increasing linearly until `__SIZE_MAX-1`. This change will 
> > > > > > > > > > break our implementation.
> > > > > > > > > > 
> > > > > > > > > > However, we can modify our implementation to adopt this 
> > > > > > > > > > change since it brings about benefits overall.
> > > > > > > > > Ideally it would be great to have unified implementation, but 
> > > > > > > > > we can define device specific value for CLK_NULL_RESERVE_ID 
> > > > > > > > > by using ifdef directive.
> > > > > > > > How about
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > __attribute__((const)) size_t __clk_null_reserve_id();
> > > > > > > > #define CLK_NULL_RESERVE_ID __clk_null_reserve_id()
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > I think the spec does not require it to be compile time 
> > > > > > > > constant. Then each library can implement its own 
> > > > > > > > __clk_null_reserve_id() whereas the IR is target independent.
> > > > > > > Or we only do this for SPIR and define it as target specific 
> > > > > > > value for other targets.
> > > > >

[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-08 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7571
+
+  // XXX: Should this be i64 instead, and should the limit increase?
+  llvm::Type *I32Ty = llvm::Type::getInt32Ty(getVMContext());

arsenm wrote:
> b-sumner wrote:
> > What we do here depends on NumRegsLeft when the block is entered and 
> > NumRegs.  If NumRegsLeft >= NumRegs then we just need 2 adjacent registers. 
> >  If NumRegsLeft == 1 and NumRegs == 2, then do we pass the low half in a 
> > register and the upper half in memory, or all of it in memory?  Anyway, I 
> > think NumRegsLeft shouldn't be updated until we know it's OK, and then we 
> > don't need the min().
> It's all one or the other. Whether it's passed in memory or not is really 
> determined in codegen based on the actual register limit (which is also 
> higher than the 16 used here, at least for now). Here selects whether to use 
> byval or not. The ABI is slightly different whether it's passed as byval or 
> as too many registers. I'm not sure it ever really makes sense to use byval 
> yet, so I wasn't trying to be very precise here.
Thanks.  Just one more question.  If we use memory for an argument, are all 
following arguments required to use memory?  In that case, the min() is 
correct.  But if a following argument could use a register, then the amount to 
subtract is NumRegs <= NumRegsLeft ? NumRegs : 0.


https://reviews.llvm.org/D36171



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-08 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7571
+
+  // XXX: Should this be i64 instead, and should the limit increase?
+  llvm::Type *I32Ty = llvm::Type::getInt32Ty(getVMContext());

What we do here depends on NumRegsLeft when the block is entered and NumRegs.  
If NumRegsLeft >= NumRegs then we just need 2 adjacent registers.  If 
NumRegsLeft == 1 and NumRegs == 2, then do we pass the low half in a register 
and the upper half in memory, or all of it in memory?  Anyway, I think 
NumRegsLeft shouldn't be updated until we know it's OK, and then we don't need 
the min().


https://reviews.llvm.org/D36171



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes

2017-08-08 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In https://reviews.llvm.org/D36327#834032, @Anastasia wrote:

> In https://reviews.llvm.org/D36327#833891, @yaxunl wrote:
>
> > In https://reviews.llvm.org/D36327#833653, @bader wrote:
> >
> > > Hi Sam,
> > >
> > > What do you think about implementing this optimization in target specific 
> > > optimization pass? Since size/alignment is saved as function parameter in 
> > > LLVM IR, the optimization can be done in target specific components w/o 
> > > adding additional conditions to generic library.
> > >
> > > Thanks,
> > > Alexey
> >
> >
> > Hi Alexey,
> >
> > The optimization of the power-of-2 type size is implemented as a library 
> > function. Our backend lacks the capability to link in library code at ISA 
> > level, so linking of the optimized library function has to be done before 
> > any target-specific passes. It seems the only place to do this is Clang 
> > codegen since Clang/llvm does not support target-specific pre-linking 
> > passes.
>
>
> My general feeling is that it doesn't look like a generic enough change for 
> the frontend. Even though it is implemented in a generic way, not every 
> target might have a special support for the power of 2 size and also if there 
> is such a support not every implementation would handle it as a library 
> function. But I can see that perhaps LLVM is missing flexibility in the flow 
> to accommodate these needs. Any change we could try to extend the compilation 
> flow such that this target specific optimization could happen before the IR 
> linking?


It is trivial to implement the small number of specialized functions this patch 
adds in terms of the general one if desired, and the general one can continue 
to be handled as it had been.

We had actually proposed a patch (sorry I don't have the reference handy) to 
add general mechanism for targets to introduce pre-link passes, but it was not 
accepted.  We can try again, but I don't really expect more progress.


https://reviews.llvm.org/D36327



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-07 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7555
+  if (NumRegsLeft > 0)
+NumRegsLeft -= (Size + 31) / 32;
+

Won't NumRegsLeft wrap if size==64 and NumRegsLeft == 1 potentially causing an 
assert later?


https://reviews.llvm.org/D36171



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D28691: Add OpenCL 2.0 atomic builtin functions as Clang builtin

2017-07-25 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In https://reviews.llvm.org/D28691#820595, @rjmccall wrote:

> In https://reviews.llvm.org/D28691#820541, @b-sumner wrote:
>
> > There are other languages for heterogeneous compute that have scopes, 
> > although not exposed quite as explicitly as OpenCL.  For example AMD's "HC" 
> > language.  And any language making use of clang and targeting SPIR-V would 
> > likely use these builtins.  I think a more generic prefix is appropriate, 
> > and "scoped" tells us exactly when these are needed.
>
>
> But would those languages use the same language design for these scopes as 
> OpenCL if they did expose them, as opposed to some more elaborate scoping 
> specification?  My objection is not that the concept is inherently 
> OpenCL-specific, it's that the presentation in the language might be 
> inherently OpenCL-specific, which makes staying in the opencl namespace is 
> prudent.


Are you envisioning a language far enough from C/C++ that a standard library or 
header would not be able to map a scoped atomic operation into a call to one of 
these new builtins?  Would we expect more of such languages than languages that 
would do such a mapping?


https://reviews.llvm.org/D28691



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D28691: Add OpenCL 2.0 atomic builtin functions as Clang builtin

2017-07-25 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

In https://reviews.llvm.org/D28691#820526, @rjmccall wrote:

> In https://reviews.llvm.org/D28691#820489, @yaxunl wrote:
>
> > In https://reviews.llvm.org/D28691#820466, @b-sumner wrote:
> >
> > > Can we drop the "opencl" part of the name and use something like 
> > > __scoped_atomic_*?   Also, it may not make sense to support non-constant 
> > > scope here since we can't predict what other scopes may be added by other 
> > > languages in the future.
> >
> >
> > we could use the approach of LangAS, i.e. we allow targets to map all 
> > language specific scopes to target-specific scope names, since IR only 
> > cares about scope names, which are target specific. And this is what the 
> > current implementation does.
> >
> > I have no objection to use the __scoped_atomic_ name. It is more general 
> > and extensible. John/Anastasia, any comments? Thanks.
>
>
> I think I would prefer __opencl_atomic_* until we have some evidence that 
> this concept is more general than just OpenCL.


There are other languages for heterogeneous compute that have scopes, although 
not exposed quite as explicitly as OpenCL.  For example AMD's "HC" language.  
And any language making use of clang and targeting SPIR-V would likely use 
these builtins.  I think a more generic prefix is appropriate, and "scoped" 
tells us exactly when these are needed.


https://reviews.llvm.org/D28691



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D28691: Add OpenCL 2.0 atomic builtin functions as Clang builtin

2017-07-25 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added inline comments.



Comment at: include/clang/Basic/Builtins.def:713
+ATOMIC_BUILTIN(__opencl_atomic_fetch_or, "v.", "t")
+ATOMIC_BUILTIN(__opencl_atomic_fetch_xor, "v.", "t")
+

yaxunl wrote:
> Anastasia wrote:
> > What about min/max? I believe they will need to have the scope too. 
> They are not 2.0 atomic builtin functions. They can be implemented as library 
> functions through 2.0 atomic builtin functions.
Yes, they are.  Please look again at 6.13.11.7.5 in the 2.0 C spec.


https://reviews.llvm.org/D28691



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D30551: [AMDGPU] Add builtin functions readlane ds_permute mov_dpp

2017-03-02 Thread Brian Sumner via Phabricator via cfe-commits

b-sumner added a comment.

mov_dpp should be under the VI+ comment


https://reviews.llvm.org/D30551



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

73 matches

Mail list logo