[PATCH] D92617: [DWARF] Allow toolchain to adjust specified DWARF version.

2020-12-07 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 310061. tra added a comment. Addressed comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D92617/new/ https://reviews.llvm.org/D92617 Files: clang/include/clang/Basic/DiagnosticDriverKinds.td

[PATCH] D92617: [DWARF] Allow toolchain to adjust specified DWARF version.

2020-12-07 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 310019. tra added a comment. Updated to address the comments. PTAL. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D92617/new/ https://reviews.llvm.org/D92617 Files:

[PATCH] D92720: [HIP] unbundle bundled preprocessor output

2020-12-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. `-E` by default prints preprocessed output to stdout. CUDA will print preprocessed output from all subcompilations. What does HIP do in this case? Printing out the bundle is probably not what the user will expect. IMO preprocessed output is frequently used as a debugging

[PATCH] D92617: [DWARF] Allow toolchain to adjust specified DWARF version.

2020-12-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/cuda-unsupported-debug-options.cu:22 +// Make sure we do not see any dwarf version other than 2, regardless of what's used on the host side. +// CHECK-NOT: {{-dwarf-version=[^2]}} // CHECK: "-triple" "x86_64

[PATCH] D91807: [CUDA] Unbreak CUDA compilation with -std=c++20

2020-12-04 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG43267929423b: [CUDA] Another attempt to fix early inclusion of new from libstdc++ (authored by tra). Repository: rG LLVM Github Monorepo CHANGES

[PATCH] D91807: [CUDA] Unbreak CUDA compilation with -std=c++20

2020-12-04 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 309598. tra added a comment. Removed forgotten pop_macro(__DEVICE__) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D91807/new/ https://reviews.llvm.org/D91807 Files: clang/lib/Headers/cuda_wrappers/new Index:

[PATCH] D91807: [CUDA] Unbreak CUDA compilation with -std=c++20

2020-12-04 Thread Artem Belevich via Phabricator via cfe-commits
tra requested review of this revision. tra added a comment. @jlebar PTAL. The previous fix did not quite fix the problem. `__device__ operator new/delete` was still not defined, but the issue was not reported due to deferred diags if the functions were not used. Repository: rG LLVM Github

[PATCH] D91807: [CUDA] Unbreak CUDA compilation with -std=c++20

2020-12-04 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 309595. tra added a comment. Better fix for the early inclusion of Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D91807/new/ https://reviews.llvm.org/D91807 Files: clang/lib/Headers/cuda_wrappers/new Index:

[PATCH] D91807: [CUDA] Unbreak CUDA compilation with -std=c++20

2020-12-04 Thread Artem Belevich via Phabricator via cfe-commits
tra reopened this revision. tra added a comment. This revision is now accepted and ready to land. Reopening the tracker for a better fix. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D91807/new/ https://reviews.llvm.org/D91807

[PATCH] D92617: [DWARF] Allow toolchain to adjust specified DWARF version.

2020-12-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:3957 + RenderDebugEnablingArgs(Args, CmdArgs, DebugInfoKind, + std::min(DWARFVersion, TC.getMaxDwarfVersion()), DebuggerTuning);

[PATCH] D92617: [DWARF] Allow toolchain to adjust specified DWARF version.

2020-12-04 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 309585. tra added a comment. Simplified dwarf version clamping. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D92617/new/ https://reviews.llvm.org/D92617 Files: clang/include/clang/Driver/ToolChain.h

[PATCH] D92617: [DWARF] Allow toolchain to adjust specified DWARF version.

2020-12-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/cuda-unsupported-debug-options.cu:22 +// Make sure we do not see any dwarf version other than 2, regardless of what's used on the host side. +// CHECK-NOT: {{-dwarf-version=[^2]}} // CHECK: "-triple" "x86_64

[PATCH] D92617: [DWARF] Allow toolchain to adjust specified DWARF version.

2020-12-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:3957 + RenderDebugEnablingArgs(Args, CmdArgs, DebugInfoKind, + std::min(DWARFVersion, TC.getMaxDwarfVersion()), DebuggerTuning);

[PATCH] D92617: [DWARF] Allow toolchain to adjust specified DWARF version.

2020-12-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:3857 + auto AdjustedDwarfVersion = TC.GetAdjustedDwarfVersion(DWARFVersion); + dblaikie wrote: > I worry a bit about having both DWARFVersion and AdjustedDwarfVersion in this >

[PATCH] D92617: [DWARF] Allow toolchain to adjust specified DWARF version.

2020-12-03 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 309404. tra edited the summary of this revision. tra added a comment. Updated according to Devid's feedback. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D92617/new/ https://reviews.llvm.org/D92617 Files:

[PATCH] D92617: [DWARF] Allow toolchain to adjust specified DWARF version.

2020-12-03 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: dblaikie, ABataev. Herald added a subscriber: bixia. tra requested review of this revision. Herald added a project: clang. This is needed for CUDA compilation where NVPTX back-end only supports DWARF2, but host compilation should be allowed to use

[PATCH] D92363: [HIP] Warn no --offload-arch option

2020-12-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D92363#2431812 , @yaxunl wrote: > > There are still valid use cases for not providing GPU arch. For example, > users would like to test syntax of a HIP program for which GPU arch does not > matter, or users would like to get

[PATCH] D80450: [CUDA][HIP] Fix HD function resolution

2020-12-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D80450#2428631 , @yaxunl wrote: >> Also, naming. `-ffix-overload-resolution` is rather non-specific. I didn't >> mean to use it literally. The problem is that I can't think of a good >> descriptive name for what we do here.

[PATCH] D80450: [CUDA][HIP] Fix HD function resolution

2020-12-01 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. LGTM. I'd suggest adding more details on the background of this change to the commit log (point to the comment in the `isBetterOverloadCandidate` ?) and outline the intention to enable the new way to do overloading after some soak time. Also,

[PATCH] D91088: [CUDA][HIP] Fix capturing reference to host variable

2020-12-01 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Sema/SemaExpr.cpp:1957 +// host variable in a device or host device lambda. +static bool isCapturingReferenceToHostVarInCUDADeviceLambda(Sema , +

[PATCH] D92363: [HIP] Warn no --offload-arch option

2020-12-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. While I agree that the default GPU choice is not likely to be correct, or usable, for everyone, but the warning seems to be a half-measure. If the default is not usable, then it should not be the default. If it's usable, then we don't need a warning. Having a warning would

[PATCH] D91088: [CUDA][HIP] Fix capturing reference to host variable

2020-11-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM in general. Comment at: clang/lib/Sema/SemaExpr.cpp:1944 + // host variable in a device or host device lambda. + auto IsCapturingReferenceToHostVarInCUDADeviceLambda = [&](VarDecl *VD) { +if (!getLangOpts().CUDA || !VD->hasInit())

[PATCH] D80450: [CUDA][HIP] Fix HD function resolution

2020-11-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/SemaCUDA/function-overload.cu:616 +// HDs have type mismatch whereas H has type match. +// In device compilation, H wins when -fgpu-defer-diag is off and two HD win +// when -fgpu-defer-diags is on. In both cases the diagnostic

[PATCH] D80450: [CUDA][HIP] Fix HD function resolution

2020-11-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/SemaCUDA/function-overload.cu:616 +// HDs have type mismatch whereas H has type match. +// In device compilation, H wins when -fgpu-defer-diag is off and two HD win +// when -fgpu-defer-diags is on. In both cases the diagnostic

[PATCH] D80450: [CUDA][HIP] Fix HD function resolution

2020-11-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D80450#2418775 , @yaxunl wrote: > For the call `ag(e)`. There are two candidates: > > 1. `ag` in namespace `b`. The function arguments can match. However it is a > host function, therefore is a wrong-sided candidate and not

[PATCH] D91807: [CUDA] Unbreak CUDA compilation with -std=c++20

2020-11-19 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG9a465057a64d: [CUDA] Unbreak CUDA compilation with -std=c++20 (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D91807: [CUDA] Unbreak CUDA compilation with -std=c++20

2020-11-19 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added subscribers: bixia, yaxunl. Herald added a project: clang. tra requested review of this revision. Standard libc++ headers in stdc++ mode include which picks up cuda_wrappers/new before any of the CUDA macros have been defined.

[PATCH] D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters.

2020-11-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a reviewer: jlebar. tra added a comment. In D91590#2398842 , @hliao wrote: > This's an experimental or demo-only patch in my spare time on eliminating > private memory usage in https://godbolt.org/z/EPPn6h. The attachment > F14026286:

[PATCH] D91546: [AMDGPU] Add option -munsafe-fp-atomics

2020-11-16 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Patch description could use a pointer to more details about the unsafe atomics. Comment at: clang/include/clang/Basic/TargetOptions.h:78 + /// \brief If enabled, allow

[PATCH] D91121: [InferAddrSpace] Teach to handle assumed address space.

2020-11-16 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:533 + if (!LD) +return -1; + Is there a suitable constant for `don't know` result? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D91421: Fix temporary file name on Windows

2020-11-13 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM overall, nit about the character choice. Comment at: clang/lib/Driver/Driver.cpp:4645 + // therefore replace it with '%'. + std::replace(BoundArch.begin(), BoundArch.end(),

[PATCH] D91281: [CUDA][HIP] Diagnose reference of host variable

2020-11-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D91281/new/ https://reviews.llvm.org/D91281 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90174: [HIP] Fix regressions due to fp contract change

2020-11-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D90174#2387518 , @scanon wrote: > Strictly speaking, fp-contract=fast probably should have been a separate flag > entirely (since there's no _expression_ being contracted in fast). > Unfortunately, that ship has sailed, and it

[PATCH] D91281: [CUDA][HIP] Diagnose reference of host variable

2020-11-11 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Comment at: clang/lib/Sema/SemaExpr.cpp:357 + if (LangOpts.CUDAIsDevice) { +auto *FD = dyn_cast_or_null(CurContext); This could use a comment why we

[PATCH] D91088: [CUDA][HIP] Fix capturing reference to host variable

2020-11-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D91088#2388496 , @yaxunl wrote: > added diagnosing referencing host variable in device functions Thank you for fixing this. The new changes look good, but I think they should be in a separate patch. Comment

[PATCH] D91088: [CUDA][HIP] Fix capturing reference to host variable

2020-11-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Ouch. This is definitely a bug. https://godbolt.org/z/faPWcv Granted, the compilation will eventually fail in most cases due to ptxas failing to resolve the reference to the variable we do not generate on the device, but it's something we should catch in Sema. Filed

[PATCH] D91088: [CUDA][HIP] Fix capturing reference to host variable

2020-11-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: rsmith. tra added a comment. The use case in the tests makes sense, but I'd like to have a second opinion on whether it may have unintended consequences. Lambdas tend to bring interesting corner cases. @rsmith Richard, do you see any issues with this approach?

[PATCH] D90409: [HIP] Math Headers to use type promotion

2020-11-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D90409#2372042 , @yaxunl wrote: >> Practically the behavior is the same since they all promote integer types to >> double. This matches the C++ behavior. However the HIP change will make it >> conform to C++ for a target

[PATCH] D90409: [HIP] Math Headers to use type promotion

2020-11-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D90409#2371969 , @yaxunl wrote: > nvcc does not support fma(float,float,char) It does, it just needs an explicit flag to match clang's treatment of `constexpr` functions as HD. Repository: rG LLVM Github Monorepo CHANGES

[PATCH] D90409: [HIP] Math Headers to use type promotion

2020-11-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D90409#2371679 , @jlebar wrote: >> LGTM. I think the change would make sense for CUDA, too. @jlebar - WDYT? > > I agree that the C and C++ standard libraries should behave the same in CUDA > mode and host mode! > > But if doing so

[PATCH] D90436: [Bundler] Use argv[0] as the default choice for the Executable name.

2020-11-03 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGcdbf6bfdc7d1: [HIP] Use argv[0] as the default choice for the Executable name. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES

[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-11-03 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGbe86b6773b6b: [CUDA] Allow local static variables with target attributes. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE

[PATCH] D89980: [hip] Remove kernel argument coercion.

2020-11-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D89980#2371580 , @hliao wrote: > is that reported in bugs.llvm.org? It was exposed in our internal code, but the situation is almost identical to your IR example. https://godbolt.org/z/EPPn6h For NVPTX we lower byval arguments as

[PATCH] D89980: [hip] Remove kernel argument coercion.

2020-11-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: jlebar. tra added a comment. @jlebar -- FYI. This looks pretty similar to the issue you've reported recently for NVPTX. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D89980/new/ https://reviews.llvm.org/D89980

[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-11-02 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 302419. tra added a comment. Remove the assert which is no longer valid. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D88345/new/ https://reviews.llvm.org/D88345 Files:

[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-11-02 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGf38a9e51178a: [CUDA] Allow local static variables with target attributes. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE

[PATCH] D88668: [CUDA] Add support for 11.1

2020-11-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Herald added a subscriber: dexonsmith. This patch is also missing corresponding LLVM changes that define features for sm_86 and ptx71. See https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/NVPTX/NVPTX.td Comment at:

[PATCH] D90174: [HIP] Fix regressions due to fp contract change

2020-11-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM, but I'll defer to @rjmccall for the approval. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D90174/new/ https://reviews.llvm.org/D90174 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[PATCH] D90311: [CUDA][HIP] Fix linkage for -fgpu-rdc

2020-11-02 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D90311/new/ https://reviews.llvm.org/D90311 ___ cfe-commits mailing list

[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-11-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:8163 "%select{__device__|__global__|__host__|__host__ __device__}0 functions">; -def err_cuda_nonglobal_constant : Error<"__constant__ variables must be global">; +def

[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-11-02 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 302394. tra added a comment. Added few test cases for allowed initializers on device-side static vars. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D88345/new/ https://reviews.llvm.org/D88345 Files:

[PATCH] D90436: [Bundler] Use argv[0] as the default choice for the Executable name.

2020-11-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. ping? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D90436/new/ https://reviews.llvm.org/D90436 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[PATCH] D90436: [HIP] Use argv[0] as the default choice for the Executable name.

2020-10-29 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: sfantao, yaxunl, sdmitriev. Herald added a subscriber: bixia. Herald added a project: clang. tra requested review of this revision. The path produced by `getMainExecutable()` may not be the right one when the files are installed in a symlinked tree

[PATCH] D89971: [OpenMP][CUDA] Add missing overload for `remquo(float,float,int*)`

2020-10-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I'll take a look at what broke the original patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D89971/new/ https://reviews.llvm.org/D89971 ___ cfe-commits mailing list

[PATCH] D90174: [HIP] Fix regressions due to fp contract change

2020-10-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/BackendUtil.cpp:486 + if (LangOpts.HIP) +Options.AllowFPOpFusion = llvm::FPOpFusion::Standard; + yaxunl wrote: > rjmccall wrote: > > tra wrote: > > > yaxunl wrote: > > > > tra wrote: > > > > > yaxunl

[PATCH] D89971: [OpenMP][CUDA] Add missing overload for `remquo(float,float,int*)`

2020-10-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Never mind. I see that you've fixed it in 17c8251bca83072d2f3e00f936d6ce24500e6b02 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D89971/new/

[PATCH] D89971: [OpenMP][CUDA] Add missing overload for `remquo(float,float,int*)`

2020-10-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Looks like it breaks CUDA: http://lab.llvm.org:8011/#/builders/46/builds/304 [10/873] Building CXX object External/CUDA/CMakeFiles/printf-cuda-11.0-c++14-libstdc++-6.dir/printf.cu.o FAILED: External/CUDA/CMakeFiles/printf-cuda-11.0-c++14-libstdc++-6.dir/printf.cu.o

[PATCH] D90174: [HIP] Fix regressions due to fp contract change

2020-10-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: scanon. tra added inline comments. Comment at: clang/lib/CodeGen/BackendUtil.cpp:486 + if (LangOpts.HIP) +Options.AllowFPOpFusion = llvm::FPOpFusion::Standard; + yaxunl wrote: > tra wrote: > > yaxunl wrote: > > > tra wrote: > > > > I

[PATCH] D90174: [HIP] Fix regressions due to fp contract change

2020-10-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/BackendUtil.cpp:486 + if (LangOpts.HIP) +Options.AllowFPOpFusion = llvm::FPOpFusion::Standard; + yaxunl wrote: > tra wrote: > > I don't think it's a good idea to force this. > > > > Perhaps a better

[PATCH] D90174: [HIP] Fix regressions due to fp contract change

2020-10-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/BackendUtil.cpp:486 + if (LangOpts.HIP) +Options.AllowFPOpFusion = llvm::FPOpFusion::Standard; + I don't think it's a good idea to force this. Perhaps a better way to address this would be to set

[PATCH] D89832: [CUDA] Extract CUDA version from cuda.h if version.txt is not found

2020-10-26 Thread Artem Belevich via Phabricator via cfe-commits
tra closed this revision. tra added a comment. Landed in rGe7fe125b776b Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D89832/new/ https://reviews.llvm.org/D89832

[PATCH] D89752: [CUDA] Improve clang's ability to detect recent CUDA versions.

2020-10-23 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rG65d206484c54: [CUDA] Improve clangs ability to detect recent CUDA versions. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D89752/new/

[PATCH] D89832: [CUDA] Extract CUDA version from cuda.h if version.txt is not found

2020-10-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D89832#2349915 , @emankov wrote: > I confirm that D89752 eliminates 47332 > on Windows. Tested against the > following CUDA versions: 7.0, 7,5, 9,2,

[PATCH] D89980: [hip] Remove kernel argument coercion.

2020-10-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Are there any tests to illustrate what this change does to IR or generated code? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D89980/new/ https://reviews.llvm.org/D89980 ___

[PATCH] D89974: [driver][CUDA] Use CMake's FindCUDA as default --cuda-path.

2020-10-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D89974#2348310 , @rnk wrote: > Lastly, if those things don't work, there is the `CCC_OVERRIDE_OPTIONS` > environment variable, so the user always has that escape hatch. A separate > environment variable would also be fine. It

[PATCH] D89974: [driver][CUDA] Use CMake's FindCUDA as default --cuda-path.

2020-10-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D89974#2348181 , @Hahnfeld wrote: > Don't we already have this via `PATH`? At least that was my motivation back > then and it worked without problems. We do and it should indeed work for the tests. Setting an environment variable

[PATCH] D89971: [OpenMP][CUDA] Add missing overload for `remquo(float,float,int*)`

2020-10-22 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. Comment at: clang/lib/Headers/__clang_cuda_cmath.h:330 double>::type remquo(__T1 __x, __T2 __y, int *__quo) { return std::remquo((double)__x, (double)__y, __quo); jdoerfert wrote: > The template

[PATCH] D89974: [driver][CUDA] Use CMake's FindCUDA as default --cuda-path.

2020-10-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > Shortcomings in the testrunner don't seem like a reason to introduce new > build-time configured default search paths into the driver. +1 The root of the problem is that we rely on CUDA SDK as the external dependency and we have no way to reliably predict where the user

[PATCH] D89974: [driver][CUDA] Use CMake's FindCUDA as default --cuda-path.

2020-10-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D89974#2347979 , @Hahnfeld wrote: > In D89974#2347938 , @tra wrote: > >> One concern I have is that the path we configure during clang's build is not >> necessarily the right choice for the

[PATCH] D89974: [driver][CUDA] Use CMake's FindCUDA as default --cuda-path.

2020-10-22 Thread Artem Belevich via Phabricator via cfe-commits
tra requested changes to this revision. tra added a comment. This revision now requires changes to proceed. One concern I have is that the path we configure during clang's build is not necessarily the right choice for the user of clang we build. It's likely that the clang in the end will be

[PATCH] D89971: [OpenMP][CUDA] Add missing overload for `remquo(float,float,int*)`

2020-10-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > The template overload function overload. > is still hidden behind an ifdef for OpenMP. In the > future we probably want to remove the ifdef but that requires further > testing. I don't think it's the case. I've just ran `clang++ -x cuda /dev/null --cuda-host-only -dD

[PATCH] D89832: [CUDA] Extract CUDA version from cuda.h if version.txt is not found

2020-10-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:76-77 +return CudaVersion::CUDA_102; + if (raw_version < 11010) +return CudaVersion::CUDA_110; + return CudaVersion::LATEST; emankov wrote: > Please, add

[PATCH] D89752: [CUDA] Improve clang's ability to detect recent CUDA versions.

2020-10-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:161 + if (FS.exists(LibDevicePath + "/libdevice.10.bc")) { +Version = CudaVersion::LATEST; +DetectedVersionIsNotSupported = Version > CudaVersion::LATEST_SUPPORTED;

[PATCH] D89752: [CUDA] Improve clang's ability to detect recent CUDA versions.

2020-10-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D89752#2344113 , @emankov wrote: > Thank you! > I've successfully applied your change in `Cuda.cpp` to the following clang > versions: `10.0.0`, `10.0.1`, `11.0.0`, and `12.0.0git`; and have created the > working patches (#206 >

[PATCH] D89832: [CUDA] Extract CUDA version from cuda.h if version.txt is not found

2020-10-20 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 299501. tra edited the summary of this revision. tra added a comment. Added more tests. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D89832/new/ https://reviews.llvm.org/D89832 Files:

[PATCH] D89832: [CUDA] Extract CUDA version from cuda.h if version.txt is not found

2020-10-20 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: jlebar, yaxunl, emankov. Herald added subscribers: dexonsmith, bixia. Herald added a project: clang. tra requested review of this revision. This is a follow-up to D89752 , If CUDA version can not be determined based

[PATCH] D89752: [CUDA] Improve clang's ability to detect recent CUDA versions.

2020-10-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/Driver/cuda-version-check.cu:13 // RUN:FileCheck %s --check-prefix=UNKNOWN_VERSION +// CUDA-11.1 does not carry version.txt file. Make sure we still detect it as a +// new version and handle it the same as we handle other

[PATCH] D89752: [CUDA] Improve clang's ability to detect recent CUDA versions.

2020-10-20 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 299410. tra marked 2 inline comments as done. tra added a comment. addresses review comments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D89752/new/ https://reviews.llvm.org/D89752 Files:

[PATCH] D89752: [CUDA] Improve clang's ability to detect recent CUDA versions.

2020-10-20 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:161 + if (FS.exists(LibDevicePath + "/libdevice.10.bc")) { +Version = CudaVersion::LATEST; +DetectedVersionIsNotSupported = Version > CudaVersion::LATEST_SUPPORTED;

[PATCH] D89752: [CUDA] Improve clang's ability to detect recent CUDA versions.

2020-10-19 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 299223. tra edited the summary of this revision. tra added a comment. Added a test. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D89752/new/ https://reviews.llvm.org/D89752 Files:

[PATCH] D89752: [CUDA] Improve clang's ability to detect recent CUDA versions.

2020-10-19 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: yaxunl, emankov, hans. Herald added a subscriber: bixia. Herald added a project: clang. tra requested review of this revision. CUDA-11.1 does not carry version.txt which causes clang to assume that it's CUDA-7.0, which used to be the only CUDA

[PATCH] D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic

2020-10-14 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/PartialDiagnostic.h:51 + : DiagID(DiagID) { +Allocator = _; + } Is there a particular reason to move field initialization into the body here and in other constructors? CHANGES SINCE

[PATCH] D89184: Support complex target features combinations

2020-10-13 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. In D89184#2328144 , @craig.topper wrote: > This is needed for D89105 and was split > from it. D89105

[PATCH] D89184: Support complex target features combinations

2020-10-13 Thread Artem Belevich via Phabricator via cfe-commits
tra requested changes to this revision. tra added a comment. This revision now requires changes to proceed. Sorry, I didn't mean to stamp the change as LGTM overall yet. Can't find a way to un-LGTM, so marking it as `Request Changes` for now. CHANGES SINCE LAST ACTION

[PATCH] D89184: Support complex target features combinations

2020-10-13 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a subscriber: echristo. tra added a comment. This revision is now accepted and ready to land. @echristo : FYI, just in case you have an opinion on bringing required feature constraint checks one step closer to being Turing-complete. :-) LGTM as far as

[PATCH] D88786: [CUDA] Don't call __cudaRegisterVariable on C++17 inline variables

2020-10-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:4137 +// TODO: Reject __device__ constexpr and __device__ inline in Sema. +if (!D->hasExternalStorage() && !D->isInline()) getCUDARuntime().registerDeviceVar(D, *GV,

[PATCH] D88786: [CUDA] Don't call __cudaRegisterVariable on C++17 inline variables

2020-10-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D88786#2312365 , @MaskRay wrote: >> Could you provide an example where this is causing an issue? > > If the C++17 inline variable appears in two TUs. They have the same comdat > group. The first comdat group is prevailing and the

[PATCH] D88786: [CUDA] Don't call __cudaRegisterVariable on C++17 inline variables

2020-10-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > If such a variable (which has a comdat group) is discarded (a copy from > another > translation unit is prevailing and selected), accessing the variable from > outside the section group (__cuda_register_globals) is a violation of the ELF > specification and will be

[PATCH] D88730: [HIP] Fix default output file for -E

2020-10-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM. Comment at: clang/test/Driver/hip-output-file-name.hip:13-15 +// RUN: %clang -### -E -target x86_64-linux-gnu \ +// RUN: --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \ +// RUN: 2>&1 | FileCheck -check-prefixes=DASH %s yaxunl

[PATCH] D88730: [HIP] Fix default output file for -E

2020-10-02 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/test/Driver/hip-output-file-name.hip:13-15 +// RUN: %clang -### -E -target x86_64-linux-gnu \ +// RUN: --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \

[PATCH] D88734: [HIP] Align device binary

2020-10-02 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp:374 +auto BI = BundlesInfo[CurWriteBundleTarget]; +OS.seek(BI.Offset);

[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-10-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/SemaCUDA/device-var-init.cu:404 __host__ __device__ void hd_sema() { static int x = 42; } yaxunl wrote: > tra wrote: > > yaxunl wrote: > > > how does this work in device compilation? Is this equivalent to

[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-10-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I've verified that clang with this patch can compile Tensorflow and that it can also compile `cooperative_groups.h` from CUDA-11. Comment at: clang/test/SemaCUDA/device-var-init.cu:404 __host__ __device__ void hd_sema() { static int x = 42; }

[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-10-01 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 295636. tra added a comment. Herald added a reviewer: jdoerfert. Fixed a test. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D88345/new/ https://reviews.llvm.org/D88345 Files:

[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-10-01 Thread Artem Belevich via Phabricator via cfe-commits
tra updated this revision to Diff 295635. tra retitled this revision from "[CUDA] Allow local `static {__constant__, __device__}` variables." to "[CUDA] Allow local `static const {__constant__, __device__}` variables.". tra edited the summary of this revision. tra added a comment. Herald added a

[PATCH] D88524: [CUDA][HIP] Fix bound arch for offload action for fat binary

2020-10-01 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Basic/Cuda.cpp:95 const char *CudaArchToString(CudaArch A) { + if (A == CudaArch::UNUSED) +return ""; You could add a

[PATCH] D88668: [CUDA] Add support for 11.1

2020-10-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Generally speaking the default should be conservative. It does us no good if we generate PTX 99.99, but discover that ptxas does not support it. Granted, these days 7.0 is also the wrong default as it's pretty ancient. IMO bumping it to 9.0 and GPU arch to sm_30 would be

[PATCH] D88524: [CUDA][HIP] Fix bound arch for offload action for fat binary

2020-09-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D88524#2304224 , @yaxunl wrote: > This translates to "unknown" in string form, which feels arbitrary. What if a > target has a valid cpu name which "unknown"? Isn't a nullptr (empty string in > string form) a more generic format

[PATCH] D88524: [CUDA][HIP] Fix bound arch for offload action for fat binary

2020-09-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > Currently CUDA/HIP toolchain uses "unknown" as bound arch > for offload action for fat binary. This causes -mcpu or -march > with "unknown" added in HIPToolChain::TranslateArgs or > CUDAToolChain::TranslateArgs. It would appear that the problem is actually where we check

[PATCH] D88557: [HIP] Add option --gpu-instrument-lib=

2020-09-30 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Perhaps we should start thinking of shipping some of that bitcode along with clang. Then the instrumentation library could be linked with automatically by the driver when `-finstrument` is

<    5   6   7   8   9   10   11   12   13   14   >