[PATCH] D106809: [clang-offload-bundler] Make Bundle Entry ID backward compatible

2021-09-07 Thread George Rokos via Phabricator via cfe-commits
grokos accepted this revision.
grokos added a comment.

LGTM as well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106809/new/

https://reviews.llvm.org/D106809

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106809: [clang-offload-bundler] Make Bundle Entry ID backward compatible

2021-09-01 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

LG. One possible suggestion is that you leave the double dash (`--`) variant in 
some tests so that we can make sure both variants (e.g. both 
`openmp-amdgcn-amd-amdhsa--gfx906` and `openmp-amdgcn-amd-amdhsa-gfx906`) are 
correctly parsed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106809/new/

https://reviews.llvm.org/D106809

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106509: [OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in Clang (1/2)

2021-08-13 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

In D106509#2943239 , @protze.joachim 
wrote:

> I was wondering about the connection to OpenACC, so I had a quick look into 
> the OpenACC spec to try and understand the background.
> OpenACC uses two separate reference counters for structured and unstructured 
> map. If one of them is >0, the data is present. If both become 0, data is 
> deleted.
>
> I think, the `hold` modifier is not sufficient to replicate OpenACC behavior. 
> Consider the following example:
>
>   #pragma acc data copy(a)  // structured ref := 1
>   {
>   #pragma acc exit data delete(a) // dynamic ref := 0
>   #pragma acc enter data copyin(a) // dynamic ref := 1
>   } // structured ref := 0 // no copyout because dynamic ref >0
>
> As I understand this will be translated to the following OpenMP:
>
>   #pragma omp target data map(ompx_hold, tofrom:a)  // ref := 1
>   {
>   #pragma omp exit data map(delete:a) // ref := 0  // no action because of 
> hold
>   #pragma omp enter data map(to:a) // ref := 1
>   } // ref := 0 // perform map from
>
> I don't think, that trying to map the two openacc reference count to a single 
> openmp reference count will work in general.

The next patch in this series (D106510 ) 
modifies libomptarget and introduces a second reference count for ompx_hold. 
There won't be a singe RefCount anymore. I will review that patch once this one 
has been finalized.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106509/new/

https://reviews.llvm.org/D106509

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105191: [Clang][OpenMP] Add support for Static Device Libraries

2021-07-09 Thread George Rokos via Phabricator via cfe-commits
grokos added inline comments.



Comment at: clang/lib/Driver/ToolChains/CommonArgs.cpp:1689
+   : "lib" + libname + "-" + archname + "-" + gpuname,
+  "a");
+

"a" --> ".a" (add a dot)



Comment at: clang/lib/Driver/ToolChains/CommonArgs.cpp:1796-1798
+if (SDL_Name != "omp" && SDL_Name != "cudart" && SDL_Name != "m" &&
+SDL_Name != "gcc" && SDL_Name != "gcc_s" && SDL_Name != "pthread" &&
+SDL_Name != "hip_hcc") {

I'm with @jdoerfert here, you can use a set of library names which are known to 
not have device-specific SDLs and check whether that set contains `SDL_Name`. 
Also, `SDL_Names` can be a set of unique entries, this way even if you try to 
add the same library twice it won't be added. This quadratic-complexity loop 
looks ugly...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105191/new/

https://reviews.llvm.org/D105191

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D93525: [clang-offload-bundler] Add unbundling of archives containing bundled object files into device specific archives

2021-06-30 Thread George Rokos via Phabricator via cfe-commits
grokos added inline comments.



Comment at: clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp:147
+Target.split(Components, '-', 5);
+Components.resize(6);
+this->OffloadKind = Components[0];

saiislam wrote:
> grokos wrote:
> > Leftover? `Components` is already 6 elements long.
> Not necessarily. It is possible that target has less than 6 elements. For 
> example all bundling/unbundling cases which do not require GPUArch field.
> E.g. "openmp-powerpc64le-ibm-linux-gnu"
OK, thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93525/new/

https://reviews.llvm.org/D93525

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D93525: [clang-offload-bundler] Add unbundling of archives containing bundled object files into device specific archives

2021-06-30 Thread George Rokos via Phabricator via cfe-commits
grokos added inline comments.



Comment at: clang/docs/ClangOffloadBundler.rst:128
+
+  ---
+

A bit of wordplay, but it's weird that a *triple* now has 4 elements...



Comment at: clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp:147
+Target.split(Components, '-', 5);
+Components.resize(6);
+this->OffloadKind = Components[0];

Leftover? `Components` is already 6 elements long.



Comment at: clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp:1102
+/// compatible with this code object
+/// @param [in] Code Object \p CodeObject
+/// @param [out] List of all compatible targets \p CompatibleTargets among all

`CodeObject` --> `CodeObjectInfo`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93525/new/

https://reviews.llvm.org/D93525

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99551: [clang-offload-wrapper] Add standard notes for ELF offload images

2021-04-06 Thread George Rokos via Phabricator via cfe-commits
grokos accepted this revision.
grokos added a comment.
This revision is now accepted and ready to land.

Change looks good, so it's accepted on my end. I'll let the other reviewers 
have a look and post their comments. Please do not commit until we have reached 
an agreement for all 4 patches together (D99551 
, D99552 , 
D99553 , D99612 
).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99551/new/

https://reviews.llvm.org/D99551

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D92195: [OPENMP50]Mapping of the subcomponents with the 'default' mappers.

2021-02-25 Thread George Rokos via Phabricator via cfe-commits
grokos accepted this revision.
grokos added a comment.
This revision is now accepted and ready to land.

Libomptarget changes look good, I'll let @jdoerfert provide feedback for the 
clang part.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92195/new/

https://reviews.llvm.org/D92195

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97003: [Clang][OpenMP] Require CUDA 9+ for OpenMP offloading on NVPTX target

2021-02-18 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

This change makes much sense. In fact, CUDA 8 was so problematic for use with 
the nvptx runtime that (if memory serves me well) we declared it unsupported. 
So essentially this patch drops support for CUDA version 7 (and lower), which 
is already six years old. If the other reviewers agree, we can accept the patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97003/new/

https://reviews.llvm.org/D97003

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86119: [OPENMP50]Allow overlapping mapping in target constrcuts.

2021-02-14 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

In D86119#2561163 , @abhinavgaba wrote:

> Thanks for the changes, Alexey! I tried the patch locally, and it looks 
> stable. It handled several tests I tried, including the following case 
> involving array section on a pointer to pointer base, and nested mappers with 
> `PTR_AND_OBJ` maps successfully:
>
>   #include 
>   
>   typedef struct { int a; double *b; } C;
>   #pragma omp declare mapper(id1: C s) map(to:s.a) map(from:s.b[0:2])
>   
>   typedef struct { int e; C f; int h; short *g; } D;
>   #pragma omp declare mapper(default: D r) map(from:r.e) map(mapper(id1), 
> tofrom:r.f) map(tofrom: r.g[0:r.h])
>   
>   int main() {
> constexpr int N = 10;
> D s;
> s.e = 111;
> s.f.a = 222;
> double x[2]; x[1] = 20;
> short y[N]; y[1] = 30;
> s.f.b = [0];
> s.g = [0];
> s.h = N;
>   
> D* sp = 
> D** spp = 
>   
> printf("%d %d %lf %p %d %p\n", spp[0][0].e, spp[0][0].f.a, 
> spp[0][0].f.b[1], spp[0][0].f.b, spp[0][0].g[1], spp[0][0].g);
> // Expected: 111 222 20.0  30 
>   
> #pragma omp target map(tofrom:spp[0][0])
> {
>   printf("%d %d %lf %p %d %p\n", spp[0][0].e, spp[0][0].f.a, 
> spp[0][0].f.b[1], spp[0][0].f.b, spp[0][0].g[1], spp[0][0].g);
>   // Expected:  222   30 
>   spp[0][0].e = 333;
>   spp[0][0].f.a = 444;
>   spp[0][0].f.b[1] = 40;
>   spp[0][0].g[1] = 50;
> }
> printf("%d %d %lf %p %d %p\n", spp[0][0].e, spp[0][0].f.a, 
> spp[0][0].f.b[1], spp[0][0].f.b, spp[0][0].g[1], spp[0][0].g);
> // Expected: 333 222 40.0  50 
>   }

@ABataev This is a nice complex example, I think it's worth including it in the 
runtime tests (under libomptarget).

@abhinavgaba Thanks for providing it!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86119/new/

https://reviews.llvm.org/D86119

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86119: [OPENMP50]Allow overlapping mapping in target constrcuts.

2020-11-17 Thread George Rokos via Phabricator via cfe-commits
grokos added inline comments.



Comment at: openmp/libomptarget/src/omptarget.cpp:233
 MapperComponents
-.Components[target_data_function == targetDataEnd ? I : E - I - 1];
+.Components[target_data_function == targetDataEnd ? E - I - 1 : I];
 MapperArgsBase[I] = C.Base;

What is the current status of the order of the arguments clang emits? Is it 
still necessary to traverse arguments in reverse order here?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86119/new/

https://reviews.llvm.org/D86119

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87946: [OpenMP] Add Location Fields to Libomptarget Runtime for Debugging

2020-10-12 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

In D87946#2325756 , @jhuber6 wrote:

> Current build, fails `offloading/target_depend_nowait` for an unknown reason 
> after calling cuStreamSynchronize in __tgt_target_teams_mapper_nowait.

Is your tree up to date? We had a problem with this test, which was fixed by 
D84470 .


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87946/new/

https://reviews.llvm.org/D87946

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88829: [OpenMP][RTL] Remove dead code

2020-10-05 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

In D88829#2311768 , @JonChesterfield 
wrote:

> Rolling the reduction in leading whitespace in 
> nvptx_target_parallel_reduction_codegen.cpp in with the patch might be 
> contentious, added a couple more reviewers to see if other people would 
> prefer that part split out. I'll accept in a day or so if there are no 
> comments on the whitespace.

Fine by me, I don't think it's worth uploading an extra patch just for this 
minor formatting detail.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88829/new/

https://reviews.llvm.org/D88829

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87946: [OpenMP] Add Location Fields to Libomptarget Runtime for Debugging

2020-09-29 Thread George Rokos via Phabricator via cfe-commits
grokos added inline comments.



Comment at: openmp/libomptarget/include/Ident.h:48-51
+auto removePath = [](const std::string ) {
+std::size_t pos = path.rfind('/');
+return path.substr(pos + 1);
+};

jhuber6 wrote:
> This will probably break with a Windows file path, but I don't think you can 
> even build most offloading if you're on Windows. Should I just add a 
> processor check?
> 
> ```
> #ifdef _WIN32
> #define PATH_DELIMITER '\\'
> #else
> #define PATH_DELIMITER '/'
> #endif
> ```
You are right, libomptarget does not run on Windows, but some fork which stays 
in sync with upstream libomptarget may do so and this change may break it. Can 
you add the proposed pre-processor check?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87946/new/

https://reviews.llvm.org/D87946

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87946: [OpenMP] Add Location Fields to Libomptarget Runtime for Debugging

2020-09-23 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

In D87946#2287744 , @jhuber6 wrote:

> Seems like a hacky solution to just keep adding suffixed whenever we want a 
> new interface though.

Yes, this used to be a point of contention within the community. We discussed 
the issue sometime ago and the majority of developers was in favor of this 
approach (as opposed to e.g. having an extra pointer to a structure which will 
contain additional information and which will be extended every time we add a 
new feature).

The libomptarget-part of this patch looks good, I'm leaving the other reviewers 
look at the clang-part.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87946/new/

https://reviews.llvm.org/D87946

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87946: [OpenMP] Add Location Fields to Libomptarget Runtime for Debugging

2020-09-21 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

In D87946#2286413 , @jhuber6 wrote:

> I wasn't aware they were explicitly deprecated. If we're keeping around old 
> interfaces for backwards compatibility I should also add in the old mapper 
> functions without the `ident_t` pointer and call into the new functions with 
> a nullptr.

Correct, all `__tgt_target_*` functions not ending in `_mapper` are part of the 
old interface and we are keeping them for compatibility with older versions of 
clang. These older clang versions do not emit the location pointer anyway, so 
this extra argument should be removed and each such function should call into 
its new API equivalent passing a `nullptr`.

We need the location pointer only for `__tgt_target_*_mapper` functions as well 
as `__kmpc_push_target_tripcount` - this is the current interface.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87946/new/

https://reviews.llvm.org/D87946

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87946: [OpenMP] Add Location Fields to Libomptarget Runtime for Debugging

2020-09-21 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

In D87946#2286024 , @jhuber6 wrote:

> Added ident_t structs to additional runtime functions.

Why are we adding the extra parameter to those additional functions? Non-mapper 
API functions have been deprecated, clang does not emit them anymore...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87946/new/

https://reviews.llvm.org/D87946

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D84767: [OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region.

2020-07-29 Thread George Rokos via Phabricator via cfe-commits
grokos added inline comments.



Comment at: clang/test/OpenMP/target_data_codegen.cpp:659-660
   // PRESENT=0x1000 | TARGET_PARAM=0x20 = 0x1020
-  // MEMBER_OF_9=0x9 | PRESENT=0x1000 | FROM=0x2 | TO=0x1 = 
0x91003
-  // MEMBER_OF_9=0x9 | PRESENT=0x1000 | PTR_AND_OBJ=0x10 | 
FROM=0x2 | TO=0x1 = 0x91013
-  // MEMBER_OF_9=0x9 | FROM=0x2 | TO=0x1 = 0x90003
-  // MEMBER_OF_9=0x9 | PTR_AND_OBJ=0x10 = 0x90010
+  // MEMBER_OF_7=0x9 | PRESENT=0x1000 | FROM=0x2 | TO=0x1 = 
0x71003
+  // MEMBER_OF_7=0x9 | PTR_AND_OBJ=0x10 = 0x70010
   // PTR_AND_OBJ=0x10 = 0x10

MEMBER_OF_7=0x9 --> MEMBER_OF_7=0x7


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84767/new/

https://reviews.llvm.org/D84767

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D84422: [OpenMP] Fix `present` for exit from `omp target data`

2020-07-27 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

This looks much better now. I don't have any other comments. Since this patch 
is now essentially a clang-only patch, I'll let @ABataev accept it or post 
comments.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84422/new/

https://reviews.llvm.org/D84422



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D84422: [OpenMP] Fix `present` for exit from `omp target data`

2020-07-27 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

In D84422#2173500 , @jdenny wrote:

> I've added a comment to the runtime code that performs the check.  As you can 
> see, the check is performed regardless.  It's just a question of whether the 
> runtime treats it as an error.  I don't think performance is an issue.
>
> My concern here is that it will be hard to justify changes to the runtime if 
> I cannot formulate a use case.


Thinking about it, I don't think there can be a case where something is present 
upon entering a target region and not be present when we're exiting. Whatever 
code comprises the target region is code executed on the device - it cannot 
modify the state of host objects (i.e. libomptarget) in any possible way. E.g. 
the kernel cannot invoke libomptarget functions, allocate memory, map/unmap 
data etc.

The only case where something like this would be possible is if we have 
multiple host threads executing async offloading. In such a case, one thread 
may launch a target region at a moment when the requested mapping is `present` 
on the device and while the kernel is executing some other thread performs a 
`target data exit` on the desired mapping. Upon exiting the kernel, the mapping 
will no longer be present but this is clearly a race condition (user's fault), 
so I don't think we should pay attention to such a scenario.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84422/new/

https://reviews.llvm.org/D84422



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D84182: [OPENMP]Fix PR46012: declare target pointer cannot be accessed in target region.

2020-07-24 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

@ABataev:

After this patch was committed, I tried to run the following example:

  #include 
  
  int *yptr;
  
  int main() {
int y[10];
y[1] = 1;
yptr = [0];
  
printf(" = %p\n", );
printf("[0] = %p\n", [0]);
  
#pragma omp target data map(to: yptr[0:5])
#pragma omp target
{
  printf("y = %d\n", yptr[1]);
  yptr[1] = 10;
  printf("y = %d\n", yptr[1]);
}
  
printf("y = %d\n", yptr[1]);
return 0;
  }

The arguments clang generates are:

  1) base = [0], begin = , size = 8, type = TARGET_PARAM | TO
  2) base = , begin = [0], size = 8, type = PTR_AND_OBJ | TO

The second argument is correct, the first argument doesn't make much sense. I 
believe it should have its base set to , not [0].
y[0] is not the base for anything, it's only the pointee object.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84182/new/

https://reviews.llvm.org/D84182



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D84422: [OpenMP] Fix `present` for exit from `omp target data`

2020-07-24 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

So let's proceed with the patch.

Instead of introducing new API functions and making all these changes in all 
these files, wouldn't it be easier if we just unset the `PRESENT` flag from 
arg_types in clang when we generate the call to `__tgt_target_data_end_*` if we 
are exiting from a scoped environment?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84422/new/

https://reviews.llvm.org/D84422



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D84422: [OpenMP] Fix `present` for exit from `omp target data`

2020-07-23 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

So is the test case that motivated this patch illegal OpenMP code?

  #pragma omp target enter data map(alloc:i)
  #pragma omp target data map(present, alloc: i)
  {
#pragma omp target exit data map(delete:i)
  } // fails presence check here


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84422/new/

https://reviews.llvm.org/D84422



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D84422: [OpenMP] Fix `present` for exit from `omp target data`

2020-07-23 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

What confuses me about this interpretation of the standard is the inconsistency 
at `data exit`. So if we have an explicit `omp target exit data 
map(present...)` then we should respect the "present" semantics, whereas when 
we have a scoped data exit:

  #pragma omp target data map(present,...)
  {
...
  } // implicit "exit data" here

then "present" should be ignored.

I agree that the paragraph from the standard leaves little room for other 
interpretations, I'd just like to point out that it looks inconsistent - at 
least to me.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84422/new/

https://reviews.llvm.org/D84422



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83959: Fix compiling warnings in OpenMP declare mapper codegen

2020-07-16 Thread George Rokos via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rGfc47c0e0a6a2: [clang] Fix compilation warnings in OpenMP 
declare mapper codegen. (authored by grokos).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83959/new/

https://reviews.llvm.org/D83959

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp


Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -8008,12 +8008,12 @@
 C->isImplicit(), std::get<2>(L));
   }
 for (const auto *C : CurExecDir->getClausesOfKind())
-  for (const auto  : C->component_lists()) {
+  for (const auto L : C->component_lists()) {
 InfoGen(std::get<0>(L), std::get<1>(L), OMPC_MAP_to, llvm::None,
 /*ReturnDevicePointer=*/false, C->isImplicit(), 
std::get<2>(L));
   }
 for (const auto *C : CurExecDir->getClausesOfKind())
-  for (const auto  : C->component_lists()) {
+  for (const auto L : C->component_lists()) {
 InfoGen(std::get<0>(L), std::get<1>(L), OMPC_MAP_from, llvm::None,
 /*ReturnDevicePointer=*/false, C->isImplicit(), 
std::get<2>(L));
   }
@@ -8029,7 +8029,7 @@
 
 for (const auto *C :
  CurExecDir->getClausesOfKind()) {
-  for (const auto  : C->component_lists()) {
+  for (const auto L : C->component_lists()) {
 OMPClauseMappableExprCommon::MappableExprComponentListRef Components =
 std::get<1>(L);
 assert(!Components.empty() &&


Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -8008,12 +8008,12 @@
 C->isImplicit(), std::get<2>(L));
   }
 for (const auto *C : CurExecDir->getClausesOfKind())
-  for (const auto  : C->component_lists()) {
+  for (const auto L : C->component_lists()) {
 InfoGen(std::get<0>(L), std::get<1>(L), OMPC_MAP_to, llvm::None,
 /*ReturnDevicePointer=*/false, C->isImplicit(), std::get<2>(L));
   }
 for (const auto *C : CurExecDir->getClausesOfKind())
-  for (const auto  : C->component_lists()) {
+  for (const auto L : C->component_lists()) {
 InfoGen(std::get<0>(L), std::get<1>(L), OMPC_MAP_from, llvm::None,
 /*ReturnDevicePointer=*/false, C->isImplicit(), std::get<2>(L));
   }
@@ -8029,7 +8029,7 @@
 
 for (const auto *C :
  CurExecDir->getClausesOfKind()) {
-  for (const auto  : C->component_lists()) {
+  for (const auto L : C->component_lists()) {
 OMPClauseMappableExprCommon::MappableExprComponentListRef Components =
 std::get<1>(L);
 assert(!Components.empty() &&
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67833: [OpenMP 5.0] Codegen support to pass user-defined mapper functions to runtime

2020-07-15 Thread George Rokos via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG537b16e9b8da: [OpenMP 5.0] Codegen support to pass 
user-defined mapper functions to runtime (authored by grokos).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67833/new/

https://reviews.llvm.org/D67833

Files:
  clang/include/clang/AST/OpenMPClause.h
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/lib/CodeGen/CGStmtOpenMP.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/OpenMP/capturing_in_templates.cpp
  clang/test/OpenMP/declare_mapper_codegen.cpp
  clang/test/OpenMP/declare_target_link_codegen.cpp
  clang/test/OpenMP/distribute_codegen.cpp
  clang/test/OpenMP/distribute_firstprivate_codegen.cpp
  clang/test/OpenMP/distribute_lastprivate_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_firstprivate_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_if_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_lastprivate_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_private_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_proc_bind_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_reduction_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_simd_firstprivate_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_simd_if_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_simd_lastprivate_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_simd_private_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_simd_proc_bind_codegen.cpp
  clang/test/OpenMP/distribute_private_codegen.cpp
  clang/test/OpenMP/distribute_simd_codegen.cpp
  clang/test/OpenMP/distribute_simd_firstprivate_codegen.cpp
  clang/test/OpenMP/distribute_simd_lastprivate_codegen.cpp
  clang/test/OpenMP/distribute_simd_private_codegen.cpp
  clang/test/OpenMP/distribute_simd_reduction_codegen.cpp
  clang/test/OpenMP/nvptx_lambda_capturing.cpp
  clang/test/OpenMP/nvptx_lambda_pointer_capturing.cpp
  clang/test/OpenMP/nvptx_target_requires_unified_shared_memory.cpp
  clang/test/OpenMP/openmp_offload_codegen.cpp
  clang/test/OpenMP/target_codegen.cpp
  clang/test/OpenMP/target_data_codegen.cpp
  clang/test/OpenMP/target_data_use_device_addr_codegen.cpp
  clang/test/OpenMP/target_defaultmap_codegen.cpp
  clang/test/OpenMP/target_depend_codegen.cpp
  clang/test/OpenMP/target_device_codegen.cpp
  clang/test/OpenMP/target_enter_data_codegen.cpp
  clang/test/OpenMP/target_enter_data_depend_codegen.cpp
  clang/test/OpenMP/target_exit_data_codegen.cpp
  clang/test/OpenMP/target_exit_data_depend_codegen.cpp
  clang/test/OpenMP/target_firstprivate_codegen.cpp
  clang/test/OpenMP/target_is_device_ptr_codegen.cpp
  clang/test/OpenMP/target_map_codegen.cpp
  clang/test/OpenMP/target_map_member_expr_array_section_codegen.cpp
  clang/test/OpenMP/target_parallel_codegen.cpp
  clang/test/OpenMP/target_parallel_depend_codegen.cpp
  clang/test/OpenMP/target_parallel_for_codegen.cpp
  clang/test/OpenMP/target_parallel_for_depend_codegen.cpp
  clang/test/OpenMP/target_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/target_parallel_for_simd_depend_codegen.cpp
  clang/test/OpenMP/target_parallel_for_simd_uses_allocators_codegen.cpp
  clang/test/OpenMP/target_parallel_for_uses_allocators_codegen.cpp
  clang/test/OpenMP/target_parallel_if_codegen.cpp
  clang/test/OpenMP/target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/target_parallel_uses_allocators_codegen.cpp
  clang/test/OpenMP/target_simd_codegen.cpp
  clang/test/OpenMP/target_simd_depend_codegen.cpp
  clang/test/OpenMP/target_simd_uses_allocators_codegen.cpp
  clang/test/OpenMP/target_teams_codegen.cpp
  clang/test/OpenMP/target_teams_depend_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_collapse_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_depend_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_dist_schedule_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_firstprivate_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_lastprivate_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_parallel_for_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_parallel_for_collapse_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_parallel_for_depend_codegen.cpp
  
clang/test/OpenMP/target_teams_distribute_parallel_for_dist_schedule_codegen.cpp
  
clang/test/OpenMP/target_teams_distribute_parallel_for_firstprivate_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_parallel_for_if_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_parallel_for_lastprivate_codegen.cpp
  

[PATCH] D67833: [OpenMP 5.0] Codegen support to pass user-defined mapper functions to runtime

2020-07-15 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

OK, now it works. Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67833/new/

https://reviews.llvm.org/D67833



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67833: [OpenMP 5.0] Codegen support to pass user-defined mapper functions to runtime

2020-07-15 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

I tried to build clang with this patch and I get errors like:

  CGOpenMPRuntime.cpp:9463:38: error: ‘OMPRTL___tgt_target_teams_nowait_mapper’ 
was not declared in this scope
  ? OMPRTL___tgt_target_teams_nowait_mapper

Where are these `OMPRTL___tgt_` symbols defined?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67833/new/

https://reviews.llvm.org/D67833



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83057: [OpenMP][NFC] Remove hard-coded line numbers from more tests

2020-07-02 Thread George Rokos via Phabricator via cfe-commits
grokos accepted this revision.
grokos added a comment.
This revision is now accepted and ready to land.

Like D82224 , looks good.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83057/new/

https://reviews.llvm.org/D83057



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D74262: [clang-offload-bundler] Enable handling of partially-linked fat objects

2020-05-06 Thread George Rokos via Phabricator via cfe-commits
grokos abandoned this revision.
grokos added a comment.

The partial linking scheme has been found to not work correctly in all cases 
(it fails when we have libraries with device code only). A new patch will be 
uploaded which will be based on archive extraction.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74262/new/

https://reviews.llvm.org/D74262



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75223: [clang-offload-wrapper] Lower priority of __tgt_register_lib in favor of __tgt_register_requires

2020-03-03 Thread George Rokos via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rGfca49fe8e34f: [clang-offload-wrapper] Lower priority of 
__tgt_register_lib in favor of… (authored by grokos).

Changed prior to commit:
  https://reviews.llvm.org/D75223?vs=246870=247991#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75223/new/

https://reviews.llvm.org/D75223

Files:
  clang/test/Driver/clang-offload-wrapper.c
  clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp


Index: clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
===
--- clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
+++ clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
@@ -262,7 +262,12 @@
 Builder.CreateRetVoid();
 
 // Add this function to constructors.
-appendToGlobalCtors(M, Func, 0);
+// Set priority to 1 so that __tgt_register_lib is executed AFTER
+// __tgt_register_requires (we want to know what requirements have been
+// asked for before we load a libomptarget plugin so that by the time the
+// plugin is loaded it can report how many devices there are which can
+// satisfy these requirements).
+appendToGlobalCtors(M, Func, /*Priority*/ 1);
   }
 
   void createUnregisterFunction(GlobalVariable *BinDesc) {
@@ -283,7 +288,8 @@
 Builder.CreateRetVoid();
 
 // Add this function to global destructors.
-appendToGlobalDtors(M, Func, 0);
+// Match priority of __tgt_register_lib
+appendToGlobalDtors(M, Func, /*Priority*/ 1);
   }
 
 public:
Index: clang/test/Driver/clang-offload-wrapper.c
===
--- clang/test/Driver/clang-offload-wrapper.c
+++ clang/test/Driver/clang-offload-wrapper.c
@@ -39,8 +39,8 @@
 
 // CHECK-IR: [[DESC:@.+]] = internal constant [[DESCTY]] { i32 1, [[IMAGETY]]* 
getelementptr inbounds ([1 x [[IMAGETY]]], [1 x [[IMAGETY]]]* [[IMAGES]], i64 
0, i64 0), [[ENTTY]]* [[ENTBEGIN]], [[ENTTY]]* [[ENTEND]] }
 
-// CHECK-IR: @llvm.global_ctors = appending global [1 x { i32, void ()*, i8* 
}] [{ i32, void ()*, i8* } { i32 0, void ()* [[REGFN:@.+]], i8* null }]
-// CHECK-IR: @llvm.global_dtors = appending global [1 x { i32, void ()*, i8* 
}] [{ i32, void ()*, i8* } { i32 0, void ()* [[UNREGFN:@.+]], i8* null }]
+// CHECK-IR: @llvm.global_ctors = appending global [1 x { i32, void ()*, i8* 
}] [{ i32, void ()*, i8* } { i32 1, void ()* [[REGFN:@.+]], i8* null }]
+// CHECK-IR: @llvm.global_dtors = appending global [1 x { i32, void ()*, i8* 
}] [{ i32, void ()*, i8* } { i32 1, void ()* [[UNREGFN:@.+]], i8* null }]
 
 // CHECK-IR: define internal void [[REGFN]]()
 // CHECK-IR:   call void @__tgt_register_lib([[DESCTY]]* [[DESC]])


Index: clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
===
--- clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
+++ clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
@@ -262,7 +262,12 @@
 Builder.CreateRetVoid();
 
 // Add this function to constructors.
-appendToGlobalCtors(M, Func, 0);
+// Set priority to 1 so that __tgt_register_lib is executed AFTER
+// __tgt_register_requires (we want to know what requirements have been
+// asked for before we load a libomptarget plugin so that by the time the
+// plugin is loaded it can report how many devices there are which can
+// satisfy these requirements).
+appendToGlobalCtors(M, Func, /*Priority*/ 1);
   }
 
   void createUnregisterFunction(GlobalVariable *BinDesc) {
@@ -283,7 +288,8 @@
 Builder.CreateRetVoid();
 
 // Add this function to global destructors.
-appendToGlobalDtors(M, Func, 0);
+// Match priority of __tgt_register_lib
+appendToGlobalDtors(M, Func, /*Priority*/ 1);
   }
 
 public:
Index: clang/test/Driver/clang-offload-wrapper.c
===
--- clang/test/Driver/clang-offload-wrapper.c
+++ clang/test/Driver/clang-offload-wrapper.c
@@ -39,8 +39,8 @@
 
 // CHECK-IR: [[DESC:@.+]] = internal constant [[DESCTY]] { i32 1, [[IMAGETY]]* getelementptr inbounds ([1 x [[IMAGETY]]], [1 x [[IMAGETY]]]* [[IMAGES]], i64 0, i64 0), [[ENTTY]]* [[ENTBEGIN]], [[ENTTY]]* [[ENTEND]] }
 
-// CHECK-IR: @llvm.global_ctors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 0, void ()* [[REGFN:@.+]], i8* null }]
-// CHECK-IR: @llvm.global_dtors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 0, void ()* [[UNREGFN:@.+]], i8* null }]
+// CHECK-IR: @llvm.global_ctors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 1, void ()* [[REGFN:@.+]], i8* null }]
+// CHECK-IR: @llvm.global_dtors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 1, void ()* [[UNREGFN:@.+]], i8* null }]
 
 // CHECK-IR: define 

[PATCH] D75223: [clang-offload-wrapper] Lower priority of __tgt_register_lib in favor of __tgt_register_requires

2020-03-03 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

@ABataev, @jdoerfert, does the patch look good to you? Can someone accept it if 
it's ready to go?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75223/new/

https://reviews.llvm.org/D75223



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75223: [clang-offload-wrapper] Lower priority of __tgt_register_lib in favor of __tgt_register_requires

2020-02-26 Thread George Rokos via Phabricator via cfe-commits
grokos updated this revision to Diff 246870.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75223/new/

https://reviews.llvm.org/D75223

Files:
  clang/test/Driver/clang-offload-wrapper.c
  clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp


Index: clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
===
--- clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
+++ clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
@@ -262,7 +262,12 @@
 Builder.CreateRetVoid();
 
 // Add this function to constructors.
-appendToGlobalCtors(M, Func, 0);
+// Set priority to 1 so that __tgt_register_lib is executed AFTER
+// __tgt_register_requires (we want to know what requirements have been
+// asked for before we load a libomptarget plugin so that by the time the
+// plugin is loaded it can report how many devices there are which can
+// satisfy these requirements).
+appendToGlobalCtors(M, Func, /*Priority*/ 1);
   }
 
   void createUnregisterFunction(GlobalVariable *BinDesc) {
@@ -283,7 +288,8 @@
 Builder.CreateRetVoid();
 
 // Add this function to global destructors.
-appendToGlobalDtors(M, Func, 0);
+// Match priority of __tgt_register_lib
+appendToGlobalDtors(M, Func, 1);
   }
 
 public:
Index: clang/test/Driver/clang-offload-wrapper.c
===
--- clang/test/Driver/clang-offload-wrapper.c
+++ clang/test/Driver/clang-offload-wrapper.c
@@ -39,8 +39,8 @@
 
 // CHECK-IR: [[DESC:@.+]] = internal constant [[DESCTY]] { i32 1, [[IMAGETY]]* 
getelementptr inbounds ([1 x [[IMAGETY]]], [1 x [[IMAGETY]]]* [[IMAGES]], i64 
0, i64 0), [[ENTTY]]* [[ENTBEGIN]], [[ENTTY]]* [[ENTEND]] }
 
-// CHECK-IR: @llvm.global_ctors = appending global [1 x { i32, void ()*, i8* 
}] [{ i32, void ()*, i8* } { i32 0, void ()* [[REGFN:@.+]], i8* null }]
-// CHECK-IR: @llvm.global_dtors = appending global [1 x { i32, void ()*, i8* 
}] [{ i32, void ()*, i8* } { i32 0, void ()* [[UNREGFN:@.+]], i8* null }]
+// CHECK-IR: @llvm.global_ctors = appending global [1 x { i32, void ()*, i8* 
}] [{ i32, void ()*, i8* } { i32 1, void ()* [[REGFN:@.+]], i8* null }]
+// CHECK-IR: @llvm.global_dtors = appending global [1 x { i32, void ()*, i8* 
}] [{ i32, void ()*, i8* } { i32 1, void ()* [[UNREGFN:@.+]], i8* null }]
 
 // CHECK-IR: define internal void [[REGFN]]()
 // CHECK-IR:   call void @__tgt_register_lib([[DESCTY]]* [[DESC]])


Index: clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
===
--- clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
+++ clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
@@ -262,7 +262,12 @@
 Builder.CreateRetVoid();
 
 // Add this function to constructors.
-appendToGlobalCtors(M, Func, 0);
+// Set priority to 1 so that __tgt_register_lib is executed AFTER
+// __tgt_register_requires (we want to know what requirements have been
+// asked for before we load a libomptarget plugin so that by the time the
+// plugin is loaded it can report how many devices there are which can
+// satisfy these requirements).
+appendToGlobalCtors(M, Func, /*Priority*/ 1);
   }
 
   void createUnregisterFunction(GlobalVariable *BinDesc) {
@@ -283,7 +288,8 @@
 Builder.CreateRetVoid();
 
 // Add this function to global destructors.
-appendToGlobalDtors(M, Func, 0);
+// Match priority of __tgt_register_lib
+appendToGlobalDtors(M, Func, 1);
   }
 
 public:
Index: clang/test/Driver/clang-offload-wrapper.c
===
--- clang/test/Driver/clang-offload-wrapper.c
+++ clang/test/Driver/clang-offload-wrapper.c
@@ -39,8 +39,8 @@
 
 // CHECK-IR: [[DESC:@.+]] = internal constant [[DESCTY]] { i32 1, [[IMAGETY]]* getelementptr inbounds ([1 x [[IMAGETY]]], [1 x [[IMAGETY]]]* [[IMAGES]], i64 0, i64 0), [[ENTTY]]* [[ENTBEGIN]], [[ENTTY]]* [[ENTEND]] }
 
-// CHECK-IR: @llvm.global_ctors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 0, void ()* [[REGFN:@.+]], i8* null }]
-// CHECK-IR: @llvm.global_dtors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 0, void ()* [[UNREGFN:@.+]], i8* null }]
+// CHECK-IR: @llvm.global_ctors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 1, void ()* [[REGFN:@.+]], i8* null }]
+// CHECK-IR: @llvm.global_dtors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 1, void ()* [[UNREGFN:@.+]], i8* null }]
 
 // CHECK-IR: define internal void [[REGFN]]()
 // CHECK-IR:   call void @__tgt_register_lib([[DESCTY]]* [[DESC]])
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75223: [clang-offload-wrapper] Lower priority of __tgt_register_lib in favor of __tgt_register_requires

2020-02-26 Thread George Rokos via Phabricator via cfe-commits
grokos created this revision.
grokos added reviewers: ABataev, vzakhari.
grokos added projects: OpenMP, clang.
Herald added a reviewer: jdoerfert.

Currently, the offload-wrapper tool inserts `__tgt_register_lib` to the list of 
global ctors of a target module with `Priority=0`. This means that it's got the 
same priority as `__tgt_register_requires` and the order in which these two 
functions are called in not guaranteed. Ideally, we'd like to call 
`__tgt_register_requires` BEFORE loading a libomptarget plugin (which is one of 
the actions happening inside `__tgt_register_lib`). The reason is that we want 
to know which requirements the user has asked for so that upon loading the 
plugin libomptarget can report how many devices there are that can satisfy the 
requirements.

E.g. with the current implementation we can run into the following problem:

1. The user requests `unified_shared_memory` but the available devices on the 
system do not support this feature.
2. Initially, the offload policy is set to `tgt_default`.
3. `__tgt_register_lib` is called and the plugin for the specific target device 
reports there are N>0 available devices.
4. Consequently, the offload policy is set to `tgt_mandatory`.
5. `__tgt_register_requires` is called and we find out that the 
`unified_shared_memory` requirement cannot be satisfied.
6. Offload fails and because the offload policy had been set to mandatory 
libomptarget terminates the application.

With the proposed change things will proceed as follows:

1. The user requests `unified_shared_memory` but the available devices on the 
system do not support this feature.
2. Initially, the offload policy is set to `tgt_default`.
3. `__tgt_register_requires` is called and registers the 
`unified_shared_memory` requirement with libomptarget.
4. `__tgt_register_lib` is called and the plugin for the specific target device 
reports that the `unified_shared_memory` requirement cannot be satisfied, so 
there are N=0 available devices.
5. Consequently, the offload policy is set to `tgt_disabled`.
6. Execution falls back on the host instead of terminating the application.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D75223

Files:
  clang/test/Driver/clang-offload-wrapper.c
  clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp


Index: clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
===
--- clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
+++ clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
@@ -262,7 +262,12 @@
 Builder.CreateRetVoid();
 
 // Add this function to constructors.
-appendToGlobalCtors(M, Func, 0);
+// Set priority to 1 so that __tgt_register_lib is executed AFTER
+// __tgt_register_requires (we want to know what requirements have been
+// asked for before we load a libomptarget plugin so that by the time the
+// plugin is loaded it can report how many devices there are which can
+// satisfy these requirements).
+appendToGlobalCtors(M, Func, /*Priority*/ 1);
   }
 
   void createUnregisterFunction(GlobalVariable *BinDesc) {
Index: clang/test/Driver/clang-offload-wrapper.c
===
--- clang/test/Driver/clang-offload-wrapper.c
+++ clang/test/Driver/clang-offload-wrapper.c
@@ -39,7 +39,7 @@
 
 // CHECK-IR: [[DESC:@.+]] = internal constant [[DESCTY]] { i32 1, [[IMAGETY]]* 
getelementptr inbounds ([1 x [[IMAGETY]]], [1 x [[IMAGETY]]]* [[IMAGES]], i64 
0, i64 0), [[ENTTY]]* [[ENTBEGIN]], [[ENTTY]]* [[ENTEND]] }
 
-// CHECK-IR: @llvm.global_ctors = appending global [1 x { i32, void ()*, i8* 
}] [{ i32, void ()*, i8* } { i32 0, void ()* [[REGFN:@.+]], i8* null }]
+// CHECK-IR: @llvm.global_ctors = appending global [1 x { i32, void ()*, i8* 
}] [{ i32, void ()*, i8* } { i32 1, void ()* [[REGFN:@.+]], i8* null }]
 // CHECK-IR: @llvm.global_dtors = appending global [1 x { i32, void ()*, i8* 
}] [{ i32, void ()*, i8* } { i32 0, void ()* [[UNREGFN:@.+]], i8* null }]
 
 // CHECK-IR: define internal void [[REGFN]]()


Index: clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
===
--- clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
+++ clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
@@ -262,7 +262,12 @@
 Builder.CreateRetVoid();
 
 // Add this function to constructors.
-appendToGlobalCtors(M, Func, 0);
+// Set priority to 1 so that __tgt_register_lib is executed AFTER
+// __tgt_register_requires (we want to know what requirements have been
+// asked for before we load a libomptarget plugin so that by the time the
+// plugin is loaded it can report how many devices there are which can
+// satisfy these requirements).
+appendToGlobalCtors(M, Func, /*Priority*/ 1);
   }
 
   void createUnregisterFunction(GlobalVariable *BinDesc) {
Index: 

[PATCH] D74262: [clang-offload-bundler] Enable handling of partially-linked fat objects

2020-02-14 Thread George Rokos via Phabricator via cfe-commits
grokos marked 2 inline comments as done.
grokos added a comment.

In D74262#1867245 , @ABataev wrote:

> Partial linking may lead to some incorrect results with global constructors. 
> How are you going to handle this?


Can you give me an example of what can break? I remember reading a conversation 
about some linker patch some time ago but I cannot recall the details.




Comment at: clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp:84
"  o   - object\n"
+   "  oo  - object; output file is a list of unbundled 
objects\n"
"  gch - precompiled-header\n"

jdoerfert wrote:
> ABataev wrote:
> > Hmm, are you going to introduce a new kind of output? It really requires 
> > RFC.
> This is the offload-bundler tool, right? Who is using that except OpenMP (and 
> SYCL)?
> 
> Is there a reason for `oo`? `uo` (=unboundled object), or `do` (=device 
> object)?
No one else (at least for now). But I can send out an RFC regarding the new 
output anyway.

`oo` is related to the fact that under this scheme we can have multiple `.o` 
files as output (many `o`'s). But if you think some of the other abbreviations 
makes more sense, I'm happy to change it.



Comment at: clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp:160
+  }
+
   /// Write the header of the bundled file to \a OS based on the information

jdoerfert wrote:
> I don't understand the comment. If \p FileName is a list of outputs, how does 
> this work?
The scheme is described in the attached pdf. In short, when the host liner 
fetches dependencies from a static library, alongside the host bundle it also 
fetches the device bundle. Now, if we have multiple dependencies from multiple 
objects inside a static library (or multiple static libraries) the host linker 
will perform a partial linking between all fetched bundles for the targets we 
are interested in. The result is a fat object in which each target bundle is 
the result of concatenating the individual bundles for that target we fetched 
from each static library. We also keep track of the size of each fetched bundle 
(we use a new sizes section per target inside the fat object for this purpose) 
so that the unbundler can separate the partially-linked bundle into the 
original object files it was assembled from. Usually, we don't know a priori 
how many dependencies will be brought in, so we don't know how many objects 
we're going to have at outputs. Therefore, in `oo` unbundling mode, the user 
specifies a single output file per target (just like in any other unbundling 
mode) which the unbundler populates with the paths to the actual output device 
objects. Then the driver reads those paths and passes them on to the device 
linker.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74262/new/

https://reviews.llvm.org/D74262



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D74262: [clang-offload-bundler] Enable handling of partially-linked fat objects

2020-02-07 Thread George Rokos via Phabricator via cfe-commits
grokos created this revision.
grokos added reviewers: hfinkel, jdoerfert, ABataev, mdtoguchi, kbobrovs, 
sdmitriev.
grokos added a project: clang.
Herald added a subscriber: Anastasia.

This is the bundler-side patch for enabling static library support in clang. 
The scheme has been discussed extensively in the past and is described in this 
document prepared by @sdmitriev: F11310194: offload from static libs.pdf 
.

Patch was developed in collaboration with @kbobrovs and a similar version has 
been merged with Intel's SYCL compiler 
(https://github.com/intel/llvm/tree/sycl).

When a fat object is created, for each bundle the bundler also creates a 
corresponding "size" section consisting of a single 64-bit integer storing the 
size of the bundle. When linking from static objects, the host linker will 
fetch all dependencies and do a partial linking on them; this action 
concatenates all sections with the same name across fetched dependencies into a 
new aggregate section, so for each target there will be an aggregate section 
containing the concatenated bundles and another aggregate section containing 
the concatenated sizes. By visiting the aggregate sizes section the unbundler 
can then split the aggregate bundle into separate output device objects.

The patch introduces a new type "oo" which is used when unbundling 
partially-linked fat objects. When "oo" is specified, the output file is not an 
object file itself; instead it is a text file containing the paths to the 
actual outputs (because we may have multiple device objects as outputs - one 
for each dependency that was fetched).

Invocation of the host linker (to do partial-linking) and cleanup of temporary 
files will be done by the Driver. Once the bundler patch lands, the Driver 
patch will follow.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D74262

Files:
  clang/test/Driver/clang-offload-bundler-missing-size-section.cpp
  clang/test/Driver/clang-offload-bundler-oo.cpp
  clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp

Index: clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
===
--- clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
+++ clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
@@ -26,6 +26,7 @@
 #include "llvm/Object/ObjectFile.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Endian.h"
 #include "llvm/Support/Errc.h"
 #include "llvm/Support/Error.h"
 #include "llvm/Support/ErrorOr.h"
@@ -80,6 +81,7 @@
"  bc  - llvm-bc\n"
"  s   - assembler\n"
"  o   - object\n"
+   "  oo  - object; output file is a list of unbundled objects\n"
"  gch - precompiled-header\n"
"  ast - clang AST file"),
   cl::cat(ClangOffloadBundlerCategory));
@@ -97,6 +99,9 @@
 /// Magic string that marks the existence of offloading data.
 #define OFFLOAD_BUNDLER_MAGIC_STR "__CLANG_OFFLOAD_BUNDLE__"
 
+/// Prefix of an added section name with bundle size.
+#define SIZE_SECTION_PREFIX "__CLANG_OFFLOAD_BUNDLE_SIZE__"
+
 /// The index of the host input in the list of inputs.
 static unsigned HostInputIndex = ~0u;
 
@@ -141,6 +146,18 @@
   /// Read the current bundle and write the result into the stream \a OS.
   virtual Error ReadBundle(raw_fd_ostream , MemoryBuffer ) = 0;
 
+  /// Read the current bundle and write the result into the file \a FileName.
+  /// The meaning of \a FileName depends on unbundling type - in some
+  /// cases (type="oo") it will contain a list of actual outputs.
+  virtual Error ReadBundle(StringRef FileName, MemoryBuffer ) {
+std::error_code EC;
+raw_fd_ostream OS(FileName, EC, sys::fs::OF_None);
+
+if (EC)
+  return createFileError(FileName, EC);
+return ReadBundle(OS, Input);
+  }
+
   /// Write the header of the bundled file to \a OS based on the information
   /// gathered from \a Inputs.
   virtual Error WriteHeader(raw_fd_ostream ,
@@ -157,6 +174,13 @@
 
   /// Write the bundle from \a Input into \a OS.
   virtual Error WriteBundle(raw_fd_ostream , MemoryBuffer ) = 0;
+
+  /// Sets a base name for temporary filename generation.
+  void SetTempFileNameBase(StringRef Base) { TempFileNameBase = Base.data(); }
+
+protected:
+  /// Serves as a base name for temporary filename generation.
+  std::string TempFileNameBase;
 };
 
 /// Handler for binary files. The bundled file will have the following format
@@ -308,6 +332,8 @@
 return Error::success();
   }
 
+  using FileHandler::ReadBundle; // to avoid hiding via the overload below
+
   Error ReadBundle(raw_fd_ostream , MemoryBuffer ) final {
 assert(CurBundleInfo != BundlesInfo.end() && "Invalid reader info!");
 StringRef FC = Input.getBuffer();
@@ -403,27 +429,99 @@
 /// designated name.
 ///
 /// To unbundle, 

[PATCH] D68746: [Clang][OpenMP Offload] Move offload registration code to the wrapper

2019-10-10 Thread George Rokos via Phabricator via cfe-commits
grokos added inline comments.



Comment at: clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp:74
+  IntegerType *getSizeTTy() {
+switch (M.getDataLayout().getPointerTypeSize(Type::getInt8PtrTy(C))) {
+case 4u:

sdmitriev wrote:
> ABataev wrote:
> > sdmitriev wrote:
> > > ABataev wrote:
> > > > Same question as before: maybe better to make the size of size_t type a 
> > > > parameter of a tool?
> > > As I remember you also had another suggestion - change size_t to 
> > > intptr_t. That will eliminate the need to an additional parameter for 
> > > size type. Will it be better?
> > In thi case we'll need to change the type in the libomptarget.
> Right. @grokos , do you see any potential problems in changing 
> __tgt_offload_entry::size type from size_t to intptr_t?
As long as `intptr_t` has the same size as `size_t` it should be fine. Of 
course, if this is not the case, then if libomptarget tries to load an older 
image where `sizeof(__tgt_offload_entry::size) != sizeof(intptr_t)` then 
backwards compatibility will have been broken. Fortunately, on all platforms 
supported by released versions of libomptarget so far (x86_64, ppc64, aarch64) 
if I'm not mistaken `sizeof(size_t) == sizeof(initptr_t)`, so I don't think 
we'll break anything.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68746/new/

https://reviews.llvm.org/D68746



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38798: [OpenMP] Support for implicit "declare target" functions - Sema patch

2018-03-21 Thread George Rokos via Phabricator via cfe-commits
grokos abandoned this revision.
grokos added a comment.

@ABataev came up with a much simpler solution to the implementation of `declare 
target`: https://reviews.llvm.org/rL327636

I am abandoning this obsolete revision.


Repository:
  rL LLVM

https://reviews.llvm.org/D38798



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43026: [OpenMP] CodeGen for the "declare target" directive - variables, functions, ctors/dtors

2018-03-21 Thread George Rokos via Phabricator via cfe-commits
grokos abandoned this revision.
grokos added a comment.

@ABataev came up with a much simpler solution to the implementation of `declare 
target`: https://reviews.llvm.org/rL327636

I am abandoning this obsolete revision.


Repository:
  rC Clang

https://reviews.llvm.org/D43026



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43197: [OpenMP] Add flag for linking runtime bitcode library

2018-03-09 Thread George Rokos via Phabricator via cfe-commits
grokos added inline comments.



Comment at: test/Driver/openmp-offload-gpu.c:150
+/// bitcode library and add it to the LIBRARY_PATH.
+// RUN:   touch %T/libomptarget-nvptx-sm_60.bc
+// RUN:   env LIBRARY_PATH=%T %clang -### -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda \

ABataev wrote:
> Create empty `libomptarget-nvptx-sm_60.bc` in `Driver/lib` directory and use 
> it in the test rather create|delete it dynamically.
I'm also in favour of this approach. On some systems /tmp is not accessible and 
the regression test fails.


Repository:
  rC Clang

https://reviews.llvm.org/D43197



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43197: [OpenMP] Add flag for linking runtime bitcode library

2018-02-12 Thread George Rokos via Phabricator via cfe-commits
grokos accepted this revision.
grokos added a comment.
This revision is now accepted and ready to land.

I don't have any other remarks, looks good.


Repository:
  rC Clang

https://reviews.llvm.org/D43197



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43197: [OpenMP] Add flag for linking runtime bitcode library

2018-02-12 Thread George Rokos via Phabricator via cfe-commits
grokos added inline comments.



Comment at: lib/Driver/ToolChains/Cuda.cpp:559
+if (!FoundBCLibrary)
+  getDriver().Diag(diag::remark_drv_omp_offload_target_missingbcruntime);
+  }

Should we be more specific when it comes to the name of the missing bc file and 
include the `sm` version? E.g. we may have `libomptarget-nvptx-sm35.bc` in 
`LIBRARY_PATH` but the driver needs `libomptarget-nvptx-sm60.bc`. If the user 
gets a general `missing libomptarget-nvptx.bc` message, it may not be clear 
what the problem is.


Repository:
  rC Clang

https://reviews.llvm.org/D43197



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43026: [OpenMP] Support for implicit "declare target" functions - CodeGen patch

2018-02-07 Thread George Rokos via Phabricator via cfe-commits
grokos created this revision.
grokos added a reviewer: ABataev.
grokos added projects: clang, OpenMP.
Herald added a subscriber: guansong.

This patch implements CodeGen support for the "declare target" directive.

Code is generated for variables, functions and ctors/dtors.

I understand that the patch as a whole is somewhat large; if this is the case 
and it cannot land in one go then let's discuss how it can be split. Due to 
this uncertainty I haven't included any regression tests, I'll upload them once 
the scope of each patch has been determined.


Repository:
  rC Clang

https://reviews.llvm.org/D43026

Files:
  lib/CodeGen/CGCXXABI.h
  lib/CodeGen/CGDeclCXX.cpp
  lib/CodeGen/CGOpenMPRuntime.cpp
  lib/CodeGen/CGOpenMPRuntime.h
  lib/CodeGen/CodeGenFunction.h
  lib/CodeGen/CodeGenModule.cpp
  lib/CodeGen/ItaniumCXXABI.cpp
  lib/CodeGen/MicrosoftCXXABI.cpp
  lib/Parse/ParseOpenMP.cpp

Index: lib/Parse/ParseOpenMP.cpp
===
--- lib/Parse/ParseOpenMP.cpp
+++ lib/Parse/ParseOpenMP.cpp
@@ -758,6 +758,7 @@
 if (!Actions.ActOnStartOpenMPDeclareTargetDirective(DTLoc))
   return DeclGroupPtrTy();
 
+SmallVector Decls;
 DKind = ParseOpenMPDirectiveKind(*this);
 while (DKind != OMPD_end_declare_target && DKind != OMPD_declare_target &&
Tok.isNot(tok::eof) && Tok.isNot(tok::r_brace)) {
@@ -781,6 +782,12 @@
 else
   TPA.Commit();
   }
+
+  // Save the declarations so that we can create the declare target group
+  // later on.
+  if (Ptr)
+for (auto *V : Ptr.get())
+  Decls.push_back(V);
 }
 
 if (DKind == OMPD_end_declare_target) {
@@ -795,8 +802,17 @@
 } else {
   Diag(Tok, diag::err_expected_end_declare_target);
   Diag(DTLoc, diag::note_matching) << "'#pragma omp declare target'";
+  // We have an error, so we don't have to attempt to generate code for the
+  // declarations.
+  Decls.clear();
 }
 Actions.ActOnFinishOpenMPDeclareTargetDirective();
+
+// If we have decls generate the group so that code can be generated for it
+// later on.
+if (!Decls.empty())
+  return Actions.BuildDeclaratorGroup(Decls);
+
 return DeclGroupPtrTy();
   }
   case OMPD_unknown:
Index: lib/CodeGen/MicrosoftCXXABI.cpp
===
--- lib/CodeGen/MicrosoftCXXABI.cpp
+++ lib/CodeGen/MicrosoftCXXABI.cpp
@@ -387,8 +387,8 @@
   QualType LValType) override;
 
   void EmitGuardedInit(CodeGenFunction , const VarDecl ,
-   llvm::GlobalVariable *DeclPtr,
-   bool PerformInit) override;
+   llvm::GlobalVariable *DeclPtr, bool PerformInit,
+   bool EmitInitOnly, bool EmitDtorOnly) override;
   void registerGlobalDtor(CodeGenFunction , const VarDecl ,
   llvm::Constant *Dtor, llvm::Constant *Addr) override;
 
@@ -2387,15 +2387,17 @@
 
 void MicrosoftCXXABI::EmitGuardedInit(CodeGenFunction , const VarDecl ,
   llvm::GlobalVariable *GV,
-  bool PerformInit) {
+  bool PerformInit, bool EmitInitOnly,
+  bool EmitDtorOnly) {
   // MSVC only uses guards for static locals.
   if (!D.isStaticLocal()) {
 assert(GV->hasWeakLinkage() || GV->hasLinkOnceLinkage());
 // GlobalOpt is allowed to discard the initializer, so use linkonce_odr.
 llvm::Function *F = CGF.CurFn;
 F->setLinkage(llvm::GlobalValue::LinkOnceODRLinkage);
 F->setComdat(CGM.getModule().getOrInsertComdat(F->getName()));
-CGF.EmitCXXGlobalVarDeclInit(D, GV, PerformInit);
+CGF.EmitCXXGlobalVarDeclInit(D, GV, PerformInit, EmitInitOnly,
+ EmitDtorOnly);
 return;
   }
 
@@ -2496,7 +2498,8 @@
 CGF.EmitBlock(InitBlock);
 Builder.CreateStore(Builder.CreateOr(LI, Bit), GuardAddr);
 CGF.EHStack.pushCleanup(EHCleanup, GuardAddr, GuardNum);
-CGF.EmitCXXGlobalVarDeclInit(D, GV, PerformInit);
+CGF.EmitCXXGlobalVarDeclInit(D, GV, PerformInit, EmitInitOnly,
+ EmitDtorOnly);
 CGF.PopCleanupBlock();
 Builder.CreateBr(EndBlock);
 
@@ -2542,7 +2545,8 @@
 // Ok, we ended up getting selected as the initializing thread.
 CGF.EmitBlock(InitBlock);
 CGF.EHStack.pushCleanup(EHCleanup, GuardAddr);
-CGF.EmitCXXGlobalVarDeclInit(D, GV, PerformInit);
+CGF.EmitCXXGlobalVarDeclInit(D, GV, PerformInit, EmitInitOnly,
+ EmitDtorOnly);
 CGF.PopCleanupBlock();
 CGF.EmitNounwindRuntimeCall(getInitThreadFooterFn(CGM),
 GuardAddr.getPointer());
Index: lib/CodeGen/ItaniumCXXABI.cpp
===
--- lib/CodeGen/ItaniumCXXABI.cpp
+++ 

[PATCH] D40977: [OpenMP] NVPTX: Set default/minimum compute capability to sm_35

2017-12-07 Thread George Rokos via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rC320082: [OpenMP] NVPTX: Set default/minimum compute 
capability to sm_35 (authored by grokos).

Repository:
  rC Clang

https://reviews.llvm.org/D40977

Files:
  CMakeLists.txt


Index: CMakeLists.txt
===
--- CMakeLists.txt
+++ CMakeLists.txt
@@ -241,14 +241,15 @@
 set(CLANG_DEFAULT_OPENMP_RUNTIME "libomp" CACHE STRING
   "Default OpenMP runtime used by -fopenmp.")
 
-# OpenMP offloading requires at least sm_30 because we use shuffle instructions
-# to generate efficient code for reductions.
-set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
+# OpenMP offloading requires at least sm_35 because we use shuffle instructions
+# to generate efficient code for reductions and the atomicMax instruction on
+# 64-bit integers in the implementation of conditional lastprivate.
+set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_35" CACHE STRING
   "Default architecture for OpenMP offloading to Nvidia GPUs.")
 string(REGEX MATCH "^sm_([0-9]+)$" MATCHED_ARCH 
"${CLANG_OPENMP_NVPTX_DEFAULT_ARCH}")
-if (NOT DEFINED MATCHED_ARCH OR "${CMAKE_MATCH_1}" LESS 30)
-  message(WARNING "Resetting default architecture for OpenMP offloading to 
Nvidia GPUs to sm_30")
-  set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
+if (NOT DEFINED MATCHED_ARCH OR "${CMAKE_MATCH_1}" LESS 35)
+  message(WARNING "Resetting default architecture for OpenMP offloading to 
Nvidia GPUs to sm_35")
+  set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_35" CACHE STRING
 "Default architecture for OpenMP offloading to Nvidia GPUs." FORCE)
 endif()
 


Index: CMakeLists.txt
===
--- CMakeLists.txt
+++ CMakeLists.txt
@@ -241,14 +241,15 @@
 set(CLANG_DEFAULT_OPENMP_RUNTIME "libomp" CACHE STRING
   "Default OpenMP runtime used by -fopenmp.")
 
-# OpenMP offloading requires at least sm_30 because we use shuffle instructions
-# to generate efficient code for reductions.
-set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
+# OpenMP offloading requires at least sm_35 because we use shuffle instructions
+# to generate efficient code for reductions and the atomicMax instruction on
+# 64-bit integers in the implementation of conditional lastprivate.
+set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_35" CACHE STRING
   "Default architecture for OpenMP offloading to Nvidia GPUs.")
 string(REGEX MATCH "^sm_([0-9]+)$" MATCHED_ARCH "${CLANG_OPENMP_NVPTX_DEFAULT_ARCH}")
-if (NOT DEFINED MATCHED_ARCH OR "${CMAKE_MATCH_1}" LESS 30)
-  message(WARNING "Resetting default architecture for OpenMP offloading to Nvidia GPUs to sm_30")
-  set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
+if (NOT DEFINED MATCHED_ARCH OR "${CMAKE_MATCH_1}" LESS 35)
+  message(WARNING "Resetting default architecture for OpenMP offloading to Nvidia GPUs to sm_35")
+  set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_35" CACHE STRING
 "Default architecture for OpenMP offloading to Nvidia GPUs." FORCE)
 endif()
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40977: [OpenMP] NVPTX: Set default/minimum compute capability to sm_35

2017-12-07 Thread George Rokos via Phabricator via cfe-commits
grokos created this revision.
grokos added a project: OpenMP.
Herald added a subscriber: mgorny.

The current implementation of the nvptx runtime (to be upstreamed shortly) uses 
the `atomicMax` operation on 64-bit integers. This is only supported in compute 
capabilities 3.5 and later. I've changed the clang default to `sm_35`.


Repository:
  rC Clang

https://reviews.llvm.org/D40977

Files:
  CMakeLists.txt


Index: CMakeLists.txt
===
--- CMakeLists.txt
+++ CMakeLists.txt
@@ -241,14 +241,15 @@
 set(CLANG_DEFAULT_OPENMP_RUNTIME "libomp" CACHE STRING
   "Default OpenMP runtime used by -fopenmp.")
 
-# OpenMP offloading requires at least sm_30 because we use shuffle instructions
-# to generate efficient code for reductions.
-set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
+# OpenMP offloading requires at least sm_35 because we use shuffle instructions
+# to generate efficient code for reductions and the atomicMax instruction on
+# 64-bit integers in the implementation of conditional lastprivate.
+set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_35" CACHE STRING
   "Default architecture for OpenMP offloading to Nvidia GPUs.")
 string(REGEX MATCH "^sm_([0-9]+)$" MATCHED_ARCH 
"${CLANG_OPENMP_NVPTX_DEFAULT_ARCH}")
-if (NOT DEFINED MATCHED_ARCH OR "${CMAKE_MATCH_1}" LESS 30)
-  message(WARNING "Resetting default architecture for OpenMP offloading to 
Nvidia GPUs to sm_30")
-  set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
+if (NOT DEFINED MATCHED_ARCH OR "${CMAKE_MATCH_1}" LESS 35)
+  message(WARNING "Resetting default architecture for OpenMP offloading to 
Nvidia GPUs to sm_35")
+  set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_35" CACHE STRING
 "Default architecture for OpenMP offloading to Nvidia GPUs." FORCE)
 endif()
 


Index: CMakeLists.txt
===
--- CMakeLists.txt
+++ CMakeLists.txt
@@ -241,14 +241,15 @@
 set(CLANG_DEFAULT_OPENMP_RUNTIME "libomp" CACHE STRING
   "Default OpenMP runtime used by -fopenmp.")
 
-# OpenMP offloading requires at least sm_30 because we use shuffle instructions
-# to generate efficient code for reductions.
-set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
+# OpenMP offloading requires at least sm_35 because we use shuffle instructions
+# to generate efficient code for reductions and the atomicMax instruction on
+# 64-bit integers in the implementation of conditional lastprivate.
+set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_35" CACHE STRING
   "Default architecture for OpenMP offloading to Nvidia GPUs.")
 string(REGEX MATCH "^sm_([0-9]+)$" MATCHED_ARCH "${CLANG_OPENMP_NVPTX_DEFAULT_ARCH}")
-if (NOT DEFINED MATCHED_ARCH OR "${CMAKE_MATCH_1}" LESS 30)
-  message(WARNING "Resetting default architecture for OpenMP offloading to Nvidia GPUs to sm_30")
-  set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
+if (NOT DEFINED MATCHED_ARCH OR "${CMAKE_MATCH_1}" LESS 35)
+  message(WARNING "Resetting default architecture for OpenMP offloading to Nvidia GPUs to sm_35")
+  set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_35" CACHE STRING
 "Default architecture for OpenMP offloading to Nvidia GPUs." FORCE)
 endif()
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D39745: Clang/libomptarget map interface flag renaming - NFC patch

2017-11-07 Thread George Rokos via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL317598: Clang/libomptarget map interface flag renaming - NFC 
patch (authored by grokos).

Changed prior to commit:
  https://reviews.llvm.org/D39745?vs=121928=121931#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D39745

Files:
  cfe/trunk/lib/CodeGen/CGOpenMPRuntime.cpp

Index: cfe/trunk/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- cfe/trunk/lib/CodeGen/CGOpenMPRuntime.cpp
+++ cfe/trunk/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -5978,22 +5978,21 @@
 /// \brief Delete the element from the device environment, ignoring the
 /// current reference count associated with the element.
 OMP_MAP_DELETE = 0x08,
-/// \brief The element being mapped is a pointer, therefore the pointee
-/// should be mapped as well.
-OMP_MAP_IS_PTR = 0x10,
-/// \brief This flags signals that an argument is the first one relating to
-/// a map/private clause expression. For some cases a single
-/// map/privatization results in multiple arguments passed to the runtime
-/// library.
-OMP_MAP_FIRST_REF = 0x20,
+/// \brief The element being mapped is a pointer-pointee pair; both the
+/// pointer and the pointee should be mapped.
+OMP_MAP_PTR_AND_OBJ = 0x10,
+/// \brief This flags signals that the base address of an entry should be
+/// passed to the target kernel as an argument.
+OMP_MAP_TARGET_PARAM = 0x20,
 /// \brief Signal that the runtime library has to return the device pointer
-/// in the current position for the data being mapped.
-OMP_MAP_RETURN_PTR = 0x40,
+/// in the current position for the data being mapped. Used when we have the
+/// use_device_ptr clause.
+OMP_MAP_RETURN_PARAM = 0x40,
 /// \brief This flag signals that the reference being passed is a pointer to
 /// private data.
-OMP_MAP_PRIVATE_PTR = 0x80,
+OMP_MAP_PRIVATE = 0x80,
 /// \brief Pass the element to the device by value.
-OMP_MAP_PRIVATE_VAL = 0x100,
+OMP_MAP_LITERAL = 0x100,
 /// Implicit map
 OMP_MAP_IMPLICIT = 0x200,
   };
@@ -6084,7 +6083,7 @@
   /// expression.
   unsigned getMapTypeBits(OpenMPMapClauseKind MapType,
   OpenMPMapClauseKind MapTypeModifier, bool AddPtrFlag,
-  bool AddIsFirstFlag) const {
+  bool AddIsTargetParamFlag) const {
 unsigned Bits = 0u;
 switch (MapType) {
 case OMPC_MAP_alloc:
@@ -6111,9 +6110,9 @@
   break;
 }
 if (AddPtrFlag)
-  Bits |= OMP_MAP_IS_PTR;
-if (AddIsFirstFlag)
-  Bits |= OMP_MAP_FIRST_REF;
+  Bits |= OMP_MAP_PTR_AND_OBJ;
+if (AddIsTargetParamFlag)
+  Bits |= OMP_MAP_TARGET_PARAM;
 if (MapTypeModifier == OMPC_MAP_always)
   Bits |= OMP_MAP_ALWAYS;
 return Bits;
@@ -6220,28 +6219,28 @@
 //
 // map(s.p[:22], s.a s.b)
 // , &(s.p), sizeof(double*), noflags
-// &(s.p), &(s.p[0]), 22*sizeof(double), ptr_flag + extra_flag
+// &(s.p), &(s.p[0]), 22*sizeof(double), ptr_flag
 //
 // map(s.ps)
 // , &(s.ps), sizeof(S2*), noflags
 //
 // map(s.ps->s.i)
 // , &(s.ps), sizeof(S2*), noflags
-// &(s.ps), &(s.ps->s.i), sizeof(int), ptr_flag + extra_flag
+// &(s.ps), &(s.ps->s.i), sizeof(int), ptr_flag
 //
 // map(s.ps->ps)
 // , &(s.ps), sizeof(S2*), noflags
-// &(s.ps), &(s.ps->ps), sizeof(S2*), ptr_flag + extra_flag
+// &(s.ps), &(s.ps->ps), sizeof(S2*), ptr_flag
 //
 // map(s.ps->ps->ps)
 // , &(s.ps), sizeof(S2*), noflags
-// &(s.ps), &(s.ps->ps), sizeof(S2*), ptr_flag + extra_flag
-// &(s.ps->ps), &(s.ps->ps->ps), sizeof(S2*), ptr_flag + extra_flag
+// &(s.ps), &(s.ps->ps), sizeof(S2*), ptr_flag
+// &(s.ps->ps), &(s.ps->ps->ps), sizeof(S2*), ptr_flag
 //
 // map(s.ps->ps->s.f[:22])
 // , &(s.ps), sizeof(S2*), noflags
-// &(s.ps), &(s.ps->ps), sizeof(S2*), ptr_flag + extra_flag
-// &(s.ps->ps), &(s.ps->ps->s.f[0]), 22*sizeof(float), ptr_flag + extra_flag
+// &(s.ps), &(s.ps->ps), sizeof(S2*), ptr_flag
+// &(s.ps->ps), &(s.ps->ps->s.f[0]), 22*sizeof(float), ptr_flag
 //
 // map(ps)
 // , , sizeof(S2*), noflags
@@ -6257,29 +6256,28 @@
 //
 // map(ps->p[:22])
 // ps, &(ps->p), sizeof(double*), noflags
-// &(ps->p), &(ps->p[0]), 22*sizeof(double), ptr_flag + extra_flag
+// &(ps->p), &(ps->p[0]), 22*sizeof(double), ptr_flag
 //
 // map(ps->ps)
 // ps, &(ps->ps), sizeof(S2*), noflags
 //
 // map(ps->ps->s.i)
 // ps, &(ps->ps), sizeof(S2*), noflags
-// &(ps->ps), &(ps->ps->s.i), sizeof(int), ptr_flag + extra_flag
+// &(ps->ps), &(ps->ps->s.i), sizeof(int), ptr_flag
 //
 // map(ps->ps->ps)
 // ps, &(ps->ps), sizeof(S2*), noflags
-// &(ps->ps), &(ps->ps->ps), sizeof(S2*), ptr_flag + extra_flag
+// &(ps->ps), &(ps->ps->ps), 

[PATCH] D38968: [OpenMP] Implement omp_is_initial_device() as builtin

2017-10-16 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

Now that this issue has been addressed and regressions tests pass, should we 
re-enable Cmake to build libomptarget by default?


https://reviews.llvm.org/D38968



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38798: [OpenMP] Support for implicit "declare target" functions - Sema patch

2017-10-11 Thread George Rokos via Phabricator via cfe-commits
grokos created this revision.
grokos added a project: clang.

This patch completes the support for the "declare target" directive in Sema. 
With this patch Sema handles implicitly used functions (i.e. functions which 
are used inside a target region without having been "declared target") 
including lambdas, templated functions, functions called from within target 
functions and ctors/dtors.

By default, use of implicit declare target functions is enabled. An upcoming 
driver patch will change that.


Repository:
  rL LLVM

https://reviews.llvm.org/D38798

Files:
  include/clang/Basic/LangOptions.def
  include/clang/Sema/Sema.h
  include/clang/Sema/SemaInternal.h
  lib/Parse/ParseOpenMP.cpp
  lib/Sema/SemaDecl.cpp
  lib/Sema/SemaOpenMP.cpp

Index: lib/Sema/SemaOpenMP.cpp
===
--- lib/Sema/SemaOpenMP.cpp
+++ lib/Sema/SemaOpenMP.cpp
@@ -19,6 +19,7 @@
 #include "clang/AST/Decl.h"
 #include "clang/AST/DeclCXX.h"
 #include "clang/AST/DeclOpenMP.h"
+#include "clang/AST/RecursiveASTVisitor.h"
 #include "clang/AST/StmtCXX.h"
 #include "clang/AST/StmtOpenMP.h"
 #include "clang/AST/StmtVisitor.h"
@@ -1139,6 +1140,124 @@
   return false;
 }
 
+namespace {
+/// Visit actual function body and its associated nested functions bodies.
+class ImplicitDeviceFunctionChecker
+: public RecursiveASTVisitor {
+  Sema 
+
+public:
+  ImplicitDeviceFunctionChecker(Sema ) : SemaRef(SemaReference){};
+
+  /// Traverse body of lambda, and mark it the with OMPDeclareTargetDeclAttr
+  bool TraverseLambdaCapture(LambdaExpr *LE, const LambdaCapture *C,
+ Expr *Init);
+
+  /// Traverse FunctionDecl and mark it the with OMPDeclareTargetDeclAttr
+  bool VisitFunctionDecl(FunctionDecl *F);
+
+  /// Traverse Callee of Calexpr and mark it the with OMPDeclareTargetDeclAttr
+  bool VisitCallExpr(CallExpr *Call);
+
+  /// Traverse Constructs and mark it the with OMPDeclareTargetDeclAttr
+  bool VisitCXXConstructExpr(CXXConstructExpr *E);
+
+  /// Traverse Destructor and mark it the with OMPDeclareTargetDeclAttr
+  bool VisitCXXDestructorDecl(CXXDestructorDecl *D);
+};
+}
+
+/// Traverse declaration of /param D to check whether it has
+/// OMPDeclareTargetDeclAttr or not. If so, it marks definition with
+/// OMPDeclareTargetDeclAttr.
+static void ImplicitDeclareTargetCheck(Sema , Decl *D) {
+  if (SemaRef.getLangOpts().OpenMPImplicitDeclareTarget) {
+// Structured block of target region is visited to catch function call.
+// Revealed function calls are marked with OMPDeclareTargetDeclAttr
+// attribute,
+// in case -fopenmp-implicit-declare-target extension is enabled.
+ImplicitDeviceFunctionChecker FunctionCallChecker(SemaRef);
+FunctionCallChecker.TraverseDecl(D);
+  }
+}
+
+/// Traverse declaration of /param D to check whether it has
+/// OMPDeclareTargetDeclAttr or not. If so, it marks definition with
+/// OMPDeclareTargetDeclAttr.
+void Sema::checkDeclImplicitlyUsedOpenMPTargetContext(Decl *D) {
+  if (!D || D->isInvalidDecl())
+return;
+
+  if (FunctionDecl *FD = dyn_cast(D)) {
+if (FD->hasBody()) {
+  for (auto RI : FD->redecls()) {
+if (RI->hasAttr()) {
+  Attr *A = OMPDeclareTargetDeclAttr::CreateImplicit(
+  Context, OMPDeclareTargetDeclAttr::MT_To);
+  D->addAttr(A);
+
+  ImplicitDeclareTargetCheck(*this, FD);
+  return;
+}
+  }
+}
+  }
+  return;
+}
+
+bool ImplicitDeviceFunctionChecker::TraverseLambdaCapture(
+LambdaExpr *LE, const LambdaCapture *C, Expr *Init) {
+  if (CXXRecordDecl *Class = LE->getLambdaClass())
+if (!Class->hasAttr()) {
+  Attr *A = OMPDeclareTargetDeclAttr::CreateImplicit(
+  SemaRef.Context, OMPDeclareTargetDeclAttr::MT_To);
+  Class->addAttr(A);
+}
+
+  TraverseStmt(LE->getBody());
+  return true;
+}
+
+bool ImplicitDeviceFunctionChecker::VisitFunctionDecl(FunctionDecl *F) {
+  assert(F);
+  if (!F->hasAttr()) {
+Attr *A = OMPDeclareTargetDeclAttr::CreateImplicit(
+SemaRef.Context, OMPDeclareTargetDeclAttr::MT_To);
+F->addAttr(A);
+TraverseDecl(F);
+  }
+  return true;
+}
+
+bool ImplicitDeviceFunctionChecker::VisitCallExpr(CallExpr *Call) {
+  if (FunctionDecl *Callee = Call->getDirectCallee()) {
+return VisitFunctionDecl(Callee);
+  }
+  return true;
+}
+
+bool ImplicitDeviceFunctionChecker::VisitCXXConstructExpr(CXXConstructExpr *E) {
+  CXXConstructorDecl *Constructor = E->getConstructor();
+  // When constructor is invoked, it is checked whether the object has
+  // destructor or not. In case it has destructor, destructor is automatically
+  // marked with declare target attribute since it is needed to emit for device,
+  QualType Ty = E->getType();
+  const RecordType *RT =
+  SemaRef.Context.getBaseElementType(Ty)->getAs();
+  CXXRecordDecl *RD = cast(RT->getDecl());
+
+  if (auto *Destructor = RD->getDestructor())
+VisitCXXDestructorDecl(Destructor);

[PATCH] D33509: [OpenMP] Create COMDAT group for OpenMP offload registration code to avoid multiple copies

2017-05-26 Thread George Rokos via Phabricator via cfe-commits
grokos added a comment.

I committed the patch. Thanks for submitting it!


Repository:
  rL LLVM

https://reviews.llvm.org/D33509



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33509: [OpenMP] Create COMDAT group for OpenMP offload registration code to avoid multiple copies

2017-05-26 Thread George Rokos via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL304056: [OpenMP] Create COMDAT group for OpenMP offload 
registration code to avoid… (authored by grokos).

Changed prior to commit:
  https://reviews.llvm.org/D33509?vs=100134=100518#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D33509

Files:
  cfe/trunk/lib/CodeGen/CGOpenMPRuntime.cpp
  cfe/trunk/lib/CodeGen/CodeGenModule.cpp
  cfe/trunk/test/OpenMP/target_codegen.cpp
  cfe/trunk/test/OpenMP/target_codegen_registration.cpp
  cfe/trunk/test/OpenMP/target_parallel_codegen.cpp
  cfe/trunk/test/OpenMP/target_parallel_codegen_registration.cpp
  cfe/trunk/test/OpenMP/target_parallel_if_codegen.cpp
  cfe/trunk/test/OpenMP/target_parallel_num_threads_codegen.cpp
  cfe/trunk/test/OpenMP/target_teams_codegen.cpp
  cfe/trunk/test/OpenMP/target_teams_codegen_registration.cpp
  cfe/trunk/test/OpenMP/target_teams_num_teams_codegen.cpp
  cfe/trunk/test/OpenMP/target_teams_thread_limit_codegen.cpp

Index: cfe/trunk/test/OpenMP/target_codegen_registration.cpp
===
--- cfe/trunk/test/OpenMP/target_codegen_registration.cpp
+++ cfe/trunk/test/OpenMP/target_codegen_registration.cpp
@@ -36,6 +36,8 @@
 
 // TCHECK:[[ENTTY:%.+]] = type { i8*, i8*, i[[SZ:32|64]], i32, i32 }
 
+// CHECK-DAG: $[[REGFN:\.omp_offloading\..+]] = comdat
+
 // CHECK-DAG: [[A1:@.+]] = internal global [[SA]]
 // CHECK-DAG: [[A2:@.+]] = global [[SA]]
 // CHECK-DAG: [[B1:@.+]] = global [[SB]]
@@ -153,15 +155,15 @@
 // CHECK: [[ENTEND:@.+]] = external constant [[ENTTY]]
 // CHECK: [[DEVBEGIN:@.+]] = external constant i8
 // CHECK: [[DEVEND:@.+]] = external constant i8
-// CHECK: [[IMAGES:@.+]] = internal unnamed_addr constant [1 x [[DEVTY]]] [{{.+}} { i8* [[DEVBEGIN]], i8* [[DEVEND]], [[ENTTY]]* [[ENTBEGIN]], [[ENTTY]]* [[ENTEND]] }]
-// CHECK: [[DESC:@.+]] = internal constant [[DSCTY]] { i32 1, [[DEVTY]]* getelementptr inbounds ([1 x [[DEVTY]]], [1 x [[DEVTY]]]* [[IMAGES]], i32 0, i32 0), [[ENTTY]]* [[ENTBEGIN]], [[ENTTY]]* [[ENTEND]] }
+// CHECK: [[IMAGES:@.+]] = internal unnamed_addr constant [1 x [[DEVTY]]] [{{.+}} { i8* [[DEVBEGIN]], i8* [[DEVEND]], [[ENTTY]]* [[ENTBEGIN]], [[ENTTY]]* [[ENTEND]] }], comdat($[[REGFN]])
+// CHECK: [[DESC:@.+]] = internal constant [[DSCTY]] { i32 1, [[DEVTY]]* getelementptr inbounds ([1 x [[DEVTY]]], [1 x [[DEVTY]]]* [[IMAGES]], i32 0, i32 0), [[ENTTY]]* [[ENTBEGIN]], [[ENTTY]]* [[ENTEND]] }, comdat($[[REGFN]])
 
 // We have 4 initializers, one for the 500 priority, another one for 501, or more for the default priority, and the last one for the offloading registration function.
 // CHECK: @llvm.global_ctors = appending global [4 x { i32, void ()*, i8* }] [
 // CHECK-SAME: { i32, void ()*, i8* } { i32 500, void ()* [[P500:@[^,]+]], i8* null },
 // CHECK-SAME: { i32, void ()*, i8* } { i32 501, void ()* [[P501:@[^,]+]], i8* null },
 // CHECK-SAME: { i32, void ()*, i8* } { i32 65535, void ()* [[PMAX:@[^,]+]], i8* null },
-// CHECK-SAME: { i32, void ()*, i8* } { i32 0, void ()* bitcast (void (i8*)* [[REGFN:@.+]] to void ()*), i8* null }]
+// CHECK-SAME: { i32, void ()*, i8* } { i32 0, void ()* bitcast (void (i8*)* @[[REGFN]] to void ()*), i8* bitcast (void (i8*)* @[[REGFN]] to i8*) }]
 
 // CHECK-NTARGET: @llvm.global_ctors = appending global [3   x { i32, void ()*, i8* }] [
 
@@ -364,14 +366,16 @@
 
 // Check registration and unregistration
 
-//CHECK: define internal void [[UNREGFN:@.+]](i8*)
+//CHECK: define internal void @[[UNREGFN:.+]](i8*)
+//CHECK-SAME: comdat($[[REGFN]]) {
 //CHECK: call i32 @__tgt_unregister_lib([[DSCTY]]* [[DESC]])
 //CHECK: ret void
 //CHECK: declare i32 @__tgt_unregister_lib([[DSCTY]]*)
 
-//CHECK: define internal void [[REGFN]](i8*)
+//CHECK: define linkonce hidden void @[[REGFN]](i8*)
+//CHECK-SAME: comdat {
 //CHECK: call i32 @__tgt_register_lib([[DSCTY]]* [[DESC]])
-//CHECK: call i32 @__cxa_atexit(void (i8*)* [[UNREGFN]], i8* bitcast ([[DSCTY]]* [[DESC]] to i8*),
+//CHECK: call i32 @__cxa_atexit(void (i8*)* @[[UNREGFN]], i8* bitcast ([[DSCTY]]* [[DESC]] to i8*),
 //CHECK: ret void
 //CHECK: declare i32 @__tgt_register_lib([[DSCTY]]*)
 
@@ -407,31 +411,31 @@
 
 // Check metadata is properly generated:
 // CHECK: !omp_offload.info = !{!{{[0-9]+}}, !{{[0-9]+}}, !{{[0-9]+}}, !{{[0-9]+}}, !{{[0-9]+}}, !{{[0-9]+}}, !{{[0-9]+}}, !{{[0-9]+}}, !{{[0-9]+}}, !{{[0-9]+}}, !{{[0-9]+}}, !{{[0-9]+}}}
-// CHECK-DAG: = !{i32 0, i32 [[DEVID:-?[0-9]+]], i32 [[FILEID:-?[0-9]+]], !"_ZN2SB3fooEv", i32 193, i32 {{[0-9]+}}}
-// CHECK-DAG: = !{i32 0, i32 [[DEVID]], i32 [[FILEID]], !"_ZN2SDD1Ev", i32 243, i32 {{[0-9]+}}}
-// CHECK-DAG: = !{i32 0, i32 [[DEVID]], i32 [[FILEID]], !"_ZN2SEC1Ev", i32 259, i32 {{[0-9]+}}}
-// CHECK-DAG: = !{i32 0, i32 [[DEVID]], i32 [[FILEID]], !"_ZN2SED1Ev", i32 265, i32 {{[0-9]+}}}
-// CHECK-DAG: = !{i32 0, i32 [[DEVID]], i32 [[FILEID]], !"_ZN2STILi1000EE3fooEv", i32 276,