Re: OpenMP offloading vs. C++ static local variables

2023-12-23 Thread Thomas Schwinge
Hi!

On 2023-12-21T13:58:23+0100, Jakub Jelinek  wrote:
> On Thu, Dec 21, 2023 at 01:31:19PM +0100, Thomas Schwinge wrote:
>> [...] the gimplification-level code re
>> 'Static locals [...] need to be "omp declare target"' runs *after*
>> 'omp_discover_implicit_declare_target'.  Thus my "move" idea above.
>
> Can't we mark the static locals already during that discovery?

Well, that's precisely what I had tried to communicate, earlier on.  ;-)

I'll work on that, as a refactoring, after I've gotten the current
implementation idea working.

> The addition during gimplification was probably made when we didn't have
> that at all.


>> OK to push, for a start, the attached
>> "GCN, nvptx: Basic '__cxa_guard_{acquire,abort,release}' for C++ static 
>> local variables support"?
>> That's now in libgcc not libgomp, so that it's also usable for GCN, nvptx
>> target testing, where we thus see a number of FAIL -> PASS progressions.
>
>> For now, for single-threaded GCN, nvptx target use only; extension for
>> multi-threaded offloading use to follow later.
>>
>>  libgcc/
>>  * c++-minimal/README: New.
>>  * c++-minimal/guard.c: New.
>>  * config/gcn/t-amdgcn (LIB2ADD): Add it.
>>  * config/nvptx/t-nvptx (LIB2ADD): Likewise.
>
>> +/* Copy'n'paste/edit from 'libstdc++-v3/libsupc++/cxxabi.h'.  */
>> +
>> +  int
>> +  __cxa_guard_acquire(__guard*);
>> +
>> +  void
>> +  __cxa_guard_release(__guard*);
>> +
>> +  void
>> +  __cxa_guard_abort(__guard*);
>
> When all this isn't inside a namespace, shouldn't it be indented by
> 2 spaces less?
>
>> +
>> +/* Copy'n'paste/edit from 'libstdc++-v3/libsupc++/guard.cc'.  */
>> +
>> +# undef _GLIBCXX_GUARD_TEST_AND_ACQUIRE
>> +# undef _GLIBCXX_GUARD_SET_AND_RELEASE
>> +# define _GLIBCXX_GUARD_SET_AND_RELEASE(G) _GLIBCXX_GUARD_SET (G)
>
> And without a space after # here?

Well, those were just un-edited copy'n'pastes from the original files;
now indentation/space-corrected for viewing pleasure.

> Otherwise LGTM, but hope that one day we'll get rid of it again.

Yep.

Pushed to master branch commit c0bf7ea189ecf252152fe15134f70f576bcd20b2
"GCN, nvptx: Basic '__cxa_guard_{acquire,abort,release}' for C++ static local 
variables support",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From c0bf7ea189ecf252152fe15134f70f576bcd20b2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 20 Dec 2023 12:27:48 +0100
Subject: [PATCH] GCN, nvptx: Basic '__cxa_guard_{acquire,abort,release}' for
 C++ static local variables support

For now, for single-threaded GCN, nvptx target use only; extension for
multi-threaded offloading use is to follow later.  Eventually switch to
libstdc++-v3/libsupc++ proper.

	libgcc/
	* c++-minimal/README: New.
	* c++-minimal/guard.c: New.
	* config/gcn/t-amdgcn (LIB2ADD): Add it.
	* config/nvptx/t-nvptx (LIB2ADD): Likewise.
---
 libgcc/c++-minimal/README   |  2 +
 libgcc/c++-minimal/guard.c  | 97 +
 libgcc/config/gcn/t-amdgcn  |  3 ++
 libgcc/config/nvptx/t-nvptx |  3 ++
 4 files changed, 105 insertions(+)
 create mode 100644 libgcc/c++-minimal/README
 create mode 100644 libgcc/c++-minimal/guard.c

diff --git a/libgcc/c++-minimal/README b/libgcc/c++-minimal/README
new file mode 100644
index 000..832f1265f7e
--- /dev/null
+++ b/libgcc/c++-minimal/README
@@ -0,0 +1,2 @@
+Minimal hacked-up version of some C++ support for offload devices, until we
+have libstdc++-v3/libsupc++ proper.
diff --git a/libgcc/c++-minimal/guard.c b/libgcc/c++-minimal/guard.c
new file mode 100644
index 000..e9937b07a62
--- /dev/null
+++ b/libgcc/c++-minimal/guard.c
@@ -0,0 +1,97 @@
+/* 'libstdc++-v3/libsupc++/guard.cc' for offload devices, until we have
+   libstdc++-v3/libsupc++ proper.
+
+   Copyright (C) 2002-2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#if defined __AMDGCN__

Re: OpenMP offloading vs. C++ static local variables

2023-12-21 Thread Jakub Jelinek
On Thu, Dec 21, 2023 at 01:31:19PM +0100, Thomas Schwinge wrote:
> These three: implicitly, or explicit '#pragma omp declare target' etc.,
> or inside '#pragma omp begin declare target' region are the only OpenMP
> facilities to get things 'omp declare target'ed, right?

I think so.
> That doesn't generally work, as the gimplification-level code re
> 'Static locals [...] need to be "omp declare target"' runs *after*
> 'omp_discover_implicit_declare_target'.  Thus my "move" idea above.

Can't we mark the static locals already during that discovery?
The addition during gimplification was probably made when we didn't have
that at all.

> OK to push, for a start, the attached
> "GCN, nvptx: Basic '__cxa_guard_{acquire,abort,release}' for C++ static local 
> variables support"?
> That's now in libgcc not libgomp, so that it's also usable for GCN, nvptx
> target testing, where we thus see a number of FAIL -> PASS progressions.

> For now, for single-threaded GCN, nvptx target use only; extension for
> multi-threaded offloading use to follow later.
> 
>   libgcc/
>   * c++-minimal/README: New.
>   * c++-minimal/guard.c: New.
>   * config/gcn/t-amdgcn (LIB2ADD): Add it.
>   * config/nvptx/t-nvptx (LIB2ADD): Likewise.

> +/* Copy'n'paste/edit from 'libstdc++-v3/libsupc++/cxxabi.h'.  */
> +
> +  int
> +  __cxa_guard_acquire(__guard*);
> +
> +  void
> +  __cxa_guard_release(__guard*);
> +
> +  void
> +  __cxa_guard_abort(__guard*);

When all this isn't inside a namespace, shouldn't it be indented by
2 spaces less?

> +
> +/* Copy'n'paste/edit from 'libstdc++-v3/libsupc++/guard.cc'.  */
> +
> +# undef _GLIBCXX_GUARD_TEST_AND_ACQUIRE
> +# undef _GLIBCXX_GUARD_SET_AND_RELEASE
> +# define _GLIBCXX_GUARD_SET_AND_RELEASE(G) _GLIBCXX_GUARD_SET (G)

And without a space after # here?

Otherwise LGTM, but hope that one day we'll get rid of it again.

Jakub



Re: OpenMP offloading vs. C++ static local variables

2023-12-21 Thread Thomas Schwinge
Hi Jakub!

On 2023-12-07T16:33:08+0100, Jakub Jelinek  wrote:
> On Thu, Dec 07, 2023 at 04:09:04PM +0100, Thomas Schwinge wrote:
>> > Yeah, I believe we should in the omp_discover_* sub-pass handle with
>> > a help of a langhook automatically mark the guard variables (possibly
>> > iff the guarded variable is marked?),
>>
>> Looking at 'gcc/omp-offload.cc:omp_discover_implicit_declare_target' left
>> me confused how that would be the code that marks up 'static' variables
>> as implicit 'omp declare target'.  Working through a simple POD example
>> (say, 's%static S s%static int i') it turns out, indeed that's not where
>> that is happending, but instead 'gcc/gimplify.cc:gimplify_bind_expr' is
>> the place:
>
> Sure, that is for the case where those local statics should be marked
> implicitly because they appear in a target function.
> They can be also marked explicitly by the user through
> #pragma omp declare target enter (name_of_static_var)
> or
> [[omp::decl (declare target)]] attribute on it etc.

These three: implicitly, or explicit '#pragma omp declare target' etc.,
or inside '#pragma omp begin declare target' region are the only OpenMP
facilities to get things 'omp declare target'ed, right?

>> That said...  Couldn't we indeed move this gimplification-level code re
>> 'Static locals [...] need to be "omp declare target"' into
>> 'gcc/omp-offload.cc:omp_discover_implicit_declare_target'?
>
> The omp-offload.cc discovery stuff was added for stuff where the OpenMP
> standard says something is implicitly declare target because there is
> some use of it satisfying some rule.
> Like, calls to functions defined in current compilation unit referenced in
> target region or something similar, or such calls referenced in declare
> target static var initializers.
> So, that feels to me like the right spot to handle the guards as well.
> Of course, the middle-end doesn't know about C++ FE's get_guard variable,
> so it should be some new language hook which would take care of it.
> The omp_discover_declare* functions can add further VAR_DECLs to the
> worklist, so I'd probably call the new language hook in the
> omp_discover_implicit_declare_target last loop.
> Or maybe even better just handle that in the
> cxx_omp_finish_decl_inits hook.  You can just
>   FOR_EACH_VARIABLE (vnode)
> if (DECL_FUNCTION_SCOPE_P (vnode->decl)
>   && omp_declare_target_var_p (vnode->decl))
>   {
>   tree sname = mangle_guard_variable (decl);
>   tree guard = get_global_binding (sname);
>   if (guard)
> ... mark guard as declare target if not yet marked ...
>   }
> because guard var initializers don't really mention anything and so
> their addition doesn't need to trigger further worklist changes.

That doesn't generally work, as the gimplification-level code re
'Static locals [...] need to be "omp declare target"' runs *after*
'omp_discover_implicit_declare_target'.  Thus my "move" idea above.
However, let's defer the latter one; I've now got a simple setup where
the new language hook is invoked in all necessary places.  (Will post
later.)

>> > And sure, __cxa_guard_* would need to be implemented in the offloading
>> > libsupc++.a or libstdc++.a.
>>
>> Until proper libstdc++/libsupc++ support emerges (I'm working on it...),
>> my idea was to add a temporary 'libgomp/config/accel/*.c' implementation
>> (based on 'libstdc++-v3/libsupc++/guard.cc').
>
> That looks reasonable.

OK to push, for a start, the attached
"GCN, nvptx: Basic '__cxa_guard_{acquire,abort,release}' for C++ static local 
variables support"?
That's now in libgcc not libgomp, so that it's also usable for GCN, nvptx
target testing, where we thus see a number of FAIL -> PASS progressions.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From d40678768ae90c3fe1208cffd7d92e7058db5bbf Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 20 Dec 2023 12:27:48 +0100
Subject: [PATCH] GCN, nvptx: Basic '__cxa_guard_{acquire,abort,release}' for
 C++ static local variables support

For now, for single-threaded GCN, nvptx target use only; extension for
multi-threaded offloading use to follow later.

	libgcc/
	* c++-minimal/README: New.
	* c++-minimal/guard.c: New.
	* config/gcn/t-amdgcn (LIB2ADD): Add it.
	* config/nvptx/t-nvptx (LIB2ADD): Likewise.
---
 libgcc/c++-minimal/README   |  2 +
 libgcc/c++-minimal/guard.c  | 97 +
 libgcc/config/gcn/t-amdgcn  |  3 ++
 libgcc/config/nvptx/t-nvptx |  3 ++
 4 files changed, 105 insertions(+)
 create mode 100644 libgcc/c++-minimal/README
 create mode 100644 libgcc/c++-minimal/guard.c

diff --git a/libgcc/c++-minimal/README b/libgcc/c++-minimal/README
new file mode 100644
index 000..832f1265f7e
--- /dev/null
+++ 

Re: OpenMP offloading vs. C++ static local variables

2023-12-07 Thread Jakub Jelinek
On Thu, Dec 07, 2023 at 04:09:04PM +0100, Thomas Schwinge wrote:
> > Yeah, I believe we should in the omp_discover_* sub-pass handle with
> > a help of a langhook automatically mark the guard variables (possibly
> > iff the guarded variable is marked?),
> 
> Looking at 'gcc/omp-offload.cc:omp_discover_implicit_declare_target' left
> me confused how that would be the code that marks up 'static' variables
> as implicit 'omp declare target'.  Working through a simple POD example
> (say, 's%static S s%static int i') it turns out, indeed that's not where
> that is happending, but instead 'gcc/gimplify.cc:gimplify_bind_expr' is
> the place:

Sure, that is for the case where those local statics should be marked
implicitly because they appear in a target function.
They can be also marked explicitly by the user through
#pragma omp declare target enter (name_of_static_var)
or
[[omp::decl (declare target)]] attribute on it etc.

> Now, the problem why that existing code doesn't trigger for C++ guard
> variables is that those are not in 'BIND_EXPR_VARS', due to C++ front end
> use of 'pushdecl_top_level_and_finish'.  If I change the C++ front end as
> follows (WIP; not reviewed in detail):
> 
> --- gcc/cp/decl2.cc
> +++ gcc/cp/decl2.cc
> @@ -3576,5 +3576,6 @@ get_guard (tree decl)
>DECL_IGNORED_P (guard) = 1;
>TREE_USED (guard) = 1;
> -  pushdecl_top_level_and_finish (guard, NULL_TREE);
> +  pushdecl (guard);
> +  cp_finish_decl (guard, NULL_TREE, false, NULL_TREE, 0);
>  }
>return guard;

I don't think that is desirable.

> ..., then we do get the expected behavior:
> 
> --- a-r.cc.006t.gimple2023-12-07 13:27:36.254963406 +0100
> +++ a-r.cc.006t.gimple2023-12-07 14:10:39.352107107 +0100
> @@ -5,6 +5,7 @@
>bool retval.1;
>bool D.2966;
>static struct S s1;
> +  static long long int _ZGVZL2f1vE2s1;
> 
>gimple_call <__atomic_load_1, _1, &_ZGVZL2f1vE2s1, 2>
>gimple_assign 
> 
> ..., and offloading compilation works down to the expected next issue:
> 
> ld: error: undefined symbol: __cxa_guard_acquire
> >>> referenced by /tmp/ccAVyZpc.o:(f1())
> [...]
> collect2: error: ld returned 1 exit status
> gcn mkoffload: fatal error: 
> build-gcc/gcc/x86_64-pc-linux-gnu-accel-amdgcn-amdhsa-gcc returned 1 exit 
> status
> [...]
> 
> However: 'pushdecl_top_level_and_finish' has been used there "forever",
> and I currently have no clue at all whether changing that into 'pushdecl'
> would be acceptable, what effects that'd have elsewhere.

Exactly.

> That said...  Couldn't we indeed move this gimplification-level code re
> 'Static locals [...] need to be "omp declare target"' into
> 'gcc/omp-offload.cc:omp_discover_implicit_declare_target'?

The omp-offload.cc discovery stuff was added for stuff where the OpenMP
standard says something is implicitly declare target because there is
some use of it satisfying some rule.
Like, calls to functions defined in current compilation unit referenced in
target region or something similar, or such calls referenced in declare
target static var initializers.
So, that feels to me like the right spot to handle the guards as well.
Of course, the middle-end doesn't know about C++ FE's get_guard variable,
so it should be some new language hook which would take care of it.
The omp_discover_declare* functions can add further VAR_DECLs to the
worklist, so I'd probably call the new language hook in the
omp_discover_implicit_declare_target last loop.
Or maybe even better just handle that in the
cxx_omp_finish_decl_inits hook.  You can just
  FOR_EACH_VARIABLE (vnode)
if (DECL_FUNCTION_SCOPE_P (vnode->decl)
&& omp_declare_target_var_p (vnode->decl))
  {
tree sname = mangle_guard_variable (decl);
tree guard = get_global_binding (sname);
if (guard)
  ... mark guard as declare target if not yet marked ...
  }
because guard var initializers don't really mention anything and so
their addition doesn't need to trigger further worklist changes.

> > or e.g. rtti info (_ZTS*, _ZTI*)
> > and eventually figure out what we should do about virtual tables (_ZTV*).
> > The last case is most complicated, as it contains function pointers, and we
> > need to figure out if we mark all methods, or say replace some pointers in
> > the virtual table with NULLs or something that errors or terminates if it
> > isn't marked.
> 
> All those I plan to defer, for now.

Ok.

> > And sure, __cxa_guard_* would need to be implemented in the offloading
> > libsupc++.a or libstdc++.a.
> 
> Until proper libstdc++/libsupc++ support emerges (I'm working on it...),
> my idea was to add a temporary 'libgomp/config/accel/*.c' implementation
> (based on 'libstdc++-v3/libsupc++/guard.cc').

That looks reasonable.

Jakub



Re: OpenMP offloading vs. C++ static local variables

2023-12-07 Thread Thomas Schwinge
Hi!

Jakub, would you please provide guidance?


Elsewhere, I wrote:

|| I'm working on implementing (some) C++ standard library support for code
|| offloading in GCC, and ran into the following issue: per
|| 
,
|| "variables declared at block scope with the specifier 'static' [...] have
|| static [...] storage duration but are initialized the first time control
|| passes through their declaration".
||
|| To implement "initialized the first time [...]" in a multi-threaded
|| context, compilers typically use a guard variable and locking call to a
|| compiler-internal C++ support library function ('__cxa_guard_acquire').
|| (..., which in GCC, you may disable with '-fno-threadsafe-statics', for
|| that matter.)
||
|| In GCC, all this appears to work fine for multi-threaded host-side
|| (non-offladed) OpenMP 'parallel', for example.  However, I'm now curious
|| about the OpenMP 'target' offloading case; minimal example:
||
|| struct S
|| {
||   S() { }
||   ~S() { }
|| };
||
|| static void f()
|| {
||   // 

||   static S s;
|| }
||
|| int main()
|| {
|| #pragma omp target
||   {
|| f();
||   }
|| }
||
|| (Everything other than 'main' is meant to be implicitly OpenMP
|| 'declare target'ed here.)

On 2023-11-20T19:13:23+0100, Jakub Jelinek  wrote:
> On Mon, Nov 20, 2023 at 06:43:47PM +0100, Thomas Schwinge wrote:
>> Current GCC fails:
>>
>> error: variable ‘_ZGVZL1fvE1s’ has been referenced in offloaded code but 
>> hasn’t been marked to be included in the offloaded code
>>
>> ... with:
>>
>> $ c++filt _ZGVZL1fvE1s
>> guard variable for f()::s
>>
>> That may "simply" be a bug to fix in GCC.

The conclusion was: yes.

>> (Something like implicitly
>> creating respective guard variables on the device, I suppose.)
>
> Yeah, I believe we should in the omp_discover_* sub-pass handle with
> a help of a langhook automatically mark the guard variables (possibly
> iff the guarded variable is marked?),

Looking at 'gcc/omp-offload.cc:omp_discover_implicit_declare_target' left
me confused how that would be the code that marks up 'static' variables
as implicit 'omp declare target'.  Working through a simple POD example
(say, 's%static S s%static int i') it turns out, indeed that's not where
that is happending, but instead 'gcc/gimplify.cc:gimplify_bind_expr' is
the place:

[...]
  for (t = BIND_EXPR_VARS (bind_expr); t ; t = DECL_CHAIN (t))
[...]
  /* Static locals inside of target construct or offloaded
 routines need to be "omp declare target".  */
  if (TREE_STATIC (t))
for (; ctx; ctx = ctx->outer_context)
  if ((ctx->region_type & ORT_TARGET) != 0)
{
  if (!lookup_attribute ("omp declare target",
 DECL_ATTRIBUTES (t)))
{
  tree id = get_identifier ("omp declare target");
  DECL_ATTRIBUTES (t)
= tree_cons (id, NULL_TREE, DECL_ATTRIBUTES 
(t));
  varpool_node *node = varpool_node::get (t);
  if (node)
{
  node->offloadable = 1;
  if (ENABLE_OFFLOADING && !DECL_EXTERNAL (t))
{
  g->have_offload = true;
  if (!in_lto_p)
vec_safe_push (offload_vars, t);
}
}
}
  break;
[...]

You (Jakub) added that in
commit 211b7533bff68e5dd72e7d75249f470101759d6d (Subversion r272322)
"Make static vars inside of target regions or declare target routines 
implicitly declare target to (PR middle-end/90779)".

Now, the problem why that existing code doesn't trigger for C++ guard
variables is that those are not in 'BIND_EXPR_VARS', due to C++ front end
use of 'pushdecl_top_level_and_finish'.  If I change the C++ front end as
follows (WIP; not reviewed in detail):

--- gcc/cp/decl2.cc
+++ gcc/cp/decl2.cc
@@ -3576,5 +3576,6 @@ get_guard (tree decl)
   DECL_IGNORED_P (guard) = 1;
   TREE_USED (guard) = 1;
-  pushdecl_top_level_and_finish (guard, NULL_TREE);
+  pushdecl (guard);
+  cp_finish_decl (guard, NULL_TREE, false, NULL_TREE, 0);
 }
   return guard;

..., then we do get the expected behavior:

--- a-r.cc.006t.gimple2023-12-07 13:27:36.254963406 +0100
+++ a-r.cc.006t.gimple