[PATCH] D155769: [Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-06 Thread Alex Voicu via Phabricator via cfe-commits
AlexVlx marked an inline comment as done.
AlexVlx added inline comments.



Comment at: clang/docs/StdParSupport.rst:366-367
+
+   Note that this is a temporary, unsafe workaround for a deficiency in the C++
+   Standard.
+

keryell wrote:
> Another way could be to hide somehow a way to select the device in the policy 
> like in https://github.com/KhronosGroup/SyclParallelSTL, which might be 
> something included in your point "4." of "Open Questions / Future 
> Developments".
> Perhaps better than opening the TLS Pandora box?
In hindsight, this was needlessly confusing and relied on an implementation 
detail, therefore the reference was removed. Thank you for pointing that out.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155769: [Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-06 Thread Alex Voicu via Phabricator via cfe-commits
AlexVlx updated this revision to Diff 547572.
AlexVlx added a comment.

Remove confusing guidance around mGPU.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/StdParSupport.rst
  clang/docs/index.rst

Index: clang/docs/index.rst
===
--- clang/docs/index.rst
+++ clang/docs/index.rst
@@ -47,6 +47,7 @@
OpenCLSupport
OpenMPSupport
SYCLSupport
+   StdParSupport
HIPSupport
HLSL/HLSLDocs
ThinLTO
Index: clang/docs/StdParSupport.rst
===
--- /dev/null
+++ clang/docs/StdParSupport.rst
@@ -0,0 +1,350 @@
+==
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+.. contents::
+   :local:
+
+Introduction
+
+
+This document describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of many accelerators, such as GPUs.
+This, coupled with the ubiquity of GPU accelerated algorithm libraries that
+implement most / all corresponding libraries in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an accelerator, via a well specified, familiar,
+algorithmic interface, without having to delve into low-level hardware specific
+details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise accelerator programming, without loss of user familiarity
+- **Where**: any and all accelerators that can be targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-stdpar --offload-target=foo`` flags, the call to
+``find`` will be offloaded to an accelerator that is part of the ``foo`` target
+family. If either ``foo`` or its runtime environment do not support transparent
+on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--stdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-target``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-stdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this by specifying the ``parallel_unsequenced_policy``, we rely on a
+pass over IR to clean up any and all code that was not "meant" for offload. If
+requested, allocation interposition is also handled via a separate pass over IR.
+
+To interface with the client HIP runtime, and to forward offloaded algorithm
+invocations to the corresponding accelerator specific library implementation, an
+implementation 

[PATCH] D155769: [Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-07-25 Thread Ronan Keryell via Phabricator via cfe-commits
keryell added a comment.

Interesting.




Comment at: clang/docs/StdParSupport.rst:349
+
+thread t0{[&]() {
+  hipSetDevice(accelerator_0);





Comment at: clang/docs/StdParSupport.rst:354
+}};
+thread t1{[&]() {
+  hitSetDevice(accelerator_1);





Comment at: clang/docs/StdParSupport.rst:360-362
+t0.join();
+t1.join();
+





Comment at: clang/docs/StdParSupport.rst:366-367
+
+   Note that this is a temporary, unsafe workaround for a deficiency in the C++
+   Standard.
+

Another way could be to hide somehow a way to select the device in the policy 
like in https://github.com/KhronosGroup/SyclParallelSTL, which might be 
something included in your point "4." of "Open Questions / Future Developments".
Perhaps better than opening the TLS Pandora box?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155769: [Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-07-19 Thread Alex Voicu via Phabricator via cfe-commits
AlexVlx created this revision.
AlexVlx added reviewers: rjmccall, yaxunl, aaron.ballman.
AlexVlx added a project: clang.
Herald added a subscriber: arphaman.
Herald added a project: All.
AlexVlx requested review of this revision.
Herald added a subscriber: cfe-commits.

This patch adds the documentation for the standard algorithm offload feature 
being proposed here: 
https://discourse.llvm.org/t/rfc-adding-c-parallel-algorithm-offload-support-to-clang-llvm/72159/1.
 It is the parent of a series of patches that make up the implementation.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D155769

Files:
  clang/docs/StdParSupport.rst
  clang/docs/index.rst

Index: clang/docs/index.rst
===
--- clang/docs/index.rst
+++ clang/docs/index.rst
@@ -47,6 +47,7 @@
OpenCLSupport
OpenMPSupport
SYCLSupport
+   StdParSupport
HLSL/HLSLDocs
ThinLTO
APINotes
Index: clang/docs/StdParSupport.rst
===
--- /dev/null
+++ clang/docs/StdParSupport.rst
@@ -0,0 +1,381 @@
+==
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+.. contents::
+   :local:
+
+Introduction
+
+
+This document describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of many accelerators, such as GPUs.
+This, coupled with the ubiquity of GPU accelerated algorithm libraries that
+implement most / all corresponding libraries in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an accelerator, via a well specified, familiar,
+algorithmic interface, without having to delve into low-level hardware specific
+details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise accelerator programming, without loss of user familiarity
+- **Where**: any and all accelerators that can be targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-stdpar --offload-target=foo`` flags, the call to
+``find`` will be offloaded to an accelerator that is part of the ``foo`` target
+family. If either ``foo`` or its runtime environment do not support transparent
+on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--stdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-target``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-stdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this