'include/cuda/cuda.h': Add parts necessary for nvptx-tools 'nvptx-run' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))

2022-05-18 Thread Thomas Schwinge
Hi!

On 2017-01-13T19:11:23+0100, Jakub Jelinek  wrote:
> cuda.h header included
> in this patch

In order to be able to use that file without changes for
nvptx-tools 'nvptx-run', I've pushed to GCC master branch
commit 86f64400a5692499856d41462461327b93f82b8d
"'include/cuda/cuda.h': Add parts necessary for nvptx-tools 'nvptx-run'",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 86f64400a5692499856d41462461327b93f82b8d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 29 Apr 2022 10:44:12 +0200
Subject: [PATCH] 'include/cuda/cuda.h': Add parts necessary for nvptx-tools
 'nvptx-run'

	include/
	* cuda/cuda.h (enum CUjit_option): Add
	'CU_JIT_GENERATE_DEBUG_INFO', 'CU_JIT_GENERATE_LINE_INFO'.
	(enum CUlimit): Add 'CU_LIMIT_STACK_SIZE',
	'CU_LIMIT_MALLOC_HEAP_SIZE'.
	(cuCtxSetLimit, cuGetErrorName): Add.
---
 include/cuda/cuda.h | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index d7105fb331e..3938d05d150 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -97,7 +97,9 @@ typedef enum {
   CU_JIT_ERROR_LOG_BUFFER = 5,
   CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES = 6,
   CU_JIT_OPTIMIZATION_LEVEL = 7,
-  CU_JIT_LOG_VERBOSE = 12
+  CU_JIT_GENERATE_DEBUG_INFO = 11,
+  CU_JIT_LOG_VERBOSE = 12,
+  CU_JIT_GENERATE_LINE_INFO = 13,
 } CUjit_option;
 
 typedef enum {
@@ -117,6 +119,11 @@ enum {
   CU_STREAM_NON_BLOCKING = 1
 };
 
+typedef enum {
+  CU_LIMIT_STACK_SIZE = 0x00,
+  CU_LIMIT_MALLOC_HEAP_SIZE = 0x02,
+} CUlimit;
+
 #define cuCtxCreate cuCtxCreate_v2
 CUresult cuCtxCreate (CUcontext *, unsigned, CUdevice);
 #define cuCtxDestroy cuCtxDestroy_v2
@@ -128,6 +135,7 @@ CUresult cuCtxPopCurrent (CUcontext *);
 #define cuCtxPushCurrent cuCtxPushCurrent_v2
 CUresult cuCtxPushCurrent (CUcontext);
 CUresult cuCtxSynchronize (void);
+CUresult cuCtxSetLimit (CUlimit, size_t);
 CUresult cuDeviceGet (CUdevice *, int);
 #define cuDeviceTotalMem cuDeviceTotalMem_v2
 CUresult cuDeviceTotalMem (size_t *, CUdevice);
@@ -143,6 +151,7 @@ CUresult cuEventRecord (CUevent, CUstream);
 CUresult cuEventSynchronize (CUevent);
 CUresult cuFuncGetAttribute (int *, CUfunction_attribute, CUfunction);
 CUresult cuGetErrorString (CUresult, const char **);
+CUresult cuGetErrorName (CUresult, const char **);
 CUresult cuInit (unsigned);
 CUresult cuDriverGetVersion (int *);
 CUresult cuLaunchKernel (CUfunction, unsigned, unsigned, unsigned, unsigned,
-- 
2.35.1



'include/cuda/cuda.h': For C++, wrap in 'extern "C"' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))

2022-05-18 Thread Thomas Schwinge
Hi!

On 2017-01-13T19:11:23+0100, Jakub Jelinek  wrote:
> cuda.h header included
> in this patch

To make this '#include'able in C++ code, I've pushed to master branch
commit bdd1dc1bfbe1492edf3ce5e4288cfbc55be329ab
"'include/cuda/cuda.h': For C++, wrap in 'extern "C"'", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From bdd1dc1bfbe1492edf3ce5e4288cfbc55be329ab Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 29 Apr 2022 10:33:15 +0200
Subject: [PATCH] 'include/cuda/cuda.h': For C++, wrap in 'extern "C"'

	include/
	* cuda/cuda.h: For C++, wrap in 'extern "C"'.
---
 include/cuda/cuda.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 5c813ad2cf8..d7105fb331e 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -32,6 +32,10 @@ the proprietary CUDA toolkit.  */
 
 #define CUDA_VERSION 8000
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 typedef void *CUcontext;
 typedef int CUdevice;
 #if defined(__LP64__) || defined(_WIN64)
@@ -191,4 +195,8 @@ CUresult cuStreamQuery (CUstream);
 CUresult cuStreamSynchronize (CUstream);
 CUresult cuStreamWaitEvent (CUstream, CUevent, unsigned);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* GCC_CUDA_H */
-- 
2.35.1



Re: Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))

2022-04-06 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 06, 2022 at 02:39:18PM +0200, Thomas Schwinge wrote:
> ... so that it may be used by other projects that inherit GCC's 'include'
> directory.
> 
>   include/
>   * cuda/cuda.h: New file.
>   libgomp/
>   * plugin/cuda/cuda.h: Remove file.
>   * plugin/plugin-nvptx.c [PLUGIN_NVPTX_DYNAMIC]: Include
>   "cuda/cuda.h" instead of .
>   * plugin/configfrag.ac : Don't set
>   'PLUGIN_NVPTX_CPPFLAGS'.
>   * configure: Regenerate.

Ok.

Jakub



Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))

2022-04-06 Thread Thomas Schwinge
Hi!

On 2017-01-13T19:11:23+0100, Jakub Jelinek  wrote:
> Especially for distributions it is undesirable to need to have proprietary
> CUDA libraries and headers installed when building GCC.

> I've talked to our lawyers and they said that the cuda.h header included
> in this patch doesn't infringe anyone's copyright or is otherwise a fair
> use, it has been created by gathering all the cu*/CU* symbols from the
> current and older nvptx plugin and some oacc tests, then stubbing the
> pointer-ish typedefs, grabing most enum values and function prototypes from
> https://raw.githubusercontent.com/shinpei0208/gdev/master/cuda/driver/cuda.h
> and verifying assembly with that header against assembly when compiled
> against NVidia's cuda.h.

..., and later accordingly was slightly extended, as necessary to use
further CUDA features in libgomp's nvptx plugin.

> --- libgomp/plugin/cuda/cuda.h.jj 2017-01-13 15:58:00.966544147 +0100
> +++ libgomp/plugin/cuda/cuda.h2017-01-13 17:02:47.355817896 +0100
> @@ -0,0 +1,174 @@
> +/* CUDA API description.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.
> +
> +This header provides the minimum amount of typedefs, enums and function
> +declarations to be able to compile plugin-nvptx.c if cuda.h and
> +libcuda.so.1 are not available.  */
> +
> +#ifndef GCC_CUDA_H
> +#define GCC_CUDA_H
> +[...]
> +#endif /* GCC_CUDA_H */

OK to push the attached
"Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h'", so that I'm
also able to use that file in the nvptx-tools, which inherit GCC's
'include' directory?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From a6f9d53277ff8408cdbd7b89f3e7595e40333d48 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 6 Apr 2022 14:12:29 +0200
Subject: [PATCH] Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h'

... so that it may be used by other projects that inherit GCC's 'include'
directory.

	include/
	* cuda/cuda.h: New file.
	libgomp/
	* plugin/cuda/cuda.h: Remove file.
	* plugin/plugin-nvptx.c [PLUGIN_NVPTX_DYNAMIC]: Include
	"cuda/cuda.h" instead of .
	* plugin/configfrag.ac : Don't set
	'PLUGIN_NVPTX_CPPFLAGS'.
	* configure: Regenerate.
---
 {libgomp/plugin => include}/cuda/cuda.h | 7 +++
 libgomp/configure   | 1 -
 libgomp/plugin/configfrag.ac| 1 -
 libgomp/plugin/plugin-nvptx.c   | 6 +-
 4 files changed, 8 insertions(+), 7 deletions(-)
 rename {libgomp/plugin => include}/cuda/cuda.h (97%)

diff --git a/libgomp/plugin/cuda/cuda.h b/include/cuda/cuda.h
similarity index 97%
rename from libgomp/plugin/cuda/cuda.h
rename to include/cuda/cuda.h
index 5c679c1767a..5c813ad2cf8 100644
--- a/libgomp/plugin/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -1,4 +1,4 @@
-/* CUDA API description.
+/* CUDA Driver API description.
Copyright (C) 2017-2022 Free Software Foundation, Inc.
 
 This file is part of GCC.
@@ -22,9 +22,8 @@ a copy of the GCC Runtime Library Exception along with this program;
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .
 
-This header provides the minimum amount of typedefs, enums and function
-declarations to be able to compile plugin-nvptx.c if cuda.h and
-libcuda.so.1 are not available.  */
+This header provides parts of the CUDA Driver API, without having to rely on
+the proprietary CUDA toolkit.  */
 
 #ifndef GCC_CUDA_H
 #define GCC_CUDA_H
diff --git a/libgomp/configure b/libgomp/configure
index b1b620cabc3..f863aa2ead4 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15297,7 +15297,6 @@ rm -f core conftest.err conftest.$ac_objext \
 		   && (test "x$CUDA_DRIVER_LIB" = x \
 			   || test "x$CUDA_DRIVER_LIB" = xno); then
 		  PLUGIN_NVPTX=1
-		  PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
 		  PLUGIN_NVPTX_LIBS='-ldl'
 		  PLUGIN_NVPTX_DYNAMIC=1
 		else
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index fc298391d4c..54d4b675c4e 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -156,7 +156,6 @@ if test 

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-05-04 Thread Thomas Schwinge
Hi!

On Wed, 3 May 2017 11:00:14 +0200, Jakub Jelinek  wrote:
> On Sat, Jan 21, 2017 at 03:50:43PM +0100, Thomas Schwinge wrote:
> > > In order to configure gcc to load libcuda.so.1 dynamically,
> > > one has to either configure it --without-cuda-driver, or without
> > > --with-cuda-driver=/--with-cuda-driver-lib=/--with-cuda-driver-include=
> > > options if cuda.h and -lcuda aren't found in the default locations.

(I still have to follow up with my additional GCC changes...)


> > > The nvptx-tools change
> > 
> > (I'll get to that later.)
> 
> I'd like to ping the nvptx-tools change.  Shall I make a github pull request
> for that?

In the future, yes please.

This time, I've handled it in
.

> I have additional following two further patches, the first one just to shut
> up -Wformat-security warning

Tom had already submitted
 including the
same fix, which I've merged earlier today.

> the other one discovered today to fix build
> against glibc trunk - they have changed getopt related includes there

I handled that one in
.

Thanks!


Grüße
 Thomas


Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-05-03 Thread Jakub Jelinek
On Sat, Jan 21, 2017 at 03:50:43PM +0100, Thomas Schwinge wrote:
> > In order to configure gcc to load libcuda.so.1 dynamically,
> > one has to either configure it --without-cuda-driver, or without
> > --with-cuda-driver=/--with-cuda-driver-lib=/--with-cuda-driver-include=
> > options if cuda.h and -lcuda aren't found in the default locations.
> 
> Would be good to have that documented ;-) -- done.
> 
> > The nvptx-tools change
> 
> (I'll get to that later.)

I'd like to ping the nvptx-tools change.  Shall I make a github pull request
for that?

I have additional following two further patches, the first one just to shut
up -Wformat-security warning, the other one discovered today to fix build
against glibc trunk - they have changed getopt related includes there and
we get:
In file included from /usr/include/bits/getopt_posix.h:27:0,
 from /usr/include/unistd.h:872,
 from ../nvptx-ld.c:23:
/usr/include/bits/getopt_core.h:91:12: error: declaration of 'int getopt(int, 
char* const*, const char*) throw ()' has a different exception specifier
 extern int getopt (int ___argc, char *const *___argv, const char *__shortopts)
^~
In file included from ../nvptx-ld.c:22:0:
../include/getopt.h:113:12: note: from previous declaration 'int getopt(int, 
char* const*, const char*)'
 extern int getopt (int argc, char *const *argv, const char *shortopts);
^~

Jakub
--- nvptx-tools/configure.ac
+++ nvptx-tools/configure.ac
@@ -51,6 +51,7 @@ LIBS="$LIBS -lcuda"
 AC_CHECK_FUNCS([[cuGetErrorName] [cuGetErrorString]])
 AC_CHECK_DECLS([[cuGetErrorName], [cuGetErrorString]],
   [], [], [[#include ]])
+AC_CHECK_HEADERS(unistd.h sys/stat.h)
 
 AC_MSG_CHECKING([for extra programs to build requiring -lcuda])
 NVPTX_RUN=
--- nvptx-tools/include/libiberty.h
+++ nvptx-tools/include/libiberty.h
@@ -390,6 +390,17 @@ extern void hex_init (void);
 /* Save files used for communication between processes.  */
 #define PEX_SAVE_TEMPS 0x4
 
+/* Max number of alloca bytes per call before we must switch to malloc.
+
+   ?? Swiped from gnulib's regex_internal.h header.  Is this actually
+   the case?  This number seems arbitrary, though sane.
+
+   The OS usually guarantees only one guard page at the bottom of the stack,
+   and a page size can be as small as 4096 bytes.  So we cannot safely
+   allocate anything larger than 4096 bytes.  Also care for the possibility
+   of a few compiler-allocated temporary stack slots.  */
+#define MAX_ALLOCA_SIZE4032
+
 /* Prepare to execute one or more programs, with standard output of
each program fed to standard input of the next.
FLAGS   As above.
--- nvptx-tools/nvptx-as.c
+++ nvptx-tools/nvptx-as.c
@@ -30,6 +30,9 @@
 #include 
 #include 
 #include 
+#ifdef HAVE_SYS_STAT_H
+#include 
+#endif
 #include 
 #define obstack_chunk_alloc malloc
 #define obstack_chunk_free free
@@ -42,6 +45,38 @@
 
 #include "version.h"
 
+#ifndef R_OK
+#define R_OK 4
+#define W_OK 2
+#define X_OK 1
+#endif
+
+#ifndef DIR_SEPARATOR
+#  define DIR_SEPARATOR '/'
+#endif
+
+#if defined (_WIN32) || defined (__MSDOS__) \
+|| defined (__DJGPP__) || defined (__OS2__)
+#  define HAVE_DOS_BASED_FILE_SYSTEM
+#  define HAVE_HOST_EXECUTABLE_SUFFIX
+#  define HOST_EXECUTABLE_SUFFIX ".exe"
+#  ifndef DIR_SEPARATOR_2 
+#define DIR_SEPARATOR_2 '\\'
+#  endif
+#  define PATH_SEPARATOR ';'
+#else
+#  define PATH_SEPARATOR ':'
+#endif
+
+#ifndef DIR_SEPARATOR_2
+#  define IS_DIR_SEPARATOR(ch) ((ch) == DIR_SEPARATOR)
+#else
+#  define IS_DIR_SEPARATOR(ch) \
+   (((ch) == DIR_SEPARATOR) || ((ch) == DIR_SEPARATOR_2))
+#endif
+
+#define DIR_UP ".."
+
 static const char *outname = NULL;
 
 static void __attribute__ ((format (printf, 1, 2)))
@@ -816,7 +851,7 @@ traverse (void **slot, void *data)
 }
 
 static void
-process (FILE *in, FILE *out)
+process (FILE *in, FILE *out, int verify, const char *outname)
 {
   symbol_table = htab_create (500, hash_string_hash, hash_string_eq,
   NULL);
@@ -824,6 +859,18 @@ process (FILE *in, FILE *out)
   const char *input = read_file (in);
   Token *tok = tokenize (input);
 
+  /* By default, when ptxas is not in PATH, do minimalistic verification,
+ just require that the first non-comment directive is .version.  */
+  if (verify < 0)
+{
+  size_t i;
+  for (i = 0; tok[i].kind == K_comment; i++)
+   ;
+  if (tok[i].kind != K_dotted || !is_keyword ([i], "version"))
+   fatal_error ("missing .version directive at start of file '%s'",
+outname);
+}
+
   do
 tok = parse_file (tok);
   while (tok->kind);
@@ -897,9 +944,83 @@ fork_execute (const char *prog, char *const *argv)
   do_wait (prog, pex);
 }
 
+/* Determine if progname is available in PATH.  */
+static bool
+program_available (const char *progname)
+{
+  char *temp = getenv ("PATH");
+  if (temp)
+{
+  char *startp, *endp, *nstore, *alloc_ptr = NULL;
+  size_t 

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-21 Thread Jakub Jelinek
On Sat, Jan 21, 2017 at 03:50:43PM +0100, Thomas Schwinge wrote:
> > In order to configure gcc to load libcuda.so.1 dynamically,
> > one has to either configure it --without-cuda-driver, or without
> > --with-cuda-driver=/--with-cuda-driver-lib=/--with-cuda-driver-include=
> > options if cuda.h and -lcuda aren't found in the default locations.
> 
> Would be good to have that documented ;-) -- done.

Thanks.

> I (obviously) agree with your intended (?) "--without-cuda-driver"
> semantics, but I think a "--with-cuda-driver" option should actually mean
> that the system's/installed CUDA driver package *must* be used (and
> similar for other "--with-cuda-driver*" options); and I also added
> "--with-cuda-driver=check" to allow overriding earlier such options (that
> is, restore the default "check" behavior).
> 
> I say 'intended (?) "--without-cuda-driver" semantics', because with your
> current patch/code, if I got that right, if one specifies
> "--without-cuda-driver" but actually does have a CUDA driver system
> installation available, then the nvptx libgomp plugin will still link
> against that one, instead of "dlopen"ing it.  So I changed that
> accordingly.

Agreed.

> > +PLUGIN_NVPTX_DYNAMIC=0
> 
> I find the name "PLUGIN_NVPTX_DYNAMIC" a bit misleading, as this isn't
> about the nvptx plugin being "dynamic" but rather it's about its usage of
> the CUDA driver library.  Thus renamed to "CUDA_DRIVER_DYNAMIC".

Ack.

> > --- libgomp/plugin/plugin-nvptx.c.jj2017-01-13 12:07:56.0 
> > +0100
> > +++ libgomp/plugin/plugin-nvptx.c   2017-01-13 18:00:39.693284346 +0100
> 
> > +/* -1 if init_cuda_lib has not been called yet, false
> > +   if it has been and failed, true if it has been and succeeded.  */
> > +static char cuda_lib_inited = -1;
> 
> Don't we actually have to worry here about multiple threads running into
> this in parallel -- thus need locking (or atomic accesses?) when
> accessing "cuda_lib_inited"?

I thought it is only accessed when a lock is held, but I could be wrong.
Also, please se my question about why we ever call cuInit in nvptx_init
(whether nvptx_get_num_devices doesn't have to be called first).

> > +/* Dynamically load the CUDA runtime library and initialize function
> 
> Not "CUDA runtime" but actually "CUDA driver" -- changed.

Ok.

> I'd like some GOMP_PLUGIN_debug output for this and the following "return
> false" cases -- added.

Ok.

> > --- libgomp/plugin/cuda/cuda.h.jj   2017-01-13 15:58:00.966544147 +0100
> > +++ libgomp/plugin/cuda/cuda.h  2017-01-13 17:02:47.355817896 +0100
> 
> > +#define CUDA_VERSION 8000
> 
> Does that make it compatible to CUDA 8.0 (and later) only?  (Not yet
> checked.)

The only reason for that is
#if CUDA_VERSION < 7000
  /* Specified in documentation and present in library from at least
 5.5.  Not declared in header file prior to 7.0.  */
  extern CUresult cuGetErrorString (CUresult, const char **);
#endif
I wanted to make it clear that cuGetErrorString prototype is provided.

I must say I don't know enough about ABI and API incompatibilities between
different CUDA versions, I presume functions with defines like:
#define cuLinkCreate cuLinkCreate_v2
at some point weren't using the _v2 suffixes, but have no idea if they had
different arguments or what.  Perhaps that would be supportable by having
some fallback if for those dlsym fails or something.

> @@ -48,26 +49,44 @@ AC_SUBST(CUDA_DRIVER_LIB)
>  CUDA_DRIVER_CPPFLAGS=
>  CUDA_DRIVER_LDFLAGS=
>  AC_ARG_WITH(cuda-driver,
> + [AS_HELP_STRING([--without-cuda-driver],
> + [do not use the system's CUDA driver package])])
> +AC_ARG_WITH(cuda-driver,
> + [AS_HELP_STRING([--with-cuda-driver=check],
> + [use the system's CUDA driver package, if usable [default]])])
> +AC_ARG_WITH(cuda-driver,
> + [AS_HELP_STRING([--with-cuda-driver],
> + [use the system's CUDA driver package])])
> +AC_ARG_WITH(cuda-driver,
>   [AS_HELP_STRING([--with-cuda-driver=PATH],
> - [specify prefix directory for installed CUDA driver package.
> -  Equivalent to --with-cuda-driver-include=PATH/include
> -  plus --with-cuda-driver-lib=PATH/lib])])
> + [use installed CUDA driver package, and specify prefix
> + directory.  Equivalent to
> + --with-cuda-driver-include=PATH/include plus
> + --with-cuda-driver-lib=PATH/lib])],
> + [],
> + [with_cuda_driver=check])

I admit my autoconf knowledge is limited, but it looks certainly strange
to have several AC_ARG_WITH for the same option.  Shouldn't we use
one AC_ARG_WITH(cuda-driver,
with multiple AS_HELP_STRING inside of its second argument?

>  AC_ARG_WITH(cuda-driver-include,
>   [AS_HELP_STRING([--with-cuda-driver-include=PATH],
> - [specify directory for installed CUDA driver include files])])
> + [use installed CUDA driver package, and specify directory for
> + include files])])
>  

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-21 Thread Thomas Schwinge
Hi!

On Fri, 13 Jan 2017 19:11:23 +0100, Jakub Jelinek  wrote:
> This is something that has been discussed already during the last Cauldron.
> Especially for distributions it is undesirable to need to have proprietary
> CUDA libraries and headers installed when building GCC.

ACK.

> These two patches allow building GCC without CUDA around in a way that later
> on can offload to PTX if libcuda.so.1 is installed

Thanks!

I'd like to have some additional changes done; see the attached patch,
and also some further comments below.

> In order to configure gcc to load libcuda.so.1 dynamically,
> one has to either configure it --without-cuda-driver, or without
> --with-cuda-driver=/--with-cuda-driver-lib=/--with-cuda-driver-include=
> options if cuda.h and -lcuda aren't found in the default locations.

Would be good to have that documented ;-) -- done.

> The nvptx-tools change

(I'll get to that later.)

> --- libgomp/plugin/configfrag.ac.jj   2017-01-13 12:07:56.0 +0100
> +++ libgomp/plugin/configfrag.ac  2017-01-13 17:33:26.608240936 +0100
> @@ -58,10 +58,12 @@ AC_ARG_WITH(cuda-driver-include,
>  AC_ARG_WITH(cuda-driver-lib,
>   [AS_HELP_STRING([--with-cuda-driver-lib=PATH],
>   [specify directory for the installed CUDA driver library])])
> -if test "x$with_cuda_driver" != x; then
> -  CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
> -  CUDA_DRIVER_LIB=$with_cuda_driver/lib
> -fi
> +case "x$with_cuda_driver" in
> +  x | xno) ;;
> +  *) CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
> + CUDA_DRIVER_LIB=$with_cuda_driver/lib
> + ;;
> +esac

I (obviously) agree with your intended (?) "--without-cuda-driver"
semantics, but I think a "--with-cuda-driver" option should actually mean
that the system's/installed CUDA driver package *must* be used (and
similar for other "--with-cuda-driver*" options); and I also added
"--with-cuda-driver=check" to allow overriding earlier such options (that
is, restore the default "check" behavior).

I say 'intended (?) "--without-cuda-driver" semantics', because with your
current patch/code, if I got that right, if one specifies
"--without-cuda-driver" but actually does have a CUDA driver system
installation available, then the nvptx libgomp plugin will still link
against that one, instead of "dlopen"ing it.  So I changed that
accordingly.

> +PLUGIN_NVPTX_DYNAMIC=0

I find the name "PLUGIN_NVPTX_DYNAMIC" a bit misleading, as this isn't
about the nvptx plugin being "dynamic" but rather it's about its usage of
the CUDA driver library.  Thus renamed to "CUDA_DRIVER_DYNAMIC".

> @@ -167,9 +170,17 @@ if test x"$enable_offload_targets" != x;
>   LIBS=$PLUGIN_NVPTX_save_LIBS
>   case $PLUGIN_NVPTX in
> nvptx*)
> - PLUGIN_NVPTX=0
> - AC_MSG_ERROR([CUDA driver package required for nvptx support])
> - ;;
> + if test "x$CUDA_DRIVER_INCLUDE" = x \
> +&& test "x$CUDA_DRIVER_LIB" = x; then
> +   PLUGIN_NVPTX=1
> +   PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
> +   PLUGIN_NVPTX_LIBS='-ldl'
> +   PLUGIN_NVPTX_DYNAMIC=1
> + else
> +   PLUGIN_NVPTX=0
> +   AC_MSG_ERROR([CUDA driver package required for nvptx support])
> + fi
> +   ;;
>   esac

I reworked that logic to accommodate for the additional
"--with-cuda-driver" usage.

> --- libgomp/plugin/plugin-nvptx.c.jj  2017-01-13 12:07:56.0 +0100
> +++ libgomp/plugin/plugin-nvptx.c 2017-01-13 18:00:39.693284346 +0100

> +/* -1 if init_cuda_lib has not been called yet, false
> +   if it has been and failed, true if it has been and succeeded.  */
> +static char cuda_lib_inited = -1;

Don't we actually have to worry here about multiple threads running into
this in parallel -- thus need locking (or atomic accesses?) when
accessing "cuda_lib_inited"?

> +/* Dynamically load the CUDA runtime library and initialize function

Not "CUDA runtime" but actually "CUDA driver" -- changed.

> +   pointers, return false if unsuccessful, true if successful.  */
> +static bool
> +init_cuda_lib (void)
> +{
> +  if (cuda_lib_inited != -1)
> +return cuda_lib_inited;
> +  const char *cuda_runtime_lib = "libcuda.so.1";
> +  void *h = dlopen (cuda_runtime_lib, RTLD_LAZY);
> +  cuda_lib_inited = false;
> +  if (h == NULL)
> +return false;

I'd like some GOMP_PLUGIN_debug output for this and the following "return
false" cases -- added.

> +# undef CUDA_ONE_CALL
> +# define CUDA_ONE_CALL(call) CUDA_ONE_CALL_1 (call)
> +# define CUDA_ONE_CALL_1(call) \
> +  cuda_lib.call = dlsym (h, #call);  \
> +  if (cuda_lib.call == NULL) \
> +return false;
> +  CUDA_CALLS
> +  cuda_lib_inited = true;
> +  return true;
>  }

> --- libgomp/plugin/cuda/cuda.h.jj 2017-01-13 15:58:00.966544147 +0100
> +++ libgomp/plugin/cuda/cuda.h2017-01-13 17:02:47.355817896 +0100

> +#define CUDA_VERSION 8000

Does that make it compatible to CUDA 8.0 (and later) only?  (Not yet

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-19 Thread Jakub Jelinek
On Thu, Jan 19, 2017 at 06:09:35PM +0300, Alexander Monakov wrote:
> > -#ifdef __LP64__
> > +#if defined(__LP64__) || defined(_WIN64)
> > 
> > (is that the right define for 64-bit MingW, right?).
> 
> Yes, _WIN64; libsanitizer has a similar test.  Alternatively, I guess,
> 
>   #if __SIZEOF_POINTER__ == 8
> 
> > Otherwise, I think using uintptr_t is a problem, because we'd need to
> > #include  (the header only includes ).
> 
> Note that plugin-nvptx.c already includes .  But, anyway, I agree 
> that
> there's value in defining the exact type via the #if.

I've committed then.

2017-01-19  Jakub Jelinek  

* plugin/cuda/cuda.h (CUdeviceptr): Typedef to unsigned long long even
for _WIN64.

--- libgomp/plugin/cuda/cuda.h  (revision 244570)
+++ libgomp/plugin/cuda/cuda.h  (working copy)
@@ -35,7 +35,7 @@ libcuda.so.1 are not available.  */
 
 typedef void *CUcontext;
 typedef int CUdevice;
-#ifdef __LP64__
+#if defined(__LP64__) || defined(_WIN64)
 typedef unsigned long long CUdeviceptr;
 #else
 typedef unsigned CUdeviceptr;


Jakub


Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-19 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> On Wed, Jan 18, 2017 at 10:52:32PM +0300, Alexander Monakov wrote:
> > Sorry for not noticing this earlier, but ...
> > 
> > > +#ifdef __LP64__
> > > +typedef unsigned long long CUdeviceptr;
> > > +#else
> > > +typedef unsigned CUdeviceptr;
> > > +#endif
> > 
> > I think this #ifdef doesn't do the right thing on MinGW.
> > Would it be fine to simplify it?  In my code I have
> > 
> >   typedef uintptr_t CUdeviceptr;
> 
> I think it depends on if we want to use CUdeviceptr typed variables
> in printf like format strings, or C++ overloading (then the exact
> type is significant and we should go for probably
> 
> -#ifdef __LP64__
> +#if defined(__LP64__) || defined(_WIN64)
> 
> (is that the right define for 64-bit MingW, right?).

Yes, _WIN64; libsanitizer has a similar test.  Alternatively, I guess,

  #if __SIZEOF_POINTER__ == 8

> Otherwise, I think using uintptr_t is a problem, because we'd need to
> #include  (the header only includes ).

Note that plugin-nvptx.c already includes .  But, anyway, I agree that
there's value in defining the exact type via the #if.

Alexander


Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-18 Thread Jakub Jelinek
On Wed, Jan 18, 2017 at 10:52:32PM +0300, Alexander Monakov wrote:
> Sorry for not noticing this earlier, but ...
> 
> > +#ifdef __LP64__
> > +typedef unsigned long long CUdeviceptr;
> > +#else
> > +typedef unsigned CUdeviceptr;
> > +#endif
> 
> I think this #ifdef doesn't do the right thing on MinGW.
> Would it be fine to simplify it?  In my code I have
> 
>   typedef uintptr_t CUdeviceptr;

I think it depends on if we want to use CUdeviceptr typed variables
in printf like format strings, or C++ overloading (then the exact
type is significant and we should go for probably

-#ifdef __LP64__
+#if defined(__LP64__) || defined(_WIN64)

(is that the right define for 64-bit MingW, right?).

Otherwise, I think using uintptr_t is a problem, because we'd need to
#include  (the header only includes ).

Another option is
typedef __UINTPTR_TYPE__ CUdeviceptr;

Jakub


Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-18 Thread Alexander Monakov
Hello Jakub,

Sorry for not noticing this earlier, but ...

> +#ifdef __LP64__
> +typedef unsigned long long CUdeviceptr;
> +#else
> +typedef unsigned CUdeviceptr;
> +#endif

I think this #ifdef doesn't do the right thing on MinGW.
Would it be fine to simplify it?  In my code I have

  typedef uintptr_t CUdeviceptr;

Thanks.
Alexander


Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-16 Thread Jeff Law

On 01/13/2017 11:28 AM, Jakub Jelinek wrote:

On Fri, Jan 13, 2017 at 06:19:02PM +, Joseph Myers wrote:

--- libgomp/plugin/cuda/cuda.h.jj   2017-01-13 15:58:00.966544147 +0100
+++ libgomp/plugin/cuda/cuda.h  2017-01-13 17:02:47.355817896 +0100
@@ -0,0 +1,174 @@
+/* CUDA API description.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.


The new file should presumably have the runtime license exception.


Agreed (though, most likely the file isn't copyrightable anyway, but
we use copyright boilerplate for various files that might not be
copyrightable).  But we should use it not just for cuda.h, but also
for hsa.h and hsa_ext_finalize.h (CCing Martin who has added those).

2017-01-13  Jakub Jelinek  

* plugin/cuda/cuda.h: Add GCC runtime library exception.
* plugin/hsa.h: Likewise.
* plugin/hsa_ext_finalize.h: Likewise.
Yea, seems like an oversight.  Certainly the intention is that using 
cuda & hsa in and of itself doesn't require the resulting executable to 
be GPL licensed.


jeff



Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-13 Thread Jakub Jelinek
On Fri, Jan 13, 2017 at 06:19:02PM +, Joseph Myers wrote:
> > --- libgomp/plugin/cuda/cuda.h.jj   2017-01-13 15:58:00.966544147 +0100
> > +++ libgomp/plugin/cuda/cuda.h  2017-01-13 17:02:47.355817896 +0100
> > @@ -0,0 +1,174 @@
> > +/* CUDA API description.
> > +   Copyright (C) 2017 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify
> > +it under the terms of the GNU General Public License as published by
> > +the Free Software Foundation; either version 3, or (at your option)
> > +any later version.
> > +
> > +GCC is distributed in the hope that it will be useful,
> > +but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +GNU General Public License for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +.
> 
> The new file should presumably have the runtime license exception.

Agreed (though, most likely the file isn't copyrightable anyway, but
we use copyright boilerplate for various files that might not be
copyrightable).  But we should use it not just for cuda.h, but also
for hsa.h and hsa_ext_finalize.h (CCing Martin who has added those).

2017-01-13  Jakub Jelinek  

* plugin/cuda/cuda.h: Add GCC runtime library exception.
* plugin/hsa.h: Likewise.
* plugin/hsa_ext_finalize.h: Likewise.

--- libgomp/plugin/cuda/cuda.h.jj   2017-01-13 17:02:47.0 +0100
+++ libgomp/plugin/cuda/cuda.h  2017-01-13 19:21:06.307547518 +0100
@@ -13,8 +13,13 @@ but WITHOUT ANY WARRANTY; without even t
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU General Public License for more details.
 
-You should have received a copy of the GNU General Public License
-along with GCC; see the file COPYING3.  If not see
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .
 
 This header provides the minimum amount of typedefs, enums and function
--- libgomp/plugin/hsa.h.jj 2017-01-13 12:07:56.0 +0100
+++ libgomp/plugin/hsa.h2017-01-13 19:21:37.230153569 +0100
@@ -13,8 +13,13 @@ but WITHOUT ANY WARRANTY; without even t
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU General Public License for more details.
 
-You should have received a copy of the GNU General Public License
-along with GCC; see the file COPYING3.  If not see
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .
 
 The contents of the file was created by extracting data structures, enum,
--- libgomp/plugin/hsa_ext_finalize.h.jj2017-01-13 12:07:56.0 
+0100
+++ libgomp/plugin/hsa_ext_finalize.h   2017-01-13 19:22:05.388794833 +0100
@@ -13,8 +13,13 @@ but WITHOUT ANY WARRANTY; without even t
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU General Public License for more details.
 
-You should have received a copy of the GNU General Public License
-along with GCC; see the file COPYING3.  If not see
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .
 
 The contents of the file was created by extracting data structures, enum,


Jakub


Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-13 Thread Joseph Myers
On Fri, 13 Jan 2017, Jakub Jelinek wrote:

> --- libgomp/plugin/cuda/cuda.h.jj 2017-01-13 15:58:00.966544147 +0100
> +++ libgomp/plugin/cuda/cuda.h2017-01-13 17:02:47.355817896 +0100
> @@ -0,0 +1,174 @@
> +/* CUDA API description.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.

The new file should presumably have the runtime license exception.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-13 Thread Jakub Jelinek
Hi!

This is something that has been discussed already during the last Cauldron.
Especially for distributions it is undesirable to need to have proprietary
CUDA libraries and headers installed when building GCC.

These two patches allow building GCC without CUDA around in a way that later
on can offload to PTX if libcuda.so.1 is installed (and the NVidia kernel
driver is installed, haven't tried if it works with nouveau, nor tried some
free CUDA replacements).  This is important because the former step can be
done when building the distribution packages, while the latter is a decision
of the user.  If the nvptx libgomp plugin is installed, but libcuda.so.1
can't be found, then the plugin behaves as if there are no PTX devices
available.  In order to configure gcc to load libcuda.so.1 dynamically,
one has to either configure it --without-cuda-driver, or without
--with-cuda-driver=/--with-cuda-driver-lib=/--with-cuda-driver-include=
options if cuda.h and -lcuda aren't found in the default locations.

I've talked to our lawyers and they said that the cuda.h header included
in this patch doesn't infringe anyone's copyright or is otherwise a fair
use, it has been created by gathering all the cu*/CU* symbols from the
current and older nvptx plugin and some oacc tests, then stubbing the
pointer-ish typedefs, grabing most enum values and function prototypes from
https://raw.githubusercontent.com/shinpei0208/gdev/master/cuda/driver/cuda.h
and verifying assembly with that header against assembly when compiled
against NVidia's cuda.h.

The nvptx-tools change to the nvptx-none-as binary is an important part of
this, although it is not a change to gcc itself - the problem is that by
default nvptx-none-as was calling the ptxas program to verify the assembly
is correct, which of course doesn't work very well when the proprietary
ptxas is not available.  So the patch makes it invoke ptxas always only if
a new --verify option is used, if --no-verify is used, then as before it
is not invoked, and without either of these options the behavior is that if
ptxas is found in $PATH, then it invokes it, if not, it does only minimal
verification good enough for gcc/configure purposes (it turned out to be
sufficient to error out if .version directive is not the first non-comment
token (ptxas errors on that too).

Tested on x86_64-linux, with CUDA around
(--with-cuda-driver=/usr/local/cuda-8.0) as well as without, and tested
in that case also both with libcuda.so.1 available and without.

Can the OpenACC hackers as well as Alex (or his collegues) please also test
it?  Do you have any problems with the GCC patch (if not, I'd commit it
next week before stage3 closes)?  Is the nvptx-tools patch ok (and if so,
can you commit it; I guess I could create a github pull request for this
if needed).

P.S.: not sure what is the cuInit call in nvptx_init good for, doesn't
libgomp always call nvptx_get_num_devices first and thus call cuInit already
there (but I've kept it in the patch)?

2017-01-13  Jakub Jelinek  

* plugin/configfrag.ac: For --without-cuda-driver don't initialize
CUDA_DRIVER_INCLUDE nor CUDA_DRIVER_LIB.  If both
CUDA_DRIVER_INCLUDE and CUDA_DRIVER_LIB are empty and linking small
cuda program fails, define PLUGIN_NVPTX_DYNAMIC to 1 and use
plugin/include/cuda as include dir and -ldl instead of -lcuda as
library to link ptx plugin against.
* plugin/plugin-nvptx.c: Include dlfcn.h if PLUGIN_NVPTX_DYNAMIC.
(CUDA_CALLS): Define.
(cuda_lib, cuda_lib_inited): New variables.
(init_cuda_lib): New function.
(CUDA_CALL_PREFIX): Define.
(CUDA_CALL_ERET, CUDA_CALL_ASSERT): Use CUDA_CALL_PREFIX.
(CUDA_CALL): Use FN instead of (FN).
(CUDA_CALL_NOCHECK): Define.
(cuda_error, fini_streams_for_device, select_stream_for_async,
nvptx_attach_host_thread_to_device, nvptx_open_device, link_ptx,
event_gc, nvptx_exec, nvptx_async_test, nvptx_async_test_all,
nvptx_wait_all, nvptx_set_clocktick, GOMP_OFFLOAD_unload_image,
nvptx_stacks_alloc, nvptx_stacks_free, GOMP_OFFLOAD_run): Use
CUDA_CALL_NOCHECK.
(nvptx_init): Call init_cuda_lib, if it fails, return false.  Use
CUDA_CALL_NOCHECK.
(nvptx_get_num_devices): Call init_cuda_lib, if it fails, return 0.
Use CUDA_CALL_NOCHECK.
* plugin/cuda/cuda.h: New file.
* config.h.in: Regenerated.
* configure: Regenerated.
* Makefile.in: Regenerated.

--- libgomp/plugin/configfrag.ac.jj 2017-01-13 12:07:56.0 +0100
+++ libgomp/plugin/configfrag.ac2017-01-13 17:33:26.608240936 +0100
@@ -58,10 +58,12 @@ AC_ARG_WITH(cuda-driver-include,
 AC_ARG_WITH(cuda-driver-lib,
[AS_HELP_STRING([--with-cuda-driver-lib=PATH],
[specify directory for the installed CUDA driver library])])
-if test "x$with_cuda_driver" != x; then
-