Re: [OMPI users] cuda at Open MPI runtime

2017-12-21 Thread Sylvain Jeaugey
Hi David, Open MPI will try to load libcuda during MPI_Init when you are using either the openib BTL or the smcuda BTL (and CUDA support has been compiled in). It is looking in /usr/lib64 and in LD_LIBRARY_PATH as well I guess. You should not need anything else than libcuda for the Open

Re: [OMPI users] OpenMPI installation issue or mpi4py compatibility problem

2017-09-21 Thread Sylvain Jeaugey
The issue is related to openCL, not NVML. So the correct export would be "export enable_opencl=no" (you may want to "export enable_nvml=no" as well). On 09/21/2017 12:32 AM, Tim Jim wrote: Hi, I tried as you suggested: export nvml_enable=no, then reconfigured and ran make all install

Re: [OMPI users] Build Open-MPI without OpenCL support

2017-09-08 Thread Sylvain Jeaugey
To solve the undefined references to cudaMalloc and cudaFree, you need to link the CUDA runtime. So you should replace -lcuda by -lcudart. For the OPENCL undefined references, I don't know where those are coming from ... could it be that hwloc is compiling OpenCL support but not adding

Re: [OMPI users] openmpi-master-201708110239-03544d7: NVIDIA: no NVIDIA devices found

2017-08-14 Thread Sylvain Jeaugey
Hi Siegmar, This has been fixed in the driver some time ago. Getting the latest driver should solve your problem. You can check the driver version with nvidia-smi, then go to http://www.nvidia.com/Download/index.aspx to get the latest. Sylvain On 08/14/2017 12:46 AM, Siegmar Gross wrote:

Re: [OMPI users] Crash in libopen-pal.so

2017-06-19 Thread Sylvain Jeaugey
Justin, can you try setting mpi_leave_pinned to 0 to disable libptmalloc2 and confirm this is related to ptmalloc ? Thanks, Sylvain On 06/19/2017 03:05 PM, Justin Luitjens wrote: I have an application that works on other systems but on the current system I’m running I’m seeing the following

Re: [OMPI users] Suppressing Nvidia warnings

2017-05-05 Thread Sylvain Jeaugey
this message with 2.1.0 built against CUDA 8.0. We're using libcuda.so.375.39. Has anyone had any luck suppressing these messages? Thanks, Ben On 27 Mar 2017, at 7:13 pm, Roland Fehrenbacher <r...@q-leap.de <mailto:r...@q-leap.de>> wrote: "SJ" == Sylvain Jeaugey <sjeau...@

Re: [OMPI users] OpenMPI 2.1.0 + PGI 17.3 = asm test failures

2017-05-01 Thread Sylvain Jeaugey
I also saw IBM and ignored the email :-) Thanks for reporting the issue, I passed it to the PGI team. On 05/01/2017 11:49 AM, Prentice Bisbal wrote: Jeff, You probably were thrown off when I said I've only really seen this problem when people didn't cross-compile correctly on the Blue Gene/P

Re: [OMPI users] Suppressing Nvidia warnings

2017-03-24 Thread Sylvain Jeaugey
, until I find the code responsible for that, I can't say for sure. I'm sorry it's taking so long -- I'm on it though. On 03/24/2017 01:56 PM, Roland Fehrenbacher wrote: "SJ" == Sylvain Jeaugey <sjeau...@nvidia.com> writes: Hi Sylvain, SJ> Hi Roland, I can't find this mess

Re: [OMPI users] "Warning :: opal_list_remove_item" with openmpi-2.1.0rc4

2017-03-21 Thread Sylvain Jeaugey
If you installed CUDA libraries and includes in /usr, then it's not surprising hwloc finds them even without defining CFLAGS. I'm just saying I think you won't get the error message if Open MPI finds CUDA but hwloc does not. On 03/21/2017 11:05 AM, Roland Fehrenbacher wrote: "SJ"

Re: [OMPI users] "Warning :: opal_list_remove_item" with openmpi-2.1.0rc4

2017-03-21 Thread Sylvain Jeaugey
Hi Siegmar, I think this "NVIDIA : ..." error message comes from the fact that you add CUDA includes in the C*FLAGS. If you just use --with-cuda, Open MPI will compile with CUDA support, but hwloc will not find CUDA and that will be fine. However, setting CUDA in CFLAGS will make hwloc find

Re: [OMPI users] Suppressing Nvidia warnings

2017-03-16 Thread Sylvain Jeaugey
Hi Roland, I can't find this message in the Open MPI source code. Could it be hwloc ? Some other library you are using ? Sylvain On 03/16/2017 04:23 AM, Roland Fehrenbacher wrote: Hi, OpenMPI 2.0.2 built with cuda support brings up lots of warnings like NVIDIA: no NVIDIA devices found

Re: [OMPI users] Unable to compile OpenMPI 1.10.3 with CUDA

2016-10-28 Thread Sylvain Jeaugey
On 10/28/2016 10:33 AM, Craig tierney wrote: Sylvain, If I do not set --with-cuda, I get: configure:9964: result: no configure:10023: checking whether CU_POINTER_ATTRIBUTE_SYNC_MEMOPS is declared configure:10023: gcc -c -DNDEBUG conftest.c >&5 conftest.c:83:19: fatal error: /cuda.h: No

Re: [OMPI users] Unable to compile OpenMPI 1.10.3 with CUDA

2016-10-27 Thread Sylvain Jeaugey
I guess --with-cuda is disabling the default CUDA path which is /usr/local/cuda. So you should either not set --with-cuda or set --with-cuda $CUDA_HOME (no include). Sylvain On 10/27/2016 03:23 PM, Craig tierney wrote: Hello, I am trying to build OpenMPI 1.10.3 with CUDA but I am unable to

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-19 Thread Sylvain Jeaugey
As a workaround, you can also try adding -noswitcherror to PGCC flags. On 07/11/2016 03:52 PM, Åke Sandgren wrote: Looks like you are compiling with slurm support. If so, you need to remove the "-pthread" from libslurm.la and libpmi.la On 07/11/2016 02:54 PM, Michael Di Domenico wrote: I'm

Re: [OMPI users] Why do I need a C++ linker while linking in MPI C code with CUDA?

2016-03-23 Thread Sylvain Jeaugey
Hi Durga, Sorry for the late reply and thanks for reporting that issue. As Rayson mentioned, CUDA is intrinsically C++ and indeed uses the host C++ compiler. Hence linking MPI + CUDA code may need to use mpic++. It happens to work with mpicc on various platforms where the libstdc++ is

Re: [OMPI users] configuring open mpi 10.1.2 with cuda on NVIDIA TK1

2016-01-22 Thread Sylvain Jeaugey
for the suggestion. I think I can say 'case closed' Spencer *From:* users <users-boun...@open-mpi.org> on behalf of Sylvain Jeaugey <sjeau...@nvidia.com> *Sent:* Friday, January 22, 2016 11:34 AM *To:* us...@

Re: [OMPI users] configuring open mpi 10.1.2 with cuda on NVIDIA TK1

2016-01-22 Thread Sylvain Jeaugey
Hi Spencer, Could you be more specific about what fails ? Did the configure stop at some point ? Or is it a compile error during the build ? I'm not sure the errors you are seeing in config.log are actually the real problem (I'm seeing the same error traces on a perfectly working machine).