Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-24 Thread Gilles Gouaillardet

Brice,


unless you want to enable/disable nvml at runtime, and assuming we do 
not need nvml in Open MPI,


and IMHO, the easiest workaround is to update

https://github.com/open-mpi/ompi/blob/master/opal/mca/hwloc/hwloc1113/configure.m4

and add the oneliner

enable_nvml=no


a better option could be to update 
https://github.com/open-mpi/ompi/blob/master/opal/mca/hwloc/configure.m4


and pass the --enable-nvml option from Open MPI down to hwloc.


Cheers,


Gilles




On 10/24/2016 4:45 PM, Brice Goglin wrote:

FWIW, I am still open to implementing something to workaround this in hwloc.
Could be shell variable such as HWLOC_DISABLE_NVML=yes for all our major
configured dependencies.

Brice



Le 24/10/2016 02:12, Gilles Gouaillardet a écrit :

Justin,


iirc, NVML is only used by hwloc (e.g. not by CUDA) and there is no
real benefit for having that.

as a workaround, you can

export enable_nvml=no

and then configure && make install

Cheers,

Gilles

On 10/20/2016 12:49 AM, Jeff Squyres (jsquyres) wrote:

Justin --

Fair point.  Can you work with Sylvain Jeaugey (at Nvidia) to submit
a pull request for this functionality?

Thanks.



On Oct 18, 2016, at 2:26 PM, Justin Luitjens <jluitj...@nvidia.com>
wrote:

After looking into this a bit more it appears that the issue is I am
building on a head node which does not have the driver installed.
Building on back node resolves this issue.  In CUDA 8.0 the NVML
stubs can be found in the toolkit at the following path:
${CUDA_HOME}/lib64/stubs
   For 8.0 I’d suggest updating the configure/make scripts to look
for nvml there and link in the stubs.  This way the build is not
dependent on the driver being installed and only the toolkit.
   Thanks,
Justin
   From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
Justin Luitjens
Sent: Tuesday, October 18, 2016 9:53 AM
To: users@lists.open-mpi.org
Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0
   I have the release version of CUDA 8.0 installed and am trying to
build OpenMPI.
   Here is my configure and build line:
   ./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm=
--with-openib= && make && sudo make install
   Where CUDA_HOME points to the cuda install path.
   When I run the above command it builds for quite a while but
eventually errors out wit this:
   make[2]: Entering directory
`/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers'
CCLD opal_wrapper
../../../opal/.libs/libopen-pal.so: undefined reference to
`nvmlInit_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to
`nvmlDeviceGetHandleByIndex_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to
`nvmlDeviceGetCount_v2'
 Any idea what I might need to change to get around this error?
   Thanks,
Justin
This email message is for the sole use of the intended recipient(s)
and may contain confidential information.  Any unauthorized review,
use, disclosure or distribution is prohibited.  If you are not the
intended recipient, please contact the sender by reply email and
destroy all copies of the original message.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-24 Thread Brice Goglin
FWIW, I am still open to implementing something to workaround this in hwloc.
Could be shell variable such as HWLOC_DISABLE_NVML=yes for all our major
configured dependencies.

Brice



Le 24/10/2016 02:12, Gilles Gouaillardet a écrit :
> Justin,
>
>
> iirc, NVML is only used by hwloc (e.g. not by CUDA) and there is no
> real benefit for having that.
>
> as a workaround, you can
>
> export enable_nvml=no
>
> and then configure && make install
>
> Cheers,
>
> Gilles
>
> On 10/20/2016 12:49 AM, Jeff Squyres (jsquyres) wrote:
>> Justin --
>>
>> Fair point.  Can you work with Sylvain Jeaugey (at Nvidia) to submit
>> a pull request for this functionality?
>>
>> Thanks.
>>
>>
>>> On Oct 18, 2016, at 2:26 PM, Justin Luitjens <jluitj...@nvidia.com>
>>> wrote:
>>>
>>> After looking into this a bit more it appears that the issue is I am
>>> building on a head node which does not have the driver installed. 
>>> Building on back node resolves this issue.  In CUDA 8.0 the NVML
>>> stubs can be found in the toolkit at the following path: 
>>> ${CUDA_HOME}/lib64/stubs
>>>   For 8.0 I’d suggest updating the configure/make scripts to look
>>> for nvml there and link in the stubs.  This way the build is not
>>> dependent on the driver being installed and only the toolkit.
>>>   Thanks,
>>> Justin
>>>   From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
>>> Justin Luitjens
>>> Sent: Tuesday, October 18, 2016 9:53 AM
>>> To: users@lists.open-mpi.org
>>> Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0
>>>   I have the release version of CUDA 8.0 installed and am trying to
>>> build OpenMPI.
>>>   Here is my configure and build line:
>>>   ./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm=
>>> --with-openib= && make && sudo make install
>>>   Where CUDA_HOME points to the cuda install path.
>>>   When I run the above command it builds for quite a while but
>>> eventually errors out wit this:
>>>   make[2]: Entering directory
>>> `/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers'
>>>CCLD opal_wrapper
>>> ../../../opal/.libs/libopen-pal.so: undefined reference to
>>> `nvmlInit_v2'
>>> ../../../opal/.libs/libopen-pal.so: undefined reference to
>>> `nvmlDeviceGetHandleByIndex_v2'
>>> ../../../opal/.libs/libopen-pal.so: undefined reference to
>>> `nvmlDeviceGetCount_v2'
>>> Any idea what I might need to change to get around this error?
>>>   Thanks,
>>> Justin
>>> This email message is for the sole use of the intended recipient(s)
>>> and may contain confidential information.  Any unauthorized review,
>>> use, disclosure or distribution is prohibited.  If you are not the
>>> intended recipient, please contact the sender by reply email and
>>> destroy all copies of the original message.
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-23 Thread Gilles Gouaillardet

Justin,


iirc, NVML is only used by hwloc (e.g. not by CUDA) and there is no real 
benefit for having that.


as a workaround, you can

export enable_nvml=no

and then configure && make install

Cheers,

Gilles

On 10/20/2016 12:49 AM, Jeff Squyres (jsquyres) wrote:

Justin --

Fair point.  Can you work with Sylvain Jeaugey (at Nvidia) to submit a pull 
request for this functionality?

Thanks.



On Oct 18, 2016, at 2:26 PM, Justin Luitjens <jluitj...@nvidia.com> wrote:

After looking into this a bit more it appears that the issue is I am building 
on a head node which does not have the driver installed.  Building on back node 
resolves this issue.  In CUDA 8.0 the NVML stubs can be found in the toolkit at 
the following path:  ${CUDA_HOME}/lib64/stubs
  
For 8.0 I’d suggest updating the configure/make scripts to look for nvml there and link in the stubs.  This way the build is not dependent on the driver being installed and only the toolkit.
  
Thanks,

Justin
  
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Justin Luitjens

Sent: Tuesday, October 18, 2016 9:53 AM
To: users@lists.open-mpi.org
Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0
  
I have the release version of CUDA 8.0 installed and am trying to build OpenMPI.
  
Here is my configure and build line:
  
./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm= --with-openib= && make && sudo make install
  
Where CUDA_HOME points to the cuda install path.
  
When I run the above command it builds for quite a while but eventually errors out wit this:
  
make[2]: Entering directory `/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers'

   CCLD opal_wrapper
../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlInit_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to 
`nvmlDeviceGetHandleByIndex_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to 
`nvmlDeviceGetCount_v2'
  
  
Any idea what I might need to change to get around this error?
  
Thanks,

Justin
This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users




___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-19 Thread Jeff Squyres (jsquyres)
Justin --

Fair point.  Can you work with Sylvain Jeaugey (at Nvidia) to submit a pull 
request for this functionality?

Thanks.


> On Oct 18, 2016, at 2:26 PM, Justin Luitjens <jluitj...@nvidia.com> wrote:
> 
> After looking into this a bit more it appears that the issue is I am building 
> on a head node which does not have the driver installed.  Building on back 
> node resolves this issue.  In CUDA 8.0 the NVML stubs can be found in the 
> toolkit at the following path:  ${CUDA_HOME}/lib64/stubs
>  
> For 8.0 I’d suggest updating the configure/make scripts to look for nvml 
> there and link in the stubs.  This way the build is not dependent on the 
> driver being installed and only the toolkit.
>  
> Thanks,
> Justin
>  
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Justin 
> Luitjens
> Sent: Tuesday, October 18, 2016 9:53 AM
> To: users@lists.open-mpi.org
> Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0
>  
> I have the release version of CUDA 8.0 installed and am trying to build 
> OpenMPI.
>  
> Here is my configure and build line:
>  
> ./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm= 
> --with-openib= && make && sudo make install
>  
> Where CUDA_HOME points to the cuda install path.  
>  
> When I run the above command it builds for quite a while but eventually 
> errors out wit this:
>  
> make[2]: Entering directory 
> `/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers'
>   CCLD opal_wrapper
> ../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlInit_v2'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `nvmlDeviceGetHandleByIndex_v2'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `nvmlDeviceGetCount_v2'
>  
>  
> Any idea what I might need to change to get around this error?
>  
> Thanks,
> Justin
> This email message is for the sole use of the intended recipient(s) and may 
> contain confidential information.  Any unauthorized review, use, disclosure 
> or distribution is prohibited.  If you are not the intended recipient, please 
> contact the sender by reply email and destroy all copies of the original 
> message.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-18 Thread Justin Luitjens
After looking into this a bit more it appears that the issue is I am building 
on a head node which does not have the driver installed.  Building on back node 
resolves this issue.  In CUDA 8.0 the NVML stubs can be found in the toolkit at 
the following path:  ${CUDA_HOME}/lib64/stubs

For 8.0 I'd suggest updating the configure/make scripts to look for nvml there 
and link in the stubs.  This way the build is not dependent on the driver being 
installed and only the toolkit.

Thanks,
Justin

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Justin 
Luitjens
Sent: Tuesday, October 18, 2016 9:53 AM
To: users@lists.open-mpi.org
Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0

I have the release version of CUDA 8.0 installed and am trying to build OpenMPI.

Here is my configure and build line:

./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm= 
--with-openib= && make && sudo make install

Where CUDA_HOME points to the cuda install path.

When I run the above command it builds for quite a while but eventually errors 
out wit this:

make[2]: Entering directory 
`/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers'
  CCLD opal_wrapper
../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlInit_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to 
`nvmlDeviceGetHandleByIndex_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to 
`nvmlDeviceGetCount_v2'


Any idea what I might need to change to get around this error?

Thanks,
Justin

This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-18 Thread Justin Luitjens
I have the release version of CUDA 8.0 installed and am trying to build OpenMPI.

Here is my configure and build line:

./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm= 
--with-openib= && make && sudo make install

Where CUDA_HOME points to the cuda install path.

When I run the above command it builds for quite a while but eventually errors 
out wit this:

make[2]: Entering directory 
`/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers'
  CCLD opal_wrapper
../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlInit_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to 
`nvmlDeviceGetHandleByIndex_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to 
`nvmlDeviceGetCount_v2'


Any idea what I might need to change to get around this error?

Thanks,
Justin

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users