Re: [OMPI users] Problem building OpenMPI with CUDA 8.0
Brice, unless you want to enable/disable nvml at runtime, and assuming we do not need nvml in Open MPI, and IMHO, the easiest workaround is to update https://github.com/open-mpi/ompi/blob/master/opal/mca/hwloc/hwloc1113/configure.m4 and add the oneliner enable_nvml=no a better option could be to update https://github.com/open-mpi/ompi/blob/master/opal/mca/hwloc/configure.m4 and pass the --enable-nvml option from Open MPI down to hwloc. Cheers, Gilles On 10/24/2016 4:45 PM, Brice Goglin wrote: FWIW, I am still open to implementing something to workaround this in hwloc. Could be shell variable such as HWLOC_DISABLE_NVML=yes for all our major configured dependencies. Brice Le 24/10/2016 02:12, Gilles Gouaillardet a écrit : Justin, iirc, NVML is only used by hwloc (e.g. not by CUDA) and there is no real benefit for having that. as a workaround, you can export enable_nvml=no and then configure && make install Cheers, Gilles On 10/20/2016 12:49 AM, Jeff Squyres (jsquyres) wrote: Justin -- Fair point. Can you work with Sylvain Jeaugey (at Nvidia) to submit a pull request for this functionality? Thanks. On Oct 18, 2016, at 2:26 PM, Justin Luitjens wrote: After looking into this a bit more it appears that the issue is I am building on a head node which does not have the driver installed. Building on back node resolves this issue. In CUDA 8.0 the NVML stubs can be found in the toolkit at the following path: ${CUDA_HOME}/lib64/stubs For 8.0 I’d suggest updating the configure/make scripts to look for nvml there and link in the stubs. This way the build is not dependent on the driver being installed and only the toolkit. Thanks, Justin From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Justin Luitjens Sent: Tuesday, October 18, 2016 9:53 AM To: users@lists.open-mpi.org Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0 I have the release version of CUDA 8.0 installed and am trying to build OpenMPI. Here is my configure and build line: ./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm= --with-openib= && make && sudo make install Where CUDA_HOME points to the cuda install path. When I run the above command it builds for quite a while but eventually errors out wit this: make[2]: Entering directory `/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers' CCLD opal_wrapper ../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlInit_v2' ../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetHandleByIndex_v2' ../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetCount_v2' Any idea what I might need to change to get around this error? Thanks, Justin This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Problem building OpenMPI with CUDA 8.0
FWIW, I am still open to implementing something to workaround this in hwloc. Could be shell variable such as HWLOC_DISABLE_NVML=yes for all our major configured dependencies. Brice Le 24/10/2016 02:12, Gilles Gouaillardet a écrit : > Justin, > > > iirc, NVML is only used by hwloc (e.g. not by CUDA) and there is no > real benefit for having that. > > as a workaround, you can > > export enable_nvml=no > > and then configure && make install > > Cheers, > > Gilles > > On 10/20/2016 12:49 AM, Jeff Squyres (jsquyres) wrote: >> Justin -- >> >> Fair point. Can you work with Sylvain Jeaugey (at Nvidia) to submit >> a pull request for this functionality? >> >> Thanks. >> >> >>> On Oct 18, 2016, at 2:26 PM, Justin Luitjens >>> wrote: >>> >>> After looking into this a bit more it appears that the issue is I am >>> building on a head node which does not have the driver installed. >>> Building on back node resolves this issue. In CUDA 8.0 the NVML >>> stubs can be found in the toolkit at the following path: >>> ${CUDA_HOME}/lib64/stubs >>> For 8.0 I’d suggest updating the configure/make scripts to look >>> for nvml there and link in the stubs. This way the build is not >>> dependent on the driver being installed and only the toolkit. >>> Thanks, >>> Justin >>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of >>> Justin Luitjens >>> Sent: Tuesday, October 18, 2016 9:53 AM >>> To: users@lists.open-mpi.org >>> Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0 >>> I have the release version of CUDA 8.0 installed and am trying to >>> build OpenMPI. >>> Here is my configure and build line: >>> ./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm= >>> --with-openib= && make && sudo make install >>> Where CUDA_HOME points to the cuda install path. >>> When I run the above command it builds for quite a while but >>> eventually errors out wit this: >>> make[2]: Entering directory >>> `/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers' >>>CCLD opal_wrapper >>> ../../../opal/.libs/libopen-pal.so: undefined reference to >>> `nvmlInit_v2' >>> ../../../opal/.libs/libopen-pal.so: undefined reference to >>> `nvmlDeviceGetHandleByIndex_v2' >>> ../../../opal/.libs/libopen-pal.so: undefined reference to >>> `nvmlDeviceGetCount_v2' >>> Any idea what I might need to change to get around this error? >>> Thanks, >>> Justin >>> This email message is for the sole use of the intended recipient(s) >>> and may contain confidential information. Any unauthorized review, >>> use, disclosure or distribution is prohibited. If you are not the >>> intended recipient, please contact the sender by reply email and >>> destroy all copies of the original message. >>> ___ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> > > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Problem building OpenMPI with CUDA 8.0
Justin, iirc, NVML is only used by hwloc (e.g. not by CUDA) and there is no real benefit for having that. as a workaround, you can export enable_nvml=no and then configure && make install Cheers, Gilles On 10/20/2016 12:49 AM, Jeff Squyres (jsquyres) wrote: Justin -- Fair point. Can you work with Sylvain Jeaugey (at Nvidia) to submit a pull request for this functionality? Thanks. On Oct 18, 2016, at 2:26 PM, Justin Luitjens wrote: After looking into this a bit more it appears that the issue is I am building on a head node which does not have the driver installed. Building on back node resolves this issue. In CUDA 8.0 the NVML stubs can be found in the toolkit at the following path: ${CUDA_HOME}/lib64/stubs For 8.0 I’d suggest updating the configure/make scripts to look for nvml there and link in the stubs. This way the build is not dependent on the driver being installed and only the toolkit. Thanks, Justin From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Justin Luitjens Sent: Tuesday, October 18, 2016 9:53 AM To: users@lists.open-mpi.org Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0 I have the release version of CUDA 8.0 installed and am trying to build OpenMPI. Here is my configure and build line: ./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm= --with-openib= && make && sudo make install Where CUDA_HOME points to the cuda install path. When I run the above command it builds for quite a while but eventually errors out wit this: make[2]: Entering directory `/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers' CCLD opal_wrapper ../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlInit_v2' ../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetHandleByIndex_v2' ../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetCount_v2' Any idea what I might need to change to get around this error? Thanks, Justin This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Problem building OpenMPI with CUDA 8.0
Justin -- Fair point. Can you work with Sylvain Jeaugey (at Nvidia) to submit a pull request for this functionality? Thanks. > On Oct 18, 2016, at 2:26 PM, Justin Luitjens wrote: > > After looking into this a bit more it appears that the issue is I am building > on a head node which does not have the driver installed. Building on back > node resolves this issue. In CUDA 8.0 the NVML stubs can be found in the > toolkit at the following path: ${CUDA_HOME}/lib64/stubs > > For 8.0 I’d suggest updating the configure/make scripts to look for nvml > there and link in the stubs. This way the build is not dependent on the > driver being installed and only the toolkit. > > Thanks, > Justin > > From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Justin > Luitjens > Sent: Tuesday, October 18, 2016 9:53 AM > To: users@lists.open-mpi.org > Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0 > > I have the release version of CUDA 8.0 installed and am trying to build > OpenMPI. > > Here is my configure and build line: > > ./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm= > --with-openib= && make && sudo make install > > Where CUDA_HOME points to the cuda install path. > > When I run the above command it builds for quite a while but eventually > errors out wit this: > > make[2]: Entering directory > `/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers' > CCLD opal_wrapper > ../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlInit_v2' > ../../../opal/.libs/libopen-pal.so: undefined reference to > `nvmlDeviceGetHandleByIndex_v2' > ../../../opal/.libs/libopen-pal.so: undefined reference to > `nvmlDeviceGetCount_v2' > > > Any idea what I might need to change to get around this error? > > Thanks, > Justin > This email message is for the sole use of the intended recipient(s) and may > contain confidential information. Any unauthorized review, use, disclosure > or distribution is prohibited. If you are not the intended recipient, please > contact the sender by reply email and destroy all copies of the original > message. > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Problem building OpenMPI with CUDA 8.0
After looking into this a bit more it appears that the issue is I am building on a head node which does not have the driver installed. Building on back node resolves this issue. In CUDA 8.0 the NVML stubs can be found in the toolkit at the following path: ${CUDA_HOME}/lib64/stubs For 8.0 I'd suggest updating the configure/make scripts to look for nvml there and link in the stubs. This way the build is not dependent on the driver being installed and only the toolkit. Thanks, Justin From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Justin Luitjens Sent: Tuesday, October 18, 2016 9:53 AM To: users@lists.open-mpi.org Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0 I have the release version of CUDA 8.0 installed and am trying to build OpenMPI. Here is my configure and build line: ./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm= --with-openib= && make && sudo make install Where CUDA_HOME points to the cuda install path. When I run the above command it builds for quite a while but eventually errors out wit this: make[2]: Entering directory `/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers' CCLD opal_wrapper ../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlInit_v2' ../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetHandleByIndex_v2' ../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetCount_v2' Any idea what I might need to change to get around this error? Thanks, Justin This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
[OMPI users] Problem building OpenMPI with CUDA 8.0
I have the release version of CUDA 8.0 installed and am trying to build OpenMPI. Here is my configure and build line: ./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm= --with-openib= && make && sudo make install Where CUDA_HOME points to the cuda install path. When I run the above command it builds for quite a while but eventually errors out wit this: make[2]: Entering directory `/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers' CCLD opal_wrapper ../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlInit_v2' ../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetHandleByIndex_v2' ../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetCount_v2' Any idea what I might need to change to get around this error? Thanks, Justin --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. --- ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users