I'm looking to get better information from the GPUs in my cluster. But I'm having some issues compiling gpu-loadsensor.c. After editing the Makefile, this is the output of a compilation attempt:

cc -DHAVE_NVML=1 -DHAVE_OPENCL=0 -I/usr/local/cuda/include -g -o gpu-loadsensor 
gpu-loadsensor.c -L/usr/local/cuda/lib64/stubs
/tmp/cc3zE1EA.o: In function `my_nvmlErrorString':
gpu-loadsensor.c:238: undefined reference to `nvmlErrorString'
/tmp/cc3zE1EA.o: In function `gpu_init':
gpu-loadsensor.c:325: undefined reference to `nvmlInit_v2'
gpu-loadsensor.c:331: undefined reference to `nvmlSystemGetNVMLVersion'
gpu-loadsensor.c:336: undefined reference to `nvmlSystemGetDriverVersion'
/tmp/cc3zE1EA.o: In function `shutdown':
gpu-loadsensor.c:358: undefined reference to `nvmlShutdown'
/tmp/cc3zE1EA.o: In function `set_n_dev':
gpu-loadsensor.c:369: undefined reference to `nvmlDeviceGetCount_v2'
/tmp/cc3zE1EA.o: In function `print_nvml':
gpu-loadsensor.c:418: undefined reference to `nvmlDeviceGetHandleByIndex_v2'
gpu-loadsensor.c:422: undefined reference to `nvmlDeviceGetName'
gpu-loadsensor.c:427: undefined reference to `nvmlDeviceGetMemoryInfo'
gpu-loadsensor.c:433: undefined reference to 
`nvmlDeviceGetComputeRunningProcesses'
gpu-loadsensor.c:442: undefined reference to `nvmlDeviceGetMaxClockInfo'
gpu-loadsensor.c:447: undefined reference to `nvmlDeviceGetUtilizationRates'
collect2: error: ld returned 1 exit status
make: *** [gpu-loadsensor] Error 1

The same occurs if I use nvcc and/or if I install the drivers and do '-L/usr/lib64/nvidia'. I'm using cuda-9.1, but the same issue was there with 8.0. And I should confirm that libnvidia-ml.so is definitely there. Any ideas? Thanks.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to