On Feb 7, 2014, at 9:45 AM, Brice Goglin wrote:
> Le 06/02/2014 21:31, Brock Palen a écrit :
>> Actually that did turn out to help. The nvml# devices appear to be numbered
>> in the way that CUDA_VISABLE_DEVICES sees them, while the cuda# devices are
>> in the order
Le 06/02/2014 21:31, Brock Palen a écrit :
> Actually that did turn out to help. The nvml# devices appear to be numbered
> in the way that CUDA_VISABLE_DEVICES sees them, while the cuda# devices are
> in the order that PBS and nvidia-smi see them.
By the way, did you have CUDA_VISIBLE_DEVICES
Le 06/02/2014 21:31, Brock Palen a écrit :
> Actually that did turn out to help. The nvml# devices appear to be numbered
> in the way that CUDA_VISABLE_DEVICES sees them, while the cuda# devices are
> in the order that PBS and nvidia-smi see them.
>
> PCIBridge
> PCIBridge
>
Brock Palen, le Thu 06 Feb 2014 21:31:42 +0100, a écrit :
> GPU L#3 "nvml2"
> GPU L#5 "nvml3"
> GPU L#7 "nvml0"
> GPU L#9 "nvml1"
>
> Is the L# always going to be in the oder I would expect? Because then I
> already have my map then.
No,
Actually that did turn out to help. The nvml# devices appear to be numbered in
the way that CUDA_VISABLE_DEVICES sees them, while the cuda# devices are in the
order that PBS and nvidia-smi see them.
PCIBridge
PCIBridge
PCIBridge
PCI 10de:1021
Hello Brock,
Some people reported the same issue in the past and that's why we added
the "nvml" objects. CUDA reorders devices by "performance".
Batch-schedulers are somehow supposed to use "nvml" for managing GPUs
without actually using them with CUDA directly. And the "nvml" order is
the