And if there is no --cpu_bind on the cmd line? Do these not exist?
> On Oct 27, 2016, at 10:14 AM, Andy Riebs <andy.ri...@hpe.com> wrote: > > Hi Ralph, > > I think I've found the magic keys... > > $ srun --ntasks-per-node=2 -N1 --cpu_bind=none env | grep BIND > SLURM_CPU_BIND_VERBOSE=quiet > SLURM_CPU_BIND_TYPE=none > SLURM_CPU_BIND_LIST= > SLURM_CPU_BIND=quiet,none > SLURM_CPU_BIND_VERBOSE=quiet > SLURM_CPU_BIND_TYPE=none > SLURM_CPU_BIND_LIST= > SLURM_CPU_BIND=quiet,none > $ srun --ntasks-per-node=2 -N1 --cpu_bind=core env | grep BIND > SLURM_CPU_BIND_VERBOSE=quiet > SLURM_CPU_BIND_TYPE=mask_cpu: > SLURM_CPU_BIND_LIST=0x1111,0x2222 > SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222 > SLURM_CPU_BIND_VERBOSE=quiet > SLURM_CPU_BIND_TYPE=mask_cpu: > SLURM_CPU_BIND_LIST=0x1111,0x2222 > SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222 > > Andy > > On 10/27/2016 11:57 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote: >> Hey Andy >> >> Is there a SLURM envar that would tell us the binding option from the srun >> cmd line? We automatically bind when direct launched due to user complaints >> of poor performance if we don’t. If the user specifies a binding option, >> then we detect that we were already bound and don’t do it. >> >> However, if the user specifies that they not be bound, then we think they >> simply didn’t specify anything - and that isn’t the case. If we can see >> something that tells us “they explicitly said not to do itâ€, then we can >> avoid the situation. >> >> Ralph >> >>> On Oct 27, 2016, at 8:48 AM, Andy Riebs <andy.ri...@hpe.com >>> <mailto:andy.ri...@hpe.com>> wrote: >>> >>> Hi All, >>> >>> We are running Open MPI version 1.10.2, built with support for Slurm >>> version 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to bind >>> by core, which segv's if there are more processes than cores. >>> >>> The user reports: >>> >>> What I found is that >>> >>> % srun --ntasks-per-node=8 --cpu_bind=none \ >>> env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0 >>> >>> will have the problem, but: >>> >>> % srun --ntasks-per-node=8 --cpu_bind=none \ >>> env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0 >>> >>> Will run as expected and print out the usage message because I didn’t >>> provide the right arguments to the code. >>> >>> So, it appears that the binding has something to do with the issue. My >>> binding script is as follows: >>> >>> % cat bindit.sh >>> #!/bin/bash >>> >>> #echo SLURM_LOCALID=$SLURM_LOCALID >>> >>> stride=1 >>> >>> if [ ! -z "$SLURM_LOCALID" ]; then >>> let bindCPU=$SLURM_LOCALID*$stride >>> exec numactl --membind=0 --physcpubind=$bindCPU $* >>> fi >>> >>> $* >>> >>> % >>> >>> >>> -- >>> Andy Riebs >>> andy.ri...@hpe.com >>> Hewlett-Packard Enterprise >>> High Performance Computing Software Engineering >>> +1 404 648 9024 >>> My opinions are not necessarily those of HPE >>> May the source be with you! >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users