Hello,
Writing between lines...
El 04/10/2017 a las 18:52, Jeffrey Frey escribió:
I didn't realize prior to this that the "--distribution" flag to "sbatch"
only affects how an "srun" within the batch script will make CPU allocations.
Prior to that happening, SLURM must allocate CPUs to the batch job, and _that_
distribution is dictated by how you have the "select/cons_res" plugin
configured:
SelectType=select/cons_res
SelectTypeParameters=CR_Core
The default behavior is to spread the allocation across the available
nodes -- thus, 4/4/3/3/3. If you'd rather "pack" allocations onto the nodes,
enable the CR_PACK_NODES option:
SelectType=select/cons_res
SelectTypeParameters=CR_Core,CR_Pack_Nodes
OK, I have added "CR_Pack_Nodes"...
This will produce the 4/4/4/4/1 allocation pattern. AFAIK there's no way
to alter which CPU allocation pattern gets used on a per-job basis.
Nop, result is not 4/4/4/4/1... submiting with "sbatch" and running
"srun", not "mpirun" after compiling OpenMPI with --mpi=pmi2 support
Once the job has been assigned nodes and CPUs on those nodes, the
"--distribution" option you provide informs "srun" how to distribute the tasks
it starts. Not using "srun" to start the MPI program, Open MPI itself knows
nothing beyond seeing
SLURM_NODELIST=n[009-013]
SLURM_TASKS_PER_NODE=4(x2),3(x3)
in the environment which produces the host list
n009:4
n010:4
n011:3
n012:3
n013:3
for which the --map-by and --rank-by options to "mpirun" will affect the
distribution.
How could I test with a small program how the cores are filling
step-by-step? Because my outfile file shows my "n" tasks line by line,
but each line starts with "Process 0 on", so my "n" tasks seem to be
always the task number 0...
My "hostname" program is:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name,
numprocs);
MPI_Finalize();
}
This small program, once compiled and running with sbatch+mpirun shows
for a 12 tasks running in 12 nodes:
Process 0 on clus01.hpc.local out of 12
Process 3 on clus04.hpc.local out of 12
Process 7 on clus08.hpc.local out of 12
Process 8 on clus09.hpc.local out of 12
Process 4 on clus05.hpc.local out of 12
Process 2 on clus03.hpc.local out of 12
Process 9 on clus10.hpc.local out of 12
Process 10 on clus11.hpc.local out of 12
Process 5 on clus06.hpc.local out of 12
Process 11 on clus12.hpc.local out of 12
Process 6 on clus07.hpc.local out of 12
Process 1 on clus02.hpc.local out of 12
However, if I run with sbatch+srun, output is:
Process 0 on clus01.hpc.local out of 1
Process 0 on clus05.hpc.local out of 1
Process 0 on clus03.hpc.local out of 1
Process 0 on clus04.hpc.local out of 1
Process 0 on clus02.hpc.local out of 1
Process 0 on clus06.hpc.local out of 1
Process 0 on clus12.hpc.local out of 1
Process 0 on clus08.hpc.local out of 1
Process 0 on clus11.hpc.local out of 1
Process 0 on clus07.hpc.local out of 1
Process 0 on clus10.hpc.local out of 1
Process 0 on clus09.hpc.local out of 1
Help, please...
On Oct 3, 2017, at 8:26 PM, Christopher Samuel <!-- tmpl_var
LEFT_BRACKET -->1<!-- tmpl_var RIGHT_BRACKET --> <[email protected]>
wrote:
On 02/10/17 20:51, Sysadmin CAOS wrote:
I'm execution my MPI program with "mpirun"... Maybe could be this the
problem? Do I need to execute with "srun"?
I suspect so, try it and see..
--
Christopher Samuel Senior Systems Administrator
Melbourne Bioinformatics - The University of Melbourne
Email: <!-- tmpl_var LEFT_BRACKET -->2<!-- tmpl_var RIGHT_BRACKET -->
[email protected] Phone: +61 (0)3 903 55545
::::::::::::::::::::::::::::::::::::::::::::::::::::::
Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE 19716
Office: (302) 831-6034 Mobile: (302) 419-4976
::::::::::::::::::::::::::::::::::::::::::::::::::::::
<!-- tmpl_var LEFT_BRACKET -->1<!-- tmpl_var RIGHT_BRACKET -->
mailto:[email protected]
<!-- tmpl_var LEFT_BRACKET -->2<!-- tmpl_var RIGHT_BRACKET -->
mailto:[email protected]