This email is to announce the latest version of the job packs feature (heterogeneous resources and MPI-MPMD tight integration support) as open-source code.
This feature has been developed by Bull/Atos R&D over 2 years in a close technical relationship with major customers. You can find some SLUG presentations below: https://slurm.schedmd.com/SLUG16/Job_Packs_SUG_2016.pdf https://slurm.schedmd.com/SLUG15/Heterogeneous_Resources_and_MPMD.pdf The initial plan was to integrate this feature within the official version of Slurm 17.02. Unfortunately this will not happen due to a difference in architectural approaches. In future version of Slurm, our MPMD API may not be preserved. In this context, we have decided to provide the job packs feature for 17.02 on the Bull/Atos github "as is" under GPL license. For the sake of transparency, we have had several requests to do so, and we feel this can be helpful for the community users to be able to experiment the feature and to provide feedback on the functionality itself. The code can be cloned from this branch: https://github.com/RJMS-Bull/slurm/tree/dev-job-pack-17.02 and the documentation can be found here: https://github.com/RJMS-Bull/slurm/blob/dev-job-pack-17.02/doc/html/job_packs.shtml For any feedback you may contact [email protected]<mailto:[email protected]> and [email protected]<mailto:[email protected]>. Here is a selection of some of the most important changes provided within the new functionality in this branch: -introduce new packs dependencies including pack-leader and pack-members -update srun/salloc/sbatch specification to support different resources demands separated with semicolon -update resource selection algorithms to support job packs functionality -introduce --pack-group parameter within srun -update PMI and PMI2 libraries to support MPI-MPMD and the possibility to have different executables communicating in the same MPI_comm_world -introduce new environment variables to reflect job packs -update sacct,squeue,sinfo to support job packs -update scancel to support the termination of a pack-member without terminating the whole job packs.
