Re: [OMPI users] Open MPI 4.0.3 outside as well as inside a SimpleFOAM container: step creation temporarily disabled, retrying Requested nodes are busy

2023-03-01 Thread Rob Kudyba via users
>
> Do you invoke mpirun from **inside** the container?
>
> IIRC, mpirun is generally invoked from **outside** the container, could
> you try this if not already the case?
>
>
> The error message is from SLURM, so this is really a SLURM vs
> singularity issue.
>
> What if you
>
> srun -N 2 -n 2 hostname
>
> instead of
>
> mpirun ...



I checked with SchedMD and the combination of commands that worked are this:
salloc -t 0-04:00 -A ourgroup --nodes=2 --ntasks=4

mpirun -n 4  singularity exec openfoam10.sif /bin/bash -l -c  'source
/opt/openfoam10/etc/bashrc& -parallel'

There are probably other ways to get this to work but the above did the
trick.

Thanks for the suggestion.

Rob



>


Re: [OMPI users] Open MPI 4.0.3 outside as well as inside a SimpleFOAM container: step creation temporarily disabled, retrying Requested nodes are busy

2023-02-28 Thread Gilles Gouaillardet via users

Rob,


Do you invoke mpirun from **inside** the container?

IIRC, mpirun is generally invoked from **outside** the container, could 
you try this if not already the case?



The error message is from SLURM, so this is really a SLURM vs 
singularity issue.


What if you

srun -N 2 -n 2 hostname

instead of

mpirun ...


Cheers,


Gilles

On 3/1/2023 12:44 PM, Rob Kudyba via users wrote:
Singularity 3.5.3 on RHEL 7 cluster w/ OpenMPI 4.0.3 lives inside a 
SimpleFOAM version 10 container. I've confirmed the OpenMPI versions 
are the same. Perhaps this is a question for Singularity users as well 
but how can I troubleshoot why mpirun just returns step creation 
temporarily disabled, retrying Requested


Singularity> mpirun -V
mpirun (Open MPI) 4.0.3
Report bugs to http://www.open-mpi.org/community/help/
Singularity> which mpirun
/usr/bin/mpirun
Singularity>

$ mpirun -V
mpirun (Open MPI) 4.0.3

mpirun -n 2 -mca plm_base_verbose 100 --mca ras_base_verbose 100 --mca 
rss_base_verbose 100 --mca rmaps_base_verbose 100  singularity exec 
openfoam simpleFoam -fileHandler uncollated -parallel | tee log.simpleFoam
openfoam10/          openfoam10.sif openfoamtestfile.sh 
 openfoam_v2012.sif
[myuser@node047 motorBike]$ mpirun -n 2 -mca plm_base_verbose 100 
--mca ras_base_verbose 100 --mca rss_base_verbose 100 --mca 
rmaps_base_verbose 100  singularity exec openfoam   simpleFoam 
-fileHandler uncollated -parallel | tee log.simpleFoam
openfoam10/          openfoam10.sif openfoamtestfile.sh 
 openfoam_v2012.sif
[myuser@node047 motorBike]$ mpirun -n 2 -mca plm_base_verbose 100 
--mca ras_base_verbose 100 --mca rss_base_verbose 100 --mca 
rmaps_base_verbose 100  singularity exec openfoam10.sif   simpleFoam 
 -parallel | tee log.simpleFoam
[node047:11650] mca: base: components_register: registering framework 
plm components
[node047:11650] mca: base: components_register: found loaded component 
slurm
[node047:11650] mca: base: components_register: component slurm 
register function successful
[node047:11650] mca: base: components_register: found loaded component 
isolated
[node047:11650] mca: base: components_register: component isolated has 
no register or open function

[node047:11650] mca: base: components_register: found loaded component rsh
[node047:11650] mca: base: components_register: component rsh register 
function successful

[node047:11650] mca: base: components_open: opening plm components
[node047:11650] mca: base: components_open: found loaded component slurm
[node047:11650] mca: base: components_open: component slurm open 
function successful
[node047:11650] mca: base: components_open: found loaded component 
isolated
[node047:11650] mca: base: components_open: component isolated open 
function successful

[node047:11650] mca: base: components_open: found loaded component rsh
[node047:11650] mca: base: components_open: component rsh open 
function successful

[node047:11650] mca:base:select: Auto-selecting plm components
[node047:11650] mca:base:select:(  plm) Querying component [slurm]
[node047:11650] mca:base:select:(  plm) Query of component [slurm] set 
priority to 75

[node047:11650] mca:base:select:(  plm) Querying component [isolated]
[node047:11650] mca:base:select:(  plm) Query of component [isolated] 
set priority to 0

[node047:11650] mca:base:select:(  plm) Querying component [rsh]
[node047:11650] mca:base:select:(  plm) Query of component [rsh] set 
priority to 10

[node047:11650] mca:base:select:(  plm) Selected component [slurm]
[node047:11650] mca: base: close: component isolated closed
[node047:11650] mca: base: close: unloading component isolated
[node047:11650] mca: base: close: component rsh closed
[node047:11650] mca: base: close: unloading component rsh
[node047:11650] mca: base: components_register: registering framework 
ras components
[node047:11650] mca: base: components_register: found loaded component 
slurm
[node047:11650] mca: base: components_register: component slurm 
register function successful
[node047:11650] mca: base: components_register: found loaded component 
simulator
[node047:11650] mca: base: components_register: component simulator 
register function successful

[node047:11650] mca: base: components_open: opening ras components
[node047:11650] mca: base: components_open: found loaded component slurm
[node047:11650] mca: base: components_open: component slurm open 
function successful
[node047:11650] mca: base: components_open: found loaded component 
simulator

[node047:11650] mca:base:select: Auto-selecting ras components
[node047:11650] mca:base:select:(  ras) Querying component [slurm]
[node047:11650] mca:base:select:(  ras) Query of component [slurm] set 
priority to 50

[node047:11650] mca:base:select:(  ras) Querying component [simulator]
[node047:11650] mca:base:select:(  ras) Selected component [slurm]
[node047:11650] mca: base: close: unloading component simulator
[node047:11650] mca: base: components_register: registering framework 
rmaps components


[OMPI users] Open MPI 4.0.3 outside as well as inside a SimpleFOAM container: step creation temporarily disabled, retrying Requested nodes are busy

2023-02-28 Thread Rob Kudyba via users
Singularity 3.5.3 on RHEL 7 cluster w/ OpenMPI 4.0.3 lives inside a
SimpleFOAM version 10 container. I've confirmed the OpenMPI versions are
the same. Perhaps this is a question for Singularity users as well but how
can I troubleshoot why mpirun just returns step creation temporarily
disabled, retrying Requested

Singularity> mpirun -V
mpirun (Open MPI) 4.0.3
Report bugs to http://www.open-mpi.org/community/help/
Singularity> which mpirun
/usr/bin/mpirun
Singularity>

$ mpirun -V
mpirun (Open MPI) 4.0.3

mpirun -n 2 -mca plm_base_verbose 100 --mca ras_base_verbose 100 --mca
rss_base_verbose 100 --mca rmaps_base_verbose 100  singularity exec
openfoam   simpleFoam -fileHandler uncollated -parallel | tee log.simpleFoam
openfoam10/  openfoam10.sif   openfoamtestfile.sh
 openfoam_v2012.sif
[myuser@node047 motorBike]$ mpirun -n 2 -mca plm_base_verbose 100 --mca
ras_base_verbose 100 --mca rss_base_verbose 100 --mca rmaps_base_verbose
100  singularity exec openfoam   simpleFoam -fileHandler uncollated
-parallel | tee log.simpleFoam
openfoam10/  openfoam10.sif   openfoamtestfile.sh
 openfoam_v2012.sif
[myuser@node047 motorBike]$ mpirun -n 2 -mca plm_base_verbose 100 --mca
ras_base_verbose 100 --mca rss_base_verbose 100 --mca rmaps_base_verbose
100  singularity exec openfoam10.sif   simpleFoam  -parallel | tee
log.simpleFoam
[node047:11650] mca: base: components_register: registering framework plm
components
[node047:11650] mca: base: components_register: found loaded component slurm
[node047:11650] mca: base: components_register: component slurm register
function successful
[node047:11650] mca: base: components_register: found loaded component
isolated
[node047:11650] mca: base: components_register: component isolated has no
register or open function
[node047:11650] mca: base: components_register: found loaded component rsh
[node047:11650] mca: base: components_register: component rsh register
function successful
[node047:11650] mca: base: components_open: opening plm components
[node047:11650] mca: base: components_open: found loaded component slurm
[node047:11650] mca: base: components_open: component slurm open function
successful
[node047:11650] mca: base: components_open: found loaded component isolated
[node047:11650] mca: base: components_open: component isolated open
function successful
[node047:11650] mca: base: components_open: found loaded component rsh
[node047:11650] mca: base: components_open: component rsh open function
successful
[node047:11650] mca:base:select: Auto-selecting plm components
[node047:11650] mca:base:select:(  plm) Querying component [slurm]
[node047:11650] mca:base:select:(  plm) Query of component [slurm] set
priority to 75
[node047:11650] mca:base:select:(  plm) Querying component [isolated]
[node047:11650] mca:base:select:(  plm) Query of component [isolated] set
priority to 0
[node047:11650] mca:base:select:(  plm) Querying component [rsh]
[node047:11650] mca:base:select:(  plm) Query of component [rsh] set
priority to 10
[node047:11650] mca:base:select:(  plm) Selected component [slurm]
[node047:11650] mca: base: close: component isolated closed
[node047:11650] mca: base: close: unloading component isolated
[node047:11650] mca: base: close: component rsh closed
[node047:11650] mca: base: close: unloading component rsh
[node047:11650] mca: base: components_register: registering framework ras
components
[node047:11650] mca: base: components_register: found loaded component slurm
[node047:11650] mca: base: components_register: component slurm register
function successful
[node047:11650] mca: base: components_register: found loaded component
simulator
[node047:11650] mca: base: components_register: component simulator
register function successful
[node047:11650] mca: base: components_open: opening ras components
[node047:11650] mca: base: components_open: found loaded component slurm
[node047:11650] mca: base: components_open: component slurm open function
successful
[node047:11650] mca: base: components_open: found loaded component simulator
[node047:11650] mca:base:select: Auto-selecting ras components
[node047:11650] mca:base:select:(  ras) Querying component [slurm]
[node047:11650] mca:base:select:(  ras) Query of component [slurm] set
priority to 50
[node047:11650] mca:base:select:(  ras) Querying component [simulator]
[node047:11650] mca:base:select:(  ras) Selected component [slurm]
[node047:11650] mca: base: close: unloading component simulator
[node047:11650] mca: base: components_register: registering framework rmaps
components
[node047:11650] mca: base: components_register: found loaded component seq
[node047:11650] mca: base: components_register: component seq register
function successful
[node047:11650] mca: base: components_register: found loaded component
rank_file
[node047:11650] mca: base: components_register: component rank_file
register function successful
[node047:11650] mca: base: components_register: found loaded component
resilient