Yes I tried it but whit the same result
[email protected] -cuda +cxx_exceptions fabrics=ucx -java -legacylaunchers
-memchecker +pmi schedulers=slurm -sqlite3 -thread_multiple +vt
You can compile wrf , when you sbatch your job it is running but it doesn´t do
anything and we get the same, with WCHAN=hrtime
0 S 4556 87383 87361 0 80 0 - 126676 hrtime ? 00:05:25
real.exe
------------------------------
Message: 2
Date: Mon, 1 Jun 2020 16:56:05 +0000
From: "Pritchard Jr., Howard" <[email protected]>
To: Slurm User Community List <[email protected]>
Subject: Re: [slurm-users] [EXTERNAL] problems with OpenMPI 4.0.3
Message-ID: <[email protected]>
Content-Type: text/plain; charset="utf-8"
HI Angelines,
Could you try reinstalling with fabric=ucx and rerunning?
UCX is the preferred way to use Infiniband in the Open MPI 4.0.x release
stream.
Howard
?On 6/1/20, 10:29 AM, "slurm-users on behalf of Alberto Morillas,
Angelines" <[email protected] on behalf of
[email protected]> wrote:
Hello Howard
I installed it with spack:
[email protected] -cuda +cxx_exceptions fabrics=verbs -java
-legacylaunchers -memchecker +pmi schedulers=slurm -sqlite3 -thread_multiple
+vt
where - --> not enable
+ --> enable
Thanks in advance.
________________________________________________
Angelines Alberto Morillas
Unidad de Arquitectura Inform?tica
Despacho: 22.1.32
Telf.: +34 91 346 6119
Fax: +34 91 346 6537
skype: angelines.alberto
CIEMAT
Avenida Complutense, 40
28040 MADRID
________________________________________________
------------------------------
Message: 2
Date: Mon, 1 Jun 2020 16:13:11 +0000
From: "Pritchard Jr., Howard" <[email protected]>
To: Slurm User Community List <[email protected]>
Subject: Re: [slurm-users] [EXTERNAL] problems with OpenMPI 4.0.3
Message-ID: <[email protected]>
Content-Type: text/plain; charset="utf-8"
Hello Angelines,
Do you know how the Open MPI 4.0.3 package was configured and
built? That information would be useful to help diagnose the problem.
Thanks,
Howard
From: slurm-users <[email protected]> on behalf
of "Alberto Morillas, Angelines" <[email protected]>
Reply-To: Slurm User Community List <[email protected]>
Date: Friday, May 29, 2020 at 4:25 AM
To: "[email protected]" <[email protected]>
Subject: [EXTERNAL] [slurm-users] problems with OpenMPI 4.0.3
Good morning,
We have a cluster with two kind of infiniband cards, one connectx-4
and the other connectx-6.
Openmpi-3.1.3 works fine, but when we start with connectx-6 we
started to use openmpi-4.0.3 (that support connectx-6) and the programs that
have several parts, first a call to a secuencial program and inside it a call
to a parallel program, ? (in our case the program is WRF, but we have others
like this with the same problem), this kind of programs suddenly stop,
?..
0 S 4556 87383 87361 0 80 0 - 126676 hrtime ? 00:05:25
real.exe
0 S 4556 87384 87361 0 80 0 - 126677 hrtime ? 00:05:33
real.exe
0 S 4556 87385 87361 0 80 0 - 126675 hrtime ? 00:05:28
real.exe
??
The WCHAN=hrtime, and it looks that it is running, but really it
doesn?t work
We don?t know if it could be problem with slurm and this version
of openmpi? Any idea?
________________________________________________
Angelines Alberto Morillas
Unidad de Arquitectura Inform?tica
Despacho: 22.1.32
Telf.: +34 91 346 6119
Fax: +34 91 346 6537
skype: angelines.alberto
CIEMAT
Avenida Complutense, 40
28040 MADRID
________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.schedmd.com/pipermail/slurm-users/attachments/20200601/e0e1cbee/attachment-0001.htm>
------------------------------
Message: 3
Date: Mon, 1 Jun 2020 16:16:00 +0000
From: Songpon Srisawai <[email protected]>
To: Slurm User Community List <[email protected]>
Subject: Re: [slurm-users] Slurm Job Count Credit system
Message-ID: <9666f3be-d648-4ee9-9ad2-80df973f87cc@Spark>
Content-Type: text/plain; charset="utf-8"
Greatly appreciated for your help. I will try to implement
following your suggestion.
On 1 Jun 2020 22:23 +0700, Renfro, Michael <[email protected]>,
wrote:
Even without the slurm-bank system, you can enforce a limit on
resources with a QOS applied to those users. Something like:
=====
sacctmgr add qos bank1 flags=NoDecay,DenyOnLimit
sacctmgr modify qos bank1 set grptresmins=cpu=1000
sacctmgr add account bank1
sacctmgr modify account name=bank1 set qos+=bank1
sacctmgr add user someuser account=bank1
sacctmgr modify user someuser set qos+=bank1
=====
You can do lots with a QOS, including limiting the number of
simultaneous running jobs, simultaneous running/queued jobs, etc.
Unfortunately, the NoDecay flag is only documented to work on GrpTRESMins,
GrpWall, and UsageRaw, not on the job count.
So if you can live with limiting the number of simultaneous jobs
instead of a total number of jobs per time period, that?s possible with QOS.
Otherwise, maybe someone else will have an idea.
--
Mike Renfro, PhD / HPC Systems Administrator, Information
Technology Services
931 372-3601 / Tennessee Tech University
On May 31, 2020, at 11:35 AM, Songpon Srisawai
<[email protected]> wrote:
Hello all,
I?m Slurm beginner who try to implement our cluster. I would like
to know whether there are any Slurm credit/token system plugin such as the
number of job count.
I found Slurm-bank that deposit hour to an account. But, I would
like to deposit the jobs token instead of hours.
Thanks for any recommendation
Songpon
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.schedmd.com/pipermail/slurm-users/attachments/20200601/76ebd6f5/attachment.htm>
End of slurm-users Digest, Vol 32, Issue 2
******************************************
End of slurm-users Digest, Vol 32, Issue 3
******************************************