That SIGTERM message means something is telling slurmdbd to quit.
Check your cron jobs, maintenance scripts, etc. Slurmdbd is being told
to shutdown. If you are running in the foreground, a ^C does that. If
you run a kill or killall on it, you will get that same message.
Brian Andrus
On
Oh, to address the passed train:
Restore the archive data with "sacctmgr archive load", then you can do
as you need.
From man sacctmgr:
*archive*{dump|load}
Write database information to a flat file or load information that
has previously been written to a file.
Brian Andrus
Setup
Instead of using the archive files, couldn't you query the db directly
for the info you need?
I would recommend sacct/sreport if those can get the info you need.
Brian Andrus
On 5/28/2024 9:59 AM, O'Neal, Doug (NIH/NCI) [C] via slurm-users wrote:
My organization needs to access historic job
/23/2024 6:16 AM, Christopher Samuel via slurm-users wrote:
On 5/22/24 3:33 pm, Brian Andrus via slurm-users wrote:
A simple example is when you have nodes with and without GPUs.
You can build slurmd packages without for those nodes and with for
the ones that have them.
FWIW we have both GPU
Not that I recommend it much, but you can build them for each
environment and install the ones needed in each.
A simple example is when you have nodes with and without GPUs.
You can build slurmd packages without for those nodes and with for the
ones that have them.
Generally, so long as
Rike,
Assuming the data, scripts and other dependencies are already on the
cluster, you could just ssh and execute the sbatch command in a single
shot: ssh submitnode sbatch some_script.sh
It will ask for a password if appropriate and could use ssh keys to
bypass that need.
Brian Andrus
/...). Wouldn't Slurm pick up that one?
Thanks!
Jeff
On Fri, Apr 19, 2024 at 1:11 PM Brian Andrus via slurm-users
wrote:
This is because you have no slurm.conf in /etc/slurm, so it it is
trying 'configless' which queries DNS to find out where to get the
config. It is failing because
This is because you have no slurm.conf in /etc/slurm, so it it is trying
'configless' which queries DNS to find out where to get the config. It
is failing because you do not have DNS configured to tell nodes where to
ask about the config.
Simple solution: put a copy of slurm.conf in
Xaver,
If you look at your slurmctld log, you likely end up seeing messages
about each node's slurm.conf not being the same as that on the master.
So, yes, it can work temporarily, but unless there are some very
specific settings done, issues will arise. The state you are in now, you
will
Yes. You can build the 8 rpms on 9. Look at 'mock' to do so. I did
similar when I still had to support EL7
Fairly generic plan, the devil is in the details and verifying each
step, but those are the basic bases you need to touch.
Brian Andrus
On 4/10/2024 1:48 PM, Steve Berg via
Xaver,
You may want to look at the ResumeRate option in slurm.conf:
ResumeRate
The rate at which nodes in power save mode are returned to normal
operation by ResumeProgram. The value is a number of nodes per
minute and it can be used to prevent power surges if a large number
of
, Brian Andrus via slurm-users
ha scritto:
Quick correction, it is SaveStateLocation not SlurmSaveState.
Brian Andrus On 3/25/2024 8:11 AM, Miriam Olmi via slurm-users wrote:
Dear all, I am having trouble finalizing the configuration of
the backup controller for my slurm
Quick correction, it is SaveStateLocation not SlurmSaveState.
Brian Andrus
On 3/25/2024 8:11 AM, Miriam Olmi via slurm-users wrote:
Dear all,
I am having trouble finalizing the configuration of the backup
controller for my slurm cluster.
In principle, if no job is running everything seems
Miriam,
You need to ensure the SlurmSaveState directory is the same for both.
And by 'the same', I mean all contents are exactly the same.
This is usually achieved by using a shared drive or replication.
Brian Andrus
On 3/25/2024 8:11 AM, Miriam Olmi via slurm-users wrote:
Dear all,
I am
Wow, snazzy!
Looks very good. My compliments.
Brian Andrus
On 3/12/2024 11:24 AM, Victoria Hobson via slurm-users wrote:
Our website has gone through some much needed change and we'd love for
you to explore it!
The new SchedMD.com is equipped with the latest information about
Slurm, your
Chip,
I use 'sacct' rather than sreport and get individual job data. That is
ingested into a db and PowerBI, which can then aggregate as needed.
sreport is pretty general and likely not the best for accurate
chargeback data.
Brian Andrus
On 3/4/2024 6:09 AM, Chip Seraphine via slurm-users
Joseph,
You will likely get many perspectives on this. I disable swap completely
on our compute nodes. I can be draconian that way. For the workflow
supported, this works and is a good thing.
Other workflows may benefit from swap.
Brian Andrus
On 3/3/2024 11:04 PM, John Joseph via
oxy>
Brian Andrus
On 2/28/2024 12:54 PM, Dan Healy wrote:
Are most of us using HAProxy or something else?
On Wed, Feb 28, 2024 at 3:38 PM Brian Andrus via slurm-users
wrote:
Magnus,
That is a feature of the load balancer. Most of them have that
these days.
Brian
Magnus,
That is a feature of the load balancer. Most of them have that these days.
Brian Andrus
On 2/28/2024 12:10 AM, Hagdorn, Magnus Karl Moritz via slurm-users wrote:
On Tue, 2024-02-27 at 08:21 -0800, Brian Andrus via slurm-users wrote:
for us, we put a load balancer in front
Josef,
for us, we put a load balancer in front of the login nodes with session
affinity enabled. This makes them land on the same backend node each time.
Also, for interactive X sessions, users start a desktop session on the
node and then use vnc to connect there. This accommodates
I imagine you could create a reservation for the node and then when you
are completely done, remove the reservation.
Each helper could then target the reservation for the job.
Brian Andrus
On 2/9/2024 5:52 PM, Alan Stange via slurm-users wrote:
Chip,
Thank you for your prompt response. We
21 matches
Mail list logo