to view the presentations
later on. Can the broadcasts be made available for viewing later on?
Thanks,
Ole
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark
On 18-08-2020 17:36, Jason Simms wrote:
Hello everyone! We have a script that queries our LDAP server for any
users that have an entitlement to use the cluster, and if they don't
already have an account on the cluster, one is created for them. In
addition, they need to be added to the Slurm
the nodes as they become idle, thereby
performing the updates that you want. Remove the crontab job as part of
the update.sh script.
The update.sh script and instructions for usage are in:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/nodes
Comments are welcome.
/Ole
--
Ole Holm N
On 06-08-2020 19:13, Jason Simms wrote:
Later this month, I will have to bring down, patch, and reboot all nodes
in our cluster for maintenance. The two options available to set nodes
into a maintenance mode seem to be either: 1) creating a system-wide
reservation, or 2) setting all nodes into
://bugs.schedmd.com/show_bug.cgi?id=9257
Upgrade to Slurm 20.02 is highly recommended.
/Ole
On 7/12/20 3:36 PM, Ole Holm Nielsen wrote:
In case your Arp cache is the problem, there is some advice in the Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-arp-cache-for-large-networks
In case your Arp cache is the problem, there is some advice in the Wiki
page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-arp-cache-for-large-networks
I think there are other causes for ReqNodeNotAvail, for example, the
node being allocated for other jobs. The "scontrol
On 7/5/20 5:42 PM, Fred Liu wrote:
It looks job comment won't be saved into job completion data, for I can't see
it when I use sacct -c
But I can see it when I use sacct(without -c).
Is it possible to make it work?
The sacct manual page explains the -c parameter:
-c, --completion
Use
On 7/1/20 11:10 AM, Gestió Servidors wrote:
I want to limit users to allow SSH connection to compute nodes. I have
read at https://slurm.schedmd.com/pam_slurm_adopt.html that
“pam_slurm_adopt” allows a SSH connection if and only if that user has a
job (or more than one) running in that node.
On 6/30/20 4:52 PM, Lawrence Stewart wrote:
How does one configure the runtime priority of a job? That is, how do you set
the CPU scheduling “nice” value?
We’re using Slurm to share a large (16 core 768 GB) server among FPGA
compilation jobs. Slurm handles core and memory reservations just
On 19-06-2020 18:55, Mark Hahn wrote:
The host-based SSH authentication is a good idea, but only inside the
cluster's security perimeter, and one should not trust computers
external to the cluster nodes in this way.
Even more than that! Hostbased allows you to define intersecting sets of
The scontrol command to set the nice level is on the list here:
https://wiki.fysik.dtu.dk/niflheim/SLURM#useful-commands
/Ole
On 6/18/20 8:05 AM, navin srivastava wrote:
Thanks **
What is the command to modify the Nice value of an already submitted job.
Regards
Navin
On Thu, Jun 18, 2020 at
On 6/16/20 3:17 PM, Durai Arasan wrote:
Thank you. We are planning to put ssh keys on login nodes only and use the
PAM module to control access to compute nodes. Will such a setup work? Or
is it necessary for PAM to work to have the ssh keys on the compute nodes
as well? I'm sorry but this is
Today we upgraded the controller node from 19.05 to 20.02.3, and
immediately all Slurm commands (on the controller node) give error
messages for all partitions:
# sinfo --version
sinfo: error: NodeNames=a[001-140] CPUs=1 match no Sockets,
Sockets*CoresPerSocket or
ockets"
before installing/upgrading to 20.02.
It may be a good idea to track updates to bug
https://bugs.schedmd.com/show_bug.cgi?id=9241
Best regards,
Ole
16.06.2020 11:12 tarihinde Ole Holm Nielsen yazdı:
Today we upgraded the controller node from 19.05 to 20.02.3, and
immediatel
On 11-06-2020 14:24, navin srivastava wrote:
Hi Team,
when i am trying to start the slurmd process i am getting the below error.
2020-06-11T13:11:58.652711+02:00 oled3 systemd[1]: Starting Slurm node
daemon...
2020-06-11T13:13:28.683840+02:00 oled3 systemd[1]: slurmd.service: Start
operation
be used in special cases.
On 6/9/20 11:45 AM, Michael Jennings wrote:
On Tuesday, 09 June 2020, at 12:43:34 (+0200),
Ole Holm Nielsen wrote:
in which case you need to set up SSH authorized_keys files for such
users.
I'll admit that I didn't know about this until I came to LANL, but
there's
:
On Tuesday, 09 June 2020, at 12:43:34 (+0200),
Ole Holm Nielsen wrote:
in which case you need to set up SSH authorized_keys files for such
users.
I'll admit that I didn't know about this until I came to LANL, but
there's actually a much better alternative than having to create user
key pairs
On 6/9/20 12:12 PM, Steve Brasier wrote:
Hi all, looking for some advice on the process to following when doing one
of the reconfigurations which requires a slurm daemon restart (as listed
in docs for "scontrol reconfigure").
When reconfiguring slurm.conf, make sure to propagate that file to
orrect? So is it not sufficient for only the "slurm" linux user
to have passwordless ssh access to all nodes? Why do we have to give
passwordless ssh access to every user of the cluster?
Thanks,
Durai
Zentrum für Datenverarbeitung
Tübingen
On Mon, Jun 8, 2020 at 6:43 PM Ole Holm Nielsen
mail
On 08-06-2020 18:07, Jeffrey T Frey wrote:
There's a Slurm PAM module you can use to gate ssh access -- basically it
checks to see if the user has a job running on the node and moves any ssh
sessions to the first cgroup associated with that user on that node. If you
don't use cgroup resource
Hi Geoffrey,
I'm just curious as to what causes a user to decide that a given node
has an issue? If a node is healthy in all respects, why would a user
decide not to use the node?
We can certainly perform all sorts of node health checks from Slurm by
configuring the use of LBNL Node Health
the update before figuring out what is happening.
Best,
Ferran
--
*From:* slurm-users on behalf of
Ole Holm Nielsen
*Sent:* Tuesday, June 2, 2020 10:04:53 PM
*To:* slurm-users@lists.schedmd.com
*Subject:* Re: [slurm-user
file does not even exist.
Any idea?
Many thanks,
Ferran
*From:* slurm-users mailto:slurm-users-boun...@lists.schedmd.com>> on behalf of Ole
Holm Nielsen mailto:ole.h.niel...@fysik.dtu.dk>>
*Sent
time, see how to make
the setup in
https://wiki.fysik.dtu.dk/niflheim/Slurm_database#setting-database-purge-parameters
You can check your database purge configuration by:
$ grep -i purge /etc/slurm/slurmdbd.conf
Best regards,
Ole
-Original Message-
From: slurm-users On Behalf Of Ole
/var/run/munge
and my system now starts munge up fine during a reboot.
Renata
On Fri, 29 May 2020, Ole Holm Nielsen wrote:
Hi Ferran,
When you have a CentOS 7 system with the EPEL repo enabled, and you have
installed the munge RPM from EPEL, then things should be working correctly.
Since
On 6/2/20 10:16 AM, Sidhu, Khushwant wrote:
When a job is running & I use the command:
Sacct –format “AveCPU, AveDiskRead, AveDiskWrite,user” –j 12345
I get values for all parameters.
However, when a job is completed, the same command returns no values for
all but ‘user’.
Is there a reason
ge:
Unable to talk to NTP daemon. Is it running?
It is the same message I get in the nodes that DO work. All nodes are
sync in time and date with the central node
*From:* slurm-users on behalf of
Ole Holm Nielsen
*S
On 29-05-2020 08:46, Sudeep Narayan Banerjee wrote:
also check:
a) whether NTP has been setup and communicating with master node
b) iptables may be flushed (iptables -L)
c) SeLinux to disabled, to check :
getenforce
vim /etc/sysconfig/selinux
(change SELINUX=enforcing to SELINUX=disabled and
Good check, but I don't believe it is necessary to disable SELinux in
order to run Munge correctly. Our slurmctld server (CentOS 7.8) reports
Enforcing.
/Ole
On 28-05-2020 20:37, Ree, Jan-Albert van wrote:
Have you checked if SELinux is perhaps blocking this?
Give a 'getenforce' command.
On 5/20/20 7:57 AM, Ole Holm Nielsen wrote:
On 20-05-2020 00:03, Flynn, David P. (Dave) wrote:
Where does Slurm keep track of the latest jobid. Since it is persistent
across reboots, I suspect it’s in a file somewhere.
$ scontrol show config | grep MaxJobId
Sorry, I should have written
On 20-05-2020 00:03, Flynn, David P. (Dave) wrote:
Where does Slurm keep track of the latest jobid. Since it is persistent
across reboots, I suspect it’s in a file somewhere.
$ scontrol show config | grep MaxJobId
, but still get counted against the user's current
requests.
------------
*From:* Ole Holm Nielsen
*Sent:* Friday, May 8, 2020 9:27 AM
*To:* slurm-users@lists.schedmd.com
*Cc:* Renfro, Michael
*Subject:* Re: [slurm-users] scontrol show assoc_mgr showing more
resources in use than squeue
Hi Mic
ncorrect sacct output).
I've also gone through sacctmgr show runaway to clean up any runaway
jobs. I had lots, but they were all from a different user, and had no
effect on this particular user's values.
*From:* slurm
Hi Michael,
Maybe you will find a couple of my Slurm tools useful for displaying
data from the Slurm database in a more user-friendly format:
showjob: Show status of Slurm job(s). Both queue information and
accounting information is printed.
showuserlimits: Print Slurm resource user limits
be mixed as follows:
slurmdbd >= slurmctld >= slurmd >= commands
You cannot mix 20.02 with an old 17.02 or 17.11 Slurm!
/Ole
On 5/1/2020 10:45 AM, Ole Holm Nielsen wrote:
On 01-05-2020 09:21, Felix Farcas wrote:
I did install a new server ARC-CE with slurm-20.x
I installed only one node fo
On 01-05-2020 09:21, Felix Farcas wrote:
I did install a new server ARC-CE with slurm-20.x
I installed only one node for testing.
On my old nodes I do have slurm-17.x
Now I am asking, if I am going to move my old nodes to the new sever
with slurm.20.x do i have to remove old slurm and
On 30-04-2020 15:34, Bjørn-Helge Mevik wrote:
Gestió Servidors writes:
For example, with "scontrol show jobid" I can know what command has
been submited, its workir, the stderr file and the stdout one. This
information, I think, cannot be obtained when the job is finished and
I run "sacct".
b
in the spec file. Just makes it easier for people who aren't experts in
building packages. And slurmdbd will work with both mysql and mariadb
regardless of what it was built against.
Thanks for the update,
Christian.
On 20/04/2020 17.57, Ole Holm Nielsen wrote:
For the record: The Slurm developers h
On 21-04-2020 04:58, Haoyang Liu wrote:
I am setting up the latest slurm-20.02-1 on my clusters and trying to configure the
"configless" slurm on the compute nodes.
After following the instructions from
https://slurm.schedmd.com/configless_slurm.html, both slurmctld and slurmd
works fine.
The
For the record: The Slurm developers has found it tricky to write a
slurm.spec file which requires the mysql-devel package and still works
in all environments, see https://bugs.schedmd.com/show_bug.cgi?id=6488
My recommendation[1] is therefore to explicitly require mysql when
building Slurm
On 17-04-2020 11:47, Ole Holm Nielsen wrote:
On 17-04-2020 10:38, Christian Anthon wrote:
It would be neat to have these build requirements / install
requirements built into the spec file.
I agree with you, and it seems that the SchedMD pages no longer list the
build prerequisites (I think
would need to
have a support contract with SchedMD. We get a lot of benefit from
having such a support contract ;-)
Best regards,
Ole
On 17/04/2020 10.08, Ole Holm Nielsen wrote:
Hi Felix,
Please make sure to install all prerequisite packages on the Slurm
build host. I have summarized this i
Hi Felix,
Please make sure to install all prerequisite packages on the Slurm build
host. I have summarized this information in my Slurm Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#build-slurm-rpms
/Ole
On 17-04-2020 09:11, Felix Farcas wrote:
I am trying to build a
You might want to check the Munge section in my Slurm Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#munge-authentication-service
/Ole
On 15-04-2020 19:57, Dean Schulze wrote:
I've installed two new nodes onto my slurm cluster. One node works, but
the other one complains
Hi Alfonso,
You just need to get the CentOS 7 prerequisites right, check out my
Slurm installation Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#build-slurm-rpms
HTH,
Ole
On 07-04-2020 13:07, Alfonso Núñez Slagado wrote:
I'm trying to build rpm packages running
:00. Put the
numbers into an Excel spreadsheet.
As I said, you should look into proper Slurm accounting so that you can
get answers to relevant questions.
/Ole
On 02/04/20 5:34 pm, Ole Holm Nielsen wrote:
On 02-04-2020 13:27, Sudeep Narayan Banerjee wrote:
any help in getting the right
On 02-04-2020 13:27, Sudeep Narayan Banerjee wrote:
any help in getting the right flags ?
The question is not well-defined. If you just want to know the JobID
number in the cluster, you could run this command every day and watch
the NEXT_JOB_ID increase:
# scontrol show config | grep
carefully: https://wiki.fysik.dtu.dk/niflheim/Slurm_installation
To remove existing packages:
rpm -qa | grep slurm
yum remove
/Ole
*From:* slurm-users on behalf of
Ole Holm Nielsen
*Sent:* Wednesday, March 25, 2020 3
Hi Nilesh,
You may find relevant information about installing on CentOS/RHEL 7 in my
Slurm Wiki: https://wiki.fysik.dtu.dk/niflheim/Slurm_installation
/Ole
On 3/25/20 2:48 AM, Dhumal, Dr. Nilesh wrote:
Hello,
I am installing slurm on centos . I installed all supporting libraries
On 20-03-2020 00:24, Brian Christiansen wrote:
Google and SchedMD are pleased to announce the V3 release of the
slurm-gcp scripts. Check it out at:
https://github.com/schedmd/slurm-gcp
...
Send Questions / Feedback to the community:
google-cloud-slurm-disc...@google.com
On 11-03-2020 20:01, Will Dennis wrote:
I have one cluster running v16.05.4 that I would like to upgrade if
possible to 19.05.5; it was installed via a .deb package I created back
in 2016. I have located a 17.11.7 Ubuntu PPA
(https://launchpad.net/~jonathonf/+archive/ubuntu/slurm) and have
On 3/10/20 9:03 AM, sysadmin.caos wrote:
my SLURM cluster has configured a partition with a "TimeLimit" of 8 hours.
Now, a job is running during 9h30m and it has been not cancelled. During
these 9 hours and a half, a script has executed a "scontrol update
partition=mypartition state=down" for
On 3/4/20 10:12 AM, Alexander Grund wrote:
we have a Power9 partition with 44 processors having 4 cores each totaling
176.
What is your hardware configuration? Do you have 1 server with 44
processor sockets, and each processor has 4 CPU cores? Or is it maybe 1
server with 1 or more sockets
er reports as indicated?
I actually have a Slurm top-user accounting report tool which can select
partitions:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/slurmacct
but it doesn't generate the desired account utilization tree as provided
by sreport.
Thanks,
Ole
--
Ole Holm Nielsen
P
sacctmgr show association
You may use the (undocumented) format=... option to select only the
columns you want, for example:
sacctmgr show assoc format=user,account,qos
Usage of the format option is only given in the Examples section of the
sacctmgr page
On 2/19/20 3:10 PM, Ricardo Gregorio wrote:
I am putting together an upgrade plan for slurm on our HPC. We are
currently running old version 17.02.11. Would you guys advise us upgrading
to 18.08 or 19.05?
You should be able to upgrade 2 Slurm major versions in one step. The
18.08 version is
:
Hi ole,
Thanks Ole.
After setting the Enforce it worked.
I am new to slurm to thanks for helping me.
Regards
Navin
On Mon, Feb 17, 2020 at 5:36 PM Ole Holm Nielsen
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
Hi Navin,
I wonder if you have configured the Slurm da
On 2/17/20 1:19 PM, Parag Khuraswar wrote:
Hi Team,
Does Slurm provide cluster usage reports like mentioned below ?
Detailed reports about cluster usage statistics.
Reports of every user and jobs including their
monthly usage, node usage, percentage of
utilization, History tracking, number of
limit is set it should allow only 3 jobs at any point of time.
Regards
Navin.
On Mon, Feb 17, 2020 at 4:48 PM Ole Holm Nielsen
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
Hi Navin,
Why do you think the limit is not working? The MaxJobs limits the number
of running job
tava wrote:
Hi,
Thanks for your script.
with this i am able to show the limit what i set. but this limt is
not working.
MaxJobs = 3, current value = 0
Regards
Navin.
On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
On 2/17/20
On 2/17/20 11:16 AM, navin srivastava wrote:
i have an issue with the slurm job limit. i applied the Maxjobs limit on
user using
sacctmgr modify user navin1 set maxjobs=3
but still i see this is not getting applied. i am still bale to submit
more jobs.
Slurm version is 17.11.x
Let me
.
-Original Message-
From: slurm-users On Behalf Of Ole Holm
Nielsen
Sent: Friday, February 7, 2020 2:34 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Which ports does slurm use?
On 06-02-2020 22:40, Dean Schulze wrote:
I've moved two nodes to a different controller
On 06-02-2020 22:40, Dean Schulze wrote:
I've moved two nodes to a different controller. The nodes are wired and
the controller is networked via wifi. I had to open up ports 6817 and
6818 between the wired and wireless sides of our network to get any
connectivity.
Now when I do
srun -N2
On 27-01-2020 20:35, Mahmood Naderan wrote:
Hi
Is there any command to print current cgroup parameters or
configurations that are used by Slurm?
This works for me:
# scontrol show config | tail -22
Cgroup Support Configuration:
AllowedDevicesFile =
On 24-01-2020 20:22, Dean Schulze wrote:
Since there isn't a list for slurm development I'll ask here. Does the
slurm code include a library for making REST calls? I'm writing a
plugin that will make REST calls and if slurm already has one I'll use
that, otherwise I'll find one with an
and suggestions for improvement are welcome!
/Ole
On 1/18/20 12:06 PM, Ole Holm Nielsen wrote:
When we have created a new Slurm user with "sacctmgr create user
name=xxx", I would like inquire at a later date about the timestamp for
the user creation. As far as I can tell, the sacctm
Some examples are here:
https://wiki.fysik.dtu.dk/niflheim/Slurm_accounting#quality-of-service-qos
/Ole
On 19-12-2019 19:30, Prentice Bisbal wrote:
On 12/19/19 10:44 AM, Ransom, Geoffrey M. wrote:
The simplest is probably to just have a separate partition that will
only allow job times of
Hi Mike,
My showuserlimits tool prints nicely user limits from the Slurm database:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits
Maybe this can give you further insights into the source of problems.
/Ole
On 16-12-2019 17:27, Renfro, Michael wrote:
Hey, folks. I’ve
Forgot the link to the Wiki: https://wiki.fysik.dtu.dk/niflheim/SLURM
On 12/8/19 9:18 PM, Ole Holm Nielsen wrote:
Hi Dean,
You may want to look at the links in my Slurm Wiki page. Both the
official Slurm documentation and other resources are listed. I think most
of your requirements
Hi Dean,
You may want to look at the links in my Slurm Wiki page. Both the
official Slurm documentation and other resources are listed. I think
most of your requirements and questions are described in these pages.
My Wiki gives detailed deployment information for a CentOS 7 cluster,
but
On 11/28/19 11:47 AM, Nguyen Dai Quy wrote:
On Thu, Nov 28, 2019 at 11:20 AM Ole Holm Nielsen
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
On 11/28/19 10:35 AM, Nguyen Dai Quy wrote:
> Hi list,
> I can not submit my job:
> > sbatch submit.sh
> sba
On 11/28/19 10:35 AM, Nguyen Dai Quy wrote:
Hi list,
I can not submit my job:
> sbatch submit.sh
sbatch: error: Batch job submission failed: Invalid account or
account/partition combination specified
After checking slurmdbd.log, I see:
[2019-11-28T10:21:07.578] Accounting storage MYSQL
On 13-11-2019 18:04, Bas van der Vlies wrote:
We have currently version 18.08.7 installed on our cluster and want to
upgrade to 19.03.3.. So I wanted to start small and installed it one of
our compute node. Buy if I start the 'slurmd' then our slurmctld will
complain that:
{{{
Hi Daniel,
Thanks for sharing your insights! I have updated my Wiki page
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#install-mariadb-database
now.
/Ole
On 11/12/19 8:52 AM, Daniel Letai wrote:
On 11/12/19 9:34 AM, Ole Holm Nielsen wrote:
On 11/11/19 10:14 PM, Daniel Letai
On 11/12/19 8:10 AM, Nguyen Dai Quy wrote:
I have the same issue by compiling RPM. Just add "--with mysql" at
rpmbuild option and the error gone :-)
HTH,
That's an interesting observation! Do you know what the "--with mysql"
actually does?
IMHO, the Slurm .spec file should include all
s required by any of the mariadb packages, it'll get pulled
automatically. If not, you don't need it on the build system.
On 11/11/19 10:56 PM, Ole Holm Nielsen wrote:
Hi William,
Interesting experiences with MariaDB 10.4! I tried to collect the
instructions from the MariaDB page, but I'm un
server that will run slurmd, i.e. all compute
nodes. I expect that if I looked harder at the build options there may be a
way to do this, perhaps with linker flags.
For now, I can progress.
Thanks
William
-Original Message-
From: slurm-users On Behalf Of Ole Holm
Nielsen
Sent: 11
Hi,
Maybe my Slurm Wiki can help you build SLurm on CentOS/RHEL 7? See
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#build-slurm-rpms
Note in particular:
Important: Install the MariaDB (a replacement for MySQL) packages before you
build Slurm RPMs (otherwise some libraries will be
om:* slurm-users on behalf of
Ole Holm Nielsen
*Sent:* Friday, October 18, 2019 2:15 PM
*To:* slurm-users@lists.schedmd.com
*Subject:* [EXT] Re: [slurm-users] How to find core count per job per node
WARNING: This is an EXTERNAL email. Please think before RESPONDING or
CLICKING on links/a
FWIW, you may be interested in my Wiki on upgrading Slurm:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm
You should also read the pages on Upgrading in the presentation
Technical: Field Notes From A MadMan, Tim Wickberg, SchedMD from last
month's Slurm User Group
On 18-10-2019 19:56, Tom Wurgler wrote:
I need to know how many cores a given job is using per node.
Say my nodes have 24 cores each and I run a 36 way job.
It take a node and a half.
scontrol show job id
shows me 36 cores, and the 2 nodes it is running on.
But I want to know how it split the
sacctmgr delete user XXX
I would also like to mention my Slurm account and user updating tools:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/slurmaccounts
/Ole
On 10/10/19 1:41 PM, Mahmood Naderan wrote:
Hi
I had created multiple test users, and then removed them.
However, I see
an
NHC check "check_fs_used /scratch 90%").
Best regards,
Ole
On 10-09-2019 20:41, Michael Jennings wrote:
On Monday, 02 September 2019, at 20:02:57 (+0200),
Ole Holm Nielsen wrote:
We have some users requesting that a certain minimum size of the
*Available* (i.e., free) TmpFS
You should be able to assign node weights to accommodate your
prioritization wishes. I've summarized this setting in my Slurm Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-weight
I hope this helps.
/Ole
On 9/5/19 5:48 PM, Douglas Duckworth wrote:
Hello
We added
e idea is to
make the prolog set up a "project" disk quota for the job on the
localtmp file system, and the epilog to remove it again.
I'm not 100% sure we will make it work, but I'm hopeful. Fingers
crossed! :)
On 9/2/19 8:02 PM, Ole Holm Nielsen wrote:> We have some users
requesting tha
ce? And how can users specify the minimum
*Available* disk space required by their jobs submitted by "sbatch"?
If this is not feasible, are there other techniques that achieve the
same goal? We're currently still at Slurm 18.08.
Thanks,
Ole
--
Ole Holm Nielsen
PhD, Senior HPC
Hi Guillaume,
The performance of the slurmctld server depends strongly on the server
hardware on which it is running! This should be taken into account when
considering your question.
SchedMD recommends that the slurmctld server should have only a few, but
very fast CPU cores, in order to
regards,
Ole
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark
Andreas made a good suggestion of looking at the user's TRESRunMin from
sshare in order to answer Jeff's question about AssocGrpCPUMinutesLimit
for a job. However, getting at this information is in practice really
complicated, and I don't think any ordinary user will bother to look it up.
n really supportive
in testing showpartitions during development and comparing the output to
spart.
Thorsten Deilmann from University of Wuppertal has offered a number of
useful suggestions, including the colored output.
Best regards,
Ole
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark
On 7/9/19 10:14 AM, Priya Mishra wrote:
Hi Ole,
I am using slurm emulator and would soon start working with the slurm
simulator. I need these larger topology files for the purpose of a
project and not actual job scheduling. If there are any suitable
resources for me to use, please let me
On 7/9/19 9:04 AM, Priya Mishra wrote:
Hi,
I am using the slurmibtopology tool to generate the topology.conf file
from the cluster at my institute which gives me a file with around 400
nodes. I need a topology file with a larger no of nodes for further use.
Is there anyway of generating a
Hi Edward,
The squeue command tells you about job status. You can get extra
information using format options (see the squeue man-page). I like to
set this environment variable for squeue:
export SQUEUE_FORMAT="%.18i %.9P %.6q %.8j %.8u %.8a %.10T %.9Q %.10M
%.10V %.9l %.6D %.6C %m %R"
Hi Edward,
Besides my Slurm Wiki page https://wiki.fysik.dtu.dk/niflheim/SLURM, I
have written a number of tools which we use for monitoring our cluster,
see https://github.com/OleHolmNielsen/Slurm_tools. I recommend in
particular these tools:
* pestat Prints a Slurm cluster nodes status
On 7/2/19 10:48 PM, Tina Fora wrote:
We run mysql on a dedicated machine with slurmctld and slurmdbd running on
another machine. Now I want to add another machine running slurmctld and
slurmdbd and this machine with be on CentOS 7. Existing one is CentOS 6.
Is this possible? Can I run two
On 01-07-2019 21:47, HELLMERS Joe wrote:
I’m having trouble installing Slurm 18.08.7 on Red Hat 7.3.
I installed munge from source.
It may be easier for you to install Slurm with RPMs. A complete guide
is in my Slurm Wiki pages:
https://wiki.fysik.dtu.dk/niflheim/SLURM
On 6/28/19 9:57 AM, Valerio Bellizzomi wrote:
On Fri, 2019-06-28 at 09:39 +0200, Ole Holm Nielsen wrote:
On 6/28/19 9:18 AM, Valerio Bellizzomi wrote:
On Fri, 2019-06-28 at 08:51 +0200, Valerio Bellizzomi wrote:
On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote:
The nodes are now
On 6/28/19 9:18 AM, Valerio Bellizzomi wrote:
On Fri, 2019-06-28 at 08:51 +0200, Valerio Bellizzomi wrote:
On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote:
The nodes are now communicating however when I run the command
srun -w compute02 /bin/ls
it remains stuck and there is no
On 6/26/19 1:14 PM, John Marshall wrote:
On 26 Jun 2019, at 11:51, Ole Holm Nielsen wrote:
You should open a case with SchedMD containing your patch:
https://bugs.schedmd.com/
Yes, I considered creating a Bugzilla account at SchedMD so that I could send
them a three-line patch
On 6/26/19 12:23 PM, John Marshall wrote:
I have had $SQUEUE_FORMAT set in my environment for a long time, but have only
today learnt that sacct will also listen to an environment variable to set a
default output format. Previously I had only looked for it in the Environment
Variables section
301 - 400 of 480 matches
Mail list logo