Hi,
see the man page for slurm.conf:
TmpFS
Fully qualified pathname of the file system available to user jobs for
temporary storage. This parameter is used in
establishing a node's TmpDisk space. The default value is "/tmp".
So it is using /tmp. You need to change that parameter to
> Yes, this is possible, but I would say it's discouraged to do so.
> With RHEL/CentOS 7 you really should be using firewalld, and forget about the
> old iptables. Here's a nice introduction:
> https://www.certdepot.net/rhel7-get-started-firewalld/
>
> Having worked with firewalld for a while
Alternatively you can
systemctl disable firewalld.service
systemctl mask firewalld.service
yum install iptables-services
systemctl enable iptables.service ip6tables.service
and configure configure iptables in /etc/sysconfig/iptables and
/etc/sysconfig/ip6tables, then
systemctl
-
tition name then within a given partition by increasing step id).
Regards,
Uwe Sauter
:
Is there a time limit set on the queue (rather than the user)?
On 04/26/2017 12:57 PM, Uwe Sauter wrote:
Hi all,
I have a mysterios situation where a user's job is killed after 24h though he specified
"-t 7-00:00:00"
on submission. This happened to several jobs of this user in the las
Hi all,
I have a mysterios situation where a user's job is killed after 24h though he specified
"-t 7-00:00:00"
on submission. This happened to several jobs of this user in the last few days.
The account he's using is has MaxWall set to 7-00:00:00. There is no QoS used.
In
On modern systems, nscd or nslcd should have been replaced by sssd. sssd has
much better caching then the older services.
Am 11.04.2017 um 17:17 schrieb Benjamin Redling:
>
> AFAIK most request never hit LDAP servers.
> In production there is always a cache on the client side -- nscd might
>
Ray,
if you're going with the easy "copy" method just be sure that the nodes are all
in the same state (user management-wise) before
you do your first copy. Otherwise you might accidentally delete already
existing users.
I also encourage you to have a look into Ansible which makes it easy to
For someone with no experience in LDAP deployment, yes, LDAP is a big issue.
And depending on the cluster size, there are
different possibilities.
>From a different point of view: tools like Salt/Ansible/… will require almost
>everytime some kind of local storage (local
installation of OS)
Do you have limits (per partition / group), QoS (with limits per user), etc
configured?
Am 19.12.2016 um 15:52 schrieb Wiegand, Paul:
> Greetings,
>
>
> We were running slurm 16.05.0 and just upgraded to 16.05.7 during our Fall
> maintenance cycle along with other changes.
> Now we are
15.12.2016 um 11:26 schrieb Stefan Doerr:
> But this doesn't answer my question why it reports 10 times as much memory
> usage as it is actually using, no?
>
> On Wed, Dec 14, 2016 at 1:00 PM, Uwe Sauter <uwe.sauter...@gmail.com
> <mailto:uwe.sauter...@gmail.com>> wrote:
There are only two memory related options "--mem" and "--mem-per-cpu".
--mem tells slurm the memory requirement of the job (if used with sbatch) or
the step (if used with srun). But not the requirement
of each process.
--mem-per-cpu is used in combination with --ntasks and --cpus-per-task. If
Dear list,
this week I updated from 15.8.12 to 16.05.6. Together with this upgrade I also
changed some of the configuration
options to allow a shared usage (user exclusive) of nodes. Since then some of
my users report that their jobs get
killed when they allocate more than half of the
Also. CPUs=32 is wrong. You need
Sockets=2 CoresPerSocket=8 ThreadsPerCore=2
Am 12.09.2016 um 16:02 schrieb alex straza:
> hello,
>
> We have some slurm nodes that have 32 CPUS - two 8-core processors with
> hyperthreading - and are trying to run some
> "embarrassingly parallel" jobs.
Try SelectTypeParameters=CR_Core instead of CR_CPU
http://slurm.schedmd.com/cons_res.html
Am 12.09.2016 um 16:02 schrieb alex straza:
> hello,
>
> We have some slurm nodes that have 32 CPUS - two 8-core processors with
> hyperthreading - and are trying to run some
> "embarrassingly parallel"
; On Mon, 29 Feb 2016 11:50:18 PM Uwe Sauter wrote:
>
>> Did you configure the RebootProgram parameter in slurm.conf and is that
>> script working? Remember: this script is run on the compute node, therefore
>> it must be available on the compute node and must be executable.
>
Did you configure the RebootProgram parameter in slurm.conf and is that script
working? Remember: this script is run on the
compute node, therefore it must be available on the compute node and must be
executable.
Am 01.03.2016 um 01:54 schrieb Christopher Samuel:
>
> Hi folks,
>
> We're at
as well).
Regards
Uwe
Am 16.02.2016 um 09:22 schrieb Diego Zuccato:
>
> Il 15/02/2016 12:39, Uwe Sauter ha scritto:
>
>> I am unsure how this can be implemented. If I call "ulimit -d
>> $((SLURM_MEM_PER_CPU * SLURM_NTASKS_PER_NODE * 1024))" in the
>> Pr
Hi all,
last paragraph of http://slurm.schedmd.com/cons_res_share.html states that
enforcement of memory allocation limits needs to be
done by setting appropriate system limits (I assume by using "ulimit").
I am unsure how this can be implemented. If I call "ulimit -d
$((SLURM_MEM_PER_CPU *
scontrol update state=IDLE nodename=
Am 21.12.2015 um 21:40 schrieb Fany Pagés Díaz:
> When I start the server, the nodes was down, I start /etc/init.d/slurm en in
> the server and it´s fine, but in the nodes
> are down. I restart the nodes again and nothing. any idea?
>
>
>
> *De:*Carlos
Hi,
depending on what you do with those nodes it might be a good idea to create a
maintenance reservation.
scontrol create reservation=Wartung flags=MAINT
or you can set the node to DOWN before stopping slurmd.
Regards,
Uwe
Am 18.12.2015 um 11:18 schrieb Danny Rotscher:
>
As far as I can recall I had to recompile my MPI libs when I upgraded from
14.03 to 14.11.
I think one of the issues was with changes in PMI(2) interface.
Regards,
Uwe
Am 15.11.2015 um 01:09 schrieb Apolinar Martinez Melchor:
> Hi,
>
> We want to update SLURM 2.6.7 to SLURM
Hi,
will there be an official backport of the mentioned commits for the 14.11
branch?
Regards,
Uwe
Am 13.11.2015 um 23:59 schrieb Danny Auble:
>
> Slurm version 15.08.4 is now available it includes about 25 bug fixes
> developed over the past couple of weeks.
>
> One notable fix
Hi,
the manpage to sbatch states for option --mem_bind:
[…]
The following informational environment variables are set when --mem_bind is in
use:
SLURM_MEM_BIND_VERBOSE
SLURM_MEM_BIND_TYPE
SLURM_MEM_BIND_LIST
See the ENVIRONMENT VARIABLES section for a more detailed
If he used 9000 cores for 24h…
Kidding aside, you need to give us more info to work with.
Regards,
Uwe
Am 24.07.2015 um 17:06 schrieb Martin, Eric:
Hi,
I need help determining what's going on here. The output of sreport says
user1 has used 12920941 minutes (215349
hours).
Hi,
because accounting in Slurm allows more than just accounting but also limiting
users, etc. you need to tell slurm about your
users. Having the available on your systems (locally, LDAP) is not enough.
Docs are here:
http://slurm.schedmd.com/documentation.html
I'm not aware of an option that allows the comment to appear in the slurmctld
log file but if you are already using
accounting take a look at the
AccountingStoreJobComment
option in slurm.conf.
Regards,
Uwe
Am 26.06.2015 um 23:04 schrieb Cooper, Adam:
Hi,
Is there some way for
I had to install it manually when I wanted to use PMI2. Might be that this is
not necessary for the older PMI (version 1).
Am 18.06.2015 um 02:02 schrieb Christopher Samuel:
On 18/06/15 03:26, Uwe Sauter wrote:
just a dumb question but did you actually built Slurm's PMI plugin
Hi,
just a dumb question but did you actually built Slurm's PMI plugin? As it is
considered additional you have to manually compile
and install it…
Regards,
Uwe
Am 17.06.2015 um 18:52 schrieb Wiegand, Paul:
Rémi,
This got me a bit farther, thanks.
The stack trace stuck in
sbatch is used to submit job scripts to the scheduler.
srun is used in job scripts (or interactively) for each job step that should
run on more then one core.
See the documentation:
http://slurm.schedmd.com/quickstart.html
http://slurm.schedmd.com/man_index.html
The impact of encryption very much depends on the instruction set of your CPU
(Intel AES-NI) and whether your library
will use those. If you have a recent enough CPU you won't see much difference
between normal SSH and HPN-SSH…
Am 02.06.2015 um 20:10 schrieb John Lockman:
This is a bit off
Before being able to answer your questions, you'd probably should tell us what
you want to achieve with a workload manager such as
Slurm.
Am 14.05.2015 um 17:58 schrieb Pradeep Bisht:
[Resending as I don't see my message in the archives]
Show message history
I'm looking at SLURM and it
Trevor,
I don't know what your intent is or the machine you are preparing yourself for
but in general login nodes and compute nodes share
a common filesystem, making the need to move data around (inside of the
cluster) unnecessary.
If you really need to move data from node local space back to
documentation mentions Lustre and NFS but was
just curious because I have no experience with either.
Thanks,
Trevor
On May 7, 2015, at 7:28 PM, Uwe Sauter uwe.sauter...@gmail.com wrote:
Trevor,
I don't know what your intent is or the machine you are preparing yourself
for but in general
launch for job 352 failed:
Invalid job credential”. Any idea what might be causing this error? I’ve
turned my SlurmdDebug up to 7 but the log file
essentially says the exact same thing as the stdout when I try and submit a
job.
Thanks,
Trevor
On May 6, 2015, at 9:54 PM, Uwe Sauter uwe.sauter
Check the file permissions for libmpichcxx.so.1.2 as well as the permissions on
the parent directories. Might be that you are not
allowed to access the folder structure as the user you're running your
application.
Am 06.05.2015 um 07:57 schrieb Fany Pagés Díaz:
Hello,
WhenI throw a MPI
Perhaps this thread from last week helps:
first post: https://groups.google.com/forum/#!topic/slurm-devel/Z6-tnIzI1IE
my question about PMI2:
https://groups.google.com/d/msg/slurm-devel/Z6-tnIzI1IE/2nfrwocTNF4J
Regards,
Uwe
Am 27.04.2015 um 21:11 schrieb Ulf Markwardt:
Dear
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treyd...@tamu.edu mailto:treyd...@tamu.edu
Jabber: treyd...@tamu.edu mailto:treyd...@tamu.edu
On Mon, Apr 20, 2015 at 2:13 PM, Uwe Sauter uwe.sauter...@gmail.com
mailto:uwe.sauter...@gmail.com
Hi,
no command that I'm aware of. I'm using pdsh for such occasions.
Regards,
Uwe
Am 20.04.2015 um 15:43 schrieb jupiter:
Hi,
If I have a centralized slurm.conf linked by all nodes, and if I change the
slurm.conf, I need to restart slurm daemons of all
nodes. Since slurm
Hi Trey,
is that a will NOT just work or a will just work?
Regards,
Uwe
Am 20.04.2015 um 20:14 schrieb Trey Dockendorf:
Just a heads up to anyone who uses MVAPICH2 with srun. The 2.1 docs for
MVAPICH2 have new configure flag values since
2.1 supports PMI-2 with SLURM. If you
Hi,
I have the case that OpenMPI was built against Slurm 14.03 (which provided
libslurm.so.27). Since upgrading to 14.11 I get errors
like:
[controller:35605] mca: base: component_find: unable to open
/opt/apps/openmpi/1.8.1/gcc/4.9/0/lib/openmpi/mca_ess_pmi:
libslurm.so.27: cannot open shared
and/or
dependencies. I'm afraid that you do indeed need to recompile
OMPI in that case. You probably need to rerun configure as well, just to be
safe.
Sorry - outside OMPI's control :-/
On Thu, Apr 16, 2015 at 5:22 AM, Uwe Sauter uwe.sauter...@gmail.com
mailto:uwe.sauter...@gmail.com wrote
setup.
On Thu, Apr 16, 2015 at 5:32 AM, Uwe Sauter uwe.sauter...@gmail.com
mailto:uwe.sauter...@gmail.com wrote:
Hi Ralph,
beside the mentioned libslurm.so.28 there is also a libslurm.so pointing
to the same libslurm.so.28.0.0 file. Perhaps OpenMPI
could use this link
Hi,
I think that you need to specify the number of nodes when supplying a nodelist
containing more that one node.
Regards,
Uwe
Am 05.04.2015 um 09:21 schrieb Edrisse Chermak:
Dear Slurm Developers and Users,
I get an 'sbatch error: Batch job submission failed: Node count
Yes! There are problems if the clean-up scripts for cgroups reside on NFSv4.
Nodes will lock-up when they try to remove a job's
cgroup.
Am 31.03.2015 um 17:06 schrieb Jeff Layton:
That's what I've done. Everything is in NFSv4 except for a
few bits:
/etc/slurm.conf
/etc/init.d/slurm
It would be helpful to see how you submitted the job. And the output from
scontrol show job 20.
Regards,
Uwe
Am 30.03.2015 um 19:49 schrieb Carl E. Fields:
Hello,
I have installed slurm version version 14.11.4 on a RHEL server with the
following specs:
Architecture:
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Reason=Low socket*core*thread count, Low CPUs
[SlurmUser@2015-03-11T22:15:12]
[SlurmUser@sod264 services]$
Thanks,
Carl
On Mon, Mar 30, 2015 at 11:01 AM, Uwe Sauter uwe.sauter...@gmail.com
mailto:uwe.sauter
Please provide more information:
Which OS? Which Slurm version? Installed via package or from source?
Regards,
Uwe
Am 25.03.2015 um 13:09 schrieb suprita.bot...@wipro.com:
Hi
Can please someone help me in knowing that why slurmctld is getting killed in
very few seconds.
And if you are planning on using cgroups, don't use NFSv4. There are problems
that cause the NFS client process to freeze (and
with that freeze the node) when the cgroup removal script is called.
Regards,
Uwe Sauter
Am 24.03.2015 um 20:50 schrieb Paul Edmon:
Yup, that's exactly
to communicate and to drain the
node.
Uwe
Am 24.03.2015 um 21:12 schrieb Paul Edmon:
Interesting. Yeah we use v3 here. Hadn't tried out v4, and good thing we
didn't then.
-Paul Edmon-
On 03/24/2015 04:05 PM, Uwe Sauter wrote:
And if you are planning on using cgroups, don't
think of right now. I'll have another espresso
soon enough and will reply if anything else comes to mind. I hope
this helps!
John DeSantis
2015-03-12 4:59 GMT-04:00 Uwe Sauter uwe.sauter...@gmail.com:
No one able to give a hint?
Am 10.03.2015 um 17:05 schrieb Uwe Sauter:
Hi,
I have
No one able to give a hint?
Am 10.03.2015 um 17:05 schrieb Uwe Sauter:
Hi,
I have an account production configured with limitations GrpNodes=18,
MaxNodes=18, MaxWall=7-00:00:00, an associated user with
limitations MaxNodes=18, MaxWall=7-00:00:00 and a QoS with limitations
Priority=10
Hi,
there is a difference in the output of scontrol show job and sprio (14.11.4). I
have two jobs, one was submitted before slurmctld
was restarted, the other one after the restart.
sprio -l shows:
JOBID USER PRIORITYAGE FAIRSHAREJOBSIZE PARTITION
QOS NICE
14115
Subject: [slurm-dev] Re: node getting again and again to drain or down state
What is the output of sinfo -R for this node ?
Le 10/03/2015 10:08, Uwe Sauter a écrit :
Check that your node resources in slurm.conf represent your actual
configuration, e.g. that the amount of memory in your
Hi,
I have an account production configured with limitations GrpNodes=18,
MaxNodes=18, MaxWall=7-00:00:00, an associated user with
limitations MaxNodes=18, MaxWall=7-00:00:00 and a QoS with limitations
Priority=10, GraceTime=00:00:00, PreemtMode=cluster,
Flags=DenyOnLimit, UsageFact0r=1.0,
Check that your node resources in slurm.conf represent your actual
configuration, e.g. that the amount of memory in your node is
configured as equal or less in slurm.conf.
Am 10.03.2015 um 10:05 schrieb suprita.bot...@wipro.com:
Hi
Please help me if anyone can.
I am running
If you know the name of your output file you could probably do somethinig like
this:
touch output
chmod 0666 output
chown user:group output
srun a.out
Am 05.03.2015 um 22:11 schrieb Slurm User:
Hi
I have a bash script which makes a call to srun
The srun command calls a simple a.out
I think the problem lies in your configuration. Having both CPUs=4 and
(SocketsPerBoard=2 CoresPerSocket=2) is
redundant. Please try with one or the other, preferably with SocketsPerBoard=2
CoresPerSocket=2 as this provides
information for CPU pinning.
Am 06.03.2015 um 00:02 schrieb Sarlo,
the new version?
-Original Message-
From: Uwe Sauter [mailto:uwe.sauter...@gmail.com]
Sent: Wednesday, February 11, 2015 9:29 AM
To: slurm-dev
Subject: [slurm-dev] Re: Upgrade Slurm to latest version
Hi,
as far as I know (and someone please correct me if I'm wrong) Slurm
Message-
From: Uwe Sauter [mailto:uwe.sauter...@gmail.com]
Sent: Wednesday, February 11, 2015 9:29 AM
To: slurm-dev
Subject: [slurm-dev] Re: Upgrade Slurm to latest version
Hi,
as far as I know (and someone please correct me if I'm wrong) Slurm will
support the two latest minor
Message-
From: Uwe Sauter [mailto:uwe.sauter...@gmail.com]
Sent: Tuesday, February 10, 2015 4:01 PM
To: slurm-dev
Subject: [slurm-dev] Re: Upgrade Slurm to latest version
Hi Mark,
most of your questions are answered in the upgrade section of this page:
http://slurm.schedmd.com
Hi Mark,
most of your questions are answered in the upgrade section of this page:
http://slurm.schedmd.com/quickstart_admin.html
If you have more questions after reading this, feel free to come back.
Regards,
Uwe
Am 10.02.2015 um 20:47 schrieb Los, Mark J:
I have been asked to
Is it really necessary to re-link MPI for every bug fix release? I had
some trouble with MPI after the upgrade 14.3 - 14.11 but I haven't seen
problems between bug fix releases so far…
Could someone from SchedMD enlighten us?
Regards,
Uwe
Am 03.02.2015 um 01:36 schrieb Kevin Abbey:
Might be worth to look into node features (see
http://slurm.schedmd.com/slurm.conf.html).
Regards,
Uwe
Am 03.02.2015 um 18:36 schrieb John Desantis:
Hello all,
Unfortunately, I have some confusion regarding how to achieve a global
and single partition for our users with several
Hi all,
there seems to be a small bug in scontrol show job output when using
stderr redirection. %j is not substituted with the jobID.
This is on 14.11.3.
Submit like:
#sbatch -o test-o%j.txt -e test-e%j.txt -N2 -n4 -A admins -t 300
./test.sh
Submitted batch job 12032
#scontrol show job
Hi all,
re-configuring my cluster to use NFSv3 instead of v4 makes the situation
go away. I'll leave it that way for now…
Thanks for the tip,
Uwe
Am 19.01.2015 um 23:29 schrieb Christopher Samuel:
On 19/01/15 19:46, Uwe Sauter wrote:
yes, going back to Scientific 6.5 make
Hi Trey
Am 19.01.2015 um 01:52 schrieb Trey Dockendorf:
Uwe,
Sorry for delayed response, for some reason messages from slurm-dev are
not making it to my inbox so had to find the response via google groups
page.
Don't worry, there is weekend all around the globe…
We also had numerous
Hi,
is there a list of SLURM environment variables which I can access in the
different prolog/epilog scripts?
Specifically is it possible to get a list of nodes for a job in in the
PrologSlurmctld script although this runs on the controller host?
Perhaps this information could be added to
Stolarek:
2015-01-19 17:35 GMT+01:00 Uwe Sauter uwe.sauter...@gmail.com
mailto:uwe.sauter...@gmail.com:
Hi,
is there a list of SLURM environment variables which I can access in the
different prolog/epilog scripts?
Specifically is it possible to get a list of nodes
Hi Trey, Christopher,
I am running into a lock up situation since updating from Scientific
Linux 6.5 to 6.6 in mid of December (2.6.32-431.29.2 to 2.6.32-504.3.3),
running Slurm 14.11.2.
My cluster runs from NFSv4 root but has local disks for TMP. Jobs that
don't use local TMP much run fine
Hi,
have a look into Slurm accounting/QoS. There are options to limit jobs
per user, jobs per group, etc. pp.
http://slurm.schedmd.com/accounting.html
http://slurm.schedmd.com/qos.html
Regards,
Uwe
Am 14.01.2015 um 10:14 schrieb Loris Bennett:
Hi,
I have a test partition in
branches, this is the point
where the tree analogy fails to apply.
Regards,
Uwe
Am 14.01.2015 um 14:47 schrieb Loris Bennett:
Hi,
Uwe Sauter uwe.sauter...@gmail.com
writes:
Hi,
an association is the combination of
* QoS
* partition
* account
* cluster
If I understand
,
Uwe Sauter uwe.sauter...@gmail.com
writes:
Hi,
have a look into Slurm accounting/QoS. There are options to limit jobs
per user, jobs per group, etc. pp.
http://slurm.schedmd.com/accounting.html
http://slurm.schedmd.com/qos.html
I saw that a QOS can restrict the number of jobs per
configuration to restrict number of procs. Per
user using associations.
Could you please suggest any command.. or Configuration related stuff..?
Thanks,
Tejas
-Original Message-
From: Uwe Sauter [mailto:uwe.sauter...@gmail.com]
Sent: Wednesday, January 14, 2015 5:27 PM
To: slurm
Am 10.12.2014 um 17:52 schrieb Uwe Sauter:
Hi all,
Friday afternoon I accidentally upgraded from 14.03.9 to 14.11.1 (just
wanted to compile but then a symlink was changed and the new version was
started). My users were still using the older version of the tools.
Since Monday
to have something
like although enough nodes are available, the job cannot run due to
[reason].
Regards,
Uwe
Am 11.12.2014 um 10:25 schrieb Uwe Sauter:
Hi all,
I flushed my database and downgraded to 14.03.10 (with --enable-debug)
but the problem still exists. What confuses me most
Hi all,
there is an error in the usage message of sacctmgr.
# sacctmgr --help
[...]
One can get an number of characters by following the field option with
a %NUMBER option. i.e. format=name%30 will print 30 chars of field name.
Account- Account, CoordinatorList,
Thanks Moe,
that'll make it easier to see why jobs are in pending state though there
are enough nodes available.
Regards,
Uwe
Am 11.12.2014 um 18:37 schrieb je...@schedmd.com:
Quoting Uwe Sauter uwe.sauter...@gmail.com:
Hi all,
I was able to resolve this issue. The problem
Hi all,
Friday afternoon I accidentally upgraded from 14.03.9 to 14.11.1 (just
wanted to compile but then a symlink was changed and the new version was
started). My users were still using the older version of the tools.
Since Monday (but probably since the update) users weren't able to
submit
Hi Anna,
I'm sorry to inform you that you have to have the user information on
all nodes. You cannot run jobs with UIDs from users the local system
does not know.
If you don't want to distribute your /etc/passdw, /etc/shadow and
/etc/group everytime a user is added or removed the best option
Kostikova:
Dear Uwe,
Thanks a lot for your quick help and explanation. Indeed, we use openldap
right now, but was wondering whether another solution is possible. So, it
seems like the best (only) solution is LDAP indeed.
Thanks a lot again,
Anna
On 5 Nov 2014, at 20:37, Uwe
Hi all,
I'm trying to configure the scheduling parameter max_switch_wait on
14.03.8 but
a) there seems to be a mismatch between the html documentation and the
salloc/sbatch/srun man pages.
b) Slurm doesn't seem to know the parameters referenced in the
documentation.
Regarding a)
Manpages
=max_switch_wait=864000,...
Regards,
Carles Fenoy
Barcelona Supercomputing Center
On Mon, Oct 20, 2014 at 10:27 AM, Uwe Sauter uwe.sauter...@gmail.com
mailto:uwe.sauter...@gmail.com wrote:
Hi all,
I'm trying to configure the scheduling parameter max_switch_wait on
14.03.8
Hi Chris,
you're right that the job array size is limited to 64k in 14.03 and
before. With the upcoming 14.11 this limit is raised to 4M IIRC. You
could check this year's SLURM user group presentations
(http://slurm.schedmd.com/publications.html) where this was mentioned.
A far as I know there
Hi all,
could someone please confirm that the variable
const uint32_t plugin_legacy
found in http://slurm.schedmd.com/plugins.html - Data Objects was
replaced with
const uint32_t min_plug_version
found in several code files in src/plugins/* sometime in the past?
If this is correct, could
Hi Monica,
Am 09.10.2014 19:59, schrieb Monica Marathe:
Hi Uwe,
Thanks for your help on the slurm error.
I created a new slurm.conf using the easy configurator but am still
facing the following error:
[root@control-machine Monica]# slurmctld -D -vv
slurmctld: pidfile not locked,
And port 6817 as well
Am 09.10.2014 21:04, schrieb Monica Marathe:
Hey Michael,
I did build my configuration file:
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
Hi,
there is some limited capabilities for the tools to send/query clusters
they don't belong to (search for the -M option).
Tools belong to the cluster that is configured in the slurm.conf they use.
And there is some work taking place at CSCS (Switzerland) that was
presented last week on the
Hi Monica,
Am 01.10.2014 21:55, schrieb Monica Marathe:
Hey,
It's my first time using SLURM and I'm getting the following error when
I run slurmctld:
[root@localhost ~]# slurmctld -D -vv
slurmctld: debug2: No last_config_lite file (/tmp/last_config_lite) to
recover
slurmctld:
Hi all,
taken from the current 14.03.7 version of the sinfo manpage:
snip
-o output_format, --format=output_format
Specify the information to be displayed using an sinfo format
string. Format stri
ngs transparently used by sinfo when running with various options are
Hi Moe,
thank you. Can this format specifier also be used in the job name field?
Best,
Uwe
Am 16.09.2014 18:27, schrieb je...@schedmd.com:
Documentation updated. see:
https://github.com/SchedMD/slurm/commit/3e5864b6486bbd95ceacd695a503f85b3c0c4b8c
Quoting Uwe Sauter
Hi Erica,
you need munge running and slurm installed. The local slurm.conf needs
to point to the control server (ControlAddr and/or ControlMachine).
Easiest way is to use the same config file as for the cluster.
Regards,
Uwe
Am 08.09.2014 19:26, schrieb Erica Riello:
Hi,
I have
Hi all,
*bump*
I can't believe no one has an explanation for this parameter...
Regards,
Uwe
Am 02.09.2014 um 16:30 schrieb Uwe Sauter:
Hi all,
I'm a bit confused by the explanation of the BatchStartTimeout option.
It states:
Specifies how long to wait after a batch job
Hi all,
I'm a bit confused by the explanation of the BatchStartTimeout option.
It states:
Specifies how long to wait after a batch job start request is issued
before we expect the batch job to be running on the compute node.
Depending upon how nodes are returned to service, this value may need
show hostnames $1`
for host in $hosts
do
echo sudo /share/system/bin/node_poweroff $host /var/log/power_save.log
sudo /share/system/bin/node_poweroff $host /var/log/power_save.log
done
On Fri, 2014-08-29 at 02:36 -0700, Uwe Sauter wrote:
Hi,
thanks for the suggestion
with scontrol and checking the
log file.
On 28 Aug 2014 19:18, Uwe Sauter uwe.sauter...@gmail.com wrote:
Hi all,
(configuration and scripts below text)
I have configured SLURM to power down idle nodes but it probably is
misconfigured. I aim for a configuration where after a certain period
Hi Dennis,
I started using SLURM only a few weeks ago but I suspect that an update
from 2.4.x to 14.03.x in a single step is not possible because of too
many changes in internal structures (both job state information and
database).
There is on entry in the FAQ
Hi Louis,
depending on the usage scenario of your cluster you will have different
requirements.
You can find general information about SLURM configuration on the
SchedMD website: http://slurm.schedmd.com/
There you will also find more specific subpages regarding
* cluster configuration for
? Or is there another way that I don't see?
Best regards,
Uwe Sauter
others are not using it.
Bill.
--
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu| Phone: (512) 232-7069
Office: ROC 1.435 | Fax: (512) 475-9445
On 8/14/14, 4:11 AM, Uwe Sauter uwe.sauter...@gmail.com wrote:
Hi all,
I got a question about
1 - 100 of 110 matches
Mail list logo