Hi,
I'm just configuring a script to deploy worker nodes. I've realised that
version #1, made many moons ago, installed MySQL/MariaDB.
But now that I look at my worker nodes, I don't think that they need mysql
on them.
Can any one confirm or deny if they do?
cheers
L.
--
The most dangerous
Is it just me, or is schedmd.com having issues at the moment?
I'm getting intermittent responses, nothing from the downloads page
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
lenkov
wrote:
> It does look like there is a problem! Doesn't work for me either
>
> Gene
>
>
> On 07/04/16 11:51, Lachlan Musicman wrote:
>
> Is it just me, or is schedmd.com having issues at the moment?
>
> I'm getting intermittent responses, nothing fro
Hola
There are a number of places in the slurm configuration where we need to
enter hostnames.
The (almost?) always docs recommend the short hostname, the slurm.conf docs
are the only place I've found that explicitly states that it should be the
value returned by `hostname -s`.
Our systems have
Is it just me, or have the slurm plugins in 16.05pre2 moved from
/usr/local/lib to /usr/lib64?
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
I think I saw something like this just now - are you running:
systemctl start slurm
or
systemctl start slurmd ?
And slurmctld is running on the head?
Cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 12 April 2016 at 13:04, Joh
Hi,
While running the tests, I'm seeing a lot of this error:
error: slurm_jobcomp plugin context not initialized
AFAICT, slurmdbd is set up correctly, and slurm.conf has
JobCompType=jobcomp/slurmdbd
What do I do to fix this error?
cheers
L.
--
The most dangerous phrase in the language
l doesn't work. Looks like
it needs a db schema inserted?
Where might I find that or ?
Cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 13 April 2016 at 10:59, Lachlan Musicman wrote:
> Hi,
>
&g
I was reading about this today. Isn't OpenMPI compiled --with-slurm by
default when installing with one of the pkg managers?
https://www.open-mpi.org/faq/?category=building#build-rte
Cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
I'm finding this a little confusing.
We have a very simple script we are using to test/train staff how to use
SLURM (16.05-pre2). They are moving from an old Torque/Maui system.
I have a test partition set up,
from slurm.conf
NodeName=slurm-[01-02] CPUs=8 RealMemory=32000 Sockets=1 CoresPerSock
I would backup /etc/slurm.
That's about it.
Cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 5 May 2016 at 07:36, Balaji Deivam wrote:
> Hello,
>
> Right now we are using Slurm 14.11.3 and planning to upgrade to the
> latest ve
We haven't got it in production yet, but I don't see why not. There's a
section in the docs that talks about sssd, so I presume it "just works"
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 18 May 2016 at 17:46, David Ramírez wrote:
I know "it will be ready when it's ready" but I am about to deploy to
production - how far off is the official 16.05 release?
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
gt; On May 29, 2016 9:01:22 PM PDT, Lachlan Musicman
> wrote:
>>
>> I know "it will be ready when it's ready" but I am about to deploy to
>> production - how far off is the official 16.05 release?
>>
>> cheers
>> L.
>> --
>> The m
Thanks and congrats!
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 1 June 2016 at 08:03, Danny Auble wrote:
>
> We are pleased to announce the release of 16.05.0! It contains many new
> features and performance enhancements. Please read
Hola,
I build the newest slurm release for installation. The docs say to install
on the head and worker nodes:
- slurm
- slurm-devel
- slurm-munge
- slurm-perlapi
- slurm-plugins
- slurm-sjobexit
- slurm-sjstat
- slurm-torque
My RPM folder also contains:
slurm-openlava
ce Hopper
On 1 June 2016 at 15:39, Lachlan Musicman wrote:
> Hola,
>
> I build the newest slurm release for installation. The docs say to install
> on the head and worker nodes:
>
>
>- slurm
>- slurm-devel
>- slurm-munge
>- slurm-perlapi
>- sl
Remi,
The obvious questions are:
Have you set up the accounting? Added a cluster, added some users, etc?
ie, on the link below, there's a section under "Tools" and "Database
Configuration" that might apply?
http://slurm.schedmd.com/accounting.html
I think that this section is ripe for a how
Hi,
I've just run the testsuite and got a couple of failures.
The first I can't solve is 30.1
FAILURE: there was an error during the rpmbuild
spawn ls /tmp/built_rpms/RPMS
spawn /usr/bin/bash -c exec rpm -qpl /tmp/built_rpms/RPMS//Slurm-0-0..rpm |
grep srun.1
error: open of /tmp/built_rpms/RPMS/
The sbatch command
http://slurm.schedmd.com/sbatch.html
has the flag
--kill-on-invalid-dep=
Which we would like to turn on by default. (ie = yes)
The man page indicates that there would be a slurm.conf setting
kill_invalid_depend but I don't see it in the default slurm.conf?
I do see KillOnBa
Thanks Chris!
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 15 June 2016 at 14:12, Christopher Samuel wrote:
>
> On 15/06/16 14:03, Lachlan Musicman wrote:
>
> > The man page indicates that there
sacctmgr isn't the easiest thing to wrap your head around :)
I've just successfully run this command:
sacctmgr add user pers...@domain.com DefaultAccount=prod Partition=prod
(we use sssd for login against an AD, hence the @domain.com)
I tried to modify a user I had added previously:
sacctmgr m
Ah! I was using the example on the Accounting page
http://slurm.schedmd.com/accounting.html
Cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 15 June 2016 at 15:16, Christopher Samuel wrote:
>
> On 15
Hi,
I would like some clarification on upgrading slurm.conf.
As we discover things needing to be added or changed, we update a central
slurm.conf and distribute to all nodes, AllocNodes and head nodes via
ansible. This works a treat.
Next, we would like to have out new slurm.conf applied without
ays done it this
way."
- Grace Hopper
On 15 June 2016 at 15:16, Christopher Samuel wrote:
>
> On 15/06/16 15:10, Lachlan Musicman wrote:
>
> > sacctmgr modify user set Partition=prod where User=pers...@domain.com
>
> I *think* you need to have the where before the set,
I forgot the important info, sorry!:
running slurm 16.05
and the subject should read "Updating slurm.conf kills the queue"
Cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 16 June 2016 at 13:24, Lachl
I think that's the AllocNode on the Partition?
See here http://slurm.schedmd.com/slurm.conf.html
and http://slurm.schedmd.com/scontrol.html
(search for AllocNode on both)
Cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 18 June
Morning!
We have a scenario where I *think* the problem is a write cache issue, but
I'm not 100% sure.
We have JobB dependent on JobA.
JobA internally (ie, not via --output) writes three small files to
nfs-shared disk, the first of which is then parsed by JobB - hence the
dependency (using --dep
We worked this one out - it was pebkac, not slurm :/
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 20 June 2016 at 10:37, Lachlan Musicman wrote:
> Morning!
>
> We have a scenario where I *think* the problem is
t out is to do a 'slurmctld -Dv' and it will fail and
> tell you what the issue is.
>
> Hopefully this helps.
>
> ---
> Nicholas McCollum
> HPC Systems Administrator
> Alabama Supercomputer Authority
>
>
> On Wed, 15 Jun 2016, Lachlan Musicman
We are transitioning from Torque/Maui to SLURM and have only just noticed
that SLURM puts all files in /tmp and doesn't create a per job/user TMPDIR.
On searching, we have found a number of options for creation of TMPDIR on
the fly using SPANK and lua and prolog/epilog.
I am looking for something
art for instance with:
> https://github.com/fafik23/slurm_plugins/tree/master/bindtmp
>
> cheers
> marcin
>
> 2016-06-24 7:22 GMT+02:00 Lachlan Musicman :
>
>> We are transitioning from Torque/Maui to SLURM and have only just noticed
>> that SLURM puts all files in /
Chris,
Are the Allowgroups groups from the system groups?
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 16 June 2016 at 15:35, Christopher Samuel wrote:
>
> On 16/06/16 14:28, Lachlan Musicman wrote:
Hmm thanks. I'm not seeing it working unfortunately.
:/
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 29 June 2016 at 14:51, Christopher Samuel wrote:
>
> On 29/06/16 14:39, Lachlan Musicman wrote:
>
&
tion". Looks like you are adding a user.
gah.
Cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 29 June 2016 at 15:19, Lachlan Musicman wrote:
> Hmm thanks. I'm not seeing it working unfortunately.
er
On 29 June 2016 at 15:28, Christopher Samuel wrote:
>
> On 29/06/16 15:21, Lachlan Musicman wrote:
>
> > Hmm thanks. I'm not seeing it working unfortunately.
>
> You need to make sure you've got SSSD set to enumerate (unless you're on
> Slurm
Is it possible to set a Default Partition against a user?
So that they can
srun hostname
while AccountingStorageEnforce=associations
instead of
srun --partition=dev hostname
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
Yeah, I'm marking a lot of slurm list email as "not spam" in my gmail
account atm
Cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 9 July 2016 at 03:52, Michael Di Domenico wrote:
>
> On Fri, Jul 8, 2016 at 1:22 PM, Tim Wickberg
Regardless of what you put in, make sure that the end product (the text
conf file) has somehting that looks like:
NodeName=compute[01-02] CPUs=40 RealMemory=385000 Sockets=2
CoresPerSocket=10 ThreadsPerCore=2 State=UNKNOWN
If I recall correctly, the NodeNames can be comma delim:
NodeName=*A
Hola,
Because I built the cluster on the fly, I named it after my partner. The
boss didn't like this, so we wanted to change the name. (to rosalind for
Rosalind Franklin).
I took a dump of sacctmgr:
sacctmgr dump fiona File=/root/fiona_cluster.cfg
and took a look inside. I didn't read the enti
Is that expected behaviour for reasons I haven't read yet, or am I just
thinking about sacctmgr all wrong?
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 18 July 2016 at 11:21, Lachlan Musicman wrote:
>
Hola,
Just looking for some clarification of the nature of job suspension.
I have just had some success with it and am confirming that it was as
wonderful as it seemed?
Scenario: about to switch from dev to prod, boss asks me to reboot whole
cluster to show stakeholders that it will come back up
Maybe. My understanding is slightly different.
We use RIM (FreeIPA) for our users - since all the users need to be on all
the nodes with the same uid.
Munge is used by slurmctld service ("head node") to communicate with slurmd
services on worker nodes. IE for the underlying management part.
So t
I'm pretty sure it's very simple and smooth.
Backup /etc/slurm
If you have some accounting happening, backup the relevant database (it
will be listed in either slurm.conf or slurmdb.conf).
You can update while slurm is running too - get everything prepared, update
your slurmdbd service, restart
Sometimes we would like to run jobs in parallel without using arrays
because the files aren't well named. But the files are all in the same
folder.
We have written a small script that loops over each file, constructs the
command in question and runs it.
We only want each command to run once, but
of Auckland
> e: g.soudlen...@auckland.ac.nz
> p: +64 9 3737599 ext 89834 c: +64 21 840 825 f: +64 9 373 7453
> w: www.nesi.org.nz
>
>
> On 3/08/16 3:33 pm, Lachlan Musicman wrote:
>
>> Sometimes we would like to run jobs in parallel without using arrays
>> because
ot;
>
>
>
> srun python run_sgRNA.py $1 `basedir $1` #Sbatch has allocated the tasks,
> python can use 12 tasks safely.
>
>
>
> Then on your scheduling server just do like you did
>
> $for sdir in /pipeline/Runs/ProjectFolders/Project_Michael-He/Sample_*; do
> sba
per
On 27 June 2016 at 01:10, Marcin Stolarek wrote:
> This was discussed numbers of times before. You can check the list
> archive, or start for instance with:
> https://github.com/fafik23/slurm_plugins/tree/master/bindtmp
>
> cheers
> marcin
>
> 2016-06-24 7:22 G
Hi,
I noticed that sstat wasn't giving any data, so I looked into how to make
that happen.
After some reading, I cracked open slurm.conf and uncommented these two
lines:
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
after confirming that jobacct_gather_linux.so was in PluginD
Oh! Thanks.
I presume that includes sruns that are in an sbatch file.
Cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 30 August 2016 at 12:12, Christopher Samuel
wrote:
>
> On 30/08/16 11:51, Lachlan Mu
James,
Would be great to know OS and SLURM version.
For instance, on Centos 7/Debian 8/Ubuntu 16.04, you might be using
systemctl status/start/restart slurmctld (head node)
systemctl status/start/restart slurmd (worker nodes)
instead?
Cheers
L.
--
The most dangerous phrase in the langu
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 31 August 2016 at 21:07, Christof Koehler <
christof.koeh...@bccms.uni-bremen.de> wrote:
> Hello everybody,
>
> If I understand the slurm documentation correctly the usual configuration
> in s
On 1 September 2016 at 18:16, Christof Koehler <
christof.koeh...@bccms.uni-bremen.de> wrote:
> Hello,
>
> On Wed, Aug 31, 2016 at 05:52:48PM -0700, Lachlan Musicman wrote:
> > I don't believe it's 100% necessary to use OverSubscribe Yes.
> >
> > We
You don't need --threads-per-core.
It's sufficient to have
SelectType=select/cons_res
SelectTypeParameters=CR_CPU
then you should be able to get to all 36.
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 7 September 2016 at 10
ge is, "We've always done it this
way."
- Grace Hopper
On 7 September 2016 at 10:39, andrealphus wrote:
>
> Thanks Lachman, took threads-per-core and out same behavior, still
> limited to 18.
>
> On Tue, Sep 6, 2016 at 5:33 PM, Lachlan Musicman
> wrote:
mber 2016 at 11:34, andrealphus wrote:
>
> Yup, thats what I expect too! Since Im brand new to slurm, not sure if
> there is some other config option or srun flag to enable
> multithreading
>
> On Tue, Sep 6, 2016 at 5:42 PM, Lachlan Musicman
> wrote:
> > Oh, I'm
I think you need a couple of things going on:
1. you have to have some sort of accounting organised and set up
2. your sbatch scripts need to use: srun not just
3. sinfo should then work on the job number.
When I asked, that was the response iirc.
cheers
L.
--
The most dangerous phrase i
;ve always done it this
way."
- Grace Hopper
On 19 September 2016 at 12:07, Lachlan Musicman wrote:
> I think you need a couple of things going on:
>
> 1. you have to have some sort of accounting organised and set up
> 2. your sbatch scripts need to use: srun not just
> 3.
Gah, yes. sstat, not sinfo.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 19 September 2016 at 13:00, Peter A Ruprecht wrote:
> Igor,
>
> Would sstat give you what you need? (http://slurm.schedmd.com/sstat.html)
> It doesn't update in
Our nodes have
Sockets=2 CoresPerSocket=10 ThreadsPerCore=2
CPUs are set to 40 and SelectTypeParameters=CR_CPU
According to this FAQ, this is "not a typical configuration".
http://slurm.schedmd.com/faq.html#cpu_count
Which is fine, I am aware that this is the set up - I did the configuration.
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 22 September 2016 at 14:51, Lachlan Musicman wrote:
> Our nodes have
>
> Sockets=2 CoresPerSocket=10 ThreadsPerCore=2
>
> CPUs are set to 40 a
s a hard limit of 20.
>
> On Wed, Sep 21, 2016 at 9:54 PM, Lachlan Musicman
> wrote:
> > On a side note, I have a minor documentation bug report:
> >
> > On this page http://slurm.schedmd.com/cpu_management.html there is a
> link to
> >
> > -s, --oversubscr
Is there a description of what each field is in the slurmdbd?
I'm looking, in particular, at the _job_table
fields:
exit_code
state
time_eligible
timelimit (units are minutes?)
tres_alloc
tres_req (well, mostly how this is caluculated)
Cheers
L.
--
The most dangerous phrase in the language
Hi,
cgroups have been on my radar since about two weeks after I started looking
into SLURM and I'm just getting around to looking at them now.
I note that the ProcTrackType docs say
> This plugin writes to disk often and can impact performance. If you are
running lots of
> short running jobs (le
I am surprised how hard I found it to find these as well - especially given
how frequently the question is asked.
This mob have made one, and it looks good, but all development has happened
on .deb systems, and I didn't have sufficient time (or skill) to unpack and
repack for rpm or generic.
http
t // Emersons
> Green // Bristol // BS16 7FR
>
> CFMS Services Ltd is registered in England and Wales No 05742022 - a
> subsidiary of CFMS Ltd
> CFMS Services Ltd registered office // Victoria House // 51 Victoria
> Street // Bristol // BS1 6AD
>
> On 28 September 2016 at 00:36
Hi,
After some fun incidents with accidental monopolization of the cluster, we
decided to enforce some QOS.
I read the documentation. Thus far in the set up the only thing I've done
that's even close is I assigned "share" values when I set up each
association.
The cluster had a QOS called normal
ge is, "We've always done it this
way."
- Grace Hopper
On 29 September 2016 at 11:10, Lachlan Musicman wrote:
> Hi,
>
> After some fun incidents with accidental monopolization of the cluster, we
> decided to enforce some QOS.
>
> I read the documentation. Thus far
29 September 2016 at 22:01, Janne Blomqvist
wrote:
> On 2016-09-29 04:11, Lachlan Musicman wrote:
> > Hi,
> >
> > After some fun incidents with accidental monopolization of the cluster,
> > we decided to enforce some QOS.
> [snip]
> > What have I done wrong? I re
I started a thread on understand QOS, but quickly realised I had made a
fundamental error in my configuration. I fixed that problem last week.
(ref:
https://groups.google.com/forum/#!msg/slurm-devel/dqL30WwmrmU/SoOMHmRVDAAJ )
Despite these changes, the issue remains, so I would like to ask again,
wrote:
>
> On 30/08/16 12:39, Lachlan Musicman wrote:
>
> > Oh! Thanks.
> >
> > I presume that includes sruns that are in an sbatch file.
>
> Yup, that's right.
>
> cheers!
> Chris
> --
> Christopher SamuelSenior Systems Administrator
>
27;ve always done it this
way."
- Grace Hopper
>
> Doug Jacobsen, Ph.D.
> NERSC Computer Systems Engineer
> National Energy Research Scientific Computing Center
> <http://www.nersc.gov>
> dmjacob...@lbl.gov
>
> - __o
> -- _ '\
Jose,
Do all the nodes have access to either a shared /usr/lib64/slurm or do they
each have their own? And is there a file in that dir (on each machine)
called select_cons_res.so?
Also, when changing slurm.conf here's a quick and easy workflow:
1. change slurm.conf
2. deploy to all machines in c
Hola,
Just built the rpms as per the installation docs.
Noted that there were three new rpms:
slurm-openlava-16.05.5-1.el7.centos.x86_64.rpm
slurm-pam_slurm-16.05.5-1.el7.centos.x86_64.rpm
slurm-seff-16.05.5-1.el7.centos.x86_64.rpm
Is that due to a more sophisticated build machine or due to a
Check against the installed libs? check *-devel? Otherwise I'm not 100%
sure - unless the rpmbuild folder with all files still exists and there's
something in there?
FWIW, it's relatively easy to install all the libs that SLURM needs without
causing too much problems. The hardest I've found so far
Hola,
For reasons, our IT team needs some downtime on our authentication server
(FreeIPA/sssd).
We would like to minimize the disruption, but also not lose any work.
The current plan is for the nodes to be set to DRAIN on Friday afternoon
and on Monday morning we will suspend any running jobs, m
partition
- jobs running on that partition will continue to do so
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 12 October 2016 at 10:35, Lachlan Musicman wrote:
> Hola,
>
> For reasons, our IT team ne
Mike, I would suggest that the limit is a SLURM limit rather than a ulimit.
What is the result of
scontrol show config | grep Mem
?
Because you have set your
SelectTypeParameters=CR_Core_Memory
Memory will cause jobs to fail if they go over the default memory limit.
The SLURM head will kill j
I've had consistent success with the documented system - "rpmbulid
slurm-.tgz" then yum installing the resulting files, using 15.x,
16.05 and 17.02.
Have on occasion needed to recompile - hdf5 support and for non main line
plugins, but otherwise it's been pretty easy.
Will happily support/debug y
On 21 October 2016 at 12:39, Christopher Samuel
wrote:
>
> On 21/10/16 12:29, Andrew Elwell wrote:
>
> > When running sreport (both 14.11 and 16.05) I'm seeing "duplicate"
> > user info with different timings. Can someone say what's being added
> > up separately here - it seems to be summing some
On 25 October 2016 at 08:42, Tuo Chen Peng wrote:
> Hello all,
>
> This is my first post in the mailing list - nice to join the community!
>
Welcome!
>
>
> I have a general question regarding slurm partition change:
>
> If I move one node from one partition to the other, will it cause any
> im
On 25 October 2016 at 09:17, Tuo Chen Peng wrote:
> Oh ok thanks for pointing this out.
>
> I thought ‘scontrol update’ command is for letting slurmctld to pick up
> any change in slurm.conf.
>
> But after reading the manual again, it seems this command is instead to
> change the setting at runti
Morning,
Yesterday we had some internal network issues that caused havoc on our
system. By the end of the day everything was ok on the whole.
This morning I came in to see one job on the queue (which was otherwise
relatively quiet) with the error message/Nodelist Reason (launch failed
requeued he
On 28 October 2016 at 09:20, Christopher Samuel
wrote:
>
> On 28/10/16 08:44, Lachlan Musicman wrote:
>
> > So I checked the system, noticed that one node was drained, resumed it.
> > Then I tried both
> >
> > scontrol requeue 230591
> > scontrol resume 2
I think it should. Can you send through your slurm.conf?
Also, the logs usually explicitly say why slurmctld/slurmd don't start, and
the best way to judge if slurm is running is with systemd:
systemctl status slurmctl
systemctl status slurmd
cheers
L.
--
The most dangerous phrase in the l
On 8 November 2016 at 07:11, Peixin Qiao wrote:
> Hi,
>
> I install munge and restart my computer, then munge stopped work and
> restarting munge didn't work. It says:
>
> munged: Error: Failed to check pidfile dir "/var/run/munge": cannot
> canonicalize "/var/run/munge": No such file or director
Peixin,
Again, depends on your OS and deployment methods, but essentially:
In slurm.conf set
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
SlurmctldLogFile=/var/log/slurm/slurm-ctld.log
SlurmdLogFile=/var/log/slur
Priority: Minor
I notice that this command works well:
sinfo -Nle -o '%C %t'
Tue Nov 8 11:38:09 2016
CPUS(A/I/O/T) STATE
40/0/0/40 alloc
38/2/0/40 mix
36/4/0/40 mix
36/4/0/40 mix
6/34/0/40 mix
0/40/0/40 idle
0/40/0/40 idle
0/40/0/40 idle
0/40/0/40 idle
0/40/0/40 idle
0/40/0/40 idle
0/40/0/40 idl
Arg, I see now (hit send too soon). My parsing of the man page was wrong.
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 8 November 2016 at 11:39, Lachlan Musicman wrote:
> Priority: Minor
>
> I notice
On 9 November 2016 at 09:36, Christopher Samuel
wrote:
>
> But /tmp is almost certainly the second worst place (after /dev/shm).
>
I don't know Chris, I think that /dev/null would rate tbh. :)
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- G
Hola,
We were looking for the ability to make jobs perfectly reproducible - while
the system is set up with environment modules with the increasing number of
package management tools - pip/conda; npm; CRAN/Bioconductor - and people
building increasingly more complex software stacks, our users have
lenv's if that is the case the switch to a container with rkt seems
> "normal" instead of a more intrusive one all mighty process to rule
> everything that docker had the last time I check, its probably better now.
>
> Saludos.
> Jean
>
> On Tue, Nov 15, 2016 a
Hey Devs,
The new design on the schedmd site is pretty - thanks!
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
Hi,
I've had a request from a user about the email system in SLURM. Basically,
there's a team collaboration and the request was:
is there an sbatch command such that two groups will get different sets of
emails.
Group 1: only get the email if the jobs FAIL
Group 2: get Begin, End and Fail
Cheer
On 8 December 2016 at 07:54, Mark R. Piercy wrote:
>
> Is it ever possible to submit jobs based on a users org affiliation? So
> if a user is in org (PI) "smith" then their jobs would automatically be
> sent to a particular partition. So no need to use the -p option in
> sbatch/srun job.
>
M
Hi David,
I dealt with this recently (see
https://groups.google.com/forum/#!topic/slurm-devel/DKcFng8c1zE for
instance )
In the end we went with this solution that has worked well for us:
https://slurm.schedmd.com/SUG14/private_tmp.pdf
which describes this plugin:
https://github.com/hpc2n/span
Will,
I believe you do. While they aren't necessary in your case, I believe the
software has been built for maximum extensibility, and as such there needs
to be:
at least one cluster
at least one account
at least one user
and an association is the "grouping" of those three. The relevant part of
Not 100% sure what you are asking? The mail options are available from
within an sbatch script by using the commands you mention.
They can also be passed directly to slurm when invoking the commands
sbatch --mail-type=ALL --mail-user=e...@mail.com
Are you asking if there is a default "always ma
We use the SPANK plugin found here
https://github.com/hpc2n/spank-private-tmp
and find it works very well.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 21 January 2017 at 03:15, John Hearns wrote:
> As I remember, in SGE and in PbsPr
Interesting. To the best of my knowledge, if you are using Accounting, all
users actually need to be in an association - ie having a user account is
insufficient.
An Association is a tuple consisting of: cluster, user, account and
(optional) partition.
Is that the problem?
cheers
L.
--
The
1 - 100 of 170 matches
Mail list logo