On 5 November 2017 at 11:08, ايمان <435204...@student.ksu.edu.sa> wrote:
> Good morning;
>
>
> I want to run parallel java code on more than one nodes , but it
> executed only on one nodes ?
>
> How can I run it on more than one nodes?
>
Good morning Eman,
Without more details, it's hard to
are referring to - do you need a
guide for SLURM or for pj2? Both can be found quickly with Google.
Cheers
L.
>
> ----------
> *من:* Lachlan Musicman <data...@gmail.com>
> *تم الإرسال:* 10/صفر/1439 02:45 ص
> * إلى: slurm-dev *
> * الموضوع: [slurm-de
On 30 October 2017 at 10:04, ايمان <435204...@student.ksu.edu.sa> wrote:
> I am programming a parallel program that use parallel java library pj2 .
>
> I want to run it using slurm but I did not know if slurm support this
> library.
>
> And what is the correct commands to run java on cluster
>
>
On 26 October 2017 at 13:27, Alex Chekholko wrote:
> Why can't you just do
>
> for fasta_file in `ls /path/to/fasta_files`; do sbatch
> --output=$fasta_file.out --error=$fasta_file.err myscript.sbatch
> $fasta_file; done
>
Because it was staring me in the face and I
Hi All,
I've now been asked twice in two days if there is any way to intelligently
name slurm output files.
Sometimes our users will do something like
for fasta_file in `ls /path/to/fasta_files`; do sbatch myscript.sbatch
$fasta_file; done
They would like their output and error files to be
On 19 October 2017 at 20:37, Chris Samuel wrote:
>
> On Thursday, 19 October 2017 7:41:37 PM AEDT Nadav Toledo wrote:
>
> > running : id -u domain_name\\username , does return its uid
>
> So your system is not finding users as just "username", but instead only as
>
greggish/status/873177525903609857
> On Thu, Oct 5, 2017 at 3:34 PM, Lachlan Musicman <data...@gmail.com>
> wrote:
>
>> On 6 October 2017 at 07:35, Doug Meyer <dameye...@gmail.com> wrote:
>>
>>> Within the cluster we have partitions that are shared and some that
On 6 October 2017 at 07:29, Jacob Chappell wrote:
> Is there a way (via scontrol for example) to disable accounting/qos policy
> enforcement for a single job? We'd like to be able to allow a job to go
> ahead and run, even though it may violate policy (MaxTRES) on a
>
On 6 October 2017 at 07:35, Doug Meyer wrote:
> Within the cluster we have partitions that are shared and some that are
> dedicated to specific groups. Is there a way to configure slurm so the
> private use partitions do not impact the priority system nor are they
> counted
On 24 September 2017 at 16:20, Daniel Letai wrote:
> Hello,
>
> B. We have active directory(AD) in our faculty, and We prefer manage
> users/groups from there , is it possible? any guide available somewhere?
>
>
> Search this mailing list, this question pops up every now and
On 21 September 2017 at 17:55, Fabrice Nininahazwe
wrote:
>
> Dear developer,
>
> I have encountered some of the nodes that are down, I can ping to node
> n003 and not node n001, I have run scontrol update to change the state with
> no success below is the result after
On 18 September 2017 at 11:13, Christopher Samuel <sam...@unimelb.edu.au>
wrote:
>
> On 14/09/17 16:04, Lachlan Musicman wrote:
>
> > It's worth noting that before this change cgroups couldn't get down to
> > the thread level. We would only consume at the core level -
On 15 September 2017 at 17:09, Dr. Thomas Orgis wrote:
> Hi Zhang,
>
> the default behaviour of slurm is to try to keep the environment
> variables from the submit node. I do not like that and in our
> installation, we urge users to always specify
>
> #SBATCH
c/profile/ /home/user1/.bashrc has already define
> many variable, I think these are default variables, currently, every time,
> I also need to source them before using, it is not reasonable from my view.
>
> Whether there is a way to configure slurm to use running node env, not
> sub
On 14 September 2017 at 19:41, Chaofeng Zhang wrote:
> On node A, I submit job file using sbatch command, the job is running on
> the node B, you will find that the output is not the env of node B, it is
> the env of node A.
>
>
>
> *#!/bin/bash*
>
> *#SBATCH
On 14 September 2017 at 11:06, Lachlan Musicman <data...@gmail.com> wrote:
>
> I've just implemented the change from
>
> NodeName=papr-res-compute[34-36] CPUs=8 RealMemory=31000 State=UNKNOWN
>
> to
>
> NodeName=papr-res-compute[34-36] CPUs=8 RealMemory=3100
On 14 September 2017 at 11:10, Christopher Samuel <sam...@unimelb.edu.au>
wrote:
>
> On 14/09/17 11:07, Lachlan Musicman wrote:
>
> > Node configuration differs from hardware: CPUs=8:8(hw) Boards=1:1(hw)
> > SocketsPerBoard=8:1(hw) CoresPerSocket=1:4(hw) ThreadsPerCo
On 13 September 2017 at 10:36, Christopher Samuel
wrote:
>
> On 13/09/17 07:22, Patrick Goetz wrote:
>
> > All I have to say to this is: um, what?
>
> My take has always been that ThreadsPerCore is really for HPC workloads
> where you've decided not to disable HT full stop
On 11 September 2017 at 20:11, Gennaro Oliva wrote:
>
> Hi Patrick,
>
> On Fri, Sep 08, 2017 at 01:17:33PM -0600, Patrick Goetz wrote:
> > After some
> > discussion on this list, someone convinced me that setting
> > "ThreadsPerCore=2" informs Slurm that each CPU actually
Hola,
I was under the impression that environments travelled with slurm when
sbatch was executed - so any node could execute any code as if it was the
env I executed from or built within my sbatch scripts.
We use Environment Modules and this has all worked just great. Very pleased.
Recently I
On 16 August 2017 at 00:14, Will French wrote:
> > On Aug 15, 2017, at 5:29 AM, Chris Samuel wrote:
> >
> >
> > On Tuesday, 15 August 2017 4:34:55 PM AEST John Hearns wrote:
> >
> >> For the /proc/self you need to start an interactive job under
On 15 August 2017 at 11:38, Christopher Samuel <sam...@unimelb.edu.au>
wrote:
> On 15/08/17 09:41, Lachlan Musicman wrote:
>
> > I guess I'm not 100% sure what I'm looking for, but I do see that there
> > is a
> >
> > 1:name=systemd:/user.slice/user-0.slice/ses
On 15 August 2017 at 07:41, Robbert Eggermont <r.eggerm...@tudelft.nl>
wrote:
>
> On 14-08-17 07:50, Lachlan Musicman wrote:
>
>> We have TaskPlugin=task/cgroup and when testing I noticed that the # of
>> threads/cpus being allocated was rounded up to the nearest even
On 14 August 2017 at 16:22, John Hearns wrote:
> Lachlan, forgive me if I am teaching granny to suck eggs..,,
> I have recently been workign with cgroups.
> If you run an interactive job what do you see when cat /proc/self/cgroups
> Also have you explored in
Hola,
Two things: in the documentation for slurm.conf the reference to ProcTrack
= proctrack/cgroup tells people to see `man cgroup.conf` for more details.
That man page holds no details re proctrack.
https://slurm.schedmd.com/slurm.conf.html
The details in question are on
Yep, thanks Chris. I went with regular reboot and have now successfully used
scontrol reboot ASAP
Very handy!
L.
--
"The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics
is the insistence that we cannot ignore the truth, nor should we panic
about it. It is a shared
e power plug.
> Let's see you deal with that one.
>
> On 7 August 2017 at 06:08, Lachlan Musicman <data...@gmail.com> wrote:
>
>> I've just been asked about implementing a "drain and reboot" for
>> nodes/partitions.
>>
>> In slurm.conf, there is a Re
I've just been asked about implementing a "drain and reboot" for
nodes/partitions.
In slurm.conf, there is a RebootProgram - does this need to be a direct
link to a bin or can it be a command?
RebootProgram=/usr/sbin/reboot
or
RebootProgram='systemctl disable reboot-guard; reboot'
Cheers
L.
eong-gu, Daejeon
> Republic of Korea 305-701
> Tel. +82-10-2075-6911 <+82%2010-2075-6911>
>
> 2017-08-02 13:05 GMT+09:00 Lachlan Musicman <data...@gmail.com>:
>
>> [root@n6 /]# si
>>>
>>> PARTITIONNODES NODES(A/I/O/T) S:C:TMEMORY T
>
> [root@n6 /]# si
>
> PARTITIONNODES NODES(A/I/O/T) S:C:TMEMORY TMP_DISK
> TIMELIMIT AVAIL_FEATURES NODELIST
>
> debug* 6 0/6/0/61:4:27785 113264
> infinite(null) c[1-6]
>
> (for a moment)
>
> [root@n6 /]# si
>
> PARTITION
Sumin,
The error message is saying that the node is down.
When you say "works with sinfo", you need to show us what that means -
sinfo is a command that interrogates the state of nodes, whereas srun sends
commands *to* nodes. So sinfo is meant to work - even if the nodes are
down. It is hte
On 28 July 2017 at 14:30, 허웅 wrote:
> I modified my slurm.conf like :
>
>
>
> NodeName=GO[1-5]
>
>
>
> PartitionName=party Default=yes Nodes=GO[1-5]
>
>
>
> and I restarted slurmctld and slurmd services.
>
>
>
> [root@GO1]~# systemctl start slurmctld
>
> [root@GO1]~#
>
> sgo3 1party* idle
>
> sgo4 1party* idle
>
> sgo5 1party* idle
>
> [root@GO1]~# sn
> Fri Jul 28 09:55:53 2017
>HOSTNAMES
> GO1
> GO2
> GO3
>
e only way out is through, and the only way through is
together. "
*Greg Bloom* @greggish
https://twitter.com/greggish/status/873177525903609857
On 28 July 2017 at 10:47, Lachlan Musicman <data...@gmail.com> wrote:
> I think it's because hostname is so undemanding.
>
> How many CPUs
I think it's because hostname is so undemanding.
How many CPUs does each host have?
You may need to use ((number of cpus per host) + 1) to see action on
another node.
You can try using stress-ng to test higher loads?
https://www.cyberciti.biz/faq/stress-test-linux-unix-server-with-stress-ng/
in/bash
> WDIR=$PWD
> #SBATCH -t 1:00
>
> the -t 1:00 will get ignored by sbatch
>
>
> On Thu, 29 Jun 2017, Lachlan Musicman wrote:
>
>> We have a 40min default time on our main partition.
>>
>> We are finding that researchers that use
>>
>> #SBATCH --ti
We have a 40min default time on our main partition.
We are finding that researchers that use
#SBATCH --time=0-07:00:00
are still having their jobs terminated at 40 minutes.
Using slurm 17.2.04 on Centos 7.3
Has anyone else experienced this?
Cheers
L.
--
"Mission Statement: To provide
We did it in place, worked as noted on the tin. It was less painful than I
expected. TBH, your procedures are admirable, but you shouldn't worry -
it's a relatively smooth process.
cheers
L.
--
"Mission Statement: To provide hope and inspiration for collective action,
to build collective
On 9 June 2017 at 15:26, Lachlan Musicman <data...@gmail.com> wrote:
> On 9 June 2017 at 14:53, Nicholas C Santucci <santu...@uci.edu> wrote:
>
>> Those first two of your Gone list I noticed when 17.02.0 was released on
>> Feb 23.
>> A patch was
t; +Obsoletes: slurm-sjobexit slurm-sjstat slurm-seff
>> %description contribs
>> seff is a mail program used directly by the Slurm daemons. On completion
>> of a
>> job, wait for it's accounting information to be available and include
>> that
>>
>> On 0
Hola,
I followed the instructions for building the 16.05.0 bz2 and installed the
resulting rpms as follows:
Each node got:
slurm.x86_64
slurm-devel.x86_64
slurm-munge.x86_64
slurm-perlapi.x86_64
slurm-plugins.x86_64
slurm-sjobexit.x86_64
slurm-sjstat.x86_64
slurm-torque.x86_64
The head
I'm pretty sure you need the MDCS.
Having said that, I know people run GNU Octave on clusters, can't speak to
it though.
R works on a cluster quite nicely.
cheers
L.
--
"Mission Statement: To provide hope and inspiration for collective action,
to build collective power, to achieve
Hi and welcome to SLURM.
It is late and I am tired, but:
1. SLURM is a cluster
2. front end will run the slurm-ctld service. Compute nodes will run slurmd
service. How that is divided is up to you.
cheers
L.
--
"Mission Statement: To provide hope and inspiration for collective action,
On 24 May 2017 at 13:18, Christopher Samuel <sam...@unimelb.edu.au> wrote:
>
> Hiya,
>
> On 24/05/17 13:10, Lachlan Musicman wrote:
>
> > Occasionally I'll see a bunch of processes "running" (sleeping) on a
> > node well after the job they are ass
Hola,
Occasionally I'll see a bunch of processes "running" (sleeping) on a node
well after the job they are associated with has finished.
How does this happen - does slurm not make sure all processes spawned by a
job have finished at completion?
cheers
L.
--
"Mission Statement: To provide
One user has recently started to see their jobs killed after roughly 40
minutes, even though they have asked for four hours.
40 minutes is partitions' default, but this user has
#SBATCH --time=04:00:00
in their sbatch file?
I have found this: https://bugs.schedmd.com/show_bug.cgi?id=2353 and
- Patrice Cullors, *Black Lives Matter founder*
On 23 May 2017 at 09:43, Lachlan Musicman <data...@gmail.com> wrote:
> Hola,
>
> One of my users has been given the PartitionTimeLimit reason for his jobs
> not running.
>
> He has requested 20 days for the job, but I don't reme
Hola,
One of my users has been given the PartitionTimeLimit reason for his jobs
not running.
He has requested 20 days for the job, but I don't remember setting a time
limit on any partition?
I do recall setting a default time, but not a time limit.
The docs claim:
On 11 May 2017 at 08:33, Batsirai Mabvakure wrote:
> Is there a command i can execute for slurm to update automatically without
> having to download it again?
>
>
Not really. Ubuntu packages SLURM IIRC, but you would need to wait until
they do their packaging and push the
ooted in
grief and rage but pointed towards vision and dreams."
- Patrice Cullors, *Black Lives Matter founder*
On 10 May 2017 at 10:57, Lachlan Musicman <data...@gmail.com> wrote:
> Running Slurm 16.05 on CentOS 7.3 I'm trying to start an interactive
> session with
>
> srun -w
Running Slurm 16.05 on CentOS 7.3 I'm trying to start an interactive
session with
srun -w papr-expanded01 --pty --mem 8192 -t 06:00 /bin/bash
--partition=expanded
srun -w papr-expanded01 --pty -t 06:00 /bin/bash --partition=expanded
srun -w papr-expanded01 --pty --mem 8192 /bin/bash
On 11 April 2017 at 02:36, Raymond Wan wrote:
>
> For SLURM to work, I understand from web pages such as
> https://slurm.schedmd.com/accounting.html that UIDs need to be shared
> across nodes. Based on this web page, it seems sharing /etc/passwd
> between nodes appears
mabvakure <batsir...@nicd.ac.za>
wrote:
> Thank you so much for the reply. Is there another way I can configure the
> nodes other than using mpich that allows me only to update the slurm.conf
> file and not install slurm on every new node every time I scale up?
>
>
> On 20
I don't know if you can split it at a GRES level, but I would put the node
in the two partitions, and then use QOS to only allow one partition access
to the single card and the other partition 3 cards.
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
t
> > partitions. The cluster of 64 is the only one where I see this
> > happening. Unless that number of nodes is pushing the limit for a single
> > slurmctld (which I doubt) I'd be inclined to think it's more likely a
> > network issue but in that case I'd expect w
On 17 February 2017 at 03:02, Baker D.J. wrote:
> Hello,
>
>
>
> Thank you for the reply. There are two accounts on this cluster. What I
> was primarily trying to do was define a default QOS with a partition. My
> idea was to use sacctmgr to create an association between
On 16 February 2017 at 09:36, Christopher Samuel
wrote:
>
> We also have all our partitions (other than our debug one reserved for
> sysadmins) marked as "State=DOWN" in slurm.conf so that they won't start
> jobs when slurmctld is brought back up again.
>
Chris,
What's
If you are only in one account, you don't need to list it.
What version of slurm are you using? Someone else mentioned needing to
restart slurmctld to users to stick. Which is not something I've
experienced, but try that maybe?
I am presuming that your slurm.conf is set up correctly for
If you are looking to suspend and resume jobs, use scontrol:
scontrol suspend
scontrol resume
https://slurm.schedmd.com/scontrol.html
The docs you are pointing to look more like taking nodes offline in times
of low usage?
cheers
L.
--
The most dangerous phrase in the language is,
g
that change to all nodes, restarting slurmctld then running scontrol
reconfigure?
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
> On Sat, Feb 11, 2017 at 3:26 AM Lachlan Musicman <data...@gmail.com>
> wrote:
>
1. As EV noted, to get Memory as a consumable resource, you will need to
add it to the line that says CR_CPU - change to CR_CPU_Memory
https://slurm.schedmd.com/slurm.conf.html
2. That's because of the CR_CPU combined with cons_res. Change to CR_CORE
for per core or CR_SOCKET for per socket. For
There's always the --dependency flag for sbatch. So yes, depending on what
you wanted, you could line up another sbatch after the first if you liked.
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 1 February 2017 at 08:38,
trival questions: does node has correct time wrt head node? and is node
correctly configured in slurm.conf? (# of cpus, amount of memory, etc)
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 1 February 2017 at 08:03, E V
Check they are all in the same time or ntpd against the same server. I
found that the nodes that kept going down had the time out of sync.
Cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 25 January 2017 at 05:49, Allan Streib
stuser slurm_clu+
>
>
>
> one thing that I did notice when I add the user I see this error in the
> slurmctld log
>
>
>
> [2017-01-23T16:47:34.351] error: Update Association request from non-super
> user uid=450
>
>
>
> UID 450 happens to be the slurm user
&
Interesting. To the best of my knowledge, if you are using Accounting, all
users actually need to be in an association - ie having a user account is
insufficient.
An Association is a tuple consisting of: cluster, user, account and
(optional) partition.
Is that the problem?
cheers
L.
--
The
We use the SPANK plugin found here
https://github.com/hpc2n/spank-private-tmp
and find it works very well.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 21 January 2017 at 03:15, John Hearns wrote:
> As I
Will,
I believe you do. While they aren't necessary in your case, I believe the
software has been built for maximum extensibility, and as such there needs
to be:
at least one cluster
at least one account
at least one user
and an association is the "grouping" of those three. The relevant part of
Hi David,
I dealt with this recently (see
https://groups.google.com/forum/#!topic/slurm-devel/DKcFng8c1zE for
instance )
In the end we went with this solution that has worked well for us:
https://slurm.schedmd.com/SUG14/private_tmp.pdf
which describes this plugin:
On 8 December 2016 at 07:54, Mark R. Piercy wrote:
>
> Is it ever possible to submit jobs based on a users org affiliation? So
> if a user is in org (PI) "smith" then their jobs would automatically be
> sent to a particular partition. So no need to use the -p option in
>
Hi,
I've had a request from a user about the email system in SLURM. Basically,
there's a team collaboration and the request was:
is there an sbatch command such that two groups will get different sets of
emails.
Group 1: only get the email if the jobs FAIL
Group 2: get Begin, End and Fail
Hey Devs,
The new design on the schedmd site is pretty - thanks!
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
pip inside
> virtualenv's if that is the case the switch to a container with rkt seems
> "normal" instead of a more intrusive one all mighty process to rule
> everything that docker had the last time I check, its probably better now.
>
> Saludos.
> Jean
>
> On Tue, Nov 15, 20
Hola,
We were looking for the ability to make jobs perfectly reproducible - while
the system is set up with environment modules with the increasing number of
package management tools - pip/conda; npm; CRAN/Bioconductor - and people
building increasingly more complex software stacks, our users
Arg, I see now (hit send too soon). My parsing of the man page was wrong.
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 8 November 2016 at 11:39, Lachlan Musicman <data...@gmail.com> wrote:
> Priority:
Priority: Minor
I notice that this command works well:
sinfo -Nle -o '%C %t'
Tue Nov 8 11:38:09 2016
CPUS(A/I/O/T) STATE
40/0/0/40 alloc
38/2/0/40 mix
36/4/0/40 mix
36/4/0/40 mix
6/34/0/40 mix
0/40/0/40 idle
0/40/0/40 idle
0/40/0/40 idle
0/40/0/40 idle
0/40/0/40 idle
0/40/0/40 idle
0/40/0/40
Peixin,
Again, depends on your OS and deployment methods, but essentially:
In slurm.conf set
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
SlurmctldLogFile=/var/log/slurm/slurm-ctld.log
On 8 November 2016 at 07:11, Peixin Qiao wrote:
> Hi,
>
> I install munge and restart my computer, then munge stopped work and
> restarting munge didn't work. It says:
>
> munged: Error: Failed to check pidfile dir "/var/run/munge": cannot
> canonicalize "/var/run/munge": No
I think it should. Can you send through your slurm.conf?
Also, the logs usually explicitly say why slurmctld/slurmd don't start, and
the best way to judge if slurm is running is with systemd:
systemctl status slurmctl
systemctl status slurmd
cheers
L.
--
The most dangerous phrase in the
On 28 October 2016 at 09:20, Christopher Samuel <sam...@unimelb.edu.au>
wrote:
>
> On 28/10/16 08:44, Lachlan Musicman wrote:
>
> > So I checked the system, noticed that one node was drained, resumed it.
> > Then I tried both
> >
> > scontrol requeue 230
Morning,
Yesterday we had some internal network issues that caused havoc on our
system. By the end of the day everything was ok on the whole.
This morning I came in to see one job on the queue (which was otherwise
relatively quiet) with the error message/Nodelist Reason (launch failed
requeued
On 25 October 2016 at 09:17, Tuo Chen Peng wrote:
> Oh ok thanks for pointing this out.
>
> I thought ‘scontrol update’ command is for letting slurmctld to pick up
> any change in slurm.conf.
>
> But after reading the manual again, it seems this command is instead to
> change
On 25 October 2016 at 08:42, Tuo Chen Peng wrote:
> Hello all,
>
> This is my first post in the mailing list - nice to join the community!
>
Welcome!
>
>
> I have a general question regarding slurm partition change:
>
> If I move one node from one partition to the other,
On 21 October 2016 at 12:39, Christopher Samuel
wrote:
>
> On 21/10/16 12:29, Andrew Elwell wrote:
>
> > When running sreport (both 14.11 and 16.05) I'm seeing "duplicate"
> > user info with different timings. Can someone say what's being added
> > up separately here - it
I've had consistent success with the documented system - "rpmbulid
slurm-.tgz" then yum installing the resulting files, using 15.x,
16.05 and 17.02.
Have on occasion needed to recompile - hdf5 support and for non main line
plugins, but otherwise it's been pretty easy.
Will happily support/debug
Mike, I would suggest that the limit is a SLURM limit rather than a ulimit.
What is the result of
scontrol show config | grep Mem
?
Because you have set your
SelectTypeParameters=CR_Core_Memory
Memory will cause jobs to fail if they go over the default memory limit.
The SLURM head will kill
for that partition
- jobs running on that partition will continue to do so
cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this
way."
- Grace Hopper
On 12 October 2016 at 10:35, Lachlan Musicman <data...@gmail.com> wrote:
> Hola,
>
> For reason
Hola,
For reasons, our IT team needs some downtime on our authentication server
(FreeIPA/sssd).
We would like to minimize the disruption, but also not lose any work.
The current plan is for the nodes to be set to DRAIN on Friday afternoon
and on Monday morning we will suspend any running jobs,
Check against the installed libs? check *-devel? Otherwise I'm not 100%
sure - unless the rpmbuild folder with all files still exists and there's
something in there?
FWIW, it's relatively easy to install all the libs that SLURM needs without
causing too much problems. The hardest I've found so
Hola,
Just built the rpms as per the installation docs.
Noted that there were three new rpms:
slurm-openlava-16.05.5-1.el7.centos.x86_64.rpm
slurm-pam_slurm-16.05.5-1.el7.centos.x86_64.rpm
slurm-seff-16.05.5-1.el7.centos.x86_64.rpm
Is that due to a more sophisticated build machine or due to
Jose,
Do all the nodes have access to either a shared /usr/lib64/slurm or do they
each have their own? And is there a file in that dir (on each machine)
called select_cons_res.so?
Also, when changing slurm.conf here's a quick and easy workflow:
1. change slurm.conf
2. deploy to all machines in
he language is, "We've always done it this
way."
- Grace Hopper
>
> Doug Jacobsen, Ph.D.
> NERSC Computer Systems Engineer
> National Energy Research Scientific Computing Center
> <http://www.nersc.gov>
> dmjacob...@lbl.gov
>
> - __o
>
.@unimelb.edu.au>
wrote:
>
> On 30/08/16 12:39, Lachlan Musicman wrote:
>
> > Oh! Thanks.
> >
> > I presume that includes sruns that are in an sbatch file.
>
> Yup, that's right.
>
> cheers!
> Chris
> --
> Christopher SamuelSenior Systems Admini
I started a thread on understand QOS, but quickly realised I had made a
fundamental error in my configuration. I fixed that problem last week.
(ref:
https://groups.google.com/forum/#!msg/slurm-devel/dqL30WwmrmU/SoOMHmRVDAAJ )
Despite these changes, the issue remains, so I would like to ask again,
:01, Janne Blomqvist <janne.blomqv...@aalto.fi>
wrote:
> On 2016-09-29 04:11, Lachlan Musicman wrote:
> > Hi,
> >
> > After some fun incidents with accidental monopolization of the cluster,
> > we decided to enforce some QOS.
> [snip]
> > What have I done w
is, "We've always done it this
way."
- Grace Hopper
On 29 September 2016 at 11:10, Lachlan Musicman <data...@gmail.com> wrote:
> Hi,
>
> After some fun incidents with accidental monopolization of the cluster, we
> decided to enforce some QOS.
>
> I read the documentatio
Hi,
After some fun incidents with accidental monopolization of the cluster, we
decided to enforce some QOS.
I read the documentation. Thus far in the set up the only thing I've done
that's even close is I assigned "share" values when I set up each
association.
The cluster had a QOS called
Park // Dirac Crescent // Emersons
> Green // Bristol // BS16 7FR
>
> CFMS Services Ltd is registered in England and Wales No 05742022 - a
> subsidiary of CFMS Ltd
> CFMS Services Ltd registered office // Victoria House // 51 Victoria
> Street // Bristol // BS1 6AD
>
>
I am surprised how hard I found it to find these as well - especially given
how frequently the question is asked.
This mob have made one, and it looks good, but all development has happened
on .deb systems, and I didn't have sufficient time (or skill) to unpack and
repack for rpm or generic.
Hi,
cgroups have been on my radar since about two weeks after I started looking
into SLURM and I'm just getting around to looking at them now.
I note that the ProcTrackType docs say
> This plugin writes to disk often and can impact performance. If you are
running lots of
> short running jobs
1 - 100 of 142 matches
Mail list logo