t a day.
> That's how it supports everything else!
Yeah, the problem is the millions of single node or core jobs that get
run, they'll never even try to use srun.
cheers!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences
ir new
quota and then zero everyone else.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
On 22/04/15 19:07, Maciej L. Olchowik wrote:
> I will suggest that to the resource allocation committee, see if they
> would be willing to change their ways :) Thanks for the thought!
My pleasure, glad our experiences are useful to others!
All the best,
Chris
--
Christopher
d people with lots of experience in this and isn't
very high traffic these days.
Drop me a line if you'd like to be subscribed (had to disable Mailman
web subscriptions as it was being abused to mailbomb people).
All the best,
Chris
--
Christopher SamuelSenior Systems Admini
ore information about how Slurm interacts with various MPI
implementations on the SchedMD website here:
http://slurm.schedmd.com/mpi_guide.html
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email:
've fixed it in the interim.
Especially as for 14.11.3 the NEWS file says:
-- Fixed squeue core dump.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 9
On 26/05/15 13:05, Christopher Samuel wrote:
> Given that Slurm 14.11.x is up to 14.11.6
Sorry, actually up to 14.11.7..
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55
ly verified by any authority – this number jumps up
# to ~40% with a sampling error bound of 3%. [...]
If anything that puts me off liking them even more. :-(
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Emai
ser namespaces here:
https://lwn.net/Articles/532593/
Your filesystem has to support it though, otherwise it looks like you'll
get EINVAL back - as this comment from a user who was trying it on
filesystems not yet ported to it reports:
https://lwn.net/Articles/541787/
All the best,
Chris
--
om memory) and then you can login to unsubscribe.
No direct link from what I can see sorry!
I think Danny, Moe, David et. al monitor the list and unsubscribe people
who ask here, so it might have already happened for you (hence the CC).
All the best,
Chris
--
Christopher SamuelSenio
oth Docker and CoreOS's rkt are Apache2. :-(
https://www.apache.org/licenses/GPL-compatibility.html
Oh well, makes the whole thing academic really for us.
Thanks for pointing that out Ralph!
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life S
ir jobs, and SLURM will do so.
How did they deal with the license incompatibility that Ralph mentioned?
All the best
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903
ing
done regularly with the old paths hardcoded in them?
What users are they running as?
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.
SBATCH --ntasks=1
#SBATCH --cpus-per-task=x
So they understand that they are running a single process that needs 'x'
cores.
Best of luck!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au
nCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
so you can at
least confirm whether or not there are multiple cores allocated.
Best of luck!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
ing at least.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
ut
containerisation they recommended looking at LXD from Canonical which is
an evolution of LXC.
http://www.ubuntu.com/cloud/tools/lxd
Of course getting it to run on anything other than Ubuntu might be a
challenge which could limit its usefulness.
All the best,
Chris
--
Christopher Samuel
n here in
Australia) to manage LDAP users and projects and it supports both Slurm
and Torque/MAM.
https://github.com/Karaage-Cluster/
Might be overkill for a small cluster though!
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Co
but if it ain't broke... ;-)
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
14.11.x).
So 14.11.x should support 14.03.x and 2.6.x.
I suspect that the OP is actually having the wrong slurmd start on the
compute nodes..
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu
few years now.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
${slurm_ver} is the current version we're
running (currently 14.03.11).
You will need to fix up your resource limit settings for
maximum lockable memory too, but that shouldn't cause the
issue you're seeing.
Best of luck!
Chris
--
Christopher SamuelSenior Systems Administrat
and?
We're on RHEL6 (no systemd) and don't start them on boot, instead we
will only start it by hand (as if a node reboots due to a fault we want
to go and check it out first).
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
i/v1.8/
There have been Slurm related fixes in 1.8.4 and 1.8.5 (though looking
at the change log they seem minor, but one is related to configure tests
for PMI).
I'd strongly suggest trying that out just to confirm that it's not an
issue that's been fixed in the last 9 months.
Be
see online a lot of people set in the
> /etc/sysconfig/slurm file.
>
> Are there different sourced files in the CentOS 7 / systemd setup for
> slurm?
I would not be at all surprised if that was the case.
If the unit file is telling you where it is then I'd trust that.
Good luck!
it of that is that for us is that as our current Intel clusters
have minimums of 4GB/core and 16GB/core then if a lot of people can get
away inside that limit then you can still fit larger jobs into the
memory and cores left over.
All the best,
Chris
--
Christopher SamuelSenior System
etrlimit(2) for details. Use the string
infinity to configure no limit on a specific resource.
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
..
cheers!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
not had to touch
Solaris for well over a decade now.
Best of luck,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
en use Karaage to manage LDAP users, their projects and it
integrates with Slurm via scripts that call sacctmgr.
http://karaage.readthedocs.org/en/latest/introduction.html
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computat
e have:
echo export BASH_ENV=/etc/profile.d/module.sh
to try and get that into peoples environments (basically ported over
from our old Torque setup).
Hope that helps,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Emai
On 30/06/15 07:05, Apolinar Martinez Melchor wrote:
> *** JOB xxx CANCELLED AT yyy DUE TO TIME LIMIT ***
That sounds like the job hit its walltime, GrpCpuMins should control
whether a job can start in the first place.
Best of luck,
Chris
--
Christopher SamuelSenior Syst
S reported for any running jobs, even the longest
currently (20 hours old).
We are running using slurmdbd with the MySQL backend.
Am I misunderstanding something?
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
that just use sbatch (and no srun) once they
complete.
The manual page for sbatch says that memory stats are collected by
default every 30 seconds.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...
d.
All the best!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
Hiya,
On 16/07/15 14:25, Danny Auble wrote:
> sstat should work with mpi jobs as well, just as long as srun is the
> launcher.
Yup, we're using OpenMPI 1.6.x mpirun (which calls srun).
cheers!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Li
11.x one to get to that version.
...and make sure you've got a database backup first! ;-)
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
d at" `date`
echo "print Scratch directory /scratch/avoca/$SLURM_JOB_ID has been allocated"
echo "print $SLURM_BG_NUM_NODES Blue Gene/Q compute nodes have been allocated"
Hope these help!
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victo
On 29/07/15 16:41, Thomas Orgis wrote:
> Am Tue, 28 Jul 2015 23:03:54 -0700
> schrieb Christopher Samuel :
>
>> 1) To stop environment variable being exported we have this in our
>> slurm.conf file:
>>
>> PropagateResourceLimits=NONE
>
> Does that infl
their walltime then there jobs will get in..
Best of luck,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
On 11/08/15 23:22, Igor Chebotar wrote:
> The mpich and openmpi ./configure was set with default options with only
> --prefix=/sotware/storage/path
For Open-MPI you need to specify:
--with-slurm
to include Slurm support.
Hope that helps!
Chris
--
Christopher SamuelSenior S
central Slurm install instead and so need to tell OMPI
where to find it.
Best of luck!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
tails about the whole scheduling process
> (backfill)?
I don't know the answer I'm afraid, but I can see that it would be
really useful to know what nodes are planned for a job (a bit like Moab
and its forward reservations for jobs).
All the best!
Chris
--
Christopher Samuel
d to that when we get to 14.11!
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
, there appears
to be no pmi2.h installed by "make install" and indeed there is a
contrib/pmi2 directory instead that has the pmi2.h header file and API
which appears to need manual intervention to install.
Has there been some misunderstanding about PMI2 in Slurm?
All the best
.h ever being installed on our systems in
14.03.x or 2.6.x.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
..
[samuel@snowy-m ~]$ srun --mpi=list
srun: MPI types are...
[...]
srun: mpi/pmi2
srun: mpi/openmpi
[...]
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903
On 02/09/15 13:32, Danny Auble wrote:
> I'm fairly sure if you install via rpm it will be there.
> Contribs isn't build through the normal make as was pointed out,
> but it is through the rpm process.
OK, thanks Danny, so it's not built by default.
cheers!
Chris
On 02/09/15 13:57, Ralph Castain wrote:
> Sorry for the confusion, Chris
Not a worry Ralph, this is what bring-up testing is for. :-)
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61
3.003] job_complete: JobID=959 State=0x8003 NodeCnt=2 done
Any ideas?
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/
night with a
different build, I'll have another go tomorrow.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
m with it?
>
> ls -l /lib/slurm/mpi*so
Thanks for the pointer, I think I'd deleted enough for Open-MPI to not
detect PMI2 when building but not enough for srun to not list it. :-/
Fixed that now aspect now. :-)
All the best,
Chris
--
Christopher SamuelSenior Systems Administ
e job arrays (waiting on confirmation from them at the
moment).
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
iname '*pmi2*'
/usr/local/slurm/latest/lib/slurm/mpi_pmi2.la
/usr/local/slurm/latest/lib/slurm/mpi_pmi2.a
/usr/local/slurm/latest/lib/slurm/mpi_pmi2.so
which is what srun looks for to know what to list.
That seems counter-intuitive, should those files be installed as a
result of
On 03/09/15 05:17, Andy Riebs wrote:
> Have you specified mpi=pmi2, either in your command line or in slurm.conf?
Yup, tried all combinations of that.
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email:
The command line to generate this was:
[samuel@snowy-m PMI2]$ sbatch -p debug --wrap "srun -vv
--slurmd-debug=verbose ./testpmi2"
Submitted batch job 996
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Ini
single
node with David's command line to two nodes and never spotted I'd
dropped it out.
I'll test that again once the compute nodes are back online again.
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiat
On 04/09/15 15:44, Christopher Samuel wrote:
> I'll test that again once the compute nodes are back online again.
OK, back to the right error again.
[samuel@snowy-m PMI2]$ srun -p debug --mpi=pmi2 ./testpmi2
srun: job 1004 queued and waiting for resources
srun: job 1004 has been a
On 04/09/15 16:02, Christopher Samuel wrote:
> I've attached the output file for the version with debugging on, and I
> have a suspicion it's related to:
>
> srun: debug: slurm_forward_data: nodelist=snowy010,
> address=/tmp/sock.pmi2.1006.0, len=243
>
> We ar
queued up to try it out
once the current job finishes.
I would like Slurm to not use $TMPDIR (TmpFS) for that as we need
the tmpdir spank plugin to map each jobs /tmp to our scratch FS
instead. Too many codes don't honour $TMPDIR/$TMP/etc..
All the best!
Chris
--
Christopher Samuel
HDF5
code that seems to need these MPI headers.
All the best,
Chris (finally in DC for the Slurm User Group)
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org
m option, and the default size is 1MB.
Your job asks for 50MB and so won't fit.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.v
then is why is the HDF5 code finding it?
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
ngStorageType=accounting_storage/slurmdbd
to tell it to use slurmdbd instead?
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.
-lpthread -lhwloc
-Wl,-rpath=/usr/local/slurm/latest/lib
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
most looks like your LD_LIBRARY_PATH is not being propagated properly
to the compute nodes. What does:
ldd scriptmpi
say?
cheers,
Chris /* Apologies if threading is broken for this email */
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation In
On 19/09/15 05:44, Fany Pag�s D�az wrote:
> I need to send an executable with parameters. I can do it with the
> command srun?
>
> For example:
> srun myapplication database IP username password
Yes, that should work just fine.
All the best,
Chris
--
Christopher Samue
the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
On 22/09/15 06:37, Andrew Petersen wrote:
> However if I use 2,
> #SBATCH --cpus-per-task=2
> I get the error "sbatch: error: Batch job submission failed: Requested
> node configuration is not available"
What does your slurm.conf look like?
--
Christopher Samuel
DEFAULT line says everything that is general and then for the
two larger memory nodes we override that.
Hope this helps!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http:
r that?
What is the program itself you're trying to run?
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
t of luck,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
24T10:23:19
1292_1.0 xhpl_inte+ 2015-09-24T10:23:19
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
PMIX work,
to me they seem to be extending PMI2 so I wonder if it would be better
for their API to be called PMI2X instead to avoid confusion?
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unim
mgr will be talking to) is running on the same host as slurmctld.
For instance we have many clusters talking back to a central host
running slurmdbd (and only slurmdbd) and have a variety of various
systems that talk to it for different tasks.
All the best,
Chris
--
Christopher Samuel
ll talk to the
same slurmdbd to record their accounting records, get their limits, QoS,
etc from.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http
ts via sacctmgr.
How's that?
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
d just request the binding for the job step
that needs inside your batch job with:
srun --cpu_bind=cores ./my_program
In our recent acceptance testing I tried that with our HPL burn in runs
and saw an improvement in performance with that.
All the best,
Chris
--
Christopher SamuelSenior S
r
The bf_window is large enough for our max job time (1 month), and we use
bf_continue to make sure that it goes all the way through the queue
before returning to the top.
Hope this helps!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Init
patch which catches the cancellation) applies. It all
applies, 14.03 looks like it would need more work.
This fix seems to be:
commit 8e66e26773352e5a27445a6b60a2134b632c3453
Author: David Bigagli
Date: Wed Nov 11 13:04:28 2015 +0100
Fix job cancelation bug.
--
Christopher
On 16/11/15 11:39, Christopher Samuel wrote:
> It was a bug, fixed in 15.08.
That was meant to be "fixed in 15.08.4" but my caffeine levels have
become too dilute..
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiativ
On 16/11/15 18:28, Mikael Johansson wrote:
> Well, to get the total number that SLURM is aware of, a simple command
> would be "sinfo -o %C", which shows Allocated/Idle/Offline/Total CPUs.
There's also "sinfo -o %F" for the same information about nodes.
cheers
l the best!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
On 30/11/15 16:44, Christopher Samuel wrote:
> We're looking at seeing if we can combine fair share with our existing
> quota system that uses GrpCPUMins.
>
> However, for fair share a decay factor is strongly suggested and I worry
> that there is an implication that the u
down and then up and
it'll go back to pending. :-)
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
On 04/12/15 10:49, Christopher Samuel wrote:
> the scheduler next tries to run it
Sorry, that should be "the scheduler next tries to schedule it".
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@u
On 11/12/15 11:56, je...@schedmd.com wrote:
> We are pleased to announce the availability of Slurm version 15.08.5
Thanks Moe - silly question - do you need to recompile plugins if going
from 15.08.4 to 15.08.5 ?
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VL
ng
was that it was much better these days.
To be honest we're willing to take a small performance hit to prevent
rogue jobs killing other peoples jobs on the same node!
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation
ut I don't believe it as ever implemented.
Best of luck,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
daemon executes on every compute node. It
# resembles a remote shell daemon to export control to
# Slurm. Because slurmd initiates and manages user jobs,
# it must execute as the user root.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sc
differ from a head, worker or
> accounting node, and is there anything special that I need to do to
> that node - apart from implementing some sort of authorization
> system?
The key is for that is Munge - you need to make sure you've got that
configured & running everywher
On 15/12/15 14:11, Christopher Samuel wrote:
> Hi Simpson,
Sorry, Lachlan! :-)
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ h
Hiya Trey,
On 16/12/15 03:24, Trey Dockendorf wrote:
> The same startup script is used to launch slurmctld and slurmd.
That's not the case for the systemd unit files, they each reference a
different binary.
All the best,
Chris
--
Christopher SamuelSenior Systems Admin
x27;s on the right version).
There is a section in the admin quickstart about upgrades here:
http://slurm.schedmd.com/quickstart_admin.html
Best of luck!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.
e.service
Best of luck!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
the job allocation. Other than the batch script
itself, Slurm does no movement of user files.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http:/
oup",
MS_NOSUID|MS_NODEV|MS_NOEXEC, "cpuacct") = -1 EBUSY (Device or resource busy)
...and it might be related to this existing mount courtesy
of systemd in /proc/mounts:
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup
rw,nosuid,nodev,noexec,relatime,cpuacct,cpu 0 0
Anyone else seen t
On 21/12/15 16:57, Simpson Lachlan wrote:
> So if I'm running the tests then that part of the filesystem should be
> shared?
Yup, that's the idea..
> Hmmm. Ok. Thanks, I'll look into this tomorrow.
No worries, best of luck!
--
Christopher SamuelSenior Syst
Thanks for the confirmation!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
101 - 200 of 497 matches
Mail list logo