. To what extent Slurm uses that information at present
I'm not sure without perusing the code further..
cheers!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http
suggest it'd be a good idea to get involved on
hwloc-devel.
cheers!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
wrinkle there being that a job script can launch N processes each of
which can allocate up to RLIMIT_AS.
We were hoping that Slurms cgroups support would permit limiting the
memory allocated by the whole job.
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using
but
it should work.
Thanks, looks like it's well worth looking into.
For the pty there is an option for srun called --pty which allows
you to open a remote shell on the master computer node of the job
as if you were sshing it.
That's great, precisely what we needed!
All the best,
Chris
- --
Christopher
backlog but I'll try and test that out soon.
Thanks!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN
modify the behaviour of via the SLURM Job
Submit Plugin API?
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r
.
Is this something people have run into before, any ideas?
cheers!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/884034622244/
http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev
, but the app does
not see the option.. :-(
cheers!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP
,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux
need (or to tell your users to use
something else, depending on how BOFH'ish you are feeling and how big
your budget is).
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61
the intention of RLIMIT_DATA)
or would it be better off as a configuration parameter?
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au
by VSizeFactor to
set RLIMIT_AS (see the slurm.conf man page). This means it won't be
enforced unless you set that to a non-default value.
Thanks again for the explanation, very much appreciated!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences
backported
to OMPI 1.6.x ?
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATURE
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 23/07/13 17:06, Christopher Samuel wrote:
Bringing up a new IBM SandyBridge cluster I'm running a NAMD test
case and noticed that if I run it with srun rather than mpirun it
goes over 20% slower.
Following on from this issue, we've found
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 07/08/13 16:19, Christopher Samuel wrote:
Anyone seen anything similar, or any ideas on what could be going
on?
Sorry, this was with:
# ACCOUNTING
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=30
Since those initial tests
memory per tasks,
but I'm not sure I understand how that could lead to Slurm thinking
the job is using vastly more memory than it actually is though.
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam
to be
accurate. This also goes for handling memory limits.
Thanks, and I understand why that is, it's just a shame that the
performance penalty for using srun with Open-MPI makes it unusable. :-(
cheers!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences
on the list before about
having their own copies but I've not found any available yet.
Before we go trying to (re?)inventing the wheel, does anyone know of any
already publicly available out there?
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life
to Karl for sending me their version, it works nicely.
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 12/08/13 16:16, Christopher Samuel wrote:
Finally gotten a system I can try this on, but I think I must be
missing something
It was a quoting problem in the Makefile I derived from the RPM spec
file, it now works really nicely, thanks!
All
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 19/08/13 12:02, Christopher Samuel wrote:
Chasing through the code it looks like getgrnam_r() fails in
get_group_members() in src/slurmctld/groups.c on our Slurm 2.6 boxes
(on RHEL 6.4)
Scratch that, restarting slurmctld doesn't provoke
very much..
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATURE-
Version: GnuPG
the jobacct_gather/cgroup plugin will give
better numbers once it's had more work.
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http
.
that doesn't appear to happen (OFFLOAD_DEVICES is not set) and I don't
see any evidence of code to do that in the current slurm-2.6 branch.
Is it an oversight, or am I missing something?
Currently I'm using a taskprolog to set it to -1 if it's absent.
All the best,
Chris
- --
Christopher Samuel
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux
SLURM_EXCLUSIVE=1
Is there any documentation explaining why these are required, or is this
a bug?
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http
Subject: Slurmdbd tables
Date: Tue, 17 Sep 2013 09:11:50 +1000
From: Brett Pemberton b...@unimelb.edu.au
To: Christopher Samuel sam...@unimelb.edu.au
Chris,
The situation:
We need all tables to have primary keys defined (for galera
replication). However slurm has two tables per cluster
,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux
sites would have conniptions
if users were able to take nodes out at random. ;-)
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au
release and then scontrol
requeue it will just start again.
Moe, et. al, how easy would it be to have some form of:
scontrol requeue --hold $JOBID
?
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 19/09/13 16:29, Arjun J Rao wrote:
I installed SLURM on a Scientific Linux 6.4 system (64bit)
Did you install from source, or from RPMs?
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation
that redirected jobs of larger than 1 core to other
partitions and jobs of just 1 core to the serial partition is likely
to be the best way for now.
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 03/10/13 06:42, David Bigagli wrote:
Sounds good. :-) Thanks for the patch it is going to be in Slurm
2.6.3.
Thanks for that David.
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation
the Torque developers solved that issue (user
discussion was on torqueusers, dev stuff was on torquedev).
cheers!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http
need to check that.
Good luck!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATURE
the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU
probably want to base any development on that.
cheers!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
whinges that /bin/hostname doesn't have the magic ELF
header to say it's been built for the CNK, but it does try and execute it.
Best of luck!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone
is another story. :-)
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATURE
should download a Linux
distro and install that instead - you'll save yourself a lot of pain.
Best of luck,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http
you can see this article:
https://lwn.net/Articles/574317/
and an article about the cgmanager project, and the systemd developers
unwillingness to cooperate on a single API, is here:
https://lwn.net/Articles/575672/
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
level, though
they'd only be affecting their own stuff.
Thoughts?
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http
there are no data
staging directives for sbatch to copy data onto and off a node
before/after a job (unlike Torque).
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
brave enough to restart those
with running jobs on them.. ;-)
To me it's reminiscent of this bug:
http://bugs.schedmd.com/show_bug.cgi?id=392
except our problem persists across a slurmctld (and slurmdbd) restart. :-(
Any ideas?
cheers,
Chris
- --
Christopher SamuelSenior Systems
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 08/01/14 11:03, Christopher Samuel wrote:
Here's the data we have with all numbers normalised to hours:
GrpCPUMins 71200
CPURunMins 58348
Raw Usage 12967
So that means GrpCPUMins-CPURunMins-RawUsage = -115 hours.
After a bit more
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux
sure the compute nodes re-read
slurm.conf so everyone is on the same page again.
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using
with:
[root@merri-m ~]# cat /usr/local/bin/sinteractive
#!/bin/bash
exec srun $* --pty -u ${SHELL} -i -l
Hope this helps!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
Weight=10
NodeName=barcoo[062-070] NodeAddr=barcoo[062-070] RealMemory=25 Gres=mic:2
Weight=100
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http
partitions at 8am and 8pm and bring that
partition up/down as appropriate.
My reading of the manual page isn't encouraging, but I could just be
not seeing the wood for the trees. Any ideas please?
All the best!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian
standard Slurm going too. Nothing like having lofty goals!
All the best!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http
I'm missing, or is this a bug?
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 03/03/14 12:01, Christopher Samuel wrote:
and I can see the expected values when I do:
# sacctmgr show assoc where cluster=barcoo account=foo
but it doesn't appear to have any effect. I've tried restarting
slurmctld on the node just
showq to
treat those as blocked and hide them from other users.
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 11/03/14 06:20, Andy Riebs wrote:
Has anyone seen this before? slurm.conf is on an NFS server, so
it's possible we've got a configuration error there.
We've seen this same problem too, lost a heap of jobs to it. :-(
- --
Christopher Samuel
clusters which have all only run 2.6.x.
The University has had some occasional network and DNS issues which
have meant that our slurmdbd has had transient issues talking to our
database server, so I don't know if that could be the trigger for us.
All the best,
Chris
- --
Christopher Samuel
.. :-)
Secondly will a 14.03 slurmctld happily talk to (drained) 2.6.x
slurmd's running jobs so we can do a rolling upgrade? Or will we need
to drain the entire cluster of running jobs first and then upgrade
them in a single hit?
All the best,
Chris
- --
Christopher SamuelSenior Systems
(especially on BG/Q).
A setting that let you change that behaviour to mark nodes as draining
rather than down would be very handy.
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone
sorts of things for us, so I think we'd
rather just have the Slurm not kill jobs unless we (or the user) tells
it to.
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903
only query their own jobs and they can do that
from any of our 4 HPC systems.
Take a look at:
http://slurm.schedmd.com/accounting.html
as a starting point.
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Moe!
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux
:
[samuel@barcoo ~]$ sacct -j 1093417 -o jobid,nodelist
JobIDNodeList
- ---
1093417barcoo010
And showing it's not there by default:
[samuel@barcoo ~]$ sacct -j 1093417 -l | fgrep -i NodeList
[samuel@barcoo ~]$
Hope that helps!
Chris
- --
Christopher
not a great deal but it'd be good to have an idea how
long we could safely run without a slurmdbd to talk to.
thanks!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http
assume
you are not using 32MB of RAM.
32MB should be enough for anyone.. ;-)
All the best!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au
MySQL 5.6?
All the best!
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP SIGNATURE-
Version
: error: Invalid job_id job
says: error: Invalid job_id job
How can I cancel this job?
You can't as it is no longer running, so cancelling it doesn't make sense.
Hope that helps,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation
into the background (like a daemon would)? Even
then I'm not sure that srun would help you.
We have users running batch scripts all the time on x86-64 without
using srun, it's never been an issue for us.
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI
proposing
# that we turn PMI-2 off when under Slurm unless the user
# specifically requests we use it.
Not sure if this has been raised on slurm-dev yet.
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email
to the manual page the --ntasks-per-node=1 option for srun
should do what you want.
cheers,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au
to 1
Submitted batch job 1856638
A distributed job (MPI for instance) must have at least
one task on every node for this to make sense.
All the best,
Chris
- --
Christopher SamuelSenior Systems
when
creating MySQL tables, that way you don't need to remember.
All the best,
Chris
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http
if it isn't available.
Wonderful, thanks so much!
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-BEGIN PGP
(and no node specification) not work for that?
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
and find a time to create a test partition there to check we
see the same.
We're (currently) on 2.6.5, what version are you on?
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61
On 22/08/14 04:43, Jesse Stroik wrote:
We recently noticed sporadic performance inconsistencies on one of our
clusters.
What distro is this? Are you using cgroups?
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation
to go and find out why before we
let it back into the cluster.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http
Unfortunately it's not clear what you do from there (there is no
unsubscribe link), but if you click Profile Home from the top
of that page it takes you to:
http://lists.schedmd.com/cgi-bin/dada/mail.cgi/profile/
and you'll see Unsubscribe from this list links there.
Good luck!
Chris
--
Christopher
2 cores to a single job if I can
dedicate each core individually to a job.
What does scontrol show node say?
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
out of individual nodes but they are
pretty rare. One company that does this is ScaleMP.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http
On 11/09/14 10:56, Christopher Samuel wrote:
This is an operating system kernel issue, not a queuing system issue, so
Slurm, LSF or Torque will all have the same issue.
To clarify the Linux kernel can support these sorts of systems, but
you'll either need extra supporting software
and then copied back at the end of
the job.
Slurm doesn't do that, it defaults to writing the output (and errors) to
a file in the directory a job is running in, so there's no need to copy
it back at the end of the job.
Hope that helps,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI
it was triggered.
Hope this helps!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
in recorded memory usage between the two.
Might take a while to have something to report though. :-)
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http
.
cheers!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
the login nodes.
8) Set partitions back to up to start jobs going again.
Hope this helps folks..
cheers!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http
of which help in this scenario),
but just wondering if I've missed anything obvious.
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au
On 29/09/14 21:33, je...@schedmd.com wrote:
Slurm does not support this today.
Thanks Moe, we'll see if we can figure another way around it.
cheers!
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam
On 30/09/14 02:39, je...@schedmd.com wrote:
About 70 people attended the Slurm User Group Meeting last week in
Lugano Switzerland.
Thanks so much to you and everyone who organised the meeting and to
everyone who came, it was well worth attending.
All the best,
Chris
--
Christopher Samuel
On 03/10/14 03:05, Michael Jennings wrote:
Is your /var/tmp mounted noexec by any chance?
I wondered that but I don't think that'll give the errors that Dennis is
seeing, he's seeing -EACCES for /usr/bin/perl, not for something in
/var/tmp. :-(
--
Christopher SamuelSenior Systems
just a few seconds.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
servers.
The foreign constraints came from another database a colleague of mine
has installed, we could fix this :-)
:-)
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3
On 16/10/14 16:02, Christopher Samuel wrote:
No worries, we're going to test out ours in a sandbox as well, so we'll
be able to compare it to our (pretty beefy) DB servers.
It took around 2 minutes to add all the indexes in our sandbox, thats
with a total of about 6 million jobs across 5
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
.x as that can use PMI2 with
Slurm and fixes this scaling issue.
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
) is to migrate to Open-MPI 1.8.4 which
is due out shortly which should address this.
cheers,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au
that in the stdout or stderr files as a
result of upgrading to 14.03.10.
Yup, this is with 14.03.10.
I've only managed to provoke it with NAMD so far, but I guess we'll hear
from users if they see it with other codes too. :-)
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
/
cgroup.event_control cgroup.procs freezer.state notify_on_release tasks
Perhaps we can put it at debug level as before as it may concern users.
If it is just cosmetic it'd be good I think.
All the best,
Chris
--
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian Life Sciences
1 - 100 of 403 matches
Mail list logo