[slurm-dev] Re: --cpu_bind=none still binds jobs to CPUs

2012-11-06 Thread Christopher Samuel
. To what extent Slurm uses that information at present I'm not sure without perusing the code further.. cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Re: --cpu_bind=none still binds jobs to CPUs

2012-11-08 Thread Christopher Samuel
suggest it'd be a good idea to get involved on hwloc-devel. cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Error message to terminal for job_submit plugin

2012-11-18 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 16/11/12 22:22, Loris Bennett wrote: Can anyone shed any light on the topic above? I'm afraid not, but a capability like the Torque submit filter would certainly be of interest to us too! cheers, Chris - -- Christopher SamuelSenior

[slurm-dev] Re: Users sync across nodes

2012-12-05 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/12/12 00:45, Marcin Stolarek wrote: Common solution for your problem is ldap. ...and a network filesystem between all nodes for home directories.. - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life

[slurm-dev] Slurm, RHEL6, cgroups and not constraining memory

2013-01-17 Thread Christopher Samuel
a lot. Any ideas? cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE

[slurm-dev] Re: Slurm, RHEL6, cgroups and not constraining memory

2013-01-21 Thread Christopher Samuel
wrinkle there being that a job script can launch N processes each of which can allocate up to RLIMIT_AS. We were hoping that Slurms cgroups support would permit limiting the memory allocated by the whole job. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian

[slurm-dev] Re: Slurm, RHEL6, cgroups and not constraining memory

2013-01-22 Thread Christopher Samuel
- -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using

[slurm-dev] Re: Slurm, RHEL6, cgroups and not constraining memory

2013-02-03 Thread Christopher Samuel
the important one). Sorry for the confusion! Here's a query from the Open Grid Engine folks last June on the same issue, being told that it's not implemented yet: https://lkml.org/lkml/2012/6/12/54 Hey ho.. All the best! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI

[slurm-dev] Re: X11 forwarding for interactive jobs?

2013-02-13 Thread Christopher Samuel
but it should work. Thanks, looks like it's well worth looking into. For the pty there is an option for srun called --pty which allows you to open a remote shell on the master computer node of the job as if you were sshing it. That's great, precisely what we needed! All the best, Chris - -- Christopher

[slurm-dev] Re: X11 forwarding for interactive jobs?

2013-02-19 Thread Christopher Samuel
backlog but I'll try and test that out soon. Thanks! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN

[slurm-dev] Re: Slurm versions 2.5.4 and 2.6.0-pre2 are now available

2013-03-10 Thread Christopher Samuel
modify the behaviour of via the SLURM Job Submit Plugin API? cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r

[slurm-dev] BGQ: How to get runjob to ignore --verbose passed to program via srun?

2013-03-20 Thread Christopher Samuel
. Is this something people have run into before, any ideas? cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev

[slurm-dev] Re: BGQ: How to get runjob to ignore --verbose passed to program via srun?

2013-03-21 Thread Christopher Samuel
- -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/884034622244/ http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev

[slurm-dev] Re: BGQ: How to get runjob to ignore --verbose passed to program via srun?

2013-03-27 Thread Christopher Samuel
, but the app does not see the option.. :-( cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP

[slurm-dev] Failed prolog killed running jobs?

2013-05-09 Thread Christopher Samuel
, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux

[slurm-dev] Re: Failed prolog killed running jobs?

2013-05-13 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/05/13 14:54, Christopher Samuel wrote: BG/Q is a special case, but whilst we try and sort out of scratch filesystem and track down the culprit do you have any suggestions on how we can prevent Slurm killing running jobs in future should

[slurm-dev] Re: slurm integration with FlexLM license manager

2013-07-02 Thread Christopher Samuel
need (or to tell your users to use something else, depending on how BOFH'ish you are feeling and how big your budget is). All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61

[slurm-dev] RLIMIT_DATA effectively a no-op on Linux

2013-07-18 Thread Christopher Samuel
the intention of RLIMIT_DATA) or would it be better off as a configuration parameter? All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au

[slurm-dev] Re: RLIMIT_DATA effectively a no-op on Linux

2013-07-23 Thread Christopher Samuel
by VSizeFactor to set RLIMIT_AS (see the slurm.conf man page). This means it won't be enforced unless you set that to a non-default value. Thanks again for the explanation, very much appreciated! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences

[slurm-dev] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-07-23 Thread Christopher Samuel
any ideas? cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG

[slurm-dev] Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-07-23 Thread Christopher Samuel
backported to OMPI 1.6.x ? All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE

[slurm-dev] slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 23/07/13 17:06, Christopher Samuel wrote: Bringing up a new IBM SandyBridge cluster I'm running a NAMD test case and noticed that if I run it with srun rather than mpirun it goes over 20% slower. Following on from this issue, we've found

[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/08/13 16:19, Christopher Samuel wrote: Anyone seen anything similar, or any ideas on what could be going on? Sorry, this was with: # ACCOUNTING JobAcctGatherType=jobacct_gather/linux JobAcctGatherFrequency=30 Since those initial tests

[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel
memory per tasks, but I'm not sure I understand how that could lead to Slurm thinking the job is using vastly more memory than it actually is though. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam

[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel
to be accurate. This also goes for handling memory limits. Thanks, and I understand why that is, it's just a shame that the performance penalty for using srun with Open-MPI makes it unusable. :-( cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences

[slurm-dev] showq wrapper for Slurm?

2013-08-11 Thread Christopher Samuel
on the list before about having their own copies but I've not found any available yet. Before we go trying to (re?)inventing the wheel, does anyone know of any already publicly available out there? All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life

[slurm-dev] Re: showq wrapper for Slurm?

2013-08-11 Thread Christopher Samuel
to Karl for sending me their version, it works nicely. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com

[slurm-dev] Re: X11 forwarding for interactive jobs?

2013-08-12 Thread Christopher Samuel
-test ~]$ echo $DISPLAY localhost:10.0 My x11.conf file has just: optional /usr/local/slurm/plugins/x11/x11.so I also knocked up a pretty basic Makefile for this as it seemed to entirely rely on a spec file if you're interested? All the best, Chris - -- Christopher SamuelSenior

[slurm-dev] Re: X11 forwarding for interactive jobs?

2013-08-12 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/08/13 16:16, Christopher Samuel wrote: Finally gotten a system I can try this on, but I think I must be missing something It was a quoting problem in the Makefile I derived from the RPM spec file, it now works really nicely, thanks! All

[slurm-dev] Re: AllowGroups broken in Slurm 2.6.0?

2013-08-18 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 19/08/13 12:02, Christopher Samuel wrote: Chasing through the code it looks like getgrnam_r() fails in get_group_members() in src/slurmctld/groups.c on our Slurm 2.6 boxes (on RHEL 6.4) Scratch that, restarting slurmctld doesn't provoke

[slurm-dev] Re: AllowGroups broken in Slurm 2.6.0?

2013-08-19 Thread Christopher Samuel
very much.. Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG

[slurm-dev] Re: Slurm User Group Meeting and New releases: v2.6.1, v13.12.0-pre1

2013-08-19 Thread Christopher Samuel
. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG

[slurm-dev] Re: [OMPI devel] slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-19 Thread Christopher Samuel
the jobacct_gather/cgroup plugin will give better numbers once it's had more work. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http

[slurm-dev] Slurm 2.6.0 MIC GRES support does not set OFFLOAD_DEVICES=-1 if no GRES requested

2013-08-21 Thread Christopher Samuel
. that doesn't appear to happen (OFFLOAD_DEVICES is not set) and I don't see any evidence of code to do that in the current slurm-2.6 branch. Is it an oversight, or am I missing something? Currently I'm using a taskprolog to set it to -1 if it's absent. All the best, Chris - -- Christopher Samuel

[slurm-dev] OFFLOAD_DEVICES not set during prolog, how to find which MICs are allocated then?

2013-08-22 Thread Christopher Samuel
not appear to be set during the prolog. Any ideas how the prolog can determine which Phi card(s) have been allocated to the job? scontrol show job will only show whether 1 or more have been requested. Thanks! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian

[slurm-dev] GRES job over 2 nodes fails saying nodes are busy, but OK with no GRES

2013-08-23 Thread Christopher Samuel
- -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux

[slurm-dev] Re: GRES job over 2 nodes fails saying nodes are busy, but OK with no GRES

2013-08-25 Thread Christopher Samuel
SLURM_EXCLUSIVE=1 Is there any documentation explaining why these are required, or is this a bug? All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Fwd: Slurmdbd tables and Galera (clustered MariaDB)

2013-09-17 Thread Christopher Samuel
Subject: Slurmdbd tables Date: Tue, 17 Sep 2013 09:11:50 +1000 From: Brett Pemberton b...@unimelb.edu.au To: Christopher Samuel sam...@unimelb.edu.au Chris, The situation: We need all tables to have primary keys defined (for galera replication). However slurm has two tables per cluster

[slurm-dev] Re: special job error state

2013-09-19 Thread Christopher Samuel
, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux

[slurm-dev] Re: special job error state

2013-09-19 Thread Christopher Samuel
sites would have conniptions if users were able to take nodes out at random. ;-) cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au

[slurm-dev] Re: special job error state

2013-09-21 Thread Christopher Samuel
release and then scontrol requeue it will just start again. Moe, et. al, how easy would it be to have some form of: scontrol requeue --hold $JOBID ? cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam

[slurm-dev] Re: Can't start slurmd daemons

2013-09-21 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 19/09/13 16:29, Arjun J Rao wrote: I installed SLURM on a Scientific Linux 6.4 system (64bit) Did you install from source, or from RPMs? - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation

[slurm-dev] Re: Limiting number of CPUs per Job

2013-09-21 Thread Christopher Samuel
that redirected jobs of larger than 1 core to other partitions and jobs of just 1 core to the serial partition is likely to be the best way for now. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam

[slurm-dev] Bug in Slurm time=days-hours:minutes parsing?

2013-10-01 Thread Christopher Samuel
for time_str2secs() in src/common/parse_time.c but as a sysadmin rather than a programmer I'm struggling to follow it, let alone see where the bug is. ;-) I'm pretty sure this is a bug, but I'd appreciate knowing if I've missed something! All the best, Chris - -- Christopher SamuelSenior

[slurm-dev] Re: Bug in Slurm time=days-hours:minutes parsing?

2013-10-03 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/10/13 06:42, David Bigagli wrote: Sounds good. :-) Thanks for the patch it is going to be in Slurm 2.6.3. Thanks for that David. - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation

[slurm-dev] Re: Interactive Jobs Not Launching Under High Load

2013-10-28 Thread Christopher Samuel
the Torque developers solved that issue (user discussion was on torqueusers, dev stuff was on torquedev). cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Re: Failed to contact primary controller : No route to host

2013-11-05 Thread Christopher Samuel
need to check that. Good luck! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE

[slurm-dev] Re: Failed to contact primary controller : No route to host

2013-11-06 Thread Christopher Samuel
the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU

[slurm-dev] Re: Which branch of SLURM should I start with?

2013-11-12 Thread Christopher Samuel
probably want to base any development on that. cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Which branch of SLURM should I start with?

2013-11-12 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 13/11/13 11:43, Leith Bade wrote: OK that will likely be the best. I assume I will need to install master on the nodes too if the data structures have changed. You'll want the same everywhere. Good luck! Chrsi - -- Christopher Samuel

[slurm-dev] Re: SLURM: Issue with exporting environmental variables on bg-q

2013-11-28 Thread Christopher Samuel
whinges that /bin/hostname doesn't have the magic ELF header to say it's been built for the CNK, but it does try and execute it. Best of luck! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone

[slurm-dev] Mixing CR_Core and CR_Socket ?

2013-12-16 Thread Christopher Samuel
is another story. :-) All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE

[slurm-dev] Re: Compiling on Mac OS X 10.6.8

2013-12-17 Thread Christopher Samuel
should download a Linux distro and install that instead - you'll save yourself a lot of pain. Best of luck, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Changes in Linux Kernel Control Group APIs

2013-12-22 Thread Christopher Samuel
you can see this article: https://lwn.net/Articles/574317/ and an article about the cgmanager project, and the systemd developers unwillingness to cooperate on a single API, is here: https://lwn.net/Articles/575672/ All the best, Chris - -- Christopher SamuelSenior Systems Administrator

[slurm-dev] Slurm, cgroups and user SSH sessions

2014-01-01 Thread Christopher Samuel
level, though they'd only be affecting their own stuff. Thoughts? All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http

[slurm-dev] Re: No standard output

2014-01-05 Thread Christopher Samuel
there are no data staging directives for sbatch to copy data onto and off a node before/after a job (unlike Torque). All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

[slurm-dev] Odd quota bug - used more than allocated (and more than run)

2014-01-07 Thread Christopher Samuel
brave enough to restart those with running jobs on them.. ;-) To me it's reminiscent of this bug: http://bugs.schedmd.com/show_bug.cgi?id=392 except our problem persists across a slurmctld (and slurmdbd) restart. :-( Any ideas? cheers, Chris - -- Christopher SamuelSenior Systems

[slurm-dev] Re: Odd quota bug - used more than allocated (and more than run)

2014-01-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 08/01/14 11:03, Christopher Samuel wrote: Here's the data we have with all numbers normalised to hours: GrpCPUMins 71200 CPURunMins 58348 Raw Usage 12967 So that means GrpCPUMins-CPURunMins-RawUsage = -115 hours. After a bit more

[slurm-dev] Re: recording sbatch options

2014-01-12 Thread Christopher Samuel
- -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux

[slurm-dev] Re: error: Node xxxxx appears to have a different slurm.conf than the slurmctld

2014-01-12 Thread Christopher Samuel
sure the compute nodes re-read slurm.conf so everyone is on the same page again. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au

[slurm-dev] Re: Memory Usage Fairshare

2014-01-15 Thread Christopher Samuel
- -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using

[slurm-dev] Re: Fwd: Slurmdbd tables and Galera (clustered MariaDB)

2014-01-16 Thread Christopher Samuel
of our folks are off to a Galera training course soon, so hopefully we can finish the transition soon. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Re: Memory Usage Fairshare

2014-01-16 Thread Christopher Samuel
, I'd not seen that before. I think for us with a mix of single CPU large memory jobs and MPI jobs with low per-core use we could end up reducing utilisation and throughput unnecessarily, but it could well work for others. cheers! Chris - -- Christopher SamuelSenior Systems Administrator

[slurm-dev] Re: Interactive shells

2014-01-20 Thread Christopher Samuel
with: [root@merri-m ~]# cat /usr/local/bin/sinteractive #!/bin/bash exec srun $* --pty -u ${SHELL} -i -l Hope this helps! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

[slurm-dev] Re: Large memory jobs block small memory job in same partition

2014-02-04 Thread Christopher Samuel
Weight=10 NodeName=barcoo[062-070] NodeAddr=barcoo[062-070] RealMemory=25 Gres=mic:2 Weight=100 cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Reserving a node for short jobs between specific hours

2014-02-13 Thread Christopher Samuel
partitions at 8am and 8pm and bring that partition up/down as appropriate. My reading of the manual page isn't encouraging, but I could just be not seeing the wood for the trees. Any ideas please? All the best! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian

[slurm-dev] Slurm daemon on Xeon Phi cards?

2014-02-17 Thread Christopher Samuel
standard Slurm going too. Nothing like having lofty goals! All the best! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http

[slurm-dev] Re: Slurm daemon on Xeon Phi cards?

2014-02-17 Thread Christopher Samuel
that won't help the case where you may want to treat them as offload devices as well and have Slurm intelligently juggle them so they don't get overcommitted, which is what would be ideal. cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences

[slurm-dev] Is MaxJobs enforced for associations?

2014-03-02 Thread Christopher Samuel
I'm missing, or is this a bug? All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP

[slurm-dev] Re: Is MaxJobs enforced for associations?

2014-03-02 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/03/14 12:01, Christopher Samuel wrote: and I can see the expected values when I do: # sacctmgr show assoc where cluster=barcoo account=foo but it doesn't appear to have any effect. I've tried restarting slurmctld on the node just

[slurm-dev] Re: slurm-dev slurm behaves differently than a serial app [was Re: openmpi misbehaves when started under slurm]

2014-03-03 Thread Christopher Samuel
=NONE That means we can have different, appropriate, limits on both login and compute nodes. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Re: Is MaxJobs enforced for associations?

2014-03-03 Thread Christopher Samuel
showq to treat those as blocked and hide them from other users. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http

[slurm-dev] Re: slurmd crashed on *some* nodes after scontrol reconfigure

2014-03-13 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/03/14 06:20, Andy Riebs wrote: Has anyone seen this before? slurm.conf is on an NFS server, so it's possible we've got a configuration error there. We've seen this same problem too, lost a heap of jobs to it. :-( - -- Christopher Samuel

[slurm-dev] RE: error: We have more allocated time than is possible...

2014-03-16 Thread Christopher Samuel
clusters which have all only run 2.6.x. The University has had some occasional network and DNS issues which have meant that our slurmdbd has had transient issues talking to our database server, so I don't know if that could be the trigger for us. All the best, Chris - -- Christopher Samuel

[slurm-dev] Re: Slurm version 14.03.0 is now available

2014-03-26 Thread Christopher Samuel
.. :-) Secondly will a 14.03 slurmctld happily talk to (drained) 2.6.x slurmd's running jobs so we can do a rolling upgrade? Or will we need to drain the entire cluster of running jobs first and then upgrade them in a single hit? All the best, Chris - -- Christopher SamuelSenior Systems

[slurm-dev] Re: Slurm version 14.03.0 is now available

2014-03-30 Thread Christopher Samuel
(especially on BG/Q). A setting that let you change that behaviour to mark nodes as draining rather than down would be very handy. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone

[slurm-dev] Re: Slurm version 14.03.0 is now available

2014-03-31 Thread Christopher Samuel
sorts of things for us, so I think we'd rather just have the Slurm not kill jobs unless we (or the user) tells it to. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903

[slurm-dev] Re: sacct access for end users

2014-03-31 Thread Christopher Samuel
only query their own jobs and they can do that from any of our 4 HPC systems. Take a look at: http://slurm.schedmd.com/accounting.html as a starting point. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative

[slurm-dev] Re: Slurm version 14.03.0 is now available

2014-03-31 Thread Christopher Samuel
Moe! - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux

[slurm-dev] Re: Information about older jobs

2014-04-06 Thread Christopher Samuel
: [samuel@barcoo ~]$ sacct -j 1093417 -o jobid,nodelist JobIDNodeList - --- 1093417barcoo010 And showing it's not there by default: [samuel@barcoo ~]$ sacct -j 1093417 -l | fgrep -i NodeList [samuel@barcoo ~]$ Hope that helps! Chris - -- Christopher

[slurm-dev] Guidance on planning a slurmdbd outage

2014-04-07 Thread Christopher Samuel
not a great deal but it'd be good to have an idea how long we could safely run without a slurmdbd to talk to. thanks! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Re: Guidance on planning a slurmdbd outage

2014-04-08 Thread Christopher Samuel
assume you are not using 32MB of RAM. 32MB should be enough for anyone.. ;-) All the best! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au

[slurm-dev] Anyone using slurmdbd with MySQL 5.6 ?

2014-04-30 Thread Christopher Samuel
MySQL 5.6? All the best! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version

[slurm-dev] Re: why job status is always CG?

2014-05-11 Thread Christopher Samuel
: error: Invalid job_id job says: error: Invalid job_id job How can I cancel this job? You can't as it is no longer running, so cancelling it doesn't make sense. Hope that helps, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation

[slurm-dev] Re: why job status is always CG?

2014-05-12 Thread Christopher Samuel
into the background (like a daemon would)? Even then I'm not sure that srun would help you. We have users running batch scripts all the time on x86-64 without using srun, it's never been an issue for us. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI

[slurm-dev] Re: PMI2 related error

2014-05-20 Thread Christopher Samuel
proposing # that we turn PMI-2 off when under Slurm unless the user # specifically requests we use it. Not sure if this has been raised on slurm-dev yet. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email

[slurm-dev] Re: pbsdsh -u equivalent

2014-06-30 Thread Christopher Samuel
to the manual page the --ntasks-per-node=1 option for srun should do what you want. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au

[slurm-dev] Re: pbsdsh -u equivalent

2014-07-02 Thread Christopher Samuel
to 1 Submitted batch job 1856638 A distributed job (MPI for instance) must have at least one task on every node for this to make sense. All the best, Chris - -- Christopher SamuelSenior Systems

[slurm-dev] Re: Documentation enhancement request- mysql

2014-07-03 Thread Christopher Samuel
when creating MySQL tables, that way you don't need to remember. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http

[slurm-dev] Re: Documentation enhancement request- mysql

2014-07-03 Thread Christopher Samuel
engine is # unavailable, enable the NO_ENGINE_SUBSTITUTION SQL mode. If # the desired engine is unavailable, this setting produces an # error instead of a warning, and the table is not created or # altered. See Section 5.1.7, “Server SQL Modes”. cheers! Chris - -- Christopher SamuelSenior

[slurm-dev] Re: Documentation enhancement request- mysql

2014-07-08 Thread Christopher Samuel
if it isn't available. Wonderful, thanks so much! - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP

[slurm-dev] Re: heterogeneus number of processors per node, slurm wont use all processors

2014-07-22 Thread Christopher Samuel
(and no node specification) not work for that? cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: heterogeneus number of processors per node, slurm wont use all processors

2014-07-27 Thread Christopher Samuel
and find a time to create a test partition there to check we see the same. We're (currently) on 2.6.5, what version are you on? cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61

[slurm-dev] Re: Intel MPI Performance inconsistency (and workaround)

2014-08-21 Thread Christopher Samuel
On 22/08/14 04:43, Jesse Stroik wrote: We recently noticed sporadic performance inconsistencies on one of our clusters. What distro is this? Are you using cgroups? cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation

[slurm-dev] Re: starting slurmd only after GPUs are fully initialized

2014-08-31 Thread Christopher Samuel
to go and find out why before we let it back into the cluster. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http

[slurm-dev] Re: Unsubscribe

2014-09-03 Thread Christopher Samuel
Unfortunately it's not clear what you do from there (there is no unsubscribe link), but if you click Profile Home from the top of that page it takes you to: http://lists.schedmd.com/cgi-bin/dada/mail.cgi/profile/ and you'll see Unsubscribe from this list links there. Good luck! Chris -- Christopher

[slurm-dev] Re: Requested node configuration is not available when using -c

2014-09-08 Thread Christopher Samuel
2 cores to a single job if I can dedicate each core individually to a job. What does scontrol show node say? cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

[slurm-dev] Re: Requested node configuration is not available when using -c

2014-09-10 Thread Christopher Samuel
out of individual nodes but they are pretty rare. One company that does this is ScaleMP. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Re: Requested node configuration is not available when using -c

2014-09-10 Thread Christopher Samuel
On 11/09/14 10:56, Christopher Samuel wrote: This is an operating system kernel issue, not a queuing system issue, so Slurm, LSF or Torque will all have the same issue. To clarify the Linux kernel can support these sorts of systems, but you'll either need extra supporting software

[slurm-dev] Re: undelivered output

2014-09-10 Thread Christopher Samuel
and then copied back at the end of the job. Slurm doesn't do that, it defaults to writing the output (and errors) to a file in the directory a job is running in, so there's no need to copy it back at the end of the job. Hope that helps, Chris -- Christopher SamuelSenior Systems Administrator VLSCI

  1   2   3   4   5   >