[slurm-dev] Re: --cpu_bind=none still binds jobs to CPUs

2012-11-06 Thread Christopher Samuel
. To what extent Slurm uses that information at present I'm not sure without perusing the code further.. cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Re: --cpu_bind=none still binds jobs to CPUs

2012-11-08 Thread Christopher Samuel
suggest it'd be a good idea to get involved on hwloc-devel. cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Slurm, RHEL6, cgroups and not constraining memory

2013-01-21 Thread Christopher Samuel
wrinkle there being that a job script can launch N processes each of which can allocate up to RLIMIT_AS. We were hoping that Slurms cgroups support would permit limiting the memory allocated by the whole job. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian

[slurm-dev] Re: Slurm, RHEL6, cgroups and not constraining memory

2013-01-22 Thread Christopher Samuel
- -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using

[slurm-dev] Re: X11 forwarding for interactive jobs?

2013-02-13 Thread Christopher Samuel
but it should work. Thanks, looks like it's well worth looking into. For the pty there is an option for srun called --pty which allows you to open a remote shell on the master computer node of the job as if you were sshing it. That's great, precisely what we needed! All the best, Chris - -- Christopher

[slurm-dev] Re: X11 forwarding for interactive jobs?

2013-02-19 Thread Christopher Samuel
backlog but I'll try and test that out soon. Thanks! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN

[slurm-dev] Re: Slurm versions 2.5.4 and 2.6.0-pre2 are now available

2013-03-10 Thread Christopher Samuel
modify the behaviour of via the SLURM Job Submit Plugin API? cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r

[slurm-dev] BGQ: How to get runjob to ignore --verbose passed to program via srun?

2013-03-20 Thread Christopher Samuel
. Is this something people have run into before, any ideas? cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev

[slurm-dev] Re: BGQ: How to get runjob to ignore --verbose passed to program via srun?

2013-03-21 Thread Christopher Samuel
- -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/884034622244/ http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev

[slurm-dev] Re: BGQ: How to get runjob to ignore --verbose passed to program via srun?

2013-03-27 Thread Christopher Samuel
, but the app does not see the option.. :-( cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP

[slurm-dev] Failed prolog killed running jobs?

2013-05-09 Thread Christopher Samuel
, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux

[slurm-dev] Re: slurm integration with FlexLM license manager

2013-07-02 Thread Christopher Samuel
need (or to tell your users to use something else, depending on how BOFH'ish you are feeling and how big your budget is). All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61

[slurm-dev] RLIMIT_DATA effectively a no-op on Linux

2013-07-18 Thread Christopher Samuel
the intention of RLIMIT_DATA) or would it be better off as a configuration parameter? All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au

[slurm-dev] Re: RLIMIT_DATA effectively a no-op on Linux

2013-07-23 Thread Christopher Samuel
by VSizeFactor to set RLIMIT_AS (see the slurm.conf man page). This means it won't be enforced unless you set that to a non-default value. Thanks again for the explanation, very much appreciated! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences

[slurm-dev] Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-07-23 Thread Christopher Samuel
backported to OMPI 1.6.x ? All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE

[slurm-dev] slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 23/07/13 17:06, Christopher Samuel wrote: Bringing up a new IBM SandyBridge cluster I'm running a NAMD test case and noticed that if I run it with srun rather than mpirun it goes over 20% slower. Following on from this issue, we've found

[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/08/13 16:19, Christopher Samuel wrote: Anyone seen anything similar, or any ideas on what could be going on? Sorry, this was with: # ACCOUNTING JobAcctGatherType=jobacct_gather/linux JobAcctGatherFrequency=30 Since those initial tests

[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel
memory per tasks, but I'm not sure I understand how that could lead to Slurm thinking the job is using vastly more memory than it actually is though. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam

[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel
to be accurate. This also goes for handling memory limits. Thanks, and I understand why that is, it's just a shame that the performance penalty for using srun with Open-MPI makes it unusable. :-( cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences

[slurm-dev] showq wrapper for Slurm?

2013-08-11 Thread Christopher Samuel
on the list before about having their own copies but I've not found any available yet. Before we go trying to (re?)inventing the wheel, does anyone know of any already publicly available out there? All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life

[slurm-dev] Re: showq wrapper for Slurm?

2013-08-11 Thread Christopher Samuel
to Karl for sending me their version, it works nicely. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com

[slurm-dev] Re: X11 forwarding for interactive jobs?

2013-08-12 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/08/13 16:16, Christopher Samuel wrote: Finally gotten a system I can try this on, but I think I must be missing something It was a quoting problem in the Makefile I derived from the RPM spec file, it now works really nicely, thanks! All

[slurm-dev] Re: AllowGroups broken in Slurm 2.6.0?

2013-08-18 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 19/08/13 12:02, Christopher Samuel wrote: Chasing through the code it looks like getgrnam_r() fails in get_group_members() in src/slurmctld/groups.c on our Slurm 2.6 boxes (on RHEL 6.4) Scratch that, restarting slurmctld doesn't provoke

[slurm-dev] Re: AllowGroups broken in Slurm 2.6.0?

2013-08-19 Thread Christopher Samuel
very much.. Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG

[slurm-dev] Re: [OMPI devel] slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-19 Thread Christopher Samuel
the jobacct_gather/cgroup plugin will give better numbers once it's had more work. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http

[slurm-dev] Slurm 2.6.0 MIC GRES support does not set OFFLOAD_DEVICES=-1 if no GRES requested

2013-08-21 Thread Christopher Samuel
. that doesn't appear to happen (OFFLOAD_DEVICES is not set) and I don't see any evidence of code to do that in the current slurm-2.6 branch. Is it an oversight, or am I missing something? Currently I'm using a taskprolog to set it to -1 if it's absent. All the best, Chris - -- Christopher Samuel

[slurm-dev] GRES job over 2 nodes fails saying nodes are busy, but OK with no GRES

2013-08-23 Thread Christopher Samuel
- -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux

[slurm-dev] Re: GRES job over 2 nodes fails saying nodes are busy, but OK with no GRES

2013-08-25 Thread Christopher Samuel
SLURM_EXCLUSIVE=1 Is there any documentation explaining why these are required, or is this a bug? All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Fwd: Slurmdbd tables and Galera (clustered MariaDB)

2013-09-17 Thread Christopher Samuel
Subject: Slurmdbd tables Date: Tue, 17 Sep 2013 09:11:50 +1000 From: Brett Pemberton b...@unimelb.edu.au To: Christopher Samuel sam...@unimelb.edu.au Chris, The situation: We need all tables to have primary keys defined (for galera replication). However slurm has two tables per cluster

[slurm-dev] Re: special job error state

2013-09-19 Thread Christopher Samuel
, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux

[slurm-dev] Re: special job error state

2013-09-19 Thread Christopher Samuel
sites would have conniptions if users were able to take nodes out at random. ;-) cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au

[slurm-dev] Re: special job error state

2013-09-21 Thread Christopher Samuel
release and then scontrol requeue it will just start again. Moe, et. al, how easy would it be to have some form of: scontrol requeue --hold $JOBID ? cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam

[slurm-dev] Re: Can't start slurmd daemons

2013-09-21 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 19/09/13 16:29, Arjun J Rao wrote: I installed SLURM on a Scientific Linux 6.4 system (64bit) Did you install from source, or from RPMs? - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation

[slurm-dev] Re: Limiting number of CPUs per Job

2013-09-21 Thread Christopher Samuel
that redirected jobs of larger than 1 core to other partitions and jobs of just 1 core to the serial partition is likely to be the best way for now. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam

[slurm-dev] Re: Bug in Slurm time=days-hours:minutes parsing?

2013-10-03 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/10/13 06:42, David Bigagli wrote: Sounds good. :-) Thanks for the patch it is going to be in Slurm 2.6.3. Thanks for that David. - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation

[slurm-dev] Re: Interactive Jobs Not Launching Under High Load

2013-10-28 Thread Christopher Samuel
the Torque developers solved that issue (user discussion was on torqueusers, dev stuff was on torquedev). cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Re: Failed to contact primary controller : No route to host

2013-11-05 Thread Christopher Samuel
need to check that. Good luck! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE

[slurm-dev] Re: Failed to contact primary controller : No route to host

2013-11-06 Thread Christopher Samuel
the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU

[slurm-dev] Re: Which branch of SLURM should I start with?

2013-11-12 Thread Christopher Samuel
probably want to base any development on that. cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: SLURM: Issue with exporting environmental variables on bg-q

2013-11-28 Thread Christopher Samuel
whinges that /bin/hostname doesn't have the magic ELF header to say it's been built for the CNK, but it does try and execute it. Best of luck! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone

[slurm-dev] Mixing CR_Core and CR_Socket ?

2013-12-16 Thread Christopher Samuel
is another story. :-) All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE

[slurm-dev] Re: Compiling on Mac OS X 10.6.8

2013-12-17 Thread Christopher Samuel
should download a Linux distro and install that instead - you'll save yourself a lot of pain. Best of luck, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Changes in Linux Kernel Control Group APIs

2013-12-22 Thread Christopher Samuel
you can see this article: https://lwn.net/Articles/574317/ and an article about the cgmanager project, and the systemd developers unwillingness to cooperate on a single API, is here: https://lwn.net/Articles/575672/ All the best, Chris - -- Christopher SamuelSenior Systems Administrator

[slurm-dev] Slurm, cgroups and user SSH sessions

2014-01-01 Thread Christopher Samuel
level, though they'd only be affecting their own stuff. Thoughts? All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http

[slurm-dev] Re: No standard output

2014-01-05 Thread Christopher Samuel
there are no data staging directives for sbatch to copy data onto and off a node before/after a job (unlike Torque). All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

[slurm-dev] Odd quota bug - used more than allocated (and more than run)

2014-01-07 Thread Christopher Samuel
brave enough to restart those with running jobs on them.. ;-) To me it's reminiscent of this bug: http://bugs.schedmd.com/show_bug.cgi?id=392 except our problem persists across a slurmctld (and slurmdbd) restart. :-( Any ideas? cheers, Chris - -- Christopher SamuelSenior Systems

[slurm-dev] Re: Odd quota bug - used more than allocated (and more than run)

2014-01-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 08/01/14 11:03, Christopher Samuel wrote: Here's the data we have with all numbers normalised to hours: GrpCPUMins 71200 CPURunMins 58348 Raw Usage 12967 So that means GrpCPUMins-CPURunMins-RawUsage = -115 hours. After a bit more

[slurm-dev] Re: recording sbatch options

2014-01-12 Thread Christopher Samuel
- -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux

[slurm-dev] Re: error: Node xxxxx appears to have a different slurm.conf than the slurmctld

2014-01-12 Thread Christopher Samuel
sure the compute nodes re-read slurm.conf so everyone is on the same page again. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au

[slurm-dev] Re: Memory Usage Fairshare

2014-01-15 Thread Christopher Samuel
- -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using

[slurm-dev] Re: Interactive shells

2014-01-20 Thread Christopher Samuel
with: [root@merri-m ~]# cat /usr/local/bin/sinteractive #!/bin/bash exec srun $* --pty -u ${SHELL} -i -l Hope this helps! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

[slurm-dev] Re: Large memory jobs block small memory job in same partition

2014-02-04 Thread Christopher Samuel
Weight=10 NodeName=barcoo[062-070] NodeAddr=barcoo[062-070] RealMemory=25 Gres=mic:2 Weight=100 cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Reserving a node for short jobs between specific hours

2014-02-13 Thread Christopher Samuel
partitions at 8am and 8pm and bring that partition up/down as appropriate. My reading of the manual page isn't encouraging, but I could just be not seeing the wood for the trees. Any ideas please? All the best! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian

[slurm-dev] Slurm daemon on Xeon Phi cards?

2014-02-17 Thread Christopher Samuel
standard Slurm going too. Nothing like having lofty goals! All the best! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http

[slurm-dev] Is MaxJobs enforced for associations?

2014-03-02 Thread Christopher Samuel
I'm missing, or is this a bug? All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP

[slurm-dev] Re: Is MaxJobs enforced for associations?

2014-03-02 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/03/14 12:01, Christopher Samuel wrote: and I can see the expected values when I do: # sacctmgr show assoc where cluster=barcoo account=foo but it doesn't appear to have any effect. I've tried restarting slurmctld on the node just

[slurm-dev] Re: Is MaxJobs enforced for associations?

2014-03-03 Thread Christopher Samuel
showq to treat those as blocked and hide them from other users. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http

[slurm-dev] Re: slurmd crashed on *some* nodes after scontrol reconfigure

2014-03-13 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/03/14 06:20, Andy Riebs wrote: Has anyone seen this before? slurm.conf is on an NFS server, so it's possible we've got a configuration error there. We've seen this same problem too, lost a heap of jobs to it. :-( - -- Christopher Samuel

[slurm-dev] RE: error: We have more allocated time than is possible...

2014-03-16 Thread Christopher Samuel
clusters which have all only run 2.6.x. The University has had some occasional network and DNS issues which have meant that our slurmdbd has had transient issues talking to our database server, so I don't know if that could be the trigger for us. All the best, Chris - -- Christopher Samuel

[slurm-dev] Re: Slurm version 14.03.0 is now available

2014-03-26 Thread Christopher Samuel
.. :-) Secondly will a 14.03 slurmctld happily talk to (drained) 2.6.x slurmd's running jobs so we can do a rolling upgrade? Or will we need to drain the entire cluster of running jobs first and then upgrade them in a single hit? All the best, Chris - -- Christopher SamuelSenior Systems

[slurm-dev] Re: Slurm version 14.03.0 is now available

2014-03-30 Thread Christopher Samuel
(especially on BG/Q). A setting that let you change that behaviour to mark nodes as draining rather than down would be very handy. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone

[slurm-dev] Re: Slurm version 14.03.0 is now available

2014-03-31 Thread Christopher Samuel
sorts of things for us, so I think we'd rather just have the Slurm not kill jobs unless we (or the user) tells it to. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903

[slurm-dev] Re: sacct access for end users

2014-03-31 Thread Christopher Samuel
only query their own jobs and they can do that from any of our 4 HPC systems. Take a look at: http://slurm.schedmd.com/accounting.html as a starting point. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative

[slurm-dev] Re: Slurm version 14.03.0 is now available

2014-03-31 Thread Christopher Samuel
Moe! - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux

[slurm-dev] Re: Information about older jobs

2014-04-06 Thread Christopher Samuel
: [samuel@barcoo ~]$ sacct -j 1093417 -o jobid,nodelist JobIDNodeList - --- 1093417barcoo010 And showing it's not there by default: [samuel@barcoo ~]$ sacct -j 1093417 -l | fgrep -i NodeList [samuel@barcoo ~]$ Hope that helps! Chris - -- Christopher

[slurm-dev] Guidance on planning a slurmdbd outage

2014-04-07 Thread Christopher Samuel
not a great deal but it'd be good to have an idea how long we could safely run without a slurmdbd to talk to. thanks! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Re: Guidance on planning a slurmdbd outage

2014-04-08 Thread Christopher Samuel
assume you are not using 32MB of RAM. 32MB should be enough for anyone.. ;-) All the best! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au

[slurm-dev] Anyone using slurmdbd with MySQL 5.6 ?

2014-04-30 Thread Christopher Samuel
MySQL 5.6? All the best! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version

[slurm-dev] Re: why job status is always CG?

2014-05-11 Thread Christopher Samuel
: error: Invalid job_id job says: error: Invalid job_id job How can I cancel this job? You can't as it is no longer running, so cancelling it doesn't make sense. Hope that helps, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation

[slurm-dev] Re: why job status is always CG?

2014-05-12 Thread Christopher Samuel
into the background (like a daemon would)? Even then I'm not sure that srun would help you. We have users running batch scripts all the time on x86-64 without using srun, it's never been an issue for us. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI

[slurm-dev] Re: PMI2 related error

2014-05-20 Thread Christopher Samuel
proposing # that we turn PMI-2 off when under Slurm unless the user # specifically requests we use it. Not sure if this has been raised on slurm-dev yet. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email

[slurm-dev] Re: pbsdsh -u equivalent

2014-06-30 Thread Christopher Samuel
to the manual page the --ntasks-per-node=1 option for srun should do what you want. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au

[slurm-dev] Re: pbsdsh -u equivalent

2014-07-02 Thread Christopher Samuel
to 1 Submitted batch job 1856638 A distributed job (MPI for instance) must have at least one task on every node for this to make sense. All the best, Chris - -- Christopher SamuelSenior Systems

[slurm-dev] Re: Documentation enhancement request- mysql

2014-07-03 Thread Christopher Samuel
when creating MySQL tables, that way you don't need to remember. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http

[slurm-dev] Re: Documentation enhancement request- mysql

2014-07-08 Thread Christopher Samuel
if it isn't available. Wonderful, thanks so much! - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP

[slurm-dev] Re: heterogeneus number of processors per node, slurm wont use all processors

2014-07-22 Thread Christopher Samuel
(and no node specification) not work for that? cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: heterogeneus number of processors per node, slurm wont use all processors

2014-07-27 Thread Christopher Samuel
and find a time to create a test partition there to check we see the same. We're (currently) on 2.6.5, what version are you on? cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61

[slurm-dev] Re: Intel MPI Performance inconsistency (and workaround)

2014-08-21 Thread Christopher Samuel
On 22/08/14 04:43, Jesse Stroik wrote: We recently noticed sporadic performance inconsistencies on one of our clusters. What distro is this? Are you using cgroups? cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation

[slurm-dev] Re: starting slurmd only after GPUs are fully initialized

2014-08-31 Thread Christopher Samuel
to go and find out why before we let it back into the cluster. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http

[slurm-dev] Re: Unsubscribe

2014-09-03 Thread Christopher Samuel
Unfortunately it's not clear what you do from there (there is no unsubscribe link), but if you click Profile Home from the top of that page it takes you to: http://lists.schedmd.com/cgi-bin/dada/mail.cgi/profile/ and you'll see Unsubscribe from this list links there. Good luck! Chris -- Christopher

[slurm-dev] Re: Requested node configuration is not available when using -c

2014-09-08 Thread Christopher Samuel
2 cores to a single job if I can dedicate each core individually to a job. What does scontrol show node say? cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

[slurm-dev] Re: Requested node configuration is not available when using -c

2014-09-10 Thread Christopher Samuel
out of individual nodes but they are pretty rare. One company that does this is ScaleMP. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Re: Requested node configuration is not available when using -c

2014-09-10 Thread Christopher Samuel
On 11/09/14 10:56, Christopher Samuel wrote: This is an operating system kernel issue, not a queuing system issue, so Slurm, LSF or Torque will all have the same issue. To clarify the Linux kernel can support these sorts of systems, but you'll either need extra supporting software

[slurm-dev] Re: undelivered output

2014-09-10 Thread Christopher Samuel
and then copied back at the end of the job. Slurm doesn't do that, it defaults to writing the output (and errors) to a file in the directory a job is running in, so there's no need to copy it back at the end of the job. Hope that helps, Chris -- Christopher SamuelSenior Systems Administrator VLSCI

[slurm-dev] Re: Override memory limits with --exclusive?

2014-09-18 Thread Christopher Samuel
it was triggered. Hope this helps! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: overcounting of SysV shared memory segments?

2014-09-19 Thread Christopher Samuel
in recorded memory usage between the two. Might take a while to have something to report though. :-) All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Using control groups to restrict resource usage on BG/Q launch node?

2014-09-22 Thread Christopher Samuel
. cheers! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: shellshock patch uses a different function export, caused some errors on our Slurm cluster

2014-09-28 Thread Christopher Samuel
the login nodes. 8) Set partitions back to up to start jobs going again. Hope this helps folks.. cheers! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Restrict access to a frontend to a partition in 2.6 ?

2014-09-29 Thread Christopher Samuel
of which help in this scenario), but just wondering if I've missed anything obvious. cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au

[slurm-dev] Re: Restrict access to a frontend to a partition in 2.6 ?

2014-09-29 Thread Christopher Samuel
On 29/09/14 21:33, je...@schedmd.com wrote: Slurm does not support this today. Thanks Moe, we'll see if we can figure another way around it. cheers! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam

[slurm-dev] Re: Slurm User Group Meeting 2014: Presentations now online

2014-09-29 Thread Christopher Samuel
On 30/09/14 02:39, je...@schedmd.com wrote: About 70 people attended the Slurm User Group Meeting last week in Lugano Switzerland. Thanks so much to you and everyone who organised the meeting and to everyone who came, it was well worth attending. All the best, Chris -- Christopher Samuel

[slurm-dev] Re: Build error on CentOS 5.10

2014-10-02 Thread Christopher Samuel
On 03/10/14 03:05, Michael Jennings wrote: Is your /var/tmp mounted noexec by any chance? I wondered that but I don't think that'll give the errors that Dennis is seeing, he's seeing -EACCES for /usr/bin/perl, not for something in /var/tmp. :-( -- Christopher SamuelSenior Systems

[slurm-dev] Re: Prune database before migration to 14.11 ?

2014-10-13 Thread Christopher Samuel
just a few seconds. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Prune database before migration to 14.11 ?

2014-10-15 Thread Christopher Samuel
servers. The foreign constraints came from another database a colleague of mine has installed, we could fix this :-) :-) cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3

[slurm-dev] Re: Prune database before migration to 14.11 ?

2014-10-21 Thread Christopher Samuel
On 16/10/14 16:02, Christopher Samuel wrote: No worries, we're going to test out ours in a sandbox as well, so we'll be able to compare it to our (pretty beefy) DB servers. It took around 2 minutes to add all the indexes in our sandbox, thats with a total of about 6 million jobs across 5

[slurm-dev] Re: SLURM and SSSD group enumeration

2014-10-29 Thread Christopher Samuel
-- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Failure tolerance in slurm + openmpi

2014-10-29 Thread Christopher Samuel
.x as that can use PMI2 with Slurm and fixes this scaling issue. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] slurmstepd: _slurm_cgroup_destroy: problem deleting step cgroup path

2014-11-06 Thread Christopher Samuel
) is to migrate to Open-MPI 1.8.4 which is due out shortly which should address this. cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au

[slurm-dev] Re: slurmstepd: _slurm_cgroup_destroy: problem deleting step cgroup path

2014-11-06 Thread Christopher Samuel
that in the stdout or stderr files as a result of upgrading to 14.03.10. Yup, this is with 14.03.10. I've only managed to provoke it with NAMD so far, but I guess we'll hear from users if they see it with other codes too. :-) All the best, Chris -- Christopher SamuelSenior Systems Administrator

[slurm-dev] Re: slurmstepd: _slurm_cgroup_destroy: problem deleting step cgroup path

2014-11-06 Thread Christopher Samuel
/ cgroup.event_control cgroup.procs freezer.state notify_on_release tasks Perhaps we can put it at debug level as before as it may concern users. If it is just cosmetic it'd be good I think. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences

  1   2   3   4   5   >