[slurm-dev] Re: Question about prologging

2015-04-15 Thread Christopher Samuel
t a day. > That's how it supports everything else! Yeah, the problem is the millions of single node or core jobs that get run, they'll never even try to use srun. cheers! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences

[slurm-dev] Re: slurmdbd association lifetime/expiry

2015-04-21 Thread Christopher Samuel
ir new quota and then zero everyone else. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: slurmdbd association lifetime/expiry

2015-04-22 Thread Christopher Samuel
On 22/04/15 19:07, Maciej L. Olchowik wrote: > I will suggest that to the resource allocation committee, see if they > would be willing to change their ways :) Thanks for the thought! My pleasure, glad our experiences are useful to others! All the best, Chris -- Christopher

[slurm-dev] Re: Pulling program results from nodes

2015-05-07 Thread Christopher Samuel
d people with lots of experience in this and isn't very high traffic these days. Drop me a line if you'd like to be subscribed (had to disable Mailman web subscriptions as it was being abused to mailbomb people). All the best, Chris -- Christopher SamuelSenior Systems Admini

[slurm-dev] Re: IB and MPI with Slurm

2015-05-24 Thread Christopher Samuel
ore information about how Slurm interacts with various MPI implementations on the SchedMD website here: http://slurm.schedmd.com/mpi_guide.html All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email:

[slurm-dev] Re: squeue produces segmentation fault.

2015-05-25 Thread Christopher Samuel
've fixed it in the interim. Especially as for 14.11.3 the NEWS file says: -- Fixed squeue core dump. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 9

[slurm-dev] Re: squeue produces segmentation fault.

2015-05-25 Thread Christopher Samuel
On 26/05/15 13:05, Christopher Samuel wrote: > Given that Slurm 14.11.x is up to 14.11.6 Sorry, actually up to 14.11.7.. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55

[slurm-dev] Re: Slurm and docker/containers

2015-05-31 Thread Christopher Samuel
ly verified by any authority – this number jumps up # to ~40% with a sampling error bound of 3%. [...] If anything that puts me off liking them even more. :-( All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Emai

[slurm-dev] Re: Slurm and docker/containers

2015-06-04 Thread Christopher Samuel
ser namespaces here: https://lwn.net/Articles/532593/ Your filesystem has to support it though, otherwise it looks like you'll get EINVAL back - as this comment from a user who was trying it on filesystems not yet ported to it reports: https://lwn.net/Articles/541787/ All the best, Chris --

[slurm-dev] Re: How to unsubscribe?

2015-06-04 Thread Christopher Samuel
om memory) and then you can login to unsubscribe. No direct link from what I can see sorry! I think Danny, Moe, David et. al monitor the list and unsubscribe people who ask here, so it might have already happened for you (hence the CC). All the best, Chris -- Christopher SamuelSenio

[slurm-dev] Re: Slurm and docker/containers

2015-06-04 Thread Christopher Samuel
oth Docker and CoreOS's rkt are Apache2. :-( https://www.apache.org/licenses/GPL-compatibility.html Oh well, makes the whole thing academic really for us. Thanks for pointing that out Ralph! All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life S

[slurm-dev] Re: Slurm and docker/containers

2015-06-04 Thread Christopher Samuel
ir jobs, and SLURM will do so. How did they deal with the license incompatibility that Ralph mentioned? All the best Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903

[slurm-dev] Re: old version of scontrol, sacct keeps resurrecting

2015-06-04 Thread Christopher Samuel
ing done regularly with the old paths hardcoded in them? What users are they running as? All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.

[slurm-dev] Re: cgroup setup and cpuset issues

2015-06-09 Thread Christopher Samuel
SBATCH --ntasks=1 #SBATCH --cpus-per-task=x So they understand that they are running a single process that needs 'x' cores. Best of luck! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au

[slurm-dev] Re: cgroup setup and cpuset issues

2015-06-09 Thread Christopher Samuel
nCores=yes ConstrainRAMSpace=yes ConstrainSwapSpace=yes -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: cgroup setup and cpuset issues

2015-06-10 Thread Christopher Samuel
so you can at least confirm whether or not there are multiple cores allocated. Best of luck! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: slurm on systemd

2015-06-11 Thread Christopher Samuel
ing at least. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Slurm and docker/containers

2015-06-11 Thread Christopher Samuel
ut containerisation they recommended looking at LXD from Canonical which is an evolution of LXC. http://www.ubuntu.com/cloud/tools/lxd Of course getting it to run on anything other than Ubuntu might be a challenge which could limit its usefulness. All the best, Chris -- Christopher Samuel

[slurm-dev] Re: UserAccounts on SLURMDBD

2015-06-14 Thread Christopher Samuel
n here in Australia) to manage LDAP users and projects and it supports both Slurm and Torque/MAM. https://github.com/Karaage-Cluster/ Might be overkill for a small cluster though! cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Co

[slurm-dev] Re: UserAccounts on SLURMDBD

2015-06-15 Thread Christopher Samuel
but if it ain't broke... ;-) All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: slurm-dev error: slurm_receive_msg: Invalid Protocol Version

2015-06-15 Thread Christopher Samuel
14.11.x). So 14.11.x should support 14.03.x and 2.6.x. I suspect that the OP is actually having the wrong slurmd start on the compute nodes.. cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu

[slurm-dev] Re: Problem running OpenMPI over slurm

2015-06-17 Thread Christopher Samuel
few years now. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Problem running OpenMPI over slurm

2015-06-17 Thread Christopher Samuel
${slurm_ver} is the current version we're running (currently 14.03.11). You will need to fix up your resource limit settings for maximum lockable memory too, but that shouldn't cause the issue you're seeing. Best of luck! Chris -- Christopher SamuelSenior Systems Administrat

[slurm-dev] Re: Problem running OpenMPI over slurm

2015-06-17 Thread Christopher Samuel
and? We're on RHEL6 (no systemd) and don't start them on boot, instead we will only start it by hand (as if a node reboots due to a fault we want to go and check it out first). All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Problem running OpenMPI over slurm

2015-06-17 Thread Christopher Samuel
i/v1.8/ There have been Slurm related fixes in 1.8.4 and 1.8.5 (though looking at the change log they seem minor, but one is related to configure tests for PMI). I'd strongly suggest trying that out just to confirm that it's not an issue that's been fixed in the last 9 months. Be

[slurm-dev] Re: Problem running OpenMPI over slurm

2015-06-17 Thread Christopher Samuel
see online a lot of people set in the > /etc/sysconfig/slurm file. > > Are there different sourced files in the CentOS 7 / systemd setup for > slurm? I would not be at all surprised if that was the case. If the unit file is telling you where it is then I'd trust that. Good luck!

[slurm-dev] Re: setting DefMemPerCPU in a heterogeneous cluster

2015-06-18 Thread Christopher Samuel
it of that is that for us is that as our current Intel clusters have minimums of 4GB/core and 16GB/core then if a lot of people can get away inside that limit then you can still fit larger jobs into the memory and cores left over. All the best, Chris -- Christopher SamuelSenior System

[slurm-dev] Re: srun + openmpi : Missing locality information

2015-06-22 Thread Christopher Samuel
etrlimit(2) for details. Use the string infinity to configure no limit on a specific resource. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: srun + openmpi : Missing locality information

2015-06-23 Thread Christopher Samuel
.. cheers! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: successful building in sparc solaris 10

2015-06-24 Thread Christopher Samuel
not had to touch Solaris for well over a decade now. Best of luck, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Off-topic: What accounting system do you use?

2015-06-24 Thread Christopher Samuel
en use Karaage to manage LDAP users, their projects and it integrates with Slurm via scripts that call sacctmgr. http://karaage.readthedocs.org/en/latest/introduction.html All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computat

[slurm-dev] Re: module environment

2015-06-29 Thread Christopher Samuel
e have: echo export BASH_ENV=/etc/profile.d/module.sh to try and get that into peoples environments (basically ported over from our old Torque setup). Hope that helps, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Emai

[slurm-dev] Re: slurm-dev Understand GrpCPUMins

2015-06-29 Thread Christopher Samuel
On 30/06/15 07:05, Apolinar Martinez Melchor wrote: > *** JOB xxx CANCELLED AT yyy DUE TO TIME LIMIT *** That sounds like the job hit its walltime, GrpCpuMins should control whether a job can start in the first place. Best of luck, Chris -- Christopher SamuelSenior Syst

[slurm-dev] Accounting frequency, running jobs and sacct

2015-07-09 Thread Christopher Samuel
S reported for any running jobs, even the longest currently (20 hours old). We are running using slurmdbd with the MySQL backend. Am I misunderstanding something? All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative

[slurm-dev] Re: Accounting frequency, running jobs and sacct

2015-07-12 Thread Christopher Samuel
that just use sbatch (and no srun) once they complete. The manual page for sbatch says that memory stats are collected by default every 30 seconds. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...

[slurm-dev] Re: Accounting frequency, running jobs and sacct

2015-07-15 Thread Christopher Samuel
d. All the best! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Accounting frequency, running jobs and sacct

2015-07-15 Thread Christopher Samuel
Hiya, On 16/07/15 14:25, Danny Auble wrote: > sstat should work with mpi jobs as well, just as long as srun is the > launcher. Yup, we're using OpenMPI 1.6.x mpirun (which calls srun). cheers! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Li

[slurm-dev] Re: Direct upgrade without running jobs from 2.4.5 to 14.11?

2015-07-15 Thread Christopher Samuel
11.x one to get to that version. ...and make sure you've got a database backup first! ;-) All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Concept of job environment with sbatch or salloc

2015-07-28 Thread Christopher Samuel
d at" `date` echo "print Scratch directory /scratch/avoca/$SLURM_JOB_ID has been allocated" echo "print $SLURM_BG_NUM_NODES Blue Gene/Q compute nodes have been allocated" Hope these help! All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victo

[slurm-dev] Re: Concept of job environment with sbatch or salloc

2015-07-29 Thread Christopher Samuel
On 29/07/15 16:41, Thomas Orgis wrote: > Am Tue, 28 Jul 2015 23:03:54 -0700 > schrieb Christopher Samuel : > >> 1) To stop environment variable being exported we have this in our >> slurm.conf file: >> >> PropagateResourceLimits=NONE > > Does that infl

[slurm-dev] Re: Set an endtime for a cluster

2015-08-06 Thread Christopher Samuel
their walltime then there jobs will get in.. Best of luck, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Problem with MPI applications at the end of the job

2015-08-11 Thread Christopher Samuel
On 11/08/15 23:22, Igor Chebotar wrote: > The mpich and openmpi ./configure was set with default options with only > --prefix=/sotware/storage/path For Open-MPI you need to specify: --with-slurm to include Slurm support. Hope that helps! Chris -- Christopher SamuelSenior S

[slurm-dev] Re: Problem with MPI applications at the end of the job

2015-08-11 Thread Christopher Samuel
central Slurm install instead and so need to tell OMPI where to find it. Best of luck! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Estimated nodes for queued jobs

2015-08-13 Thread Christopher Samuel
tails about the whole scheduling process > (backfill)? I don't know the answer I'm afraid, but I can see that it would be really useful to know what nodes are planned for a job (a bit like Moab and its forward reservations for jobs). All the best! Chris -- Christopher Samuel

[slurm-dev] Re: Estimated nodes for queued jobs

2015-08-16 Thread Christopher Samuel
d to that when we get to 14.11! All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] PMI2 in Slurm 14.11.8 ?

2015-09-01 Thread Christopher Samuel
, there appears to be no pmi2.h installed by "make install" and indeed there is a contrib/pmi2 directory instead that has the pmi2.h header file and API which appears to need manual intervention to install. Has there been some misunderstanding about PMI2 in Slurm? All the best

[slurm-dev] Re: PMI2 in Slurm 14.11.8 ?

2015-09-01 Thread Christopher Samuel
.h ever being installed on our systems in 14.03.x or 2.6.x. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: PMI2 in Slurm 14.11.8 ?

2015-09-01 Thread Christopher Samuel
.. [samuel@snowy-m ~]$ srun --mpi=list srun: MPI types are... [...] srun: mpi/pmi2 srun: mpi/openmpi [...] All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903

[slurm-dev] Re: PMI2 in Slurm 14.11.8 ?

2015-09-01 Thread Christopher Samuel
On 02/09/15 13:32, Danny Auble wrote: > I'm fairly sure if you install via rpm it will be there. > Contribs isn't build through the normal make as was pointed out, > but it is through the rpm process. OK, thanks Danny, so it's not built by default. cheers! Chris

[slurm-dev] Re: PMI2 in Slurm 14.11.8 ?

2015-09-01 Thread Christopher Samuel
On 02/09/15 13:57, Ralph Castain wrote: > Sorry for the confusion, Chris Not a worry Ralph, this is what bring-up testing is for. :-) -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61

[slurm-dev] srun: error: mpi/pmi2: failed to send temp kvs to compute nodes

2015-09-01 Thread Christopher Samuel
3.003] job_complete: JobID=959 State=0x8003 NodeCnt=2 done Any ideas? All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/

[slurm-dev] Re: srun: error: mpi/pmi2: failed to send temp kvs to compute nodes

2015-09-02 Thread Christopher Samuel
night with a different build, I'll have another go tomorrow. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: PMI2 in Slurm 14.11.8 ?

2015-09-02 Thread Christopher Samuel
m with it? > > ls -l /lib/slurm/mpi*so Thanks for the pointer, I think I'd deleted enough for Open-MPI to not detect PMI2 when building but not enough for srun to not list it. :-/ Fixed that now aspect now. :-) All the best, Chris -- Christopher SamuelSenior Systems Administ

[slurm-dev] Re: jobs arrays & slurm_load_job changes in 14.11+

2015-09-03 Thread Christopher Samuel
e job arrays (waiting on confirmation from them at the moment). All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: PMI2 in Slurm 14.11.8 ?

2015-09-03 Thread Christopher Samuel
iname '*pmi2*' /usr/local/slurm/latest/lib/slurm/mpi_pmi2.la /usr/local/slurm/latest/lib/slurm/mpi_pmi2.a /usr/local/slurm/latest/lib/slurm/mpi_pmi2.so which is what srun looks for to know what to list. That seems counter-intuitive, should those files be installed as a result of

[slurm-dev] Re: PMI2 in Slurm 14.11.8 ?

2015-09-03 Thread Christopher Samuel
On 03/09/15 05:17, Andy Riebs wrote: > Have you specified mpi=pmi2, either in your command line or in slurm.conf? Yup, tried all combinations of that. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email:

[slurm-dev] Re: srun: error: mpi/pmi2: failed to send temp kvs to compute nodes

2015-09-03 Thread Christopher Samuel
The command line to generate this was: [samuel@snowy-m PMI2]$ sbatch -p debug --wrap "srun -vv --slurmd-debug=verbose ./testpmi2" Submitted batch job 996 All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Ini

[slurm-dev] Re: srun: error: mpi/pmi2: failed to send temp kvs to compute nodes

2015-09-03 Thread Christopher Samuel
single node with David's command line to two nodes and never spotted I'd dropped it out. I'll test that again once the compute nodes are back online again. cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiat

[slurm-dev] Re: srun: error: mpi/pmi2: failed to send temp kvs to compute nodes

2015-09-03 Thread Christopher Samuel
On 04/09/15 15:44, Christopher Samuel wrote: > I'll test that again once the compute nodes are back online again. OK, back to the right error again. [samuel@snowy-m PMI2]$ srun -p debug --mpi=pmi2 ./testpmi2 srun: job 1004 queued and waiting for resources srun: job 1004 has been a

[slurm-dev] Re: srun: error: mpi/pmi2: failed to send temp kvs to compute nodes

2015-09-03 Thread Christopher Samuel
On 04/09/15 16:02, Christopher Samuel wrote: > I've attached the output file for the version with debugging on, and I > have a suspicion it's related to: > > srun: debug: slurm_forward_data: nodelist=snowy010, > address=/tmp/sock.pmi2.1006.0, len=243 > > We ar

[slurm-dev] Re: srun: error: mpi/pmi2: failed to send temp kvs to compute nodes

2015-09-04 Thread Christopher Samuel
queued up to try it out once the current job finishes. I would like Slurm to not use $TMPDIR (TmpFS) for that as we need the tmpdir spank plugin to map each jobs /tmp to our scratch FS instead. Too many codes don't honour $TMPDIR/$TMP/etc.. All the best! Chris -- Christopher Samuel

[slurm-dev] Re: Compile of 15.08.0 fails on trusty missing mpio.h

2015-09-12 Thread Christopher Samuel
HDF5 code that seems to need these MPI headers. All the best, Chris (finally in DC for the Slurm User Group) -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org

[slurm-dev] Re: Requested configuration not available

2015-09-14 Thread Christopher Samuel
m option, and the default size is 1MB. Your job asks for 50MB and so won't fit. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.v

[slurm-dev] Re: Compile of 15.08.0 fails on trusty missing mpio.h

2015-09-14 Thread Christopher Samuel
then is why is the HDF5 code finding it? All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Issue with slurmctld daemon

2015-09-15 Thread Christopher Samuel
ngStorageType=accounting_storage/slurmdbd to tell it to use slurmdbd instead? All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.

[slurm-dev] Re: I need send a example script of job with mpi

2015-09-17 Thread Christopher Samuel
-lpthread -lhwloc -Wl,-rpath=/usr/local/slurm/latest/lib All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Now I have this error with libltdl.so.7

2015-09-20 Thread Christopher Samuel
most looks like your LD_LIBRARY_PATH is not being propagated properly to the compute nodes. What does: ldd scriptmpi say? cheers, Chris /* Apologies if threading is broken for this email */ -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation In

[slurm-dev] Re: I need send an executable with parameters with srun

2015-09-20 Thread Christopher Samuel
On 19/09/15 05:44, Fany Pag�s D�az wrote: > I need to send an executable with parameters. I can do it with the > command srun? > > For example: > srun myapplication database IP username password Yes, that should work just fine. All the best, Chris -- Christopher Samue

[slurm-dev] Re: Large job socket timed out errors.

2015-09-21 Thread Christopher Samuel
the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: CR_cores, --cpus_per_task=2 not working

2015-09-21 Thread Christopher Samuel
On 22/09/15 06:37, Andrew Petersen wrote: > However if I use 2, > #SBATCH --cpus-per-task=2 > I get the error "sbatch: error: Batch job submission failed: Requested > node configuration is not available" What does your slurm.conf look like? -- Christopher Samuel

[slurm-dev] Re: CR_cores, --cpus_per_task=2 not working

2015-09-21 Thread Christopher Samuel
DEFAULT line says everything that is general and then for the two larger memory nodes we override that. Hope this helps! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http:

[slurm-dev] Re: slurm-dev RV: Re: Now I have this error with libltdl.so.7

2015-09-22 Thread Christopher Samuel
r that? What is the program itself you're trying to run? All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: slurm-dev RV: Re: I need send an executable with parameters with srun

2015-09-22 Thread Christopher Samuel
t of luck, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Job start time discrepancy for job array with 14.11.9

2015-09-23 Thread Christopher Samuel
24T10:23:19 1292_1.0 xhpl_inte+ 2015-09-24T10:23:19 All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: What follows PMI-2?

2015-09-24 Thread Christopher Samuel
PMIX work, to me they seem to be extending PMI2 so I wonder if it would be better for their API to be called PMI2X instead to avoid confusion? All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unim

[slurm-dev] Re: Limit access to reconfiguration of Slurm (p.ex. accounting, limits) to certain hosts?

2015-09-29 Thread Christopher Samuel
mgr will be talking to) is running on the same host as slurmctld. For instance we have many clusters talking back to a central host running slurmdbd (and only slurmdbd) and have a variety of various systems that talk to it for different tasks. All the best, Chris -- Christopher Samuel

[slurm-dev] Re: Limit access to reconfiguration of Slurm (p.ex. accounting, limits) to certain hosts?

2015-09-30 Thread Christopher Samuel
ll talk to the same slurmdbd to record their accounting records, get their limits, QoS, etc from. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http

[slurm-dev] Re: slurm job priorities depending on the number of each user's jobs

2015-10-06 Thread Christopher Samuel
ts via sacctmgr. How's that? All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: CPU affinity question

2015-10-07 Thread Christopher Samuel
d just request the binding for the job step that needs inside your batch job with: srun --cpu_bind=cores ./my_program In our recent acceptance testing I tried that with our HPL burn in runs and saw an improvement in performance with that. All the best, Chris -- Christopher SamuelSenior S

[slurm-dev] Re: Jobs waiting even when I have idle slots

2015-10-07 Thread Christopher Samuel
r The bf_window is large enough for our max job time (1 month), and we use bf_continue to make sure that it goes all the way through the queue before returning to the top. Hope this helps! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Init

[slurm-dev] Re: jobs canceled by other user than owner

2015-11-15 Thread Christopher Samuel
patch which catches the cancellation) applies. It all applies, 14.03 looks like it would need more work. This fix seems to be: commit 8e66e26773352e5a27445a6b60a2134b632c3453 Author: David Bigagli Date: Wed Nov 11 13:04:28 2015 +0100 Fix job cancelation bug. -- Christopher

[slurm-dev] Re: jobs canceled by other user than owner

2015-11-15 Thread Christopher Samuel
On 16/11/15 11:39, Christopher Samuel wrote: > It was a bug, fixed in 15.08. That was meant to be "fixed in 15.08.4" but my caffeine levels have become too dilute.. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiativ

[slurm-dev] Re: Finding out total number of cores in the cluster

2015-11-18 Thread Christopher Samuel
On 16/11/15 18:28, Mikael Johansson wrote: > Well, to get the total number that SLURM is aware of, a simple command > would be "sinfo -o %C", which shows Allocated/Idle/Offline/Total CPUs. There's also "sinfo -o %F" for the same information about nodes. cheers

[slurm-dev] Combing fair share with GrpCPUMins caps

2015-11-29 Thread Christopher Samuel
l the best! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Combing fair share with GrpCPUMins caps

2015-12-03 Thread Christopher Samuel
On 30/11/15 16:44, Christopher Samuel wrote: > We're looking at seeing if we can combine fair share with our existing > quota system that uses GrpCPUMins. > > However, for fair share a decay factor is strongly suggested and I worry > that there is an implication that the u

[slurm-dev] Re: squeue: AssocMaxJobsLimit despite restriction being lifted

2015-12-03 Thread Christopher Samuel
down and then up and it'll go back to pending. :-) All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: squeue: AssocMaxJobsLimit despite restriction being lifted

2015-12-03 Thread Christopher Samuel
On 04/12/15 10:49, Christopher Samuel wrote: > the scheduler next tries to run it Sorry, that should be "the scheduler next tries to schedule it". -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@u

[slurm-dev] Re: Slurm version 15.08.5 now available

2015-12-10 Thread Christopher Samuel
On 11/12/15 11:56, je...@schedmd.com wrote: > We are pleased to announce the availability of Slurm version 15.08.5 Thanks Moe - silly question - do you need to recompile plugins if going from 15.08.4 to 15.08.5 ? cheers, Chris -- Christopher SamuelSenior Systems Administrator VL

[slurm-dev] Re: Shared node and exclusive node within the same cluster

2015-12-14 Thread Christopher Samuel
ng was that it was much better these days. To be honest we're willing to take a small performance hit to prevent rogue jobs killing other peoples jobs on the same node! All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation

[slurm-dev] Re: Shared node and exclusive node within the same cluster

2015-12-14 Thread Christopher Samuel
ut I don't believe it as ever implemented. Best of luck, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Launch Failed Requeued Held

2015-12-14 Thread Christopher Samuel
daemon executes on every compute node. It # resembles a remote shell daemon to export control to # Slurm. Because slurmd initiates and manages user jobs, # it must execute as the user root. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sc

[slurm-dev] Re: Head vs worker vs accounting nodes. Where to run the daemons?

2015-12-14 Thread Christopher Samuel
differ from a head, worker or > accounting node, and is there anything special that I need to do to > that node - apart from implementing some sort of authorization > system? The key is for that is Munge - you need to make sure you've got that configured & running everywher

[slurm-dev] Re: Head vs worker vs accounting nodes. Where to run the daemons?

2015-12-14 Thread Christopher Samuel
On 15/12/15 14:11, Christopher Samuel wrote: > Hi Simpson, Sorry, Lachlan! :-) -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ h

[slurm-dev] Re: Head vs worker vs accounting nodes. Where to run the daemons?

2015-12-15 Thread Christopher Samuel
Hiya Trey, On 16/12/15 03:24, Trey Dockendorf wrote: > The same startup script is used to launch slurmctld and slurmd. That's not the case for the systemd unit files, they each reference a different binary. All the best, Chris -- Christopher SamuelSenior Systems Admin

[slurm-dev] RE: Slurm version 15.08.6 now available

2015-12-20 Thread Christopher Samuel
x27;s on the right version). There is a section in the admin quickstart about upgrades here: http://slurm.schedmd.com/quickstart_admin.html Best of luck! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.

[slurm-dev] RE: Worker nodes down, draining?

2015-12-20 Thread Christopher Samuel
e.service Best of luck! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Shared file systems?

2015-12-20 Thread Christopher Samuel
the job allocation. Other than the batch script itself, Slurm does no movement of user files. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http:/

[slurm-dev] slurmd can't mount cpuacct cgroup namespace on RHEL 7.2 ?

2015-12-20 Thread Christopher Samuel
oup", MS_NOSUID|MS_NODEV|MS_NOEXEC, "cpuacct") = -1 EBUSY (Device or resource busy) ...and it might be related to this existing mount courtesy of systemd in /proc/mounts: cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpuacct,cpu 0 0 Anyone else seen t

[slurm-dev] Re: Shared file systems?

2015-12-20 Thread Christopher Samuel
On 21/12/15 16:57, Simpson Lachlan wrote: > So if I'm running the tests then that part of the filesystem should be > shared? Yup, that's the idea.. > Hmmm. Ok. Thanks, I'll look into this tomorrow. No worries, best of luck! -- Christopher SamuelSenior Syst

[slurm-dev] Re: slurmd can't mount cpuacct cgroup namespace on RHEL 7.2 ?

2015-12-21 Thread Christopher Samuel
Thanks for the confirmation! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

<    1   2   3   4   5   >