[slurm-dev] Is MaxJobs enforced for associations?

2014-03-02 Thread Christopher Samuel
e something I'm missing, or is this a bug? All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Is MaxJobs enforced for associations?

2014-03-02 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/03/14 12:01, Christopher Samuel wrote: > and I can see the expected values when I do: > > # sacctmgr show assoc where cluster=barcoo account=foo > > but it doesn't appear to have any effect. I've tried restarting

[slurm-dev] Re: slurm-dev slurm behaves differently than a serial app [was Re: openmpi misbehaves when started under slurm]

2014-03-03 Thread Christopher Samuel
) with: PropagateResourceLimits=NONE That means we can have different, appropriate, limits on both login and compute nodes. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

[slurm-dev] Re: Is MaxJobs enforced for associations?

2014-03-03 Thread Christopher Samuel
then patch our local showq to treat those as "blocked" and hide them from other users. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 htt

[slurm-dev] Re: Is MaxJobs enforced for associations?

2014-03-04 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/03/14 21:23, Marcin Stolarek wrote: > bf_max_job_user=# We already do this, but it doesn't change the state of the skipped jobs to reflect the fact that they're not eligible to start. cheers, Chris - -- Christopher Samuel

[slurm-dev] Re: slurmd crashed on *some* nodes after "scontrol reconfigure"

2014-03-13 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/03/14 06:20, Andy Riebs wrote: > Has anyone seen this before? slurm.conf is on an NFS server, so > it's possible we've got a configuration error there. We've seen this same problem too, lost a heap of jobs to it. :-

[slurm-dev] Re: missing SLURM environment variables with export=NONE

2014-03-13 Thread Christopher Samuel
-SLURM_SUBMIT_HOST SLURM_JOB_NUM_NODES - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Ver

[slurm-dev] RE: error: We have more allocated time than is possible...

2014-03-16 Thread Christopher Samuel
d our Intel clusters which have all only run 2.6.x. The University has had some occasional network and DNS issues which have meant that our slurmdbd has had transient issues talking to our database server, so I don't know if that could be the trigger for us. All the best, Chris - -- Ch

[slurm-dev] Re: Slurm version 14.03.0 is now available

2014-03-26 Thread Christopher Samuel
o be sure.. :-) Secondly will a 14.03 slurmctld happily talk to (drained) 2.6.x slurmd's running jobs so we can do a rolling upgrade? Or will we need to drain the entire cluster of running jobs first and then upgrade them in a single hit? All the best, Chris - -- Christopher Samuel

[slurm-dev] Re: Slurm version 14.03.0 is now available

2014-03-30 Thread Christopher Samuel
d we've lost a heap of jobs from this behaviour (especially on BG/Q). A setting that let you change that behaviour to mark nodes as draining rather than down would be very handy. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computatio

[slurm-dev] Re: Slurm version 14.03.0 is now available

2014-03-31 Thread Christopher Samuel
tely down. For us Open-MPI catches those sorts of things for us, so I think we'd rather just have the Slurm not kill jobs unless we (or the user) tells it to. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Emai

[slurm-dev] Re: Slurm version 14.03.0 is now available

2014-03-31 Thread Christopher Samuel
is running on the front end is the srun). cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN

[slurm-dev] Re: sacct access for end users

2014-03-31 Thread Christopher Samuel
users can only query their own jobs and they can do that from any of our 4 HPC systems. Take a look at: http://slurm.schedmd.com/accounting.html as a starting point. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Init

[slurm-dev] Re: Slurm version 14.03.0 is now available

2014-03-31 Thread Christopher Samuel
nks Moe! - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Li

[slurm-dev] Re: Information about older jobs

2014-04-06 Thread Christopher Samuel
For instance: [samuel@barcoo ~]$ sacct -j 1093417 -o jobid,nodelist JobIDNodeList - --- 1093417barcoo010 And showing it's not there by default: [samuel@barcoo ~]$ sacct -j 1093417 -l | fgrep -i NodeList [samuel@barcoo ~]$ Hope that helps! Chris

[slurm-dev] Guidance on planning a slurmdbd outage

2014-04-06 Thread Christopher Samuel
it's not a great deal but it'd be good to have an idea how long we could safely run without a slurmdbd to talk to. thanks! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61

[slurm-dev] Re: Guidance on planning a slurmdbd outage

2014-04-08 Thread Christopher Samuel
ave enough free memory but I assume > you are not using 32MB of RAM. 32MB should be enough for anyone.. ;-) All the best! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone:

[slurm-dev] Re: Patch to show backfill scheduled nodes

2014-04-08 Thread Christopher Samuel
s already > calculated by the scheduler. Would it make more sense to display that with "scontrol show job" rather than in squeue? But yes, anything that gives more insight into what Slurm is trying to do for backfill is good! All the best, Chris - -- Christopher SamuelSenior

[slurm-dev] Anyone using slurmdbd with MySQL 5.6 ?

2014-04-30 Thread Christopher Samuel
lurmdbd against MySQL 5.6? All the best! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATU

[slurm-dev] Re: Anyone using slurmdbd with MySQL 5.6 ?

2014-04-30 Thread Christopher Samuel
or now). Has anyone tested running 2.6.x slurmctld's against a 14.03.x slurmdbd for extended periods of time? cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0

[slurm-dev] Re: why job status is always CG?

2014-05-11 Thread Christopher Samuel
ecutes > > [root@master ~]# scancel job 14 scancel: error: Invalid job_id job > > says: error: Invalid job_id job > > How can I cancel this job? You can't as it is no longer running, so cancelling it doesn't make sense. Hope that helps, Chris - -- Christophe

[slurm-dev] Re: why job status is always CG?

2014-05-12 Thread Christopher Samuel
script went into the background (like a daemon would)? Even then I'm not sure that srun would help you. We have users running batch scripts all the time on x86-64 without using srun, it's never been an issue for us. All the best, Chris - -- Christopher SamuelSenior Systems

[slurm-dev] Re: PMI2 related error

2014-05-20 Thread Christopher Samuel
-2 has a definite advantage, so I'm proposing # that we turn PMI-2 "off" when under Slurm unless the user # specifically requests we use it. Not sure if this has been raised on slurm-dev yet. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI -

[slurm-dev] Re: srun interactive failure after upgrade

2014-05-20 Thread Christopher Samuel
Hopefully we can abstract it out into our "sinteractive" script somehow.. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vl

[slurm-dev] Re: srun interactive failure after upgrade

2014-05-21 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 22/05/14 04:49, David Bigagli wrote: > By popular acclaim we keep supporting the old behaviour for > historical reasons. The fix will be available in 14.03.4. The > commit is 6aadcf15355dfe. Thanks David! - -- Christopher Samuel

[slurm-dev] Re: Reserving a cpu core for each gpu

2014-05-28 Thread Christopher Samuel
jobs using GPUs (enforce using # job_submit plugin) Is that what you were after? cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au

[slurm-dev] Re: pbsdsh -u equivalent

2014-06-30 Thread Christopher Samuel
According to the manual page the --ntasks-per-node=1 option for srun should do what you want. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://ww

[slurm-dev] Re: pbsdsh -u equivalent

2014-07-02 Thread Christopher Samuel
etting nnodes to 1 Submitted batch job 1856638 A distributed job (MPI for instance) must have at least one task on every node for this to make sense. All the best, Chris - -- Christopher SamuelS

[slurm-dev] Re: Documentation enhancement request- mysql

2014-07-03 Thread Christopher Samuel
t should use Engine=InnoDB when creating MySQL tables, that way you don't need to remember. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www

[slurm-dev] Re: Documentation enhancement request- mysql

2014-07-03 Thread Christopher Samuel
r if the desired engine is # unavailable, enable the NO_ENGINE_SUBSTITUTION SQL mode. If # the desired engine is unavailable, this setting produces an # error instead of a warning, and the table is not created or # altered. See Section 5.1.7, “Server SQL Modes”. cheers! Chris - -- Christopher Samu

[slurm-dev] Re: BLCR does not checkpoint jobs

2014-07-06 Thread Christopher Samuel
this to work. Caveat: We've never used this, so YMMV. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com

[slurm-dev] Re: Documentation enhancement request- mysql

2014-07-08 Thread Christopher Samuel
isting and > halt if it isn't available. Wonderful, thanks so much! - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.

[slurm-dev] Re: heterogeneus number of processors per node, slurm wont use all processors

2014-07-22 Thread Christopher Samuel
es --ntasks=32 (and no node specification) not work for that? cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: heterogeneus number of processors per node, slurm wont use all processors

2014-07-27 Thread Christopher Samuel
counts so I'll try and find a time to create a test partition there to check we see the same. We're (currently) on 2.6.5, what version are you on? cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam

[slurm-dev] Re: How to change how frequently SLURM updates the output file (stdout)?

2014-08-07 Thread Christopher Samuel
it difficult for me to check on the status of my jobs. I have a suspicion that's related to your filesystem, not Slurm. Certainly we don't see any such issue using Slurm 2.6.x with GPFS and Panasas filesystems. All the best, Chris -- Christopher SamuelSenior Systems Admin

[slurm-dev] Re: Intel MPI Performance inconsistency (and workaround)

2014-08-21 Thread Christopher Samuel
On 22/08/14 04:43, Jesse Stroik wrote: > We recently noticed sporadic performance inconsistencies on one of our > clusters. What distro is this? Are you using cgroups? cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Compu

[slurm-dev] Re: starting slurmd only after GPUs are fully initialized

2014-08-31 Thread Christopher Samuel
so if one does go bad we want to go and find out why before we let it back into the cluster. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://ww

[slurm-dev] Re: Unsubscribe

2014-09-03 Thread Christopher Samuel
slurmdev Unfortunately it's not clear what you do from there (there is no unsubscribe link), but if you click "Profile Home" from the top of that page it takes you to: http://lists.schedmd.com/cgi-bin/dada/mail.cgi/profile/ and you'll see "Unsubscribe from this list" links the

[slurm-dev] Re: "Requested node configuration is not available" when using -c

2014-09-08 Thread Christopher Samuel
icate 2 cores to a single job if I can dedicate each core individually to a job. What does "scontrol show node" say? cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61

[slurm-dev] Re: "Requested node configuration is not available" when using -c

2014-09-10 Thread Christopher Samuel
o build SMP systems out of individual nodes but they are pretty rare. One company that does this is ScaleMP. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903

[slurm-dev] Re: "Requested node configuration is not available" when using -c

2014-09-10 Thread Christopher Samuel
On 11/09/14 10:56, Christopher Samuel wrote: > This is an operating system kernel issue, not a queuing system issue, so > Slurm, LSF or Torque will all have the same issue. To clarify the Linux kernel can support these sorts of systems, but you'll either need extra supporting

[slurm-dev] Re: undelivered output

2014-09-10 Thread Christopher Samuel
d then copied back at the end of the job. Slurm doesn't do that, it defaults to writing the output (and errors) to a file in the directory a job is running in, so there's no need to copy it back at the end of the job. Hope that helps, Chris -- Christopher SamuelSenior Systems Admi

[slurm-dev] Re: Override memory limits with --exclusive?

2014-09-18 Thread Christopher Samuel
ensure it was triggered. Hope this helps! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: overcounting of SysV shared memory segments?

2014-09-19 Thread Christopher Samuel
hows a difference in recorded memory usage between the two. Might take a while to have something to report though. :-) All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)

[slurm-dev] Using control groups to restrict resource usage on BG/Q launch node?

2014-09-22 Thread Christopher Samuel
ndle this gracefully. cheers! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: shellshock patch uses a different function export, caused some errors on our Slurm cluster

2014-09-28 Thread Christopher Samuel
can't submit jobs to it by accident. 7) Let users back onto the login nodes. 8) Set partitions back to "up" to start jobs going again. Hope this helps folks.. cheers! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation

[slurm-dev] Restrict access to a frontend to a partition in 2.6 ?

2014-09-28 Thread Christopher Samuel
nd groups, neither of which help in this scenario), but just wondering if I've missed anything obvious. cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

[slurm-dev] Re: Restrict access to a frontend to a partition in 2.6 ?

2014-09-29 Thread Christopher Samuel
On 29/09/14 21:33, je...@schedmd.com wrote: > Slurm does not support this today. Thanks Moe, we'll see if we can figure another way around it. cheers! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Em

[slurm-dev] Re: Slurm User Group Meeting 2014: Presentations now online

2014-09-29 Thread Christopher Samuel
On 30/09/14 02:39, je...@schedmd.com wrote: > About 70 people attended the Slurm User Group Meeting last week in > Lugano Switzerland. Thanks so much to you and everyone who organised the meeting and to everyone who came, it was well worth attending. All the best, Chris -- Christopher

[slurm-dev] Re: shellshock patch uses a different function export, caused some errors on our Slurm cluster

2014-10-01 Thread Christopher Samuel
On 29/09/14 15:28, Christopher Samuel wrote: > B) If you update a compute node when there are jobs queued under the > previous bash then they will fail when they run there (also cannot find > modules, even though a prologue of ours sets BASH_ENV to force the env > vars to get

[slurm-dev] Re: Build error on CentOS 5.10

2014-10-01 Thread Christopher Samuel
ll) What do the following commands say? file /usr/bin/perl ls -l /usr/bin/perl getfacl /usr/bin/perl Do you have SE Linux enabled? cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Pho

[slurm-dev] Re: Build error on CentOS 5.10

2014-10-02 Thread Christopher Samuel
On 03/10/14 03:05, Michael Jennings wrote: > Is your /var/tmp mounted "noexec" by any chance? I wondered that but I don't think that'll give the errors that Dennis is seeing, he's seeing -EACCES for /usr/bin/perl, not for something in /var/tmp. :-( -- Christopher

[slurm-dev] Re: Build error on CentOS 5.10

2014-10-02 Thread Christopher Samuel
e say? cheers! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Prune database before migration to 14.11 ?

2014-10-13 Thread Christopher Samuel
ion row table in another database took just a few seconds. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Running batch jobs to handle change management via Puppet

2014-10-15 Thread Christopher Samuel
th 'nohup'. cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Prune database before migration to 14.11 ?

2014-10-15 Thread Christopher Samuel
r (pretty beefy) DB servers. > The "foreign constraints" came from another database a colleague of mine > has installed, we could fix this :-) :-) cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Emai

[slurm-dev] Re: Associations and DefaultQOS

2014-10-16 Thread Christopher Samuel
. and that new user submits a job, the > default qos I've configured is not respected. Hmm, could you try and mark a partition as UP with scontrol and see if that helps? It's something we do here on Slurm 2.6 and (I believe) resolves this for us. All the best, Chris --

[slurm-dev] Re: Prune database before migration to 14.11 ?

2014-10-21 Thread Christopher Samuel
On 16/10/14 16:02, Christopher Samuel wrote: > No worries, we're going to test out ours in a sandbox as well, so we'll > be able to compare it to our (pretty beefy) DB servers. It took around 2 minutes to add all the indexes in our sandbox, thats with a total of about 6 milli

[slurm-dev] Re: including config files

2014-10-23 Thread Christopher Samuel
ch that each cluster includes local.conf but have local.conf as a symlink to $cluster.conf (and then exclude local.conf from git/rsync or however else that is managed/distributed, if it is automated). We do this trick already with other tools here at VLSCI. cheers, Chris -- Christopher Samuel

[slurm-dev] Re: SLURM and SSSD group enumeration

2014-10-29 Thread Christopher Samuel
can't test this. commit 18d809fabfdf654facc480d114f6f3694b6bdd84 Author: Morris Jette Date: Wed Aug 21 14:46:52 2013 -0700 Allocate group member buffer size as needed Replace fixed size buffer with a buffer that can grow as needed. -- Christopher SamuelSenior Systems Adm

[slurm-dev] Re: SLURM and SSSD group enumeration

2014-10-29 Thread Christopher Samuel
sion of Slurm are you using? cheers! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Failure tolerance in "slurm + openmpi"

2014-10-29 Thread Christopher Samuel
en-MPI 1.8.x as that can use PMI2 with Slurm and fixes this scaling issue. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Failure tolerance in "slurm + openmpi"

2014-10-29 Thread Christopher Samuel
en Slurm decides to clean things up for you at that point. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] slurmstepd: _slurm_cgroup_destroy: problem deleting step cgroup path

2014-11-06 Thread Christopher Samuel
the list (after SC14) is to migrate to Open-MPI 1.8.4 which is due out shortly which should address this. cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 htt

[slurm-dev] Re: slurmstepd: _slurm_cgroup_destroy: problem deleting step cgroup path

2014-11-06 Thread Christopher Samuel
no longer has that in the stdout or stderr files as a > result of upgrading to 14.03.10. Yup, this is with 14.03.10. I've only managed to provoke it with NAMD so far, but I guess we'll hear from users if they see it with other codes too. :-) All the best, Chris -- Chris

[slurm-dev] Re: slurmstepd: _slurm_cgroup_destroy: problem deleting step cgroup path

2014-11-06 Thread Christopher Samuel
id_500/job_2497190/ cgroup.event_control cgroup.procs freezer.state notify_on_release tasks > Perhaps we can put it at debug level as before as it may concern users. If it is just cosmetic it'd be good I think. All the best, Chris -- Christopher SamuelSenior Systems Administrat

[slurm-dev] Re: Is /tmp guaranteed to be writable?

2014-11-11 Thread Christopher Samuel
teable then it's a bug, not a feature. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] RE: Is /tmp guaranteed to be writable?

2014-11-11 Thread Christopher Samuel
des? As a sysadmin any application should honour and use $TMPDIR if defined. If that isn't set then all you can really rely on is /tmp and /var/tmp. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email:

[slurm-dev] The (simple) VLSCI PBS to Slurm script converter

2014-11-20 Thread Christopher Samuel
I hope this is useful to people, have at it please! All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Debugging backfill in production - how?

2014-11-24 Thread Christopher Samuel
None". SchedulerParameters=bf_window=43200,bf_resolution=600,bf_max_job_user=5,max_job_bf=1,bf_continue,defer Everything seems to perform well with those settings, slurmctld is at around 8GB virtual and only ~35MB RSS for instance. Best of luck! Chris -- Christopher SamuelSen

[slurm-dev] Re: CR_Core_Memory

2014-11-26 Thread Christopher Samuel
accounting as it was experimental and could give incorrect info, but it seems fine in 14.03.x. Hope this helps! All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)

[slurm-dev] Re: slurmdbd - error: as_mysql_step_complete: Not inputing this job, it has no submit time.

2014-11-27 Thread Christopher Samuel
(upgraded from 2.6.x recently). :-( Best of luck! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Consistent memory units for sacct ?

2014-11-27 Thread Christopher Samuel
d whilst I could munge things based on the units output it'd be nice if I could tell it to just report in KB for everything. This is 14.03.10 BTW. Any ideas anyone? cheers! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiat

[slurm-dev] Re: Job Resource Report

2014-12-01 Thread Christopher Samuel
oo0010 0.00M -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Slurm RPMS

2014-12-03 Thread Christopher Samuel
pendent. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Overall memory usage for a job

2014-12-21 Thread Christopher Samuel
up * If there's more then pick the high water mark for all the steps and add that to the batch job. Does that sound about right? It doesn't need to be byte-accurate, this is more to give users a good indication of how much memory their jobs are using. All the best, Chris -- Christoph

[slurm-dev] Re: rounding in memory scheduling - or overcommit

2014-12-21 Thread Christopher Samuel
nodes in preference to higher ones unless the lower memory nodes are taken. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: CR_Core_Memory

2014-12-22 Thread Christopher Samuel
tc/cgroup_agents" ConstrainCores=yes ConstrainRAMSpace=yes ConstrainSwapSpace=yes Hope this helps! All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vl

[slurm-dev] Re: Running java application with HugePages on Slurm

2014-12-22 Thread Christopher Samuel
appserver/java-hotspottm-64-bit-server-vm-warning-failed-to-reserve-shared-memory-errno-12 This can also affect Open-MPI over IB too: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-user Best of luck, Chris -- Christopher SamuelSenior Systems Administrator VLSCI -

[slurm-dev] Re: slurmstepd: mpi/pmi2: invalid kvs seq from srun

2015-01-05 Thread Christopher Samuel
. All the best & happy new year! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] LLVM clang warning shows bug in slurm-15.08.0-0pre1 & master

2015-01-05 Thread Christopher Samuel
add_user_space_avail += bb_ptr->size; } bb_ptr = bb_ptr->next; Hope this helps, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Failed to initialize accounting storage plugin

2015-01-07 Thread Christopher Samuel
the rename on systems being upgraded). You can see more information here: https://tracker.debian.org/pkg/slurm-llnl If you're on Wheezy that will give you Slurm 2.3.4 (Jessie will have Slurm 14.03.9). Hope this helps! Chris -- Christopher SamuelSenior Systems Administrator

[slurm-dev] Re: squeue SEGV error in 14.11.2

2015-01-07 Thread Christopher Samuel
Date: Mon Dec 15 15:24:39 2014 -0800 Revert "Commit 38068d21 expanded the reason for unavailable jobs but" as it may cause core dumo in squeue. This reverts commit 322c783cc437800052827d524e653313d2bed9b6. -- Christopher SamuelSenior Systems Administrator V

[slurm-dev] RE: license scheduling

2015-01-11 Thread Christopher Samuel
On 09/01/15 16:31, gareth.willi...@csiro.au wrote: > I'm still stewing on this. Does anyone have sacctmgr/remote license > setup working? We've never attempted it here sorry! -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Comput

[slurm-dev] Re: FlexLM integration - roughly how much work?

2015-01-15 Thread Christopher Samuel
go around it would work against their customers interest. Yours, Cynical of Melbourne.. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Lock ups with NFSv4 [was: Connection Refused with job cancel]

2015-01-18 Thread Christopher Samuel
On 17/01/15 21:05, Uwe Sauter wrote: > Trey: My Slurm installation completely lies on NFSv4. I've seen weird problems with NFSv4 in RHEL6 in the past, so we just use NFSv3 now. Does the problem go away if you drop back to the 6.5 kernel on the compute nodes? cheers, Chris -- Chr

[slurm-dev] Re: Lock ups with NFSv4 [was: Connection Refused with job cancel]

2015-01-19 Thread Christopher Samuel
rnel panics when we boot our 4 racks at once). Could I suggest perhaps trying the Beowulf list for this? It might be a better forum for general Linux distro and kernel problems in HPC: http://beowulf.org/ Caveat: I run the Beowulf list these days. cheers, Chris -- Christopher SamuelSeni

[slurm-dev] Re: SLURM with VASP

2015-01-28 Thread Christopher Samuel
batch script just before calling VASP to capture what is actually getting set, just in case there's something odd going on.. We had people using VASP last year with Slurm and OpenMPI and they didn't seem to have any issues. Best of luck! Chris -- Christopher SamuelSenior System

[slurm-dev] Re: SLURM with VASP

2015-01-29 Thread Christopher Samuel
rom inside the VASP batch job (just before it starts) so we can see what the limits are please? thanks, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.or

[slurm-dev] Re: Paramater Analagous to MAXLOAD on Torque/Maui?

2015-01-29 Thread Christopher Samuel
t off and consider using cgroups to contain jobs. Best of luck! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: unexpected reject after create reservation .. account=

2015-02-02 Thread Christopher Samuel
lem. Could you check to see if it's the same for you? All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: sreport -- customize scripting -- percentages

2015-02-17 Thread Christopher Samuel
rt? In my testing here (14.03.11) that worked for all those examples, *except* for the last one. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Debugging lingering srun job allocations

2015-02-18 Thread Christopher Samuel
oms to what you are seeing. It also links to the Slurm docs on high throughput computing which may well be useful for your configuration, you can find them here: http://www.schedmd.com/slurmdocs/high_throughput.html Other than that I've no ideas sorry! Best of luck, Chris -- Christopher

[slurm-dev] Re: slurm on NFS for a cluster?

2015-03-24 Thread Christopher Samuel
h isn't going to happen if it's also ro? cheers! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: slurm on NFS for a cluster?

2015-03-24 Thread Christopher Samuel
que on everything from RH7.3 (yes, pre RHEL), SLES9, SLES10, RHEL 3, 4 & 5 (we moved to Slurm when we went to RHEL6). All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: slurm on NFS for a cluster - Part II

2015-03-25 Thread Christopher Samuel
thing we just install as an RPM (from EPEL) and our xCAT takes care of ensuring the keys are correct. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 555

[slurm-dev] Re: slurm on NFS for a cluster - Part II

2015-03-25 Thread Christopher Samuel
On 26/03/15 14:11, Fred Liu wrote: > Just curious, What is xCAT? It's a cluster management suite, we've used it on SGI as well as IBM gear: http://xcat.sourceforge.net/ All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Science

[slurm-dev] Re: Problems running job

2015-03-30 Thread Christopher Samuel
x27; state waiting on some form of device I/O. I know some people have reported strange interactions between Slurm being on an NFSv4 mount (NFSv3 is fine). Good luck! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Question about prologging

2015-04-14 Thread Christopher Samuel
at created commands that got run with srun in parallel on our BG/Q system). cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci

[slurm-dev] Re: Question about prologging

2015-04-15 Thread Christopher Samuel
t Tools) at SC14. :-) http://www.ugent.be/hpc/hust14.html I really liked the look of XALT, but the reliance on people running applications with srun pretty much nixed it here. :-( All the best! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Comput

  1   2   3   4   5   >