[slurm-dev] 2.6.0 html documentation says CR_Core_Memory Not yet implemented

2013-08-06 Thread Jeff Tan
Dear SchedMD, Just verifying what might be no more than a missed docupdate: is CR_Core_Memory definitely implemented in 2.6.0? The cons_res.html that comes with it says it isn't, but it seems to work when activated. Regards Jeff Dr. Jeff Tan High Performance Computing Specialist IBM

[slurm-dev] Re: reservation/priority problems

2014-01-16 Thread Jeff Tan
tried it just now on an x86 cluster and it also went off straight away. Perhaps some extra debugging might reveal why the 2.6.5 was holding the jobs back? Regards Jeff Jeff Tan High Performance Computing Specialist IBM Research Collaboratory for Life Sciences, Melbourne, Australia Phone: +61 3

[slurm-dev] Fw: experience with temporary absence of slurmdbd on running system

2014-04-07 Thread Jeff Tan
D-oh. Chris and I sent the same query within a couple of minutes of one another. Please ignore this one and respond to Chris Samuel's instead ( Guidance on planning a slurmdbd outage). Regards Jeff Tan High Performance Computing Specialist IBM Research Collaboratory for Life Sciences

[slurm-dev] slurmdbd: more time than is possible

2014-09-09 Thread Jeff Tan
are missing. Any suggestions would be appreciated. Regards Jeff -- Jeff Tan High Performance Computing Specialist IBM Research Collaboratory for Life Sciences, Melbourne, Australia

[slurm-dev] Re: slurmdbd: more time than is possible

2014-09-09 Thread Jeff Tan
My mistake! I was looking at the wrong figure/column: I have logs where the reported d_cpu matches the total number of CPU-seconds for an hour during the hourly rollup, but sometimes the number is higher and sometimes lower. No, d_cpu did not quite match the total number of CPU-seconds on

[slurm-dev] Re: slurmdbd errors

2014-09-15 Thread Jeff Tan
in a script or (as I did above) just on the command line? Also, there are two places where cluster names are defined: slurm.conf and via sacctmgr. Perhaps one or the other has the spelling wrong? Regards Jeff Jeff Tan High Performance Computing Specialist IBM Research Collaboratory for Life Sciences

[slurm-dev] Re: slurmdbd: more time than is possible

2014-09-16 Thread Jeff Tan
received the complaint in the rollup only when the allocation was at 93.75% and higher, and never for hours when allocation was lower. Regards Jeff Jeff Tan High Performance Computing Specialist IBM Research Collaboratory for Life Sciences, Melbourne, Australia Jeff Tan/Australia/IBM wrote on 09/09

[slurm-dev] Re: slurmdbd errors

2014-09-17 Thread Jeff Tan
Hi Brian Glad to have helped with the errors, but I'm not sure what you mean regarding sshare. What does the output look like when you run the command? Regards Jeff Jeff Tan High Performance Computing Specialist IBM Research Collaboratory for Life Sciences, Melbourne, Australia From

[slurm-dev] Re: jobs canceled by other user than owner

2015-11-15 Thread Jeff Tan
Hi Markus Just to clarify: "When switching to user '71187' and executing a 'scancel' to a job from from user '70032' (similar to the last entry of the output above), it is impossible to get a job cancelled." So is user 71187 able to cancel the job submitted by user 70032 with scancel or not?

[slurm-dev] Re: Question about -m cyclic and --exclusive options to slurm

2017-01-03 Thread Jeff Tan
normally wait for all resources to be lined up before it starts the job otherwise. Also, as I understand it, -m (--distribution) does not change Slurm's behavior to line up all the CPUs required in total before starting the job (unless it's a job array). Regards Jeff -- Jeff Tan Infra

[slurm-dev] Re: Question about -m cyclic and --exclusive options to slurm

2017-01-04 Thread Jeff Tan
supports this if you'd like. Regards Jeff -- Jeff Tan Infrastructure Services & Technologies IBM Research - Australia From: "Koziol, Lucas" <lucas.koz...@exxonmobil.com> To: "slurm-dev" <slurm-dev@schedmd.com> Date: 05/01/2017 03:26 Subject:

[slurm-dev] Re: slurmdb error

2017-04-26 Thread Jeff Tan
Hi Mahmood > [root@cluster ~]# ps aux | grep slurmdb > root 3406 0.0 0.0 338636 2672 ?Sl 00:26 0:01 > /usr/sbin/slurmdbd > root 17146 0.0 0.0 105308 888 pts/2S+ 13:26 0:00 grep slurmdb That's good. What does its /var/log/slurm/slurmdbd.log say? Any errors? >

[slurm-dev] Re: slurmdb error

2017-04-26 Thread Jeff Tan
Hi Mahmood > [root@cluster ~]# sacctmgr -i create cluster Rocks-Cluster > sacctmgr: error: slurmdbd: Sending DbdInit msg: Unable to connect to database > sacctmgr: error: Problem talking to the database: Unable to connect > to database You need to narrow that down. If you're using sacctmgr, you