[slurm-dev] Re: jobfilter plugin

2014-09-24 Thread Eva Hocks
JobSubmitPlugins=job_submit/defaults On Wed, 24 Sep 2014, David Bigagli wrote: What is in your slurm.conf: JobSubmitPlugins=? On Wed 24 Sep 2014 01:50:25 PM PDT, Eva Hocks wrote: Thanks for the help. Unfortunately I get the same error when using [2014-09-24T13:47:52.151

[slurm-dev] Re: jobfilter plugin

2014-09-24 Thread Eva Hocks
] error: cannot create job_submit context for job_submit/defaults [2014-09-24T15:14:49.709] fatal: failed to initialize job_submit plugin On Wed, 24 Sep 2014, Eva Hocks wrote: JobSubmitPlugins=job_submit/defaults On Wed, 24 Sep 2014, David Bigagli wrote: What is in your slurm.conf

[slurm-dev] job pending, not starting

2014-09-23 Thread Eva Hocks
How can I get a job started after it was pending with JobState=PENDING Reason=AssociationJobLimit I removed the qos job limit with no success, I removed the user from the qos with no success. I tried the scontrol StartTime=now with no success. So how can I get the job running? What is the

[slurm-dev] Requested node configuration is not available

2014-09-23 Thread Eva Hocks
slurm Version=14.03: I am trying to run a simple job with #SBATCH --nodes=1-1 #SBATCH --ntasks=2 #SBATCH --cpus-per-task=1 on a test cluster with 2 nodes both configured: CPUAlloc=0 CPUErr=0 CPUTot=8 but whenever I try sbatch it refuses: Requested node configuration is not available. The

[slurm-dev] Re: Requested node configuration is not available

2014-09-23 Thread Eva Hocks
If I use #SBATCH -N 1-1 #SBATCH -c 2 the job starts and uses NumNodes=1 NumCPUs=8 CPUs/Task=2 Eva On Tue, 23 Sep 2014, Eva Hocks wrote: slurm Version=14.03: I am trying to run a simple job with #SBATCH --nodes=1-1 #SBATCH --ntasks=2 #SBATCH --cpus-per-task=1 on a test

[slurm-dev] Re: Requested node configuration is not available

2014-09-23 Thread Eva Hocks
? Doesn't look like it. Thanks Eva On Tue, 23 Sep 2014, Eva Hocks wrote: If I use #SBATCH -N 1-1 #SBATCH -c 2 the job starts and uses NumNodes=1 NumCPUs=8 CPUs/Task=2 Eva On Tue, 23 Sep 2014, Eva Hocks wrote: slurm Version=14.03: I am trying to run a simple job

[slurm-dev] Re: Requested node configuration is not available

2014-09-23 Thread Eva Hocks
- Original Message - From: Eva Hocks ho...@sdsc.edu To: slurm-dev slurm-dev@schedmd.com Sent: Tuesday, September 23, 2014 5:59:28 PM Subject: [slurm-dev] Re: Requested node configuration is not available I found #SBATCH --ntasks-per-node=2 and now the job runs

[slurm-dev] sacctmgr

2014-09-16 Thread Eva Hocks
trying to set limits for qos with sacctmgr. Unfortunately the syntax mentioned on http://slurm.schedmd.com/sacctmgr.html doesn't seem to work: modify qos name=expedite set MaxNodesPerJob=5 MaxJobs=4 MaxProcSecondsPerJob=20 FairShare=999 MaxWallDurationPerJob=40 Unknown option:

[slurm-dev] jobfilter plugin

2014-09-16 Thread Eva Hocks
trying to use the jobfilter plugin but the slurmctl does start at all logging the following error: [2014-09-16T15:23:44.303] slurmctld version 14.03.6 started on cluster hpcdev-005 [2014-09-16T15:23:44.304] error: Couldn't find the specified plugin name for job_submit/default looking at all

[slurm-dev] Re: slurm salloc

2014-09-12 Thread Eva Hocks
SLURM_SUBMIT_DIR=/nfs/admins/adm17 SLURM_JOB_NODELIST=n523601 SLURM_JOB_CPUS_PER_NODE=40 SLURM_SUBMIT_HOST=frontend SLURM_JOB_PARTITION=foo SLURM_JOB_NUM_NODES=1 Regards, Uwe Am 12.09.2014 um 00:45 schrieb Eva Hocks: I am trying to configure the latest slurm 14.03

[slurm-dev] slurm salloc

2014-09-11 Thread Eva Hocks
I am trying to configure the latest slurm 14.03 and am running into problem to prevent slurm from running jobs on the control node. sinfo shows 3 nodes configure in the slurm.conf: active up2:00:00 1 down* hpc-0-5 active up2:00:00 1mix hpc-0-4 active up

[slurm-dev] Re: How to spread jobs among nodes?

2014-05-12 Thread Eva Hocks
in slurm 14.03.1-2 scontrol: error: Bad SelectTypeParameter: LLN while CR_LLN works. Is that compatible to the priority setting in maob/maui:PRIORITYF='-LOAD + .01*AMEM - 10*JOBCOUNT' Thanks Eva On Thu, 8 May 2014, Atom Powers wrote: Thank you Morris, LLN looks like it may be what

[slurm-dev] squeue: error: undefined symbol: slurm_auth_get_arg_desc

2014-04-30 Thread Eva Hocks
when running slurm commands as a user (root works just fine) the commands show: squeue: symbol lookup error: /usr/lib64/slurm/auth_munge.so: undefined symbol: slurm_auth_get_arg_desc scontrol: symbol lookup error: /usr/lib64/slurm/auth_munge.so: undefined symbol: slurm_auth_get_arg_desc

[slurm-dev] slurm allocation manager

2014-04-28 Thread Eva Hocks
Does slurm have a plugin to use an allocation manager which manages resource allocations where a resource allocation grants a job a right to use a particular amount of resources? Thanks Eva

[slurm-dev] Re: select/cons_res CR_CORE, CR_CPU: what's the difference?

2013-09-05 Thread Eva Hocks
Does that work with SelectTypeParameters=CR_Core_Memory as well? I need to use CR_Core_Memory since I have 1 partition where I do not want to monitor memory and only CR_Core is allowed for partition configuration. Thanks Eva On Thu, 5 Sep 2013, Moe Jette wrote: Slurm is allocating one

[slurm-dev] backfill - node allocation

2013-09-05 Thread Eva Hocks
I have configured the SchedulerType=sched/backfill with 30 node in the partition gpu-1-4 1 batch*mixed 322:8:2 2584330 2051220 gtx680,mat none gpu-1-[5-7] 3 batch*idle 322:8:2 258433+0 2051220 gtx680,mat none

[slurm-dev] slurm preemption: Unable to allocate resources

2013-09-03 Thread Eva Hocks
with slurm 2.6.0 I configured preemption: PreemptMode=SUSPEND,GANG PreemptType=preempt/partition_prio and 3 partitions with PartitionName=batch PreemptMode=ON Priority=1 PartitionName=active PreemptMode=OFF Priority=1000 PartitionName=gpu PreemptMode=ON Priority=100 Nevertheless when

[slurm-dev] list of jobs per node

2013-08-28 Thread Eva Hocks
What is the command (other than squeue|grep) to list all jobs running on a specific node? I am looking for a output similar to pbsnodes xxx-14-65 xxx-14-65 state = free np = 16 properties = native,flash ntype = cluster jobs = 0/886627.gordon-fe2.local ^

[slurm-dev] Re: list of jobs per node

2013-08-28 Thread Eva Hocks
as far as I know I can get partition information from sinfo, but no jobs per node. Any --format option I missed? Thanks Eva On Wed, 28 Aug 2013, Jonathan Mills wrote: Have you looked at the output of 'sinfo'? On 08/28/2013 04:35 PM, Eva Hocks wrote: What is the command (other

[slurm-dev] Re: list of jobs per node

2013-08-28 Thread Eva Hocks
option ? Sincerly damien Le 28 ao???t 2013 ??? 22:54, Eva Hocks a ???crit : as far as I know I can get partition information from sinfo, but no jobs per node. Any --format option I missed? Thanks Eva On Wed, 28 Aug 2013, Jonathan Mills wrote: Have you looked

[slurm-dev] CR_Core_Memory and CR_Core versus CR_CPU_Memory

2013-08-09 Thread Eva Hocks
I am struggling to set slurm to allocate the correct CPUs for a job requesting 1 task. With hyperthreading enabled and the CR_Core setting slurm allocates 2 CPUs per job requesting 1 CPU: srun -n 1 --pty --ntasks-per-node=1 shows: NodeList=gpu-1-9 NumNodes=2 NumCPUs=2 CPUs/Task=1 The

[slurm-dev] Re: CR_Core_Memory and CR_Core versus CR_CPU_Memory

2013-08-09 Thread Eva Hocks
for other jobs. Or is this just an accounting issue? Martin From: Eva Hocks ho...@sdsc.edu To: slurm-dev slurm-dev@schedmd.com, Date: 08/09/2013 11:41 AM Subject:[slurm-dev] CR_Core_Memory and CR_Core versus CR_CPU_Memory I am struggling to set slurm to allocate

[slurm-dev] Re: slurm (ReqNodeNotAvail) for jobs using --ntasks-per-node

2013-08-08 Thread Eva Hocks
the ReqNodeNotAvail resolved? I tried to hold and release the jobs with no success. Any help is much appreciated Thanks Eva On Wed, 7 Aug 2013, Eva Hocks wrote: All jobs on the partition which use --ntasks-per-node in the sbatch script are not scheduled any more. The log shows: [2013-08-07T12:07:32.016

[slurm-dev] Re: cons_res: Can't use Partition SelectType

2013-08-07 Thread Eva Hocks
. You are using CR_CPU_Memory. Best regards, Magnus On 2013-08-05 23:13, Eva Hocks wrote: I am getting spam messages in the logs: [2013-08-05T14:04:32.000] cons_res: Can't use Partition SelectType unless using CR_Socket or CR_Core and CR_ALLOCATE_FULL_SOCKET The slurm.conf

[slurm-dev] cons_res: Can't use Partition SelectType

2013-08-05 Thread Eva Hocks
I am getting spam messages in the logs: [2013-08-05T14:04:32.000] cons_res: Can't use Partition SelectType unless using CR_Socket or CR_Core and CR_ALLOCATE_FULL_SOCKET The slurm.conf settings are: SelectType=select/cons_res SelectTypeParameters=CR_CPU_Memory and I have set one

[slurm-dev] idle nodes and jobs PD (Resources)

2013-08-05 Thread Eva Hocks
I have 17 idle node in one partition : gpu up infinite 3 down* gpu-1-[12-13,15] gpu up infinite 9mix gpu-1-[4-11,17] gpu up infinite 17 idle gpu-1-[14,16],gpu-2-[4-17],gpu-3-9 but jobs do not get scheduled with (Resources). There should be

[slurm-dev] job information

2013-07-29 Thread Eva Hocks
How can I obtain any full job information in slurm (eqiuvalent to torque qstat -f). I am looking for information on Variable_List Output_Path Error_Path Thanks Eva

[slurm-dev] srun: error: Unable to allocate resources: Access/permission denied

2013-07-18 Thread Eva Hocks
I am trying to use a Login node, other then the primary slurm scheduler node, for batch job submission but it does not work. on the master node: fe1 ~]$ sbatch sbatch Submitted batch job 59 on any batch node: gpu-ln1 ~]$ sbatch sbatch sbatch: error: Batch job submission failed:

[slurm-dev] Re: srun: error: Unable to allocate resources: Access/permission denied

2013-07-18 Thread Eva Hocks
in scontrol manual. Sent from my iPhone On 18 Tem 2013, at 22:02, Eva Hocks ho...@sdsc.edu wrote: I am trying to use a Login node, other then the primary slurm scheduler node, for batch job submission but it does not work. on the master node: fe1 ~]$ sbatch sbatch Submitted

[slurm-dev] Re: scontrol show nodes

2013-07-17 Thread Eva Hocks
,unk --format=%12n %20f %20H %12u %32E Hope that helps, Matt On 7/17/13 5:14 PM, Eva Hocks wrote: How can I list information on only down nodes which were offlined due to a Reason? I am looking for a command similar to torque's pbsnodes -ln, a short list with not all the node

[slurm-dev] Re: scontrol show nodes

2013-07-17 Thread Eva Hocks
-05-06T01:09:31 puck3 HTH Michael On Wed, Jul 17, 2013 at 2:14 PM, Eva Hocks ho...@sdsc.edu wrote: How can I list information on only down nodes which were offlined due to a Reason? I am looking for a command similar to torque's pbsnodes -ln, a short list with not all

[slurm-dev] Re: scontrol show nodes

2013-07-17 Thread Eva Hocks
HTH Michael On Wed, Jul 17, 2013 at 2:14 PM, Eva Hocks ho...@sdsc.edu mailto:ho...@sdsc.edu wrote: How can I list information on only down nodes which were offlined due to a Reason? I am looking for a command similar to torque's pbsnodes -ln

[slurm-dev] Re: Slurm task assignment by actual physical core

2013-07-11 Thread Eva Hocks
On Wed, 10 Jul 2013, Danny Auble wrote: The llnl documentation is very out of date. You should be consulting http://slurm.schedmd.com. I found I stll need it for some critical information. Only in the llnl documentation I found the missing the information about what the unit for memory

[slurm-dev] Re: slurm 2.6 partition node specification problem

2013-07-11 Thread Eva Hocks
to reproduce the problem with a quick test. You have config lines similar to these? NodeName=gpu-1-[4-17],gpu-2-[4,6-16],gpu-3-9 ... PartitionName=... Nodes=gpu-1-[4-17],gpu-2-[4,6-16],gpu-3-9 Regards, John On 2013-07-10 19:20, Eva Hocks wrote: Thanks, John

[slurm-dev] slurm 2.6 partition node specification problem

2013-07-10 Thread Eva Hocks
The entry in partiton.conf: PartitionName=CLUSTER Default=yes State=UP nodes=gpu-[1]-[4-17],gpu-[2]-[4,6-16],gpu-[3]-[9] causes slurmctl to crash: 2013-07-10T16:03:22.923] error: find_node_record: lookup failure for gpu-[2]-[4] [2013-07-10T16:03:22.923] error: node_name2bitmap: invalid

[slurm-dev] Re: slurm 2.6 partition node specification problem

2013-07-10 Thread Eva Hocks
after the failure) Thanks Eva On Wed, 10 Jul 2013, John Thiltges wrote: On 07/10/2013 06:16 PM, Eva Hocks wrote: The entry in partiton.conf: PartitionName=CLUSTER Default=yes State=UP nodes=gpu-[1]-[4-17],gpu-[2]-[4,6-16],gpu-[3]-[9] causes slurmctl to crash: 2013-07-10T16:03

[slurm-dev] slurm integration with FlexLM license manager

2013-07-01 Thread Eva Hocks
The documentation announced the integration since 2.4. I am running slurm 2.4.3. Could anyone please point me to where I can find how to onfigure the FlexLM license manager integration with slurm? Thanks Eva

[slurm-dev] Re: jobacct_gather plugins

2013-06-19 Thread Eva Hocks
I second that! Sounds like the correct approach for data intensive computing. Thanks Eva -- University of California, San Diego SDSC, MC 0505 9500 Gilman Drive La Jolla, Ca 92093-0505 Web : http://www.sdsc.edu/~hocks (858) 822-0954email: ho...@sdsc.edu

[slurm-dev] sacctmgr: error: Problem talking to the database: Connection refused

2013-06-18 Thread Eva Hocks
I am new to slurm and have fought my way to get a server and nodes up and even jobs running. But I do not seem to be able to get any accounting information due to DB problems. I am running the slurm rocks roll slurm-6.1.0-1.x86_64 from sourceforge which is slurm 2.5.0 and the munge daemon is

[slurm-dev] Re: sacctmgr: error: Problem talking to the database: Connection refused

2013-06-18 Thread Eva Hocks
starting the MYSQL plugin. I swapped to AccountingStorageType=accounting_storage/filetxt and don't have any problems with that. Thanks Eva On Tue, 18 Jun 2013, John Thiltges wrote: On 06/18/2013 05:53 PM, Eva Hocks wrote: any sacct command returns: slurmctld.log:[2013-06-18T14:45:57-07:00