JobSubmitPlugins=job_submit/defaults
On Wed, 24 Sep 2014, David Bigagli wrote:
What is in your slurm.conf:
JobSubmitPlugins=?
On Wed 24 Sep 2014 01:50:25 PM PDT, Eva Hocks wrote:
Thanks for the help. Unfortunately I get the same error when using
[2014-09-24T13:47:52.151
] error: cannot create job_submit context for
job_submit/defaults
[2014-09-24T15:14:49.709] fatal: failed to initialize job_submit plugin
On Wed, 24 Sep 2014, Eva Hocks wrote:
JobSubmitPlugins=job_submit/defaults
On Wed, 24 Sep 2014, David Bigagli wrote:
What is in your slurm.conf
How can I get a job started after it was pending with
JobState=PENDING Reason=AssociationJobLimit
I removed the qos job limit with no success, I removed the user from the
qos with no success. I tried the scontrol StartTime=now with no success.
So how can I get the job running? What is the
slurm Version=14.03:
I am trying to run a simple job with
#SBATCH --nodes=1-1
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=1
on a test cluster with 2 nodes both configured:
CPUAlloc=0 CPUErr=0 CPUTot=8
but whenever I try sbatch it refuses: Requested node configuration is
not available. The
If I use
#SBATCH -N 1-1
#SBATCH -c 2 the job starts and uses
NumNodes=1 NumCPUs=8 CPUs/Task=2
Eva
On Tue, 23 Sep 2014, Eva Hocks wrote:
slurm Version=14.03:
I am trying to run a simple job with
#SBATCH --nodes=1-1
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=1
on a test
? Doesn't look like it.
Thanks
Eva
On Tue, 23 Sep 2014, Eva Hocks wrote:
If I use
#SBATCH -N 1-1
#SBATCH -c 2 the job starts and uses
NumNodes=1 NumCPUs=8 CPUs/Task=2
Eva
On Tue, 23 Sep 2014, Eva Hocks wrote:
slurm Version=14.03:
I am trying to run a simple job
- Original Message -
From: Eva Hocks ho...@sdsc.edu
To: slurm-dev slurm-dev@schedmd.com
Sent: Tuesday, September 23, 2014 5:59:28 PM
Subject: [slurm-dev] Re: Requested node configuration is not available
I found
#SBATCH --ntasks-per-node=2
and now the job runs
trying to set limits for qos with sacctmgr. Unfortunately the syntax
mentioned on http://slurm.schedmd.com/sacctmgr.html
doesn't seem to work:
modify qos name=expedite set MaxNodesPerJob=5 MaxJobs=4 MaxProcSecondsPerJob=20
FairShare=999 MaxWallDurationPerJob=40
Unknown option:
trying to use the jobfilter plugin but the slurmctl does start at all
logging the following error:
[2014-09-16T15:23:44.303] slurmctld version 14.03.6 started on cluster
hpcdev-005
[2014-09-16T15:23:44.304] error: Couldn't find the specified plugin name for
job_submit/default looking at all
SLURM_SUBMIT_DIR=/nfs/admins/adm17
SLURM_JOB_NODELIST=n523601
SLURM_JOB_CPUS_PER_NODE=40
SLURM_SUBMIT_HOST=frontend
SLURM_JOB_PARTITION=foo
SLURM_JOB_NUM_NODES=1
Regards,
Uwe
Am 12.09.2014 um 00:45 schrieb Eva Hocks:
I am trying to configure the latest slurm 14.03
I am trying to configure the latest slurm 14.03 and am running into
problem to prevent slurm from running jobs on the control node.
sinfo shows 3 nodes configure in the slurm.conf:
active up2:00:00 1 down* hpc-0-5
active up2:00:00 1mix hpc-0-4
active up
in slurm 14.03.1-2
scontrol: error: Bad SelectTypeParameter: LLN
while CR_LLN works. Is that compatible to the priority setting in
maob/maui:PRIORITYF='-LOAD + .01*AMEM - 10*JOBCOUNT'
Thanks
Eva
On Thu, 8 May 2014, Atom Powers wrote:
Thank you Morris,
LLN looks like it may be what
when running slurm commands as a user (root works just fine) the
commands show:
squeue: symbol lookup error: /usr/lib64/slurm/auth_munge.so: undefined symbol:
slurm_auth_get_arg_desc
scontrol: symbol lookup error: /usr/lib64/slurm/auth_munge.so: undefined
symbol: slurm_auth_get_arg_desc
Does slurm have a plugin to use an allocation manager which
manages resource allocations where a resource allocation grants
a job a right to use a particular amount of resources?
Thanks
Eva
Does that work with SelectTypeParameters=CR_Core_Memory as well? I need
to use CR_Core_Memory since I have 1 partition where I do not want to
monitor memory and only CR_Core is allowed for partition configuration.
Thanks
Eva
On Thu, 5 Sep 2013, Moe Jette wrote:
Slurm is allocating one
I have configured the SchedulerType=sched/backfill
with 30 node in the partition
gpu-1-4 1 batch*mixed 322:8:2 2584330
2051220 gtx680,mat none
gpu-1-[5-7] 3 batch*idle 322:8:2 258433+0
2051220 gtx680,mat none
with slurm 2.6.0 I configured preemption:
PreemptMode=SUSPEND,GANG
PreemptType=preempt/partition_prio
and 3 partitions with
PartitionName=batch PreemptMode=ON Priority=1
PartitionName=active PreemptMode=OFF Priority=1000
PartitionName=gpu PreemptMode=ON Priority=100
Nevertheless when
What is the command (other than squeue|grep) to list all jobs running on
a specific node?
I am looking for a output similar to
pbsnodes xxx-14-65
xxx-14-65
state = free
np = 16
properties = native,flash
ntype = cluster
jobs = 0/886627.gordon-fe2.local
^
as far as I know I can get partition information from sinfo, but no jobs
per node. Any --format option I missed?
Thanks
Eva
On Wed, 28 Aug 2013, Jonathan Mills wrote:
Have you looked at the output of 'sinfo'?
On 08/28/2013 04:35 PM, Eva Hocks wrote:
What is the command (other
option ?
Sincerly
damien
Le 28 ao???t 2013 ??? 22:54, Eva Hocks a ???crit :
as far as I know I can get partition information from sinfo, but no jobs
per node. Any --format option I missed?
Thanks
Eva
On Wed, 28 Aug 2013, Jonathan Mills wrote:
Have you looked
I am struggling to set slurm to allocate the correct CPUs for a job
requesting 1 task.
With hyperthreading enabled and the CR_Core setting slurm allocates 2
CPUs per job requesting 1 CPU:
srun -n 1 --pty --ntasks-per-node=1
shows:
NodeList=gpu-1-9
NumNodes=2 NumCPUs=2 CPUs/Task=1
The
for other
jobs. Or is this just an accounting issue?
Martin
From: Eva Hocks ho...@sdsc.edu
To: slurm-dev slurm-dev@schedmd.com,
Date: 08/09/2013 11:41 AM
Subject:[slurm-dev] CR_Core_Memory and CR_Core versus
CR_CPU_Memory
I am struggling to set slurm to allocate
the ReqNodeNotAvail resolved? I tried to hold and release
the jobs with no success.
Any help is much appreciated
Thanks
Eva
On Wed, 7 Aug 2013, Eva Hocks wrote:
All jobs on the partition which use --ntasks-per-node in the sbatch
script are not scheduled any more. The log shows:
[2013-08-07T12:07:32.016
.
You are using CR_CPU_Memory.
Best regards,
Magnus
On 2013-08-05 23:13, Eva Hocks wrote:
I am getting spam messages in the logs:
[2013-08-05T14:04:32.000] cons_res: Can't use Partition SelectType
unless using CR_Socket or CR_Core and CR_ALLOCATE_FULL_SOCKET
The slurm.conf
I am getting spam messages in the logs:
[2013-08-05T14:04:32.000] cons_res: Can't use Partition SelectType
unless using CR_Socket or CR_Core and CR_ALLOCATE_FULL_SOCKET
The slurm.conf settings are:
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
and I have set one
I have 17 idle node in one partition :
gpu up infinite 3 down* gpu-1-[12-13,15]
gpu up infinite 9mix gpu-1-[4-11,17]
gpu up infinite 17 idle gpu-1-[14,16],gpu-2-[4-17],gpu-3-9
but jobs do not get scheduled with (Resources). There should be
How can I obtain any full job information in slurm (eqiuvalent to torque
qstat -f). I am looking for information on
Variable_List
Output_Path
Error_Path
Thanks
Eva
I am trying to use a Login node, other then the primary slurm scheduler
node, for batch job submission but it does not work.
on the master node:
fe1 ~]$ sbatch sbatch
Submitted batch job 59
on any batch node:
gpu-ln1 ~]$ sbatch sbatch
sbatch: error: Batch job submission failed:
in scontrol manual.
Sent from my iPhone
On 18 Tem 2013, at 22:02, Eva Hocks ho...@sdsc.edu wrote:
I am trying to use a Login node, other then the primary slurm scheduler
node, for batch job submission but it does not work.
on the master node:
fe1 ~]$ sbatch sbatch
Submitted
,unk --format=%12n %20f %20H
%12u %32E
Hope that helps,
Matt
On 7/17/13 5:14 PM, Eva Hocks wrote:
How can I list information on only down nodes which were offlined due
to a Reason? I am looking for a command similar to torque's
pbsnodes -ln, a short list with not all the node
-05-06T01:09:31 puck3
HTH
Michael
On Wed, Jul 17, 2013 at 2:14 PM, Eva Hocks ho...@sdsc.edu wrote:
How can I list information on only down nodes which were offlined due
to a Reason? I am looking for a command similar to torque's
pbsnodes -ln, a short list with not all
HTH
Michael
On Wed, Jul 17, 2013 at 2:14 PM, Eva Hocks ho...@sdsc.edu
mailto:ho...@sdsc.edu wrote:
How can I list information on only down nodes which were
offlined due
to a Reason? I am looking for a command similar to torque's
pbsnodes -ln
On Wed, 10 Jul 2013, Danny Auble wrote:
The llnl documentation is very out of date. You should be
consulting http://slurm.schedmd.com.
I found I stll need it for some critical information.
Only in the llnl documentation I found the missing the information
about what the unit for memory
to reproduce the problem with a quick test. You have
config lines similar to these?
NodeName=gpu-1-[4-17],gpu-2-[4,6-16],gpu-3-9 ...
PartitionName=... Nodes=gpu-1-[4-17],gpu-2-[4,6-16],gpu-3-9
Regards,
John
On 2013-07-10 19:20, Eva Hocks wrote:
Thanks, John
The entry in partiton.conf:
PartitionName=CLUSTER Default=yes State=UP
nodes=gpu-[1]-[4-17],gpu-[2]-[4,6-16],gpu-[3]-[9]
causes slurmctl to crash:
2013-07-10T16:03:22.923] error: find_node_record: lookup failure for gpu-[2]-[4]
[2013-07-10T16:03:22.923] error: node_name2bitmap: invalid
after the failure)
Thanks
Eva
On Wed, 10 Jul 2013, John Thiltges wrote:
On 07/10/2013 06:16 PM, Eva Hocks wrote:
The entry in partiton.conf:
PartitionName=CLUSTER Default=yes State=UP
nodes=gpu-[1]-[4-17],gpu-[2]-[4,6-16],gpu-[3]-[9]
causes slurmctl to crash:
2013-07-10T16:03
The documentation announced the integration since 2.4. I am running
slurm 2.4.3.
Could anyone please point me to where I can find how to onfigure the
FlexLM license manager integration with slurm?
Thanks
Eva
I second that! Sounds like the correct approach for data intensive
computing.
Thanks
Eva
--
University of California, San Diego
SDSC, MC 0505
9500 Gilman Drive
La Jolla, Ca 92093-0505 Web : http://www.sdsc.edu/~hocks
(858) 822-0954email: ho...@sdsc.edu
I am new to slurm and have fought my way to get a server and nodes up
and even jobs running. But I do not seem to be able to get any
accounting information due to DB problems.
I am running the slurm rocks roll slurm-6.1.0-1.x86_64 from sourceforge
which is slurm 2.5.0 and the munge daemon is
starting
the MYSQL plugin.
I swapped to AccountingStorageType=accounting_storage/filetxt
and don't have any problems with that.
Thanks
Eva
On Tue, 18 Jun 2013, John Thiltges wrote:
On 06/18/2013 05:53 PM, Eva Hocks wrote:
any sacct command returns:
slurmctld.log:[2013-06-18T14:45:57-07:00
40 matches
Mail list logo