Dear Matteo, I believe that the only MaxTime setting you have is in your stanza that begins with "PartitionName=DEFAULT", where you establish MaxTime=2-0. I believe this is the limit you are exceeding.
I believe you would need to increase this to 7-0, and then establish lower limits through other QoSs. Or possibly (I haven't tried this) you can set this limit in the stanzas for only the partitions that you wish to be able to run longer--that is, only on the lines for PartitionName=foff1 and foff2. Hope some of that helps. Best wishes, Lyn On Mon, Oct 31, 2011 at 5:42 AM, Matteo Guglielmi <[email protected]>wrote: > Dear All, > > I'm trying to create a simple qos called 1week which > I would like to associate to those users who do need > to run for one week instead of 2 days at maximum: > > ### slurm.conf ### > EnforcePartLimits=YES > TaskPlugin=task/affinity > TaskPluginParam=Sched > TopologyPlugin=topology/none > TrackWCKey=no > SchedulerType=sched/backfill > SelectType=select/cons_res > SelectTypeParameters=CR_Core_Memory > PriorityType=priority/multifactor > PriorityDecayHalfLife=7-0 > PriorityCalcPeriod=5 > PriorityFavorSmall=YES > PriorityMaxAge=7-0 > PriorityUsageResetPeriod=NONE > PriorityWeightAge=1000 > PriorityWeightFairshare=1000 > PriorityWeightJobSize=10000 > PriorityWeightPartition=10000 > PriorityWeightQOS=10000 > AccountingStorageEnforce=limits,qos > AccountingStorageType=accounting_storage/slurmdbd > JobCompType=jobcomp/none > JobAcctGatherType=jobacct_gather/linux > PreemptMode=suspend,gang > PreemptType=preempt/partition_prio > > NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN > > NodeName=foff[01-08] Procs=8 CoresPerSocket=4 Sockets=2 ThreadsPerCore=1 > RealMemory=7000 Weight=1 Feature=X5482,foff,fofflm > NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4 ThreadsPerCore=1 > RealMemory=127000 Weight=1 Feature=6176,foff,foffhm > > PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED > MaxTime=2-0 PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO > PartitionName=batch Nodes=foff[01-13] Default=YES > PartitionName=foff1 Nodes=foff[01-08] Priority=1000 > PartitionName=foff2 Nodes=foff[09-13] Priority=1000 > ################# > > sacctmgr list associations > format=Account,Cluster,User,Fairshare,Partition,defaultqos,qos tree withd > > Account Cluster User Share Partition Def QOS > QOS > -------------------- ---------- ---------- --------- ---------- --------- > -------------------- > root superb 1 > normal > root superb root 1 > normal > sb superb 1 > normal > sb superb belushki 1 batch > normal > sb superb fiocco 1 batch > normal > gr-fo superb 1 > normal > gr-fo superb belushki 1 foff1 > normal > gr-fo superb belushki 1 foff2 > normal > gr-fo superb fiocco 1 foff1 > normal > gr-fo superb fiocco 1 foff2 > normal > > > sacctmgr add qos Name=1week MaxWall=7-0 Priority=100 PreemptMode=Cluster > Flags=PartitionTimeLimit > > sacctmgr modify user name=belushki Account=gr-fo set qos+=1week > > sacctmgr list associations > format=Account,Cluster,User,Fairshare,Partition,defaultqos,qos tree withd > > Account Cluster User Share Partition Def QOS > QOS > -------------------- ---------- ---------- --------- ---------- --------- > -------------------- > root superb 1 > normal > root superb root 1 > normal > sb superb 1 > normal > sb superb belushki 1 batch > normal > sb superb fiocco 1 batch > normal > gr-fo superb 1 > normal > gr-fo superb belushki 1 foff1 > 1week,normal > gr-fo superb belushki 1 foff2 > 1week,normal > gr-fo superb fiocco 1 foff1 > normal > gr-fo superb fiocco 1 foff2 > normal > > /etc/init.d/slurmd restart (same command was issued on all nodes too) > > su - belushki > > srun -p foff2 -A gr-fo --qos=1week -t 7-0 hostname > srun: error: Unable to allocate resources: Requested time limit is invalid > (exceeds some limit) > > > Could you tell me what I still miss in order to make it working for user > "belushki"? > > Thanks, > > --matt >
