Dear Matteo,

I believe that the only MaxTime setting you have is in your stanza that
begins with "PartitionName=DEFAULT", where you establish MaxTime=2-0.  I
believe this is the limit you are exceeding.

I believe you would need to increase this to 7-0, and then establish lower
limits through other QoSs.  Or possibly (I haven't tried this) you can set
this limit in the stanzas for only the partitions that you wish to be able
to run longer--that is, only on the lines for PartitionName=foff1 and foff2.

Hope some of that helps.

Best wishes,
Lyn

On Mon, Oct 31, 2011 at 5:42 AM, Matteo Guglielmi
<[email protected]>wrote:

> Dear All,
>
> I'm trying to create a simple qos called 1week which
> I would like to associate to those users who do need
> to run for one week instead of 2 days at maximum:
>
> ### slurm.conf ###
> EnforcePartLimits=YES
> TaskPlugin=task/affinity
> TaskPluginParam=Sched
> TopologyPlugin=topology/none
> TrackWCKey=no
> SchedulerType=sched/backfill
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
> PriorityType=priority/multifactor
> PriorityDecayHalfLife=7-0
> PriorityCalcPeriod=5
> PriorityFavorSmall=YES
> PriorityMaxAge=7-0
> PriorityUsageResetPeriod=NONE
> PriorityWeightAge=1000
> PriorityWeightFairshare=1000
> PriorityWeightJobSize=10000
> PriorityWeightPartition=10000
> PriorityWeightQOS=10000
> AccountingStorageEnforce=limits,qos
> AccountingStorageType=accounting_storage/slurmdbd
> JobCompType=jobcomp/none
> JobAcctGatherType=jobacct_gather/linux
> PreemptMode=suspend,gang
> PreemptType=preempt/partition_prio
>
> NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN
>
> NodeName=foff[01-08] Procs=8  CoresPerSocket=4  Sockets=2 ThreadsPerCore=1
> RealMemory=7000   Weight=1 Feature=X5482,foff,fofflm
> NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4 ThreadsPerCore=1
> RealMemory=127000 Weight=1 Feature=6176,foff,foffhm
>
> PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED
> MaxTime=2-0 PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO
> PartitionName=batch   Nodes=foff[01-13] Default=YES
> PartitionName=foff1   Nodes=foff[01-08] Priority=1000
> PartitionName=foff2   Nodes=foff[09-13] Priority=1000
> #################
>
> sacctmgr list associations
> format=Account,Cluster,User,Fairshare,Partition,defaultqos,qos tree withd
>
>             Account    Cluster       User     Share  Partition   Def QOS
>                QOS
> -------------------- ---------- ---------- --------- ---------- ---------
> --------------------
> root                     superb                    1
>              normal
>  root                    superb       root         1
>              normal
>  sb                      superb                    1
>              normal
>  sb                     superb   belushki         1      batch
>             normal
>  sb                     superb     fiocco         1      batch
>             normal
>  gr-fo                  superb                    1
>              normal
>   gr-fo                 superb   belushki         1      foff1
>             normal
>   gr-fo                 superb   belushki         1      foff2
>             normal
>   gr-fo                 superb     fiocco         1      foff1
>             normal
>   gr-fo                 superb     fiocco         1      foff2
>             normal
>
>
> sacctmgr add qos Name=1week MaxWall=7-0 Priority=100 PreemptMode=Cluster
> Flags=PartitionTimeLimit
>
> sacctmgr modify user name=belushki Account=gr-fo set qos+=1week
>
> sacctmgr list associations
> format=Account,Cluster,User,Fairshare,Partition,defaultqos,qos tree withd
>
>             Account    Cluster       User     Share  Partition   Def QOS
>                QOS
> -------------------- ---------- ---------- --------- ---------- ---------
> --------------------
> root                     superb                    1
>              normal
>  root                    superb       root         1
>              normal
>  sb                      superb                    1
>              normal
>  sb                     superb   belushki         1      batch
>             normal
>  sb                     superb     fiocco         1      batch
>             normal
>  gr-fo                  superb                    1
>              normal
>   gr-fo                 superb   belushki         1      foff1
>       1week,normal
>   gr-fo                 superb   belushki         1      foff2
>       1week,normal
>   gr-fo                 superb     fiocco         1      foff1
>             normal
>   gr-fo                 superb     fiocco         1      foff2
>             normal
>
> /etc/init.d/slurmd restart (same command was issued on all nodes too)
>
> su - belushki
>
> srun -p foff2 -A gr-fo --qos=1week -t 7-0 hostname
> srun: error: Unable to allocate resources: Requested time limit is invalid
> (exceeds some limit)
>
>
> Could you tell me what I still miss in order to make it working for user
> "belushki"?
>
> Thanks,
>
> --matt
>

Reply via email to