What does the submit script look like and/or 'sbatch' command?  What's the
error in slurmctld.log?

- Trey

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: [email protected]
Jabber: [email protected]

On Thu, Aug 6, 2015 at 3:53 PM, Gerry Creager - NOAA Affiliate <
[email protected]> wrote:

> We recently tried to implement accounting and fair queuing. For
> completeness, the system is a Cray XE6m
>
> In slurm.conf, we have:
> AccountingStorageType=accounting_storage/slurmdbd
> AccountingStorageHost=sdb
> AccountingStorageEnforce=limits
> PriorityType=priority/multifactor
>
> PriorityWeightAge=1000
> PriorityWeightFairshare=10000
> PriorityWeightJobSize=1000
> PriorityWeightPartition=1000
> PriorityWeightQOS=0 # don't use the qos factor
>
> MessageTimeout=45 # problems with race condition!
>
> # PARTITIONS
> PartitionName=workq Default=YES Priority=1 DefaultTime=60 MaxTime=06:00:00
> AllowGroups=ALL
> Nodes=nid00[002-007,024-029,040-043,046-049,052-055,064-071,088-091,094-099,100-103,120-127,136-151,160-167,184-199,216-223,232-247,256-263,2
> 80-287] MaxNodes=135
> PartitionName=debugq Default=YES Priority=5 DefaultTime=60 MaxTime=4:00:00
> AllowGroups=ALL Nodes=nid00[002-007,024-029] MaxNodes=4
> PartitionName=wofq Default=YES Priority=1 DefaultTime=60 MaxTime=06:00:00
> AllowGroups=ALL
> Nodes=nid00[002-007,024-029,040-043,046-049,052-055,064-071,088-091,094-099,100-103,120-127,136-151,160-167,184-199,216-223,232-247,256-263,28
> 0-287] MaxNodes=135
>
> IN sacctmgr, I have the following associations:
>    Cluster    Account       User  Partition     Share GrpJobs GrpNodes
>  GrpCPUs  GrpMem GrpSubmit     GrpWall  GrpCPUMins MaxJobs MaxNodes
>  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins                  QOS   Def QOS
> GrpCPURunMins
> ---------- ---------- ---------- ---------- --------- ------- --------
> -------- ------- --------- ----------- ----------- ------- --------
> -------- --------- ----------- ----------- -------------------- ---------
> -------------
>       loki       root                               1
>
>                                               normal
>
>       loki       root       root                    1
>
>                                               normal
>
>       loki      debug                             200
>                                                                4
>        2    00:15:00                           normal
>
>       loki      debug   chenghao     debugq       200
>                                                                4
>        2    00:15:00                           normal
>
>       loki      debug chris.kar+     debugq       200
>                                                                4
>        2    00:15:00                           normal
>
>       loki      debug    cpotvin     debugq       200
>                                                                4
>        2    00:15:00                           normal
>
>       loki      debug      gerry     debugq       200
>                                                                4
>        2    00:15:00                           normal
>
>       loki      debug james.cor+     debugq       200
>                                                                4
>        2    00:15:00                           normal
>
>       loki      debug      jdgao     debugq       200
>                                                                4
>        2    00:15:00                           normal
>
>       loki      debug   kknopf83     debugq       200
>                                                                4
>        2    00:15:00                           normal
>
>       loki      debug    mansell     debugq       200
>                                                                4
>        2    00:15:00                           normal
>
>       loki      debug     mflora     debugq       200
>                                                                4
>        2    00:15:00                           normal
>
>       loki      debug   nyussouf     debugq       200
>                                                                4
>        2    00:15:00                           normal
>
>       loki      debug   skinnerp     debugq       200
>                                                                4
>        2    00:15:00                           normal
>
>       loki      debug    tajones     debugq       200
>                                                                4
>        2    00:15:00                           normal
>
>       loki      debug     wicker     debugq       200
>                                                                4
>        2    00:15:00                           normal
>
>       loki      debug        wof     debugq       200
>                                                                4
>        2    00:15:00                           normal
>
>       loki largequeue                             100
>                                                               96     1024
>            06:00:00                           normal
>
>       loki largequeue   chenghao      workq       100
>                                                               96     1024
>            06:00:00                           normal
>
>       loki largequeue chris.kar+      workq       100
>                                                               96     1024
>            06:00:00                           normal
>
>       loki largequeue    cpotvin      workq       100
>                                                               96     1024
>            06:00:00                           normal
>
>       loki largequeue      gerry      workq       100
>                                                               96     1024
>            06:00:00                           normal
>
>       loki largequeue james.cor+      workq       100
>                                                               96     1024
>            06:00:00                           normal
>
>       loki largequeue      jdgao      workq       100
>                                                               96     1024
>            06:00:00                           normal
>
>       loki largequeue   kknopf83      workq       100
>                                                               96     1024
>            06:00:00                           normal
>
>       loki largequeue    mansell      workq       100
>                                                               96     1024
>            06:00:00                           normal
>
>       loki largequeue     mflora      workq       100
>                                                               96     1024
>            06:00:00                           normal
>
>       loki largequeue   nyussouf      workq       100
>                                                               96     1024
>            06:00:00                           normal
>
>       loki largequeue   skinnerp      workq       100
>                                                               96     1024
>            06:00:00                           normal
>
>       loki largequeue    tajones                  100
>                                                               96     1024
>            06:00:00                           normal
>
>       loki largequeue     wicker      workq       100
>                                                               96     1024
>            06:00:00                           normal
>
>       loki largequeue        wof      workq       100
>                                                               96     1024
>            06:00:00                           normal
>
>       loki   realtime                            1000
>                                                              128     4096
>            01:00:00                           normal
>
>       loki   realtime        wof       wofq      1000
>                                                              128     4096
>            01:00:00                           normal
>
>       loki smallqueue                             100
>                                                               36      288
>            06:00:00                           normal
>
>       loki smallqueue   chenghao      workq       100
>                                                               36      288
>            06:00:00                           normal
>
>       loki smallqueue chris.kar+      workq       100
>                                                               36      288
>            06:00:00                           normal
>
>       loki smallqueue    cpotvin      workq       100
>                                                               36      288
>            06:00:00                           normal
>
>       loki smallqueue      gerry      workq       100
>                                                               36      288
>            06:00:00                           normal
>
>       loki smallqueue james.cor+      workq       100
>                                                               36      288
>            06:00:00                           normal
>
>       loki smallqueue      jdgao      workq       100
>                                                               36      288
>            06:00:00                           normal
>
>       loki smallqueue   kknopf83      workq       100
>                                                               36      288
>            06:00:00                           normal
>
>       loki smallqueue    mansell      workq       100
>                                                               36      288
>            06:00:00                           normal
>
>       loki smallqueue     mflora      workq       100
>                                                               36      288
>            06:00:00                           normal
>
>       loki smallqueue   nyussouf      workq       100
>                                                               36      288
>            06:00:00                           normal
>
>       loki smallqueue   skinnerp      workq       100
>                                                               36      288
>            06:00:00                           normal
>
>       loki smallqueue    tajones      workq       100
>                                                               36      288
>            06:00:00                           normal
>
>       loki smallqueue     wicker      workq       100
>                                                               36      288
>            06:00:00                           normal
>
>       loki smallqueue        wof      workq       100
>                                                               36      288
>            06:00:00                           normal
>
>
>
> I've a user who keeps getting error'd out, with a claim that she has an
> account/partition mismatch. The partition specified is not anywhere in her
> slurm submission script, however (wofq).
>
> I'm baffled. Any suggestions?
> --
> Gerry Creager
> NSSL/CIMMS
> 405.325.6371
> ++++++++++++++++++++++
> “Big whorls have little whorls,
> That feed on their velocity;
> And little whorls have lesser whorls,
> And so on to viscosity.”
> Lewis Fry Richardson (1881-1953)
>

Reply via email to