On Fri, 4 Nov 2011 08:47:22 -0700
Don Lipari wrote:

Hi Don,

> FastSchedule=0 is the appropriate setting.  But the slurmd is
> apparently only seeing one proc.  Try running "scontrol show slurmd"
> on the compute node to confirm.  Compare that against /proc/cpuinfo.
> If the cause is still not obvious, try turning up the SlurmdDebug
> level and looking for clues in a more verbose log file.

# scontrol show slurmd
Active Steps             = NONE
Actual CPUs              = 4
Actual sockets           = 2
Actual cores             = 2
Actual threads per core  = 1
Actual real memory       = 7985 MB
Actual temp disk space   = 1990 MB
Boot time                = 2011-11-04T10:43:50
Hostname                 = tditaller002.pic.es
Last slurmctld msg time  = 2011-11-07T10:30:59
Slurmd PID               = 21650
Slurmd Debug             = 3
Slurmd Logfile           = /var/log/slurm/SlurmdLogFile
Version                  = 2.3.1

[...]
[2011-11-07T10:38:13] debug3: Trying to load plugin 
/usr/lib64/slurm/topology_none.so
[2011-11-07T10:38:13] topology NONE plugin loaded
[2011-11-07T10:38:13] debug3: Success.
[2011-11-07T10:38:13] debug3: NodeName    = tditaller002.pic.es
[2011-11-07T10:38:13] debug3: TopoAddr    = tditaller002.pic.es
[2011-11-07T10:38:13] debug3: TopoPattern = node
[2011-11-07T10:38:13] debug3: CacheGroups = 0
[2011-11-07T10:38:13] debug3: Confile     = `/etc/slurm/slurm.conf'
[2011-11-07T10:38:13] debug3: Debug       = 9
[2011-11-07T10:38:13] debug3: CPUs        = 1  (CF:  1, HW:  4)
[2011-11-07T10:38:13] debug3: Sockets     = 1  (CF:  1, HW:  2)
[2011-11-07T10:38:13] debug3: Cores       = 1  (CF:  1, HW:  2)
[2011-11-07T10:38:13] debug3: Threads     = 1  (CF:  1, HW:  1)
[2011-11-07T10:38:13] debug3: UpTime      = 429911 = 4-23:25:11
[...]
[2011-11-07T10:38:13] Procs=1 Sockets=1 Cores=1 Threads=1 Memory=7985 
TmpDisk=1990 Uptime=429911

I see no errors or anything strange...


> > 2.-) CPU_factor
> > In torque we define cpu_factor. A way to normalize cpu_time between
> > two differnet hosts. (host A is good, host B bad. So, 1 second in
> > host A equals to 2 in host B).
> > 
> > Is this configurable in slurm? what name do you use for that?¿
> 
> The closest analog to this is the "Weight" setting in the slurm.conf
> file.

I think this is not the same as cpu_factor. With Weight I could order
the priority for node scheduling, but it is not going to normalize
times. 

In your clusters, and for billing purpose, do you make any difference
between time wasted in faster nodes? If yes, how?


 
> > 3.-) max node load.
> > May I configure a max amount of load in a node? i.e a node with 4
> > cpus will run 4 jobs, but if running 3 it reaches some load I'd
> > like slurm to NOT send more jobs to that node.
> 
> Here you should look at the "Shared" partition configuration setting
> in the slurm.conf man page.  Shared=Yes:3 could be appropriate for
> this scenario.

That looks ok, but may I configure it per node basis? in function of
number of cores, i.e.?
BTW, man says that the conf is ONLY recommended with gang_sched
pluguin. I can't guess why...

 
> > 4.-) how is the file copy between client/server done?
> > (input/output)? ssh? NFS= is it configurable?
> 
> The easiest mechanism is to share files across login and compute
> nodes is through a shared file system.  However, if you need to push
> files around, look at the sbcast command.

great! 
 
> > Well, I think I've asked enough questions for my first mail :-)
> > Could anyone answer some (or all) this questions? Coudl anyone send
> > me a link to presentations/wiki/extended_doc?
> 
> http://schedmd.com/slurmdocs/publications.html has some publications
> that are not bundled with the html pages included in the SLURM
> distribution.

Wow... nice, more docs to read!


> Don
Many thanks for your replies Don!

Cheers,
Arnau

Reply via email to