Hi all, My name is Arnau Bria and I work as a sysadmin at PIC (a data center in Barcelona). We have a cluster of ~300 nodes and 3300 job slots under torque/maui. Our current scenario, more than 6k jobs, causes serious problems to torque/maui, so we're studying alternatives and seems that slurm has same/more torque/maui features and it scales much better.
So, I'm staring to read some slurm docs and I've been able to install a server, configure some partitions, a copule of nodes and send some jobs. I've learned some basic command to manage partions/nodes/queues/jobs. Now I'd like to start a deeper investigation and I'm trying to "import" torque's configuration into slurm, and see what still has sense and what not: 1.-) from: https://computing.llnl.gov/linux/slurm/faq.html#fast_schedule How can I configure SLURM to use the resources actually found on a node rather than what is defined in slurm.conf? All my nodes (which have 4 cpus) show only 1 cpu. I can't make slurm to guess node resources automatically. This is my conf: [...] SelectType=select/cons_res SelectTypeParameters=CR_CPU FastSchedule=0 [...] NodeName=DEFAULT State=UNKNOWN NodeName=tditaller002.pic.es,tditaller005.pic.es node log: [...] Nov 3 14:40:20 tditaller002 slurmd[8245]: slurmd version 2.3.1 started Nov 3 14:40:20 tditaller002 slurmd[8245]: slurmd started on Thu 03 Nov 2011 14:40:20 +0100 Nov 3 14:40:20 tditaller002 slurmd[8245]: Procs=1 Sockets=1 Cores=1 Threads=1 Memory=7985 TmpDisk=1990 Uptime=98838 2.-) CPU_factor In torque we define cpu_factor. A way to normalize cpu_time between two differnet hosts. (host A is good, host B bad. So, 1 second in host A equals to 2 in host B). Is this configurable in slurm? what name do you use for that?¿ 3.-) max node load. May I configure a max amount of load in a node? i.e a node with 4 cpus will run 4 jobs, but if running 3 it reaches some load I'd like slurm to NOT send more jobs to that node. 4.-) how is the file copy between client/server done? (input/output)? ssh? NFS= is it configurable? Well, I think I've asked enough questions for my first mail :-) Could anyone answer some (or all) this questions? Coudl anyone send me a link to presentations/wiki/extended_doc? Many thanks in advance, Cheers, Arnau
