Hi, Serendipitously I just came across the same problem as you describe in section 1. You will probably find it useful to look at the replies to my question:
http://groups.google.com/group/slurm-devel/browse_thread/thread/0d0c4c41975cd941# Cheers, Andy On 5 November 2011 04:47, Lipari, Don <[email protected]> wrote: >> -----Original Message----- >> From: [email protected] [mailto:owner-slurm- >> [email protected]] On Behalf Of Arnau Bria >> Sent: Thursday, November 03, 2011 6:52 AM >> To: [email protected] >> Subject: [slurm-dev] new to slurm >> >> Hi all, >> >> My name is Arnau Bria and I work as a sysadmin at PIC (a data center in >> Barcelona). We have a cluster of ~300 nodes and 3300 job slots under >> torque/maui. Our current scenario, more than 6k jobs, causes serious >> problems to torque/maui, so we're studying alternatives and seems that >> slurm has same/more torque/maui features and it scales much better. >> >> So, I'm staring to read some slurm docs and I've been able to install a >> server, configure some partitions, a copule of nodes and send some >> jobs. I've learned some basic command to manage >> partions/nodes/queues/jobs. >> >> >> Now I'd like to start a deeper investigation and I'm trying to >> "import" torque's configuration into slurm, and see what still has >> sense and what not: >> >> 1.-) from: >> https://computing.llnl.gov/linux/slurm/faq.html#fast_schedule >> How can I configure SLURM to use the resources actually found >> on a node rather than what is defined in slurm.conf? >> >> All my nodes (which have 4 cpus) show only 1 cpu. I can't make slurm to >> guess node resources automatically. This is my conf: >> [...] >> SelectType=select/cons_res >> SelectTypeParameters=CR_CPU >> FastSchedule=0 >> [...] >> NodeName=DEFAULT State=UNKNOWN >> NodeName=tditaller002.pic.es,tditaller005.pic.es >> >> node log: >> [...] >> Nov 3 14:40:20 tditaller002 slurmd[8245]: slurmd version 2.3.1 started >> Nov 3 14:40:20 tditaller002 slurmd[8245]: slurmd started on Thu 03 Nov >> 2011 14:40:20 +0100 >> Nov 3 14:40:20 tditaller002 slurmd[8245]: Procs=1 Sockets=1 Cores=1 >> Threads=1 Memory=7985 TmpDisk=1990 Uptime=98838 > > FastSchedule=0 is the appropriate setting. But the slurmd is apparently only > seeing one proc. Try running "scontrol show slurmd" on the compute node to > confirm. Compare that against /proc/cpuinfo. If the cause is still not > obvious, try turning up the SlurmdDebug level and looking for clues in a more > verbose log file. > >> 2.-) CPU_factor >> In torque we define cpu_factor. A way to normalize cpu_time between two >> differnet hosts. (host A is good, host B bad. So, 1 second in host A >> equals to 2 in host B). >> >> Is this configurable in slurm? what name do you use for that?¿ > > The closest analog to this is the "Weight" setting in the slurm.conf file. > >> 3.-) max node load. >> May I configure a max amount of load in a node? i.e a node with 4 cpus >> will run 4 jobs, but if running 3 it reaches some load I'd like slurm >> to NOT send more jobs to that node. > > Here you should look at the "Shared" partition configuration setting in the > slurm.conf man page. Shared=Yes:3 could be appropriate for this scenario. > >> 4.-) how is the file copy between client/server done? (input/output)? >> ssh? NFS= is it configurable? > > The easiest mechanism is to share files across login and compute nodes is > through a shared file system. However, if you need to push files around, > look at the sbcast command. > >> Well, I think I've asked enough questions for my first mail :-) >> Could anyone answer some (or all) this questions? Coudl anyone send me >> a link to presentations/wiki/extended_doc? > > http://schedmd.com/slurmdocs/publications.html has some publications that are > not bundled with the html pages included in the SLURM distribution. > > Don > >> Many thanks in advance, >> Cheers, >> Arnau > > > -- Andrew Punnett <[email protected]> --- Centre for Theoretical Chemistry and Physics (CTCP), Bldg. 40, Massey University (Albany Campus), Private Bag 102 904, Auckland 0745, NEW ZEALAND --- Phone +64 (0)9 414 0800 ext. 9886 http://ctcp.massey.ac.nz/~punnett
