Re: [slurm-dev] new to slurm

Andrew Punnett Fri, 04 Nov 2011 16:32:52 -0700

Hi,

Serendipitously I just came across the same problem as you describe in
section 1. You will probably find it useful to look at the replies to
my question:


http://groups.google.com/group/slurm-devel/browse_thread/thread/0d0c4c41975cd941#

Cheers,

Andy

On 5 November 2011 04:47, Lipari, Don <[email protected]> wrote:
>> -----Original Message-----
>> From: [email protected] [mailto:owner-slurm-
>> [email protected]] On Behalf Of Arnau Bria
>> Sent: Thursday, November 03, 2011 6:52 AM
>> To: [email protected]
>> Subject: [slurm-dev] new to slurm
>>
>> Hi all,
>>
>> My name is Arnau Bria and I work as a sysadmin at PIC (a data center in
>> Barcelona). We have a cluster of ~300 nodes and 3300 job slots under
>> torque/maui. Our current scenario, more than 6k jobs, causes serious
>> problems to torque/maui, so we're studying alternatives and seems that
>> slurm has same/more torque/maui features and it scales much better.
>>
>> So, I'm staring to read some slurm docs and I've been able to install a
>> server, configure some partitions, a copule of nodes and send some
>> jobs. I've learned some basic command to manage
>> partions/nodes/queues/jobs.
>>
>>
>> Now I'd like to start a deeper investigation and I'm trying to
>> "import" torque's configuration into slurm, and see what still has
>> sense and what not:
>>
>> 1.-) from:
>> https://computing.llnl.gov/linux/slurm/faq.html#fast_schedule
>> How can I configure SLURM to use the resources actually found
>> on a node rather than what is defined in slurm.conf?
>>
>> All my nodes (which have 4 cpus) show only 1 cpu. I can't make slurm to
>> guess node resources automatically. This is my conf:
>> [...]
>> SelectType=select/cons_res
>> SelectTypeParameters=CR_CPU
>> FastSchedule=0
>> [...]
>> NodeName=DEFAULT State=UNKNOWN
>> NodeName=tditaller002.pic.es,tditaller005.pic.es
>>
>> node log:
>> [...]
>> Nov  3 14:40:20 tditaller002 slurmd[8245]: slurmd version 2.3.1 started
>> Nov  3 14:40:20 tditaller002 slurmd[8245]: slurmd started on Thu 03 Nov
>> 2011 14:40:20 +0100
>> Nov  3 14:40:20 tditaller002 slurmd[8245]: Procs=1 Sockets=1 Cores=1
>> Threads=1 Memory=7985 TmpDisk=1990 Uptime=98838
>
> FastSchedule=0 is the appropriate setting.  But the slurmd is apparently only 
> seeing one proc.  Try running "scontrol show slurmd" on the compute node to 
> confirm.  Compare that against /proc/cpuinfo.  If the cause is still not 
> obvious, try turning up the SlurmdDebug level and looking for clues in a more 
> verbose log file.
>
>> 2.-) CPU_factor
>> In torque we define cpu_factor. A way to normalize cpu_time between two
>> differnet hosts. (host A is good, host B bad. So, 1 second in host A
>> equals to 2 in host B).
>>
>> Is this configurable in slurm? what name do you use for that?¿
>
> The closest analog to this is the "Weight" setting in the slurm.conf file.
>
>> 3.-) max node load.
>> May I configure a max amount of load in a node? i.e a node with 4 cpus
>> will run 4 jobs, but if running 3 it reaches some load I'd like slurm
>> to NOT send more jobs to that node.
>
> Here you should look at the "Shared" partition configuration setting in the 
> slurm.conf man page.  Shared=Yes:3 could be appropriate for this scenario.
>
>> 4.-) how is the file copy between client/server done? (input/output)?
>> ssh? NFS= is it configurable?
>
> The easiest mechanism is to share files across login and compute nodes is 
> through a shared file system.  However, if you need to push files around, 
> look at the sbcast command.
>
>> Well, I think I've asked enough questions for my first mail :-)
>> Could anyone answer some (or all) this questions? Coudl anyone send me
>> a link to presentations/wiki/extended_doc?
>
> http://schedmd.com/slurmdocs/publications.html has some publications that are 
> not bundled with the html pages included in the SLURM distribution.
>
> Don
>
>> Many thanks in advance,
>> Cheers,
>> Arnau
>
>
>



-- 
Andrew Punnett <[email protected]>
---
Centre for Theoretical Chemistry and Physics (CTCP),
Bldg. 40, Massey University (Albany Campus),
Private Bag 102 904, Auckland 0745,
NEW ZEALAND
---
Phone +64 (0)9 414 0800 ext. 9886
http://ctcp.massey.ac.nz/~punnett

Re: [slurm-dev] new to slurm

Reply via email to