Michel, FYI, I just tested this on 2.4.0-pre3. ALLOCATE_FULL_SOCKET seems to be working on that version as documented.
SelectType=select/cons_res SelectTypeParameters=CR_Socket TaskPlugin=task/affinity TaskPluginParam=sched WITH ALLOCATE_FULL_SOCKET = 0 (allocate only the requested number of cpus): [sulu] (slurm) etc> srun -n1 -c1 --cpu_bind=verbose,sockets ... [2] 2629 [sulu] (slurm) etc> cpu_bind=MASK - bones, task 0 0 [26337]: mask 0x1 set WITH ALLOCATE_FULL_SOCKET = 1 (allocate all cpus on the consumed sockets): [sulu] (slurm) etc> srun -n1 -c1 --cpu_bind=verbose,sockets ... [2] 9074 [sulu] (slurm) etc> cpu_bind=MASK - bones, task 0 0 [26466]: mask 0x55 set Martin Perry Bull Phoenix [slurm-dev] Re: select/cons_res ALLOCATE_FULL_SOCKET Michel Bourget to: slurm-dev 02/03/2012 07:21 PM From: Michel Bourget <[email protected]> To: "slurm-dev" <[email protected]> Please respond to "slurm-dev" <[email protected]> On 02/03/2012 08:15 PM, Moe Jette wrote: > I do believe that it works as described in the comments, although it > hasn't been tested in a while so verify that it works as desired. It > was added after the original code as developed, which is why both > CR_SOCKET and CR_CORE are set (it accomplishes the desired > functionality with minimal code changes). > Hi Moe, thanks your feedback. Well, CR_CORE == CR_SOCKET --> alloc_cores=true, by default, according to _cyclic_sync_core_bitmat/_block_sync_core_bitmap distribution. That is a little confusing, to be honest. Anyway, I tried ALLOCATE_FULL_SOCKET=1 and I didn't notice any change when CR_Socket SelectTypeParam is selected. Iow, Still the mask I obtain for Procs=32 Sockets=4 CoresPerSocket=4 ThreadsPerCore=2 using N=1 n=1 c=5 is 0x70007000 and I'd expect it to be 0xF000F000, ie I'd expect the effect of ALLOCATE_FULL_SOCKET to chime in. Hence my initial intuition something is wrong, incomplete, whatever. Or I totally missed it; that could also be true. ( Btw, slurm 2.2.3. ) Since memory locality ( ldoms ) and sockets memory zones are pretty tight, at least on SGI UV systems, using "--cpu_bind=ldoms" seems like a WAR. I get the expected mask, 0xF000F000. if (cr_type & CR_CORE) alloc_cores = true; #ifdef ALLOCATE_FULL_SOCKET if (cr_type & CR_SOCKET) alloc_sockets = true; #else if (cr_type & CR_SOCKET) alloc_cores = true; #endif At any rate, the task/affinnity always get from the srun credentials the same input mask: slurmd: Cred: job_core_bitmap 12-14 Unless I am really wrong, Above should be 12-15 when ALLOCATE_FULL_SOCKET is defined as 1. slurmd: Cred: step_core_bitmap 12-14 slurmd: Cred: job_nhosts 1 slurmd: debug3: task/affinity: slurmctld s 4 c 4; hw s 4 c 4 t 2 slurmd: debug3: task/affinity: job 816.0 CPU mask from slurmctld: 0x7000 slurmd: debug3: task/affinity: job 816.0 CPU final mask for local node: 0x3F000000 slurmd: debug: task affinity : after lllp distribution cpu bind method is 'verbose,mask_cpu' (0x70007000) What am I missing ? > #if(0) > /* Using CR_SOCKET or CR_SOCKET_MEMORY will not allocate a socket to more > * than one job at a time, but it also will not grant a job access to more > * CPUs on the socket than requested. If ALLOCATE_FULL_SOCKET is defined, > * then a job will be given access to every cores on each allocated socket. > */ > #define ALLOCATE_FULL_SOCKET 1 > #endif > > Quoting Michel Bourget<[email protected]>: > >> Hi, >> >> given the description in the source, I am considering enabling this >> feature. But it seems incomplete. Am i missing something ? What >> puzzles me is CR_SOCKET and CR_CORE seems to be "equal" when >> ALLOCATE_FULL_SOCKET >> is disabled. What is the rationale there ? A todo ? >> >> Tia >> >> -- >> >> ----------------------------------------------------------- >> Michel Bourget - SGI - Linux Software Engineering >> "Past BIOS POST, everything else is extra" (travis) >> ----------------------------------------------------------- >> >> -- ----------------------------------------------------------- Michel Bourget - SGI - Linux Software Engineering "Past BIOS POST, everything else is extra" (travis) -----------------------------------------------------------
