On 02/09/2012 06:17 PM, [email protected] wrote:
>
> Michel,
> FYI, I just tested this on 2.4.0-pre3. ALLOCATE_FULL_SOCKET seems to
> be working on that version as documented.
>
> SelectType=select/cons_res
> SelectTypeParameters=CR_Socket
> TaskPlugin=task/affinity
> TaskPluginParam=sched
But I use cpusets ... Should not make a difference.
>
> WITH ALLOCATE_FULL_SOCKET = 0 (allocate only the requested number of
> cpus):
>
> [sulu] (slurm) etc> srun -n1 -c1 --cpu_bind=verbose,sockets ...
> [2] 2629
> [sulu] (slurm) etc> cpu_bind=MASK - bones, task 0 0 [26337]: mask
> 0x1 set
>
> WITH ALLOCATE_FULL_SOCKET = 1 (allocate all cpus on the consumed
> sockets):
>
> [sulu] (slurm) etc> srun -n1 -c1 --cpu_bind=verbose,sockets ...
> [2] 9074
> [sulu] (slurm) etc> cpu_bind=MASK - bones, task 0 0 [26466]: mask
> 0x55 set
>
Hi Martin,
thanks for the feedback.
I'd like to see a numa cpumap output so I trust that mask. I wonder if
you tried with:
-n 1 -c2 for example. In my case, I get strange results. In my case, we
have quad-core
sockets ( 4 sockets ) and I obtain the following mask ( always using -n1 ):
11 1111 1111 2222 2222 2233
mask c 0123 4567 8901 2345 6789 0123 4567 8901
0xf000f000 1 0000 0000 0000 1111 0000 0000 0000 1111
0xf000f000 2 0000 0000 0000 1111 0000 0000 0000 1111
0x70007000 3 uh ? 0000 0000 0000 1110 0000 0000 0000 1110
0xf000f000 4 0000 0000 0000 1111 0000 0000 0000 1111
0x70007000 5 uh ? 0000 0000 0000 1110 0000 0000 0000 1110
0xf000f000 7 0000 0000 0000 1111 0000 0000 0000 1111
0xf000f000 8 0000 0000 0000 1111 0000 0000 0000 1111
0x3f003f00 9 uh ? 0000 0000 1111 1100 0000 0000 1111 1100
Aren't we agreeing this is wrong ? Do you get the same odd behavior on
your end ?
Btw, this is 2.3.3. Same for you ?
A+
> Martin Perry
> Bull Phoenix
>
>
>
>
>
>
>
>
> *[slurm-dev] Re: select/cons_res ALLOCATE_FULL_SOCKET*
>
>
> *Michel Bourget * to: slurm-dev
> 02/03/2012 07:21 PM
>
>
>
>
>
> From: Michel Bourget <[email protected]>
>
>
>
> To: "slurm-dev" <[email protected]>
>
>
>
>
> *Please respond to "slurm-dev" <[email protected]>*
>
>
>
>
>
>
> On 02/03/2012 08:15 PM, Moe Jette wrote:
> > I do believe that it works as described in the comments, although it
> > hasn't been tested in a while so verify that it works as desired. It
> > was added after the original code as developed, which is why both
> > CR_SOCKET and CR_CORE are set (it accomplishes the desired
> > functionality with minimal code changes).
> >
>
> Hi Moe,
>
> thanks your feedback.
>
> Well, CR_CORE == CR_SOCKET --> alloc_cores=true, by default, according to
> _cyclic_sync_core_bitmat/_block_sync_core_bitmap distribution. That is a
> little confusing,
> to be honest. Anyway, I tried ALLOCATE_FULL_SOCKET=1 and I didn't notice
> any change
> when CR_Socket SelectTypeParam is selected. Iow, Still the mask I
> obtain for
>
> Procs=32 Sockets=4 CoresPerSocket=4 ThreadsPerCore=2
>
> using N=1 n=1 c=5 is 0x70007000 and I'd expect it to be 0xF000F000, ie
> I'd expect the effect of ALLOCATE_FULL_SOCKET to chime in. Hence my
> initial intuition
> something is wrong, incomplete, whatever. Or I totally missed it; that
> could also be true.
> ( Btw, slurm 2.2.3. )
>
> Since memory locality ( ldoms ) and sockets memory zones are pretty
> tight, at least on SGI
> UV systems, using "--cpu_bind=ldoms" seems like a WAR. I get the
> expected mask, 0xF000F000.
>
> if (cr_type & CR_CORE)
> alloc_cores = true;
> #ifdef ALLOCATE_FULL_SOCKET
> if (cr_type & CR_SOCKET)
> alloc_sockets = true;
> #else
> if (cr_type & CR_SOCKET)
> alloc_cores = true;
> #endif
>
>
> At any rate, the task/affinnity always get from the srun credentials the
> same input mask:
> slurmd: Cred: job_core_bitmap 12-14
>
>
> Unless I am really wrong, Above should be 12-15 when
> ALLOCATE_FULL_SOCKET is defined as 1.
>
> slurmd: Cred: step_core_bitmap 12-14
> slurmd: Cred: job_nhosts 1
> slurmd: debug3: task/affinity: slurmctld s 4 c 4; hw s 4 c 4 t 2
> slurmd: debug3: task/affinity: job 816.0 CPU mask from slurmctld: 0x7000
> slurmd: debug3: task/affinity: job 816.0 CPU final mask for local node:
> 0x3F000000
> slurmd: debug: task affinity : after lllp distribution cpu bind method
> is 'verbose,mask_cpu' (0x70007000)
>
> What am I missing ?
>
> > #if(0)
> > /* Using CR_SOCKET or CR_SOCKET_MEMORY will not allocate a socket to
> more
> > * than one job at a time, but it also will not grant a job access
> to more
> > * CPUs on the socket than requested. If ALLOCATE_FULL_SOCKET is
> defined,
> > * then a job will be given access to every cores on each
> allocated socket.
> > */
> > #define ALLOCATE_FULL_SOCKET 1
> > #endif
> >
> > Quoting Michel Bourget<[email protected]>:
> >
> >> Hi,
> >>
> >> given the description in the source, I am considering enabling this
> >> feature. But it seems incomplete. Am i missing something ? What
> >> puzzles me is CR_SOCKET and CR_CORE seems to be "equal" when
> >> ALLOCATE_FULL_SOCKET
> >> is disabled. What is the rationale there ? A todo ?
> >>
> >> Tia
> >>
> >> --
> >>
> >> -----------------------------------------------------------
> >> Michel Bourget - SGI - Linux Software Engineering
> >> "Past BIOS POST, everything else is extra" (travis)
> >> -----------------------------------------------------------
> >>
> >>
>
>
> --
>
> -----------------------------------------------------------
> Michel Bourget - SGI - Linux Software Engineering
> "Past BIOS POST, everything else is extra" (travis)
> -----------------------------------------------------------
>
>
--
-----------------------------------------------------------
Michel Bourget - SGI - Linux Software Engineering
"Past BIOS POST, everything else is extra" (travis)
-----------------------------------------------------------