Martin,

Thank you for reporting the bug and submitting a patch.  It will be in the 
v2.2.3 release.

Don

From: [email protected] [mailto:[email protected]] On 
Behalf Of [email protected]
Sent: Friday, February 25, 2011 9:15 AM
To: [email protected]
Cc: [email protected]; [email protected]
Subject: [slurm-dev] Re: Slurmd abort when using task affinity with plane 
distribution


There is an error in the patch I sent yesterday for this problem.  The attach 
patch is a corrected version.
Regards,
Martin

Martin Perry/US/BULL

02/24/2011 02:04 PM

To

[email protected]

cc

[email protected], "[email protected]" 
<[email protected]>

Subject

Slurmd abort when using task affinity with plane distribution






Slurmd may abort when using task affinity with the plane distribution method.  
I think the problem is in function _task_layout_plane in 
src/common/slurm_step_layout.c.  The function does not support heterogeneous 
allocations of cpus across nodes.  The following example illustrates the 
problem:

slurm.conf settings:
SelectType=select/cons_res
SelectTypeParameters=CR_Core
TaskPlugin=task/affinity
TaskPluginParam=sched,cores

command:
srun -p bones-chekov-scotty -N 3-3 -n 6 -l -m plane=2 hostname | sort

In this example, slurm allocates 4 cores from one node and 1 core each from the 
other two nodes (block allocation method).  But _task_layout_plane distributes 
2 tasks to each node, even though two of the nodes only have 1 allocated core.  
When task affinity detects this condition, it aborts slurmd with the following 
error (from the slurmd log): "error: task/affinity: only 1 bits in avail_map 
for 2 tasks!"

The attached patch fixes the problem for slurm version 2.2.1 by modifying 
_task_layout_plane to take the allocation into account when distributing tasks 
across nodes.  Here is the same example after the patch has been applied, 
showing that the job runs successfully and the tasks have been correctly 
distributed in accordance with the block allocation and plane=2 distribution:

[sulu] (slurm) mnp> srun -p bones-chekov-scotty -N 3-3 -n 6 -l -m plane=2 
hostname | sort
0: scotty
1: scotty
2: chekov
3: bones
4: scotty
5: scotty


Regards,
Martin


Reply via email to