Hi Eva,
I wasn't able to reproduce the problem with a quick test. You have
config lines similar to these?
NodeName=gpu-1-[4-17],gpu-2-[4,6-16],gpu-3-9 ...
PartitionName=... Nodes=gpu-1-[4-17],gpu-2-[4,6-16],gpu-3-9
Regards,
John
On 2013-07-10 19:20, Eva Hocks wrote:
>
>
>
>
> Thanks, John
>
>
>
> but this is what I have in the partition file:
>
> nodes=gpu-1-[4-17],gpu-2-[4,6-16],gpu-3-9
>
>
>
> slurm gets confused when it can't look up gpu-2-4 and then splits the
>
> gpu-2-[4,6-16] into gpu-[2]-[4] (failed lookup) and 6-16] (which is
>
> actually no node name at all but a wrong parsing after the failure)
>
>
>
> Thanks
>
> Eva
>
>
>
> On Wed, 10 Jul 2013, John Thiltges wrote:
>
>
>
>> On 07/10/2013 06:16 PM, Eva Hocks wrote:
>>> The entry in partiton.conf:
>>> PartitionName=CLUSTER Default=yes State=UP
>>> nodes=gpu-[1]-[4-17],gpu-[2]-[4,6-16],gpu-[3]-[9]
>>> causes slurmctl to crash:
>>> 2013-07-10T16:03:22.923] error: find_node_record: lookup failure for
>>> gpu-[2]-[4]
>>> [2013-07-10T16:03:22.923] error: node_name2bitmap: invalid node specified
>>> gpu-[2]-[4]
>>> [2013-07-10T16:03:22.923] error: find_node_record: lookup failure for 6-16]
>>> [2013-07-10T16:03:22.923] error: node_name2bitmap: invalid node specified
>>> 6-16]
>>> [2013-07-10T16:03:22.923] fatal: Invalid node names in partition CLUSTER
>> It looks like the hostlist parser is confused by the brackets, finding
>> names of "6-16]" and "gpu-[2]-[4]".
>> Brackets are only needed when there is a range. If you take out the
>> extra brackets, it should parse OK:
>> nodes=gpu-1-[4-17],gpu-2-[4,6-16],gpu-3-9
>> Regards,
>> John
> >