A customer is having a problem using --distribution=arbitrary with a hostfile. Hosts can be represented in the hostfile either individually (node1,node2,...) or as a range or set of ranges between square brackets (node[1-10], node[20-30,50-100], ...). His job appears to fail after processing about 13090 hosts, with the following errors: srun: error: hostlist.c:1747 Too many ranges, can't process entire list: Invalid argument and then srun: error: Hostlist is too long for the allocate RPC!
The constants defined in hostlist.h and hostlist.c suggest a limit much larger than 13090 hosts. Note sure where the RPC limit is defined. All the hosts in the job's hostfile are listed individually. There are no ranges. Does anyone know the following limits regarding the hostfile? Maximum number of hosts. Maximum number of host ranges. Maximum size of the hostfile in bytes. Thanks, Martin
