Thanks Moe. That's exactly what we need. Martin
From: Moe Jette <[email protected]> To: "slurm-dev" <[email protected]>, Date: 07/05/2013 03:29 PM Subject: [slurm-dev] Re: Limits on hostfile used with --distribution=arbitrary Hi Martin, The original logic could not process a hostlist larger than 64k bytes in length. The hostlist functions should be able to handle up to 64k nodes. The patch below will be in v2.6 and should handle a hostlist of any size https://github.com/SchedMD/slurm/commit/41ba94015036d34ab0cb32de07f2e42f39409ec7 Quoting [email protected]: > A customer is having a problem using --distribution=arbitrary with a > hostfile. Hosts can be represented in the hostfile either individually > (node1,node2,...) or as a range or set of ranges between square brackets > (node[1-10], node[20-30,50-100], ...). His job appears to fail after > processing about 13090 hosts, with the following errors: > srun: error: hostlist.c:1747 Too many ranges, can't process entire list: > Invalid argument > and then > srun: error: Hostlist is too long for the allocate RPC! > > The constants defined in hostlist.h and hostlist.c suggest a limit much > larger than 13090 hosts. Note sure where the RPC limit is defined. All > the hosts in the job's hostfile are listed individually. There are no > ranges. Does anyone know the following limits regarding the hostfile? > > Maximum number of hosts. > Maximum number of host ranges. > Maximum size of the hostfile in bytes. > > Thanks, > Martin >
