Do you have limits (per partition / group), QoS (with limits per user), etc 
configured?



Am 19.12.2016 um 15:52 schrieb Wiegand, Paul:
> Greetings,
> 
> 
> We were running slurm 16.05.0 and just upgraded to 16.05.7 during our Fall 
> maintenance cycle along with other changes. 
> Now we are having a very strange problem:
> 
> 
> * When a regular user requests 6 or fewer nodes, one is able to do so without 
> issue;
> 
> 
> * When a regular user requests 7 or more nodes, one gets the following error:
> 
> 
> hostlist.c:1007: hostrange shift: malloc failed
> Aborted (core dumped)
> 
> 
> * It doesn't matter which nodes are being requested ... I've verified that I 
> *can* allocate across any node ... just not
> more than 6;
> 
> 
> * The root user can request any number of nodes;
> 
> 
> 
> I should say that I do *not* believe this is due to the upgrade per se.  
> There were other changes made during downtime,
> including to how groups are named and handled.  I expect it to be a 
> permissions problem somewhere, but the logs are not
> helpful.  Indeed, I rolled back to 16.05.0 and still get this error.
> 
> 
> Any help where to look would be appreciated.
> 
> 
> Thanks,
> 
> Paul.
> 
> 

Reply via email to