What version of OMPI are you using?

On Feb 27, 2014, at 2:54 PM, Sten Wolf <[email protected]> wrote:

> 
> Hi,
> I have seen this in several different clusters running different apps - users 
> can start a task directly with openmpi, it will perform resonably, but when 
> started through slurm it either runs very slowly, or complain and die.
> Most recently jobs died due to number of open files limit, however both hard 
> and soft limit are high enough on the nodes (the openmpi job works) and 
> slurmd is set to that same limit (with ulimit -n  in /etc/sysconfig/slurm). 
> The propagate parameter was tried but failed to make a difference either from 
> command line or in slurm.conf (by default all limits should already be 
> propagated).
> The slurm version itself is somewhat old (I think 2.4.5) but can't simply be 
> upgraded (any changes to the cluster require a review process), so answers in 
> the form of "upgrade to latest and see if it still exists" might not be very 
> helpful.
> I'll have more data (including access to logs) during next week, but for now 
> - can anyone make a guess as to what might be going on?

Reply via email to