[slurm-dev] openmpi misbehaves when started under slurm

Sten Wolf Thu, 27 Feb 2014 14:54:26 -0800

Hi,

I have seen this in several different clusters running different apps -users can start a task directly with openmpi, it will perform resonably,but when started through slurm it either runs very slowly, or complainand die.Most recently jobs died due to number of open files limit, however bothhard and soft limit are high enough on the nodes (the openmpi job works)and slurmd is set to that same limit (with ulimit -n in/etc/sysconfig/slurm). The propagate parameter was tried but failed tomake a difference either from command line or in slurm.conf (by defaultall limits should already be propagated).The slurm version itself is somewhat old (I think 2.4.5) but can'tsimply be upgraded (any changes to the cluster require a reviewprocess), so answers in the form of "upgrade to latest and see if itstill exists" might not be very helpful.I'll have more data (including access to logs) during next week, but fornow - can anyone make a guess as to what might be going on?

[slurm-dev] openmpi misbehaves when started under slurm

Reply via email to