What version of OMPI are you using? On Feb 27, 2014, at 2:54 PM, Sten Wolf <[email protected]> wrote:
> > Hi, > I have seen this in several different clusters running different apps - users > can start a task directly with openmpi, it will perform resonably, but when > started through slurm it either runs very slowly, or complain and die. > Most recently jobs died due to number of open files limit, however both hard > and soft limit are high enough on the nodes (the openmpi job works) and > slurmd is set to that same limit (with ulimit -n in /etc/sysconfig/slurm). > The propagate parameter was tried but failed to make a difference either from > command line or in slurm.conf (by default all limits should already be > propagated). > The slurm version itself is somewhat old (I think 2.4.5) but can't simply be > upgraded (any changes to the cluster require a review process), so answers in > the form of "upgrade to latest and see if it still exists" might not be very > helpful. > I'll have more data (including access to logs) during next week, but for now > - can anyone make a guess as to what might be going on?
