Noam Bernstein <noam.bernst...@nrl.navy.mil> writes:

> We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in some 
> collective communication), but now I'm wondering whether I should just test
> 1.6.5.

What bug, exactly?  As you mentioned vasp, is it specifically affecting
that?

We have seen apparent deadlocks with vasp -- which users assure me is
due to malfunctioning hardware and/or batch system -- but I don't think
there was any evidence of it being due to openmpi (1.4 and 1.6 on
different systems here).  I didn't have the padb --deadlock mode working
properly at the time I looked at one, but it seemed just to be stuck
with some ranks in broadcast and the rest in barrier.  Someone else put
a parallel debugger on it, but I'm not sure if there was a conclusive
result, and I'm not very interested in debugging proprietary programs.

Reply via email to