Noam Bernstein <noam.bernst...@nrl.navy.mil> writes: > We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in some > collective communication), but now I'm wondering whether I should just test > 1.6.5.
What bug, exactly? As you mentioned vasp, is it specifically affecting that? We have seen apparent deadlocks with vasp -- which users assure me is due to malfunctioning hardware and/or batch system -- but I don't think there was any evidence of it being due to openmpi (1.4 and 1.6 on different systems here). I didn't have the padb --deadlock mode working properly at the time I looked at one, but it seemed just to be stuck with some ranks in broadcast and the rest in barrier. Someone else put a parallel debugger on it, but I'm not sure if there was a conclusive result, and I'm not very interested in debugging proprietary programs.