On 19/01/15 19:46, Uwe Sauter wrote: > yes, going back to Scientific 6.5 make the problem disappear. But due to > our setup here I cannot run an all-up-to-date Scientific 6.6 with the > 6.5 kernel.
In that case it sounds very like (yet another) bug in the RHEL 6.6 kernel (we've hit a few with the Mellanox drivers for instance). In your position I'd drop back to 6.5 if I couldn't run an older kernel (we already run the RHEL 6.2 kernel on our RHEL 6.4 BG/Q service node Power7 LPAR due to bugs in the 6.4 kernel that affect bonded IB performance with TSM, the 6.3 kernel panics when we boot our 4 racks at once). Could I suggest perhaps trying the Beowulf list for this? It might be a better forum for general Linux distro and kernel problems in HPC: http://beowulf.org/ Caveat: I run the Beowulf list these days. cheers, Chris -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: [email protected] Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
