Hi, I just upgraded an older cluster with SGE6.2u5 and all nodes couldn't run parallel jobs any longer, as they always got a SIGABRT in PE start. This was on the list a couple of times, and so I post my findings here. Before it ran SGE6.1u3 without any problems.
The OS on the nodes was openSUSE 11.3 with kernel 2.6.34.7-0.5-default. As I never have seen this behavior on my own before, I upgraded the OS to the latest available patches of openSUSE 11.3 which changed the kernel to be 2.6.34.10-0.4-default and guess what: it's working again. I don't have the source for SGE6.1u3 but to me it looks like: - Some versions of Linux will send a SIGBART for unknown reasons when the shepherd starts - SGE6.1u3 ignored the signal in the shepherd -- Reuti _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
