Hi,

I just upgraded an older cluster with SGE6.2u5 and all nodes couldn't run 
parallel jobs any longer, as they always got a SIGABRT in PE start. This was on 
the list a couple of times, and so I post my findings here. Before it ran 
SGE6.1u3 without any problems.

The OS on the nodes was openSUSE 11.3 with kernel 2.6.34.7-0.5-default. As I 
never have seen this behavior on my own before, I upgraded the OS to the latest 
available patches of openSUSE 11.3 which changed the kernel to be 
2.6.34.10-0.4-default and guess what: it's working again.

I don't have the source for SGE6.1u3 but to me it looks like:

- Some versions of Linux will send a SIGBART for unknown reasons when the 
shepherd starts
- SGE6.1u3 ignored the signal in the shepherd

-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to