Hi Daniel, "Daniel Gruner" <[email protected]> writes:
> I haven't seen any postings on the list for about a month - well, > except for my own. What is going on with the xcpu project? > > I've reported a bunch of bugs, specifically with bjs, but never heard > anything from anybody (Abhishek?). I've gone "production" (with > trepidation) with my cluster, and these bugs are quite a nuisance. > Furthermore, yesterday two of my nodes crashed - in fact appeared to > be completely powered off - while running stuff. No warnings or > apparent problems, but I am still investigating. One of the worst > issues with that is that bjs ends up in a weird state, hanging up > rather than reporting that there are two less nodes available, and it > does not recover until xstat shows that all the nodes are back online. > > Then there is the MPI problem... I'm still here. Not much use to you I know. I havn't had time to do any further debugging because term started and I'm running to keep up. Sonn I hope. Cheers, Roger
