Hi All, I haven't seen any postings on the list for about a month - well, except for my own. What is going on with the xcpu project?
I've reported a bunch of bugs, specifically with bjs, but never heard anything from anybody (Abhishek?). I've gone "production" (with trepidation) with my cluster, and these bugs are quite a nuisance. Furthermore, yesterday two of my nodes crashed - in fact appeared to be completely powered off - while running stuff. No warnings or apparent problems, but I am still investigating. One of the worst issues with that is that bjs ends up in a weird state, hanging up rather than reporting that there are two less nodes available, and it does not recover until xstat shows that all the nodes are back online. Then there is the MPI problem... Regards, Daniel p.s. IF we can have most of these issues resolved, and xstat manages to scale up significantly, then I would be very interested in testing xcpu on our new 4,000 node cluster which should be available by end of April. I'd love to be able to run xcpu on it (would be quite an uphill battle, but if it is shown to work then it would be quite a coup)...
