Hi all,

I've been running several oscar clusters for over six months now and
they've been working pretty much seamlessly, however...

We've just had a fairly serious power outage here which took down one of
the clusters mid processing etc. Since it has come back up I've been
having trouble with the PBS queue. The symptoms are as follows:

- Running showq returns that there are 16 free nodes (there are a total
of 16 nodes plus the master node).
- Running pbsnodes -l returns nothing i.e. there are no marked nodes
- Running pbsnodes -a returns that all of the nodes are up and running,
but are all free
- Running the oscar test script (from the installation gui) - The
following tests fail:
   - PBS HDF5 Test
   - Checking for 16 free nodes:
   Both of these tests fail with the error message: "Not enough free
nodes. Tests incomplete."
- The MPICH over PBS tests pass

To summarise, for some reason the pbs tests fail, but the mpich tests
which run over pbs pass! I'm not really sure where to start in looking
for this problem!

Any help you can give me would be much appreciated, thanks for your
time,

Rich

--------------------------------
Richard Bruin
PhD Student
Department of Earth Sciences
University of Cambridge



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to