[Wien] problems in wien2k 10 run

saed alazar Thu, 19 May 2011 08:37:12 -0700 (PDT)

The code works just fine on the student cluster where we have compiled 
it using the extra '-assu buff' flag to alleviate NFS problems. I think 
we have seen that the optimisation jobs work well there.

There are still problems with the optimisation jobs running on the planck
cluster though.

It seems that there are 2 main problems:

1??
One node appears to be doing nearly all the work while the others do
little... as seen in the dayfile. However if we login to the job nodes
while the job is running and run the top command, all cores seem to be
using ~100% cpu, load is normal. Also 'cat /proc/meminfo' shows there is
plenty of free memory (there should be as these nodes each have 32GB
RAM).
2?? After some time it becomes impossible to login to your home
directory and I cannot even login to the job node from the console on
the machine. I also cannot delete the job from the queues. This means I
then have to turn the queuing system off (qterm -t quick), remove the job files
from the jobs directory (rm
-rf /usr/local/torque/server_priv/jobs/2627.planck.*) and then restart
the queuing system (/usr/local/torque/sbin/pbs_server -t warm). I then
still have to reboot the nodes that were involved with that job. This is a
problem.

The main difference between the planck cluster and the student cluster is that
the planck cluster has a GPFS parallel file system and does not use NFS (well
actually, GPFS uses something like NFS). The problems we were seeing on the
student clsuter disappeared when we
recompiled with the extra '-assu buff' flag. I am recompiling wien2k on the
planck cluster with this flag but it does not fix the
problems.

Other than that, both machines are running RHEL5.3
Operating System, and openmpi, fftw have been compiled the same way, as
has wien2k.

Thanks

Said
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20110519/127a430a/attachment.htm>

[Wien] problems in wien2k 10 run

Reply via email to