The code works just fine on the student cluster where we have compiled it using the extra '-assu buff' flag to alleviate NFS problems. I think we have seen that the optimisation jobs work well there.
There are still problems with the optimisation jobs running on the planck cluster though. It seems that there are 2 main problems: 1?? One node appears to be doing nearly all the work while the others do little... as seen in the dayfile. However if we login to the job nodes while the job is running and run the top command, all cores seem to be using ~100% cpu, load is normal. Also 'cat /proc/meminfo' shows there is plenty of free memory (there should be as these nodes each have 32GB RAM). 2?? After some time it becomes impossible to login to your home directory and I cannot even login to the job node from the console on the machine. I also cannot delete the job from the queues. This means I then have to turn the queuing system off (qterm -t quick), remove the job files from the jobs directory (rm -rf /usr/local/torque/server_priv/jobs/2627.planck.*) and then restart the queuing system (/usr/local/torque/sbin/pbs_server -t warm). I then still have to reboot the nodes that were involved with that job. This is a problem. The main difference between the planck cluster and the student cluster is that the planck cluster has a GPFS parallel file system and does not use NFS (well actually, GPFS uses something like NFS). The problems we were seeing on the student clsuter disappeared when we recompiled with the extra '-assu buff' flag. I am recompiling wien2k on the planck cluster with this flag but it does not fix the problems. Other than that, both machines are running RHEL5.3 Operating System, and openmpi, fftw have been compiled the same way, as has wien2k. Thanks Said -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20110519/127a430a/attachment.htm>

