There are so many possibilities, a few: a) If you only request 1 core/node most queuing systems (qsub/msub etc) will allocate the other cores to other jobs. You are then going to be very dependent upon what those other jobs are doing. Normal is to use all the cores on a given node.
b) When you run on cluster B, in addition to a) it is going to be inefficient to run with mpi communications across nodes and it is much better to run on a given node across cores. Are you using a machines file with eight 1: nodeA lines (for instance) or one with a single 1: nodeA nodeB....? The first does not use mpi, the second does. To use mpi within a node you would use lines such as 1:node:8. Knowledge of your .machines file will help people assist you. c) The memory on those clusters is very small, whoever bought them was not thinking about large scale jobs. I look for at least 4G/core, and 2G/core is barely acceptable. You are going to have to use mpi. d) All mpi is equal, but some mpi is more equal than others. Depending upon whether you have infiniband, ethernet, openmpi, impi and how everything was compiled you can see enormous differences. One thing to look at is the difference between the cpu time and wall time (both in case.dayfile and at the bottom of case.output1_*). With a good mpi setup the wall time should be 5-10% more than the cpu time; with a bad setup it can be several times it. On Thu, Oct 17, 2013 at 8:44 AM, Yundi Quan <quanyu...@gmail.com> wrote: > Hi, > I have access to two clusters as a low-level user. One cluster (cluster A) > consists of nodes with 8 core and 8 G mem per node. The other cluster > (cluster B) has 24G mem per node and each node has 14 cores or more. The > cores on cluster A are Xeon CPU E5620@2.40GHz, while the cores on cluster B > are Xeon CPU X5550@2.67GH. From the specifications (2.40GHz+12288 KB cache > vs 2.67GHz+8192 KB cache), two machines should be very close in performance. > But it does not seem to be so. > > I have job with 72 atoms per unit cell. I initialized the job on cluster A > and ran it for a few iterations. Each iteration took 2 hours. Then, I moved > the job to cluster B (14 cores per node with @2.67GHz). Now it takes more > than 8 hours to finish one iteration. On both clusters, I request one core > per node and 8 nodes per job ( 8 is the number of k points). I compiled > WIEN2k_13 on cluster A without mpi. On cluster B, WIEN2k_12 was compiled by > the administrator with mpi. > > What could have caused poor performance of cluster B? Is it because of MPI? > > On an unrelated question. Sometimes memory would run out on cluster B which > has 24Gmem per node. Nevertheless the same job could run smoothly on cluster > A which only has 8 G per node. > > Thanks. -- Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu 1-847-491-3996 "Research is to see what everybody else has seen, and to think what nobody else has thought" Albert Szent-Gyorgi _______________________________________________ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html