Dear Rajiv,

 

It is impossible to answer your question, since we have no idea what kinds of 
jobs you are trying to run (# of k-points, matrix size, memory requirements, 
etc) or the capabilities of your cluster (e.g. I guess your nodes do not have 
56 cores). It is best to work with your cluster administrator to answer these 
kinds of performance questions. You will have to test the performance and 
scaling of WIEN2k of your cluster yourself, since we do not have access to it.

 

A few hints:

 

*         The case.dayfile will contain output from the unix ‘time’ command 
between the various steps in the SCF cycle, which can help you answer the 
question of which job options you choose are faster. For example, after lapw0 
runs:

 

-------- .machine0 : 12 processors

3.254u 0.122s 0:02.36 142.7%    0+0k 0+120io 0pf+0w

 

The :log file also gives you the time each command is started.

 

*         WIEN2k can look bad on cluster CPU efficiency metrics (depending on 
the job) because it is I/O bandwidth intensive, i.e. reading and writing large 
vectors to disk takes walltime but not cputime.

*         A good question to ask is how your job is scaling with the resources 
you give it. If you give a job 32 cores and it completes an SCF cycle in 5 
minutes, but 64 cores of the same job takes 4 minutes, you probably are hitting 
a scaling limit (usually node I/O) and wasting resources.

*         From your screenshot, it looks like you are running more than one job 
simultaneously  on node001, since lapw1 and lapw2c are running at the same 
time. Running a more controlled test will help you better measure job 
performance/scaling.

 

Beyond that I can’t be much help. Good luck and remember to search the mailing 
list!

 

--

Dr. Eamon McDermott

CEA Grenoble

DRT/LETI/DTSI/SCMC

 

 

 

From: Wien [mailto:wien-boun...@zeus.theochem.tuwien.ac.at] On Behalf Of Rajiv 
Chouhan
Sent: Saturday, December 03, 2016 19:59
To: A Mailing list for WIEN2k users <wien@zeus.theochem.tuwien.ac.at>
Subject: Re: [Wien] Parallel execution in clusters

 

Hi All,

Please reply to my previous email. It is very important to understand the code 
running in parallel mode in the cluster. 

Thank you,

Rajiv

 

On Wed, Nov 30, 2016 at 10:53 AM, Rajiv Chouhan <chouhanraji...@gmail.com 
<mailto:chouhanraji...@gmail.com> > wrote:

 

Hi,

 

I am using WIEN2k in cluster with mpi parallelization. I have written the 
script to run the job in machine using pbs script in the slurm platform of the 
same cluster. I have attached the link to the snapshot of the jobs running in 
the cluster in two nodes001 and node004. In both the nodes the allocations are  
52 and 56 respectively. The top command shows the utilization shown in snapshot 
attached. Can anyone tell me which of the node is utilizing full resources and 
which execution is faster. Again jobs of "raji..+" (in node001) are submitted 
with pbs and other users (node004) are submitted with slurm script.

 https://www.dropbox.com/s/a8ikape71f43cva/Capture-02.jpg?dl=0

 

Thank you,

Rajiv

 

 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to