Re: [Pw_forum] GIPAW acceleration

Davide Ceresoli Sun, 16 Jul 2017 03:18:07 -0700

Dear Yasser,
     I have to investigate this issue. There is no reason GIPAW
should work only with a specific number of CPU and pools.


Best,
     Davide



On 07/16/2017 12:06 PM, Yasser Fowad AlWahedi wrote:
> Thanks Davide,
>
> I am running using the smearing option since the system is metallic.
>
> I also noticed an interesting relation. GIPAW runs succeed if  number of 
> cores (np) <= number of k points/npool. I checked this in the 38 atom case 
> which kept failing whenever I chose a number of processors higher than the 
> number of kpoints per pool. Although the SCF runs was finishing successfully 
> all the time. This was also observed in other cases. Is this a general rule?
>
> I will send you the files privately.
>
> Yasser
>
> -----Original Message-----
> From: Davide Ceresoli [mailto:[email protected]]
> Sent: Sunday, July 16, 2017 1:49 PM
> To: Yasser Fowad AlWahedi <[email protected]>; PWSCF Forum 
> <[email protected]>
> Subject: Re: [Pw_forum] GIPAW acceleration
>
> Dear Yasser,
>      no problem! First of all, it seems to me that I/O is not a problem.
> In fact cputime ~= walltime and davcio routines consume only 1.88 s.
>
> I compared calculations of similar size and I've got:
>      wollastonite: 30 atoms, 36 k-points: 10h40m
>      coesite:      48 atoms, 32 k-points: 19h20m
> on a rather old (2008) Xeon E5520 2.27 GHz, 8 cores.
>
> My timings are more favorable than your C1 results. However, if your system 
> is a slab, the empty space carries a non-neglibigle extra cost.
> You can try to minimize it as much as possible. NMR interactions are 
> short-ranged, contrary to electrostatic interactions
>
> Is your system metallic? even if it has a small band gap, I suggest using 
> occupations='smearing'. This will speed up the linear-response in GIPAW, and 
> convergence wrt k-points.
>
> Finally, the clock difference between the i7 (3.5 GHz) and the Xeon (2.2 GHz) 
> can explain the difference in timing. The clock ratio is ~1.6, similar to the 
> walltime ratio.
>
> In any case, if you send me privately input and output files, I can look them 
> in detail.
>
> Best wishes,
>      Davide
>
>
>
>
>
>
> On 07/16/2017 10:26 AM, Yasser Fowad AlWahedi wrote:
>> Dear Davide,
>>
>> Thanks for your support and my apologies for the late reply.  PW and GIPAW 
>> are compiled using GNU compilers and the intel MKL libs.
>>
>> I am running DFT of Ni2P clusters of various surfaces over two computational 
>> rigs:
>>
>> 1) The university cluster: Each node consist of dual 8 cores/8 threads
>> CPUs Xeons clocked at 2.2 GHz with 64 GB ram. I only use one node per
>> simulation. For storage it uses a mechanical hard drive . (Later
>> called C1)
>>
>> 2) My home pc: which is equipped with i7 5930K processor 6 cores 12 threads 
>> clocked at 3.9 GHz with 128 GB ram (Later called C2). For storage I use a 
>> Samsung 850 EVO SSD.
>>
>> Below table summarize the cases performed/running and the time of finish or 
>> expected time of finishing assuming linear extrapolation.
>>
>>
>> # of atoms   npool   Cores   # kpoints per pool      Computer        Time 
>> (hrs)
>> 30           2       16      17                      C1              28.9
>> 38           1       16      25                      C1              31.3
>> 49           1       16      34                      C1              124.9*
>> 50           2       16      17                      C1              474.6*
>> 52           1       10      34                      C2              295.2*
>>
>> * estimated time of finish
>>
>> I understand that the cases are different and as such they will require more 
>> or less time to finish.
>>
>> But I noticed that the 50 and 52 cases which are quite similar (same k 
>> points and similar number of atoms) but done over two different systems 
>> attain substantially different time of finish. My guess it is probably due 
>> to the SSD being used to write off the data.  Considering that C2 uses less 
>> computational threads and more atoms but is expected to finish faster.
>>
>> I also noticed an interesting relation. GIPAW runs succeed if  number of 
>> cores (np) <= number of k points/npool. I checked this in the 38 atom case 
>> which kept failing whenever I chose a number of processors higher than the 
>> number of kpoints per pool. Although the SCF runs was finishing successfully 
>> all the time. This was also observed in other cases. Is this a general rule?
>>
>> Below is the timing output of the 38 atoms case:
>>
>> gipaw_setup  :      0.46s CPU      0.50s WALL (       1 calls)
>>
>>      Linear response
>>      greenf       :  20177.91s CPU  20207.68s WALL (     600 calls)
>>      cgsolve      :  20057.24s CPU  20086.82s WALL (     600 calls)
>>      ch_psi       :  19536.93s CPU  19563.75s WALL (   44231 calls)
>>      h_psiq       :  13685.97s CPU  13707.40s WALL (   44231 calls)
>>
>>      Apply operators
>>      h_psi        :  44527.30s CPU  46802.35s WALL ( 5434310 calls)
>>      apply_vel    :    262.98s CPU    263.30s WALL (     525 calls)
>>
>>      Induced current
>>      j_para       :    559.19s CPU    560.39s WALL (     675 calls)
>>      biot_savart  :      0.05s CPU      0.06s WALL (       1 calls)
>>
>>      Other routines
>>
>>      General routines
>>      calbec       :  39849.22s CPU  37474.79s WALL (10917262 calls)
>>      fft          :      0.12s CPU      0.15s WALL (      42 calls)
>>      ffts         :      0.01s CPU      0.01s WALL (      10 calls)
>>      fftw         :   8220.39s CPU   9116.72s WALL (27084278 calls)
>>      davcio       :      0.02s CPU      1.88s WALL (     400 calls)
>>
>>      Parallel routines
>>      fft_scatter  :   3533.10s CPU   3242.29s WALL (27084330 calls)
>>
>>      Plugins
>>
>>      GIPAW        : 112557.79s CPU 112726.12s WALL (       1 calls)
>>
>> Yasser
>>
>>
>>
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]]
>> On Behalf Of Davide Ceresoli
>> Sent: Thursday, July 13, 2017 8:30 PM
>> To: PWSCF Forum <[email protected]>
>> Subject: Re: [Pw_forum] GIPAW acceleration
>>
>> Dear Yasser,
>>      how many atoms? how many k-points? I/O can always be the reason, but in 
>> my experience if the system is very large, time is dominated by computation, 
>> not I/O.
>> You should get some speedup if diagonalization='cg' in GIPAW.
>>
>> Anyway, if I have time, I will introduce a "disk_io" variable in the input 
>> file, to try to keep more data in memory instead that on disk.
>>
>> Best regards,
>>      Davide
>>
>>
>> On 07/13/2017 10:02 AM, Yasser Fowad AlWahedi wrote:
>>> Dear GIPAW users,
>>>
>>>
>>>
>>> For nmr shifts calculations, I am suffering from the extreme slowness
>>> of GIPAW nmr shifts calculations.  I have noticed that GIPAW write
>>> off the results frequently for restart purposes. In our clusters we
>>> have mechanical hard drives which stores the off data for. Could that be a 
>>> reason for its slowness?
>>>
>>>
>>>
>>> Yasser Al Wahedi
>>>
>>> Assistant Professor
>>>
>>> Khalifa University of Science and Technology
>>>
>>
>

-- 
+--------------------------------------------------------------+
   Davide Ceresoli
   CNR Institute of Molecular Science and Technology (CNR-ISTM)
   c/o University of Milan, via Golgi 19, 20133 Milan, Italy
   Email: [email protected]
   Phone: +39-02-50314276, +39-347-1001570 (mobile)
   Skype: dceresoli
+--------------------------------------------------------------+
_______________________________________________
Pw_forum mailing list
[email protected]
http://pwscf.org/mailman/listinfo/pw_forum

Re: [Pw_forum] GIPAW acceleration

Reply via email to