Dear Yasser,
no problem! First of all, it seems to me that I/O is not a problem.
In fact cputime ~= walltime and davcio routines consume only 1.88 s.
I compared calculations of similar size and I've got:
wollastonite: 30 atoms, 36 k-points: 10h40m
coesite: 48 atoms, 32 k-points: 19h20m
on a rather old (2008) Xeon E5520 2.27 GHz, 8 cores.
My timings are more favorable than your C1 results. However, if your
system is a slab, the empty space carries a non-neglibigle extra cost.
You can try to minimize it as much as possible. NMR interactions are
short-ranged, contrary to electrostatic interactions
Is your system metallic? even if it has a small band gap, I suggest using
occupations='smearing'. This will speed up the linear-response in GIPAW,
and convergence wrt k-points.
Finally, the clock difference between the i7 (3.5 GHz) and the Xeon (2.2 GHz)
can explain the difference in timing. The clock ratio is ~1.6, similar to
the walltime ratio.
In any case, if you send me privately input and output files, I can look
them in detail.
Best wishes,
Davide
On 07/16/2017 10:26 AM, Yasser Fowad AlWahedi wrote:
> Dear Davide,
>
> Thanks for your support and my apologies for the late reply. PW and GIPAW
> are compiled using GNU compilers and the intel MKL libs.
>
> I am running DFT of Ni2P clusters of various surfaces over two computational
> rigs:
>
> 1) The university cluster: Each node consist of dual 8 cores/8 threads CPUs
> Xeons clocked at 2.2 GHz with 64 GB ram. I only use one node per simulation.
> For storage it uses a mechanical hard drive . (Later called C1)
>
> 2) My home pc: which is equipped with i7 5930K processor 6 cores 12 threads
> clocked at 3.9 GHz with 128 GB ram (Later called C2). For storage I use a
> Samsung 850 EVO SSD.
>
> Below table summarize the cases performed/running and the time of finish or
> expected time of finishing assuming linear extrapolation.
>
>
> # of atoms npool Cores # kpoints per pool Computer Time
> (hrs)
> 30 2 16 17 C1 28.9
> 38 1 16 25 C1 31.3
> 49 1 16 34 C1 124.9*
> 50 2 16 17 C1 474.6*
> 52 1 10 34 C2 295.2*
>
> * estimated time of finish
>
> I understand that the cases are different and as such they will require more
> or less time to finish.
>
> But I noticed that the 50 and 52 cases which are quite similar (same k points
> and similar number of atoms) but done over two different systems attain
> substantially different time of finish. My guess it is probably due to the
> SSD being used to write off the data. Considering that C2 uses less
> computational threads and more atoms but is expected to finish faster.
>
> I also noticed an interesting relation. GIPAW runs succeed if number of
> cores (np) <= number of k points/npool. I checked this in the 38 atom case
> which kept failing whenever I chose a number of processors higher than the
> number of kpoints per pool. Although the SCF runs was finishing successfully
> all the time. This was also observed in other cases. Is this a general rule?
>
> Below is the timing output of the 38 atoms case:
>
> gipaw_setup : 0.46s CPU 0.50s WALL ( 1 calls)
>
> Linear response
> greenf : 20177.91s CPU 20207.68s WALL ( 600 calls)
> cgsolve : 20057.24s CPU 20086.82s WALL ( 600 calls)
> ch_psi : 19536.93s CPU 19563.75s WALL ( 44231 calls)
> h_psiq : 13685.97s CPU 13707.40s WALL ( 44231 calls)
>
> Apply operators
> h_psi : 44527.30s CPU 46802.35s WALL ( 5434310 calls)
> apply_vel : 262.98s CPU 263.30s WALL ( 525 calls)
>
> Induced current
> j_para : 559.19s CPU 560.39s WALL ( 675 calls)
> biot_savart : 0.05s CPU 0.06s WALL ( 1 calls)
>
> Other routines
>
> General routines
> calbec : 39849.22s CPU 37474.79s WALL (10917262 calls)
> fft : 0.12s CPU 0.15s WALL ( 42 calls)
> ffts : 0.01s CPU 0.01s WALL ( 10 calls)
> fftw : 8220.39s CPU 9116.72s WALL (27084278 calls)
> davcio : 0.02s CPU 1.88s WALL ( 400 calls)
>
> Parallel routines
> fft_scatter : 3533.10s CPU 3242.29s WALL (27084330 calls)
>
> Plugins
>
> GIPAW : 112557.79s CPU 112726.12s WALL ( 1 calls)
>
> Yasser
>
>
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On
> Behalf Of Davide Ceresoli
> Sent: Thursday, July 13, 2017 8:30 PM
> To: PWSCF Forum <[email protected]>
> Subject: Re: [Pw_forum] GIPAW acceleration
>
> Dear Yasser,
> how many atoms? how many k-points? I/O can always be the reason, but in
> my experience if the system is very large, time is dominated by computation,
> not I/O.
> You should get some speedup if diagonalization='cg' in GIPAW.
>
> Anyway, if I have time, I will introduce a "disk_io" variable in the input
> file, to try to keep more data in memory instead that on disk.
>
> Best regards,
> Davide
>
>
> On 07/13/2017 10:02 AM, Yasser Fowad AlWahedi wrote:
>> Dear GIPAW users,
>>
>>
>>
>> For nmr shifts calculations, I am suffering from the extreme slowness
>> of GIPAW nmr shifts calculations. I have noticed that GIPAW write off
>> the results frequently for restart purposes. In our clusters we have
>> mechanical hard drives which stores the off data for. Could that be a reason
>> for its slowness?
>>
>>
>>
>> Yasser Al Wahedi
>>
>> Assistant Professor
>>
>> Khalifa University of Science and Technology
>>
>
--
+--------------------------------------------------------------+
Davide Ceresoli
CNR Institute of Molecular Science and Technology (CNR-ISTM)
c/o University of Milan, via Golgi 19, 20133 Milan, Italy
Email: [email protected]
Phone: +39-02-50314276, +39-347-1001570 (mobile)
Skype: dceresoli
+--------------------------------------------------------------+
_______________________________________________
Pw_forum mailing list
[email protected]
http://pwscf.org/mailman/listinfo/pw_forum