Dear Yasser,
I have to investigate this issue. There is no reason GIPAW
should work only with a specific number of CPU and pools.
Best,
Davide
On 07/16/2017 12:06 PM, Yasser Fowad AlWahedi wrote:
> Thanks Davide,
>
> I am running using the smearing option since the system is metallic.
>
> I also noticed an interesting relation. GIPAW runs succeed if number of
> cores (np) <= number of k points/npool. I checked this in the 38 atom case
> which kept failing whenever I chose a number of processors higher than the
> number of kpoints per pool. Although the SCF runs was finishing successfully
> all the time. This was also observed in other cases. Is this a general rule?
>
> I will send you the files privately.
>
> Yasser
>
> -----Original Message-----
> From: Davide Ceresoli [mailto:[email protected]]
> Sent: Sunday, July 16, 2017 1:49 PM
> To: Yasser Fowad AlWahedi <[email protected]>; PWSCF Forum
> <[email protected]>
> Subject: Re: [Pw_forum] GIPAW acceleration
>
> Dear Yasser,
> no problem! First of all, it seems to me that I/O is not a problem.
> In fact cputime ~= walltime and davcio routines consume only 1.88 s.
>
> I compared calculations of similar size and I've got:
> wollastonite: 30 atoms, 36 k-points: 10h40m
> coesite: 48 atoms, 32 k-points: 19h20m
> on a rather old (2008) Xeon E5520 2.27 GHz, 8 cores.
>
> My timings are more favorable than your C1 results. However, if your system
> is a slab, the empty space carries a non-neglibigle extra cost.
> You can try to minimize it as much as possible. NMR interactions are
> short-ranged, contrary to electrostatic interactions
>
> Is your system metallic? even if it has a small band gap, I suggest using
> occupations='smearing'. This will speed up the linear-response in GIPAW, and
> convergence wrt k-points.
>
> Finally, the clock difference between the i7 (3.5 GHz) and the Xeon (2.2 GHz)
> can explain the difference in timing. The clock ratio is ~1.6, similar to the
> walltime ratio.
>
> In any case, if you send me privately input and output files, I can look them
> in detail.
>
> Best wishes,
> Davide
>
>
>
>
>
>
> On 07/16/2017 10:26 AM, Yasser Fowad AlWahedi wrote:
>> Dear Davide,
>>
>> Thanks for your support and my apologies for the late reply. PW and GIPAW
>> are compiled using GNU compilers and the intel MKL libs.
>>
>> I am running DFT of Ni2P clusters of various surfaces over two computational
>> rigs:
>>
>> 1) The university cluster: Each node consist of dual 8 cores/8 threads
>> CPUs Xeons clocked at 2.2 GHz with 64 GB ram. I only use one node per
>> simulation. For storage it uses a mechanical hard drive . (Later
>> called C1)
>>
>> 2) My home pc: which is equipped with i7 5930K processor 6 cores 12 threads
>> clocked at 3.9 GHz with 128 GB ram (Later called C2). For storage I use a
>> Samsung 850 EVO SSD.
>>
>> Below table summarize the cases performed/running and the time of finish or
>> expected time of finishing assuming linear extrapolation.
>>
>>
>> # of atoms npool Cores # kpoints per pool Computer Time
>> (hrs)
>> 30 2 16 17 C1 28.9
>> 38 1 16 25 C1 31.3
>> 49 1 16 34 C1 124.9*
>> 50 2 16 17 C1 474.6*
>> 52 1 10 34 C2 295.2*
>>
>> * estimated time of finish
>>
>> I understand that the cases are different and as such they will require more
>> or less time to finish.
>>
>> But I noticed that the 50 and 52 cases which are quite similar (same k
>> points and similar number of atoms) but done over two different systems
>> attain substantially different time of finish. My guess it is probably due
>> to the SSD being used to write off the data. Considering that C2 uses less
>> computational threads and more atoms but is expected to finish faster.
>>
>> I also noticed an interesting relation. GIPAW runs succeed if number of
>> cores (np) <= number of k points/npool. I checked this in the 38 atom case
>> which kept failing whenever I chose a number of processors higher than the
>> number of kpoints per pool. Although the SCF runs was finishing successfully
>> all the time. This was also observed in other cases. Is this a general rule?
>>
>> Below is the timing output of the 38 atoms case:
>>
>> gipaw_setup : 0.46s CPU 0.50s WALL ( 1 calls)
>>
>> Linear response
>> greenf : 20177.91s CPU 20207.68s WALL ( 600 calls)
>> cgsolve : 20057.24s CPU 20086.82s WALL ( 600 calls)
>> ch_psi : 19536.93s CPU 19563.75s WALL ( 44231 calls)
>> h_psiq : 13685.97s CPU 13707.40s WALL ( 44231 calls)
>>
>> Apply operators
>> h_psi : 44527.30s CPU 46802.35s WALL ( 5434310 calls)
>> apply_vel : 262.98s CPU 263.30s WALL ( 525 calls)
>>
>> Induced current
>> j_para : 559.19s CPU 560.39s WALL ( 675 calls)
>> biot_savart : 0.05s CPU 0.06s WALL ( 1 calls)
>>
>> Other routines
>>
>> General routines
>> calbec : 39849.22s CPU 37474.79s WALL (10917262 calls)
>> fft : 0.12s CPU 0.15s WALL ( 42 calls)
>> ffts : 0.01s CPU 0.01s WALL ( 10 calls)
>> fftw : 8220.39s CPU 9116.72s WALL (27084278 calls)
>> davcio : 0.02s CPU 1.88s WALL ( 400 calls)
>>
>> Parallel routines
>> fft_scatter : 3533.10s CPU 3242.29s WALL (27084330 calls)
>>
>> Plugins
>>
>> GIPAW : 112557.79s CPU 112726.12s WALL ( 1 calls)
>>
>> Yasser
>>
>>
>>
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]]
>> On Behalf Of Davide Ceresoli
>> Sent: Thursday, July 13, 2017 8:30 PM
>> To: PWSCF Forum <[email protected]>
>> Subject: Re: [Pw_forum] GIPAW acceleration
>>
>> Dear Yasser,
>> how many atoms? how many k-points? I/O can always be the reason, but in
>> my experience if the system is very large, time is dominated by computation,
>> not I/O.
>> You should get some speedup if diagonalization='cg' in GIPAW.
>>
>> Anyway, if I have time, I will introduce a "disk_io" variable in the input
>> file, to try to keep more data in memory instead that on disk.
>>
>> Best regards,
>> Davide
>>
>>
>> On 07/13/2017 10:02 AM, Yasser Fowad AlWahedi wrote:
>>> Dear GIPAW users,
>>>
>>>
>>>
>>> For nmr shifts calculations, I am suffering from the extreme slowness
>>> of GIPAW nmr shifts calculations. I have noticed that GIPAW write
>>> off the results frequently for restart purposes. In our clusters we
>>> have mechanical hard drives which stores the off data for. Could that be a
>>> reason for its slowness?
>>>
>>>
>>>
>>> Yasser Al Wahedi
>>>
>>> Assistant Professor
>>>
>>> Khalifa University of Science and Technology
>>>
>>
>
--
+--------------------------------------------------------------+
Davide Ceresoli
CNR Institute of Molecular Science and Technology (CNR-ISTM)
c/o University of Milan, via Golgi 19, 20133 Milan, Italy
Email: [email protected]
Phone: +39-02-50314276, +39-347-1001570 (mobile)
Skype: dceresoli
+--------------------------------------------------------------+
_______________________________________________
Pw_forum mailing list
[email protected]
http://pwscf.org/mailman/listinfo/pw_forum