Hi,

This is not totally unsurprising.

When you hit 128 cores you are effectively only having 30 orbitals per
core. This means when the diagonalization takes place it's blocks will be
~30x30 matrices.
You are reaching the bottleneck of the diagonalization routines which takes
up a considerable amount of time in this case. So there isn't much to do :)
Your system is simply too small to scale to larger core-counts.

Also:

1. The METIS decomposition has no influence on the diagonalization routines.
2. The vacuum region has very little overhead, so that shouldn't affect the
calculation time a lot, same goes for the dipole-correction.

The fact that 2. isn't influencing your speed-up is a further indication
that it is the diagonalization routines.

Den ons. 21. okt. 2020 kl. 22.03 skrev Karen Fidanyan <
fidan...@fhi-berlin.mpg.de>:

> Dear colleagues,
>
> I started working with Siesta (v4.0.2-17, Intel compiler) recently, and
> cannot understand its performance. I do a 6x6x7 Pd slab with a water
> molecule and have a long time per SCF step:
> #Cores  seconds_per_scf_step
> 2          753.0
> 4          376.8
> 8          235.3
> 16        181.4
> 32        114.5
> 64          86.3
> 128        71.9
>
> I'm puzzled with the fact that performance is saturated at 64 cores,
> although SCF is still pretty slow. I tried: METIS domain decomposition,
> reducing vacuum layer almost to zero and disabling dipole correction,
> but the numbers are similar in all those cases.
> I enclose the archive with my inputs and selected outputs. If someone
> could point out what is wrong, I would be very grateful.
>
> Best regards,
> Karen Fidanyan
>
>
> --
> SIESTA is supported by the Spanish Research Agency (AEI) and by the
> European H2020 MaX Centre of Excellence (http://www.max-centre.eu/)
>


-- 
Kind regards Nick
-- 
SIESTA is supported by the Spanish Research Agency (AEI) and by the European 
H2020 MaX Centre of Excellence (http://www.max-centre.eu/)

Responder a