Hi, This is not totally unsurprising.
When you hit 128 cores you are effectively only having 30 orbitals per core. This means when the diagonalization takes place it's blocks will be ~30x30 matrices. You are reaching the bottleneck of the diagonalization routines which takes up a considerable amount of time in this case. So there isn't much to do :) Your system is simply too small to scale to larger core-counts. Also: 1. The METIS decomposition has no influence on the diagonalization routines. 2. The vacuum region has very little overhead, so that shouldn't affect the calculation time a lot, same goes for the dipole-correction. The fact that 2. isn't influencing your speed-up is a further indication that it is the diagonalization routines. Den ons. 21. okt. 2020 kl. 22.03 skrev Karen Fidanyan < fidan...@fhi-berlin.mpg.de>: > Dear colleagues, > > I started working with Siesta (v4.0.2-17, Intel compiler) recently, and > cannot understand its performance. I do a 6x6x7 Pd slab with a water > molecule and have a long time per SCF step: > #Cores seconds_per_scf_step > 2 753.0 > 4 376.8 > 8 235.3 > 16 181.4 > 32 114.5 > 64 86.3 > 128 71.9 > > I'm puzzled with the fact that performance is saturated at 64 cores, > although SCF is still pretty slow. I tried: METIS domain decomposition, > reducing vacuum layer almost to zero and disabling dipole correction, > but the numbers are similar in all those cases. > I enclose the archive with my inputs and selected outputs. If someone > could point out what is wrong, I would be very grateful. > > Best regards, > Karen Fidanyan > > > -- > SIESTA is supported by the Spanish Research Agency (AEI) and by the > European H2020 MaX Centre of Excellence (http://www.max-centre.eu/) > -- Kind regards Nick
-- SIESTA is supported by the Spanish Research Agency (AEI) and by the European H2020 MaX Centre of Excellence (http://www.max-centre.eu/)