Dear SIESTA Community and Developers,
I have located a small but important bug in the verlet2 subroutine
implemented in the SIESTA code, which affects its performance when running
in parallel mode. The verlet2 subroutine is part of the dynamics.f file
and is employed to run standard microcanonical MD simulations.
I realized of the error while performing MD simulations of the finite
temperature properties of small atomic clusters. At first, it just seemed
that some runs were finished with a Cholesky-type error (a recurrent
subject in this list). But after a few days I clearly saw that the error
only appears in the verlet option (microcanonical ensemble) and not in any
of the other MD options (Nose, Parrinello-Rahman, anneal). Moreover, it
only appears when a verlet run is restarted (no problem if the initial
velocities are generated from a MB distribution) and only in parallel runs
(the same simulation, in serial mode, did not stop with an error message).
I could also see that the density matrix does not achieve self-consistency
in the MD steps previous to the Cholesky error message (I think this has
been mentioned in this list previously). Finally, the energy conservation
is poor (a clear signature that something is going wrong with Verlet) when
restarting an MD run in parallel mode.
As I found no solution to this problem in the e-mail archive, I decided to
track the error myself. The solution is quite simple once the error was
located; the location itself took longer. The error is in the variable
old_dt. This is used only to calculate the velocities in the first time
step (the ones read from the XV file are one time step behind, so they are
not the current velocities for the first time step). old_dt is read by
Node zero from the VERLET_RESTART file, but then is not broadcasted to the
other nodes. Broadcasting is necessary, as old_dt is used afterwards by
all nodes in order to calculate the atomic velocities. To correct the
error, simply change
call broadcast(accold(1:3,1:natoms))
call broadcast(vold(1:3,1:natoms))
by
call broadcast(old_dt)
call broadcast(accold(1:3,1:natoms))
call broadcast(vold(1:3,1:natoms))
right after the VERLET_RESTART file is read. Then parallel and serial
versions give the same results, there is no drift in total energy, no
self-consistency problems, and, finally, no Cholesky error!
I am not sure if this will solve all the Cholesky errors reported in this
list (one must be aware that there may be "physical" reasons for errors of
this type, such as two atoms becoming too close), but hopefully will
elliminate many of them, so it is important that you fix this error in
your versions of the SIESTA code.
Hope this is useful to all of you. Best wishes,
Dr. Andres Aguado
Department of Theoretical Physics
University of Valladolid
47011 Valladolid
SPAIN