Hi, ## Summary: Running pw.x on 128-1024 processors, testing bulk 64-Si cell at gamma (gamma tricks not used because of incompatibility with subsequent calculations) with a "large" number of (extra) bands. No problems reported when nbnd is small. With 128-256 processors, when nbnd>1300, if using Davidson diag, program exits before completion of 1 scf step, with cholesky decomposition failure error; if using iterative diag (cg), fails at same stage with error "(ZHEGV*) failed". System is Cray XT4.
## Purpose: reproducing the beautiful results of PHYSICAL REVIEW B 79, 201104, 2009 for GWW education purposes. :) ## Background: I have found similar-looking problems reported here, and have tried several of the recommendations (switching to ndiag 1 at runtime to use serial diag instead of parallel; switching from david to cg). In addition, I have tried increasing the PW cutoff (to provide more PWs relative to requested bands for the sake of Davidson diag, but this does not really help). I also attempted to do a regular SCF calculation with no nbnd specification, followed by a NSCF calculation with extra bands specified. The same errors are obtained. ## Current status: I am now trying to rule out memory-related errors (via running on more nodes), and will update this thread accordingly if the problem is related to memory requirements. Running on 512 processors permitted nbnd=2500 (converged results should require ~3300 bands for this particular calculation, according to my understanding of the noted paper), and I have some 1024 processor runs queued up. It does not seem to me that such a system, even with so many states, should have such large memory demands, so am wondering if I am doing something stupendously wrong (or perhaps not exactly doing something wrong, but failing to do something glaringly obvious that would solve the problem). Below is my input file, followed by some brief technical specs in case such are helpful. ## Sample input file: &control calculation='scf' restart_mode='from_scratch', prefix='si' outdir='/scr/josepht/espresso/bsi64/Large_GAMMA/STEP_B/tmp' pseudo_dir='/scr/josepht/espresso/bsi64/pseudo' / &system ibrav= 8, celldm(1)= 20.52, celldm(2)= 1, celldm(3)=1, nat= 64, ntyp= 1, ecutwfc = 35.0, nosym=.true. nbnd = 3328, / &electrons diagonalization='david', conv_thr = 1.0d-8, mixing_beta = 0.5, / ATOMIC_SPECIES Si 1. Si.pbe-rrkj.UPF ATOMIC_POSITIONS (bohr) Si 0.00000000 0.00000000 0.00000000 Si 5.13000000 5.13000000 0.00000000 Si 0.00000000 5.13000000 5.13000000 Si 5.13000000 0.00000000 5.13000000 Si 2.56500000 2.56500000 2.56500000 Si 7.69500000 7.69500000 2.56500000 Si 7.69500000 2.56500000 7.69500000 Si 2.56500000 7.69500000 7.69500000 Si 10.26000000 0.00000000 0.00000000 Si 15.39000000 5.13000000 0.00000000 Si 10.26000000 5.13000000 5.13000000 Si 15.39000000 0.00000000 5.13000000 Si 12.82500000 2.56500000 2.56500000 Si 17.95500000 7.69500000 2.56500000 Si 17.95500000 2.56500000 7.69500000 Si 12.82500000 7.69500000 7.69500000 Si 0.00000000 10.26000000 0.00000000 Si 5.13000000 15.39000000 0.00000000 Si 0.00000000 15.39000000 5.13000000 Si 5.13000000 10.26000000 5.13000000 Si 2.56500000 12.82500000 2.56500000 Si 7.69500000 17.95500000 2.56500000 Si 7.69500000 12.82500000 7.69500000 Si 2.56500000 7.69500000 7.69500000 Si 10.26000000 0.00000000 0.00000000 Si 15.39000000 5.13000000 0.00000000 Si 10.26000000 5.13000000 5.13000000 Si 15.39000000 0.00000000 5.13000000 Si 12.82500000 2.56500000 2.56500000 Si 17.95500000 7.69500000 2.56500000 Si 17.95500000 2.56500000 7.69500000 Si 12.82500000 7.69500000 7.69500000 Si 0.00000000 10.26000000 0.00000000 Si 5.13000000 15.39000000 0.00000000 Si 0.00000000 15.39000000 5.13000000 Si 5.13000000 10.26000000 5.13000000 Si 2.56500000 12.82500000 2.56500000 Si 7.69500000 17.95500000 2.56500000 Si 7.69500000 12.82500000 7.69500000 Si 2.56500000 17.95500000 7.69500000 Si 0.00000000 0.00000000 10.26000000 Si 5.13000000 5.13000000 10.26000000 Si 0.00000000 5.13000000 15.39000000 Si 5.13000000 0.00000000 15.39000000 Si 2.56500000 2.56500000 12.82500000 Si 7.69500000 7.69500000 12.82500000 Si 7.69500000 2.56500000 17.95500000 Si 2.56500000 7.69500000 17.95500000 Si 10.26000000 10.26000000 0.00000000 Si 15.39000000 15.39000000 0.00000000 Si 10.26000000 15.39000000 5.13000000 Si 15.39000000 10.26000000 5.13000000 Si 12.82500000 12.82500000 2.56500000 Si 17.95500000 17.95500000 2.56500000 Si 17.95500000 12.82500000 7.69500000 Si 12.82500000 17.95500000 7.69500000 Si 10.26000000 0.00000000 10.26000000 Si 15.39000000 5.13000000 10.26000000 Si 10.26000000 5.13000000 15.39000000 Si 15.39000000 0.00000000 15.39000000 Si 12.82500000 2.56500000 12.82500000 Si 17.95500000 7.69500000 12.82500000 Si 17.95500000 2.56500000 17.95500000 Si 12.82500000 7.69500000 17.95500000 Si 0.00000000 10.26000000 10.26000000 Si 5.13000000 15.39000000 10.26000000 Si 0.00000000 15.39000000 15.39000000 Si 5.13000000 10.26000000 15.39000000 Si 2.56500000 12.82500000 12.82500000 Si 7.69500000 17.95500000 12.82500000 Si 7.69500000 12.82500000 17.95500000 Si 2.56500000 17.95500000 17.95500000 Si 10.26000000 10.26000000 10.26000000 Si 15.39000000 15.39000000 10.26000000 Si 10.26000000 15.39000000 15.39000000 Si 15.39000000 10.26000000 15.39000000 Si 12.82500000 12.82500000 12.82500000 Si 17.95500000 17.95500000 12.82500000 Si 17.95500000 12.82500000 17.95500000 Si 12.82500000 17.95500000 17.95500000 K_POINTS 1 0.0 0.0 0.0 1.0 ##END OF INPUT The above file runs when nbnd = 1280 , and (possibly) relevant output from the successful run includes: (Each subspace H/S matrix 400.00 Mb ( 5120,5120) ## Technical specs: Code was compiled on a Cray XT4 (unsure if compilation details would be helpful), and runs were performed on Cray XT4 nodes with two quad-core 2.3 GHz AMD Opteron processors with 16 GBytes of usable memory (requesting 4 cores per node). I've read here that the problem might be related to libraries/compilers (issues with PGI, ACML, etcetera)...if that is likely the case, I would be interested in insight regarding optimal compilation on Cray. Thanks in advance for any assistance, and I apologize if this question has essentially already been answered on the forum - I searched but did not come across an explicit solution to something matching this, though admit that the general theme is present in several independent threads. Joseph Turnbull Department of Physics NC State University
