Hi Laurence, I will try all you have said. I didn't know about the -assu buff option - I suppose it is valid for ifort, right?
My scratch is already set. In fact, it was one of the variables I had the care to set, because I saw the size of the vector files (scary...) Finally, no problem with slowing down things a little. I'd rather have things slowed down a few seconds than have two hours lost (that is roughly the time it takes for running lapw1 -up/dn for my system) plus the hours when nothing happen until I realize the job has died... And I stil have to include U and spin-orbit after that! Thanks a lot, Marcos On Wed, Jul 28, 2010 at 9:00 PM, Laurence Marks <L-marks at northwestern.edu>wrote: > This could be quite a lot of work. Some simpler suggestions: > > 1) In param.inc in SRC_lapw[0-2] change to > PARAMETER (restrict_output= 1) > This will reduce the size of the log files > > 2) Use -assu buff in your compilation options -- this writes data in > big chunks not line-by-line and is much > friendlier on file servers. > > 3) Set the environmental variable SCRATCH (export it from bash should > work) so large data files such > as the case.vector_X are local. > > 4) In $WIENROOT/parallel_options add (or edit) > set sleepy = XX # additional sleep before checking > set delay = YY > > where XX, YY are adjusted to try and reduce AFS problems ( 0.5 ? -- > this will slow things down but...) > > > 2010/7/28 Marcos Ver?ssimo Alves <marcos.verissimo.alves at gmail.com>: > > Hi all, > > I have managed to run Wien2k in our cluster, with k-point > parallelization. > > However, it looks like our NFS system (which is actually an AFS one) is > > still a bit unstable, since the cluster has been upgraded and > re-assembled > > very recently. Problem is, the sysadmins have gone on vacations, so I'll > > have to find a way of getting around this the best I can until the > beginning > > of next month. > > My current problem is that looks like some nodes of our cluster have been > > losing connection with the AFS server intermittently, and from what I see > > (please correct me if I'm wrong) all the writing is done over the network > to > > the home directory. So, during the writing of the energy_up files, if the > > connection is lost then lapw2 will crash. Indeed, one of the instances of > > lapw1 resulted in an energyup file, in the end, with 0 size. This in turn > > made lapw2 crash, and this has happened overnight. > > My question is, I would like to make a small (I guess) change in the > > scripts, wherever needed. Instead of writing some files (only the ones > that > > are critical for the execution of the next code) to the home, which would > be > > done over AFS, they would be done in the scratch directory, which is > local. > > Then, at the end of the execution, they would be copied to the home > > directory, possibly with a check on the success of the operation. I don't > > know if this would be better, but at least the problems with network load > > would be much more punctual, and it could also be more prone to error > > control. > > Since I do not have much knowledge of csh programming (I'm mostly a bash > > guy) and the Wien2k scripts are pretty complex beasts to which I am not > very > > acquainted, could you give your opinions on the feasibility of my > > suggestions, and if they are not too complex to implement, possible > changes > > and/or places to be changed in the scripts? > > Best regards, > > Marcos > > _______________________________________________ > > Wien mailing list > > Wien at zeus.theochem.tuwien.ac.at > > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien > > > > > > > > -- > Laurence Marks > Department of Materials Science and Engineering > MSE Rm 2036 Cook Hall > 2220 N Campus Drive > Northwestern University > Evanston, IL 60208, USA > Tel: (847) 491-3996 Fax: (847) 491-7820 > email: L-marks at northwestern dot edu > Web: www.numis.northwestern.edu > Chair, Commission on Electron Crystallography of IUCR > www.numis.northwestern.edu/ > Electron crystallography is the branch of science that uses electron > scattering and imaging to study the structure of matter. > _______________________________________________ > Wien mailing list > Wien at zeus.theochem.tuwien.ac.at > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20100729/85e11133/attachment-0001.htm>