You are on the right track to analyze your problem. Did you define a $SCRATCH variable pointing to some local directory ?
Otherwise try to verify and isolate the problem by running the steps "by hand". x lapw2 -p on either n01 or n02 or even: ssh n02 "cd $PWD;time lapw2 lapw2_1.def" Yongsheng Zhang schrieb: > Additional information: > > lapw2 parallel is only crashed when I am concerning the data transition > between nodes or connection between nodes, i.e. > It is no problem when I run the job in the local node in 2 CPU parallel. > But if I use the local node as the main node, which used to run lapw0, > and use one of other nodes to run parallel, the "lapw2 parallel crashed" > occurs. I don't know if I sell my problem correctly. So here is an > example: There are two nodes, called n01 and n02, (each node has 2 > CPUs). When I login the n01 node and directly use the 2 CPU to run > parallel, and .machines file is: > granularity:1 > 1: n01 > 1: n01 > everything is fine. > > But if I still stay on the n01 node, but use n02 to run parallel, > correspondingly my .machines file changes to , > > granularity:1 > 1: n02 > 1: n02 > > Then, my lapw2 parallel is crashed after lapw1. > > Thank you very much > Zhang > > zhang at fhi-berlin.mpg.de wrote: >> Dear all, >> >> The latest wien2k version (8.1) is compiled successfully in the IBM linux >> cluster, which use intel f95i version 9.0 as fortran compiler and cc as C >> compiler, and MKL 9.0 libraries. >> >> For small jobs such as bulk systems, it is no problem to use on single >> CPUs or k-point parallel on several nodes (2 CPUs on each node). And for >> large system, it is only no problem if I run it on a single CPU or on the >> 2 CPUs parallel in one node. But if the k-point parallel includes more >> than 1 nodes, after lapw1 parallel is successfully done, the lapw2 is >> crashed with the following information: (example of the k-parallel on 2 >> nodes, 4 CPUs) >> >> LAPW0 END >> LAPW1 END >> LAPW1 END >> LAPW1 END >> LAPW1 END >> LAPW2 - FERMI; weighs written >> Segmentation fault >> Segmentation fault >> LAPW2 END >> LAPW2 END >> cp: cannot stat `.in.tmp': No such file or directory >> rm: cannot remove `.in.tmp': No such file or directory >> rm: cannot remove `.in.tmp1': No such file or directory >> >> >>> stop error >>> >> For the lapw2 output files, case.scf2_1(2) contains all finished >> information, but case.scf2_3(4) only has one line. In the lapw2.error >> file, it says, >> ** testerror: Error in Parallel LAPW2 >> >> And in the dayfile, it says, >> ** LAPW2 crashed! >> 0.473u 0.412s 0:15.32 5.7% 0+0k 0+0io 0pf+0w >> error: command /batch/mfh/yzhang/wien-08-t/lapw2para lapw2.def failed >> >> In the same machine, my old Wien2k version run without any problem. So I >> am wondering if there is something wrong in the new version's >> lapw2para_lapw? >> >> BTW: I am sure my machines file is correct and use "real" machines' names. >> >> Thanks, >> Zhang >> _______________________________________________ >> Wien mailing list >> Wien at zeus.theochem.tuwien.ac.at >> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien >> > > -- P.Blaha -------------------------------------------------------------------------- Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna Phone: +43-1-58801-15671 FAX: +43-1-58801-15698 Email: blaha at theochem.tuwien.ac.at WWW: http://info.tuwien.ac.at/theochem/ --------------------------------------------------------------------------