The .machines file looks fine to me, but one of the others might see something that I didn't notice (besides the WIEN2k command not being there at the bottom of the file - likely missed in the copy and paste).

The main problem seems to the "bash: lapw1: command not found" unless something happened earlier that is not shown. Tracking down parallel error messages is more complicated. Unlike a serial calculation that can output the standard output and error to the display of a terminal on a desktop, a parallel calculation on a cluster with a queue system can put them in a standard output (-o) and standard error file (-e) or a combined output/error file (-j) with user specified name(s) [1,2]. They can also be written to the hidden dot files like .time* or .stdout* as mentioned before [3,4,5].

The "lapw1: command not found" might be because $WIENROOT didn't get added to the PATH on one of the nodes [ http://www.supercluster.org/pipermail/torqueusers/2010-March/010143.html ]. Did you try checking if the path to WIEN2k is in the PATH, such as PBS_O_PATH with qstat -f jobid [ http://stackoverflow.com/questions/21248406/sleep-command-not-found-in-torque-pbs-but-works-in-shell ].

Did you try to ssh into all 8 nodes and see if you can see lapw1 on each node? For example,

ssh n024
ls -l $WIENROOT/lapw1

ssh n225
ls -l $WIENROOT/lapw1

...

Above, I'm just guessing about the commands/configuration for your system, but the administrator or helpdesk for your cluster should know everything about your system and be able to help you much better with resolving the command not found error.

[1] http://beige.ucs.indiana.edu/I590/node39.html
[2] https://wikis.nyu.edu/display/NYUHPC/Tutorial+-+Submitting+a+job+using+qsub [3] http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13598.html [4] http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg14148.html
[5] http://zeus.theochem.tuwien.ac.at/pipermail/wien/2017-March/026109.html

On 3/13/2017 1:25 PM, shaymlal dayananda wrote:
Dear developers and users

I was trying to do a volume optimization and scf calculation with spin polarization in parallel mode. But my both the jobs crashes and I got the following error file. However both cases run correctly when parallel mode is removed.
............................................................................
'LAPW2' - can't open unit: 30
 'LAPW2' -        filename: case.energyup_1
**  testerror: Error in Parallel LAPW2
.................................................................................
Also in STDOUT , I see the following particular errors. (

.......................................................................
bash: lapw1: command not found
...
....
.....
FERMI - Error
grep: *scf1dn*: No such file or directory
0.381u 0.507s 1:12.66 1.2%    0+0k 128+1736io 1pf+0w
Test-TiC-VOl-parallel.scf1dn_1: No such file or directory.
.............................................................................


I copied my machine file and the job file here. But I think this is not correct and I am not sure whether I needs to have lines for lapw2 and lapwsp separately. Any help to get corrected this is highly appreciated.

".machnes" file
.............................
#
lapw0:n024  n225  n220  n218  n045  n044  n043  n043
1:n024
1:n225
1:n220
1:n218
1:n045
1:n044
1:n043
1:n043
granularity:1
extrafine:1

......................................................

job file is copied below.


# example for 8 nodes
#PBS -l procs=8
#PBS -l pmem=2048mb
#PBS -l walltime=4:00:00

module load wien2k

# change into your working directory
cd $PBS_O_WORKDIR
#start creating .machines
cat $PBS_NODEFILE |cut -c1-6 >.machines_current
aa=`cat .machines_current | wc -l`
echo '#' > .machines

# example for an MPI parallel lapw0
echo -n 'lapw0:' >> .machines
i=1
while [ $i -lt $aa ]
do
echo -n `cat $PBS_NODEFILE |head -$i | tail -1` ' ' >>.machines
i=$((i+1))
done
echo  `cat $PBS_NODEFILE |head -$i|tail -1` ' ' >>.machines

#example for k-point parallel lapw1/2
i=1
while [ $i -le $aa ]
do
echo -n '1:' >>.machines
head -$i .machines_current |tail -1 >> .machines
i=$((i+1))
done

echo 'granularity:1' >>.machines
echo 'extrafine:1' >>.machines

#define here your WIEN2k command


....................................................................


Thank you

Chami
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to