At http://www.nersc.gov/users/software/applications/materials-science/wien2k/ , the last line in the job file has a star (*) after .machine. It seems to missing in the last line of your job file. Without it, the old .machines is not removed and maybe that prevents the new .machines file from being created.

Also, I suggest you talk to the consultant that administrates the cluster. They should be able to tell you better why you are getting the error "ssh: connect to host nid01855 port 204: Connection refused". They might have a firewall setup to block port 204 or might have disabled ssh access to node nid01855.

On 2/18/2016 8:31 AM, Dr. K. C. Bhamu wrote:
Dear Users and developers

I ran my job via slurm job file on a remote server (2 nodes/64 cores) everything went fine upto DOSS but when I ran "x optic -p" through job file the below mentioned message occurred:

[1] 1371
ssh: connect to host nid01855 port 204: Connection refused^M
[1] + Exit 255 ( $remote $machine[$p] "cd $PWD;$t $taskset0 $exe ${def}_${loop}.def;rm -f .lock_$lockfile[$p]" ) >> .timeop_$loop
[1] 1375
ssh: connect to host nid01855 port 204: Connection refused^M
[1] + Exit 255 ( $remote $machine[$p] "cd $PWD;$t $taskset0 $exe ${def}_${loop}.def;rm -f .lock_$lockfile[$p]" ) >> .timeop_$loop
[1] 1379
ssh: connect to host nid01855 port 204: Connection refused^M
[1] + Exit 255 ( $remote $machine[$p] "cd $PWD;$t $taskset0 $exe ${def}_${loop}.def;rm -f .lock_$lockfile[$p]" ) >> .timeop_$loop
***  OPTIC crashed!*
0.840u 1.800s 1:50.21 2.3%      0+0k 82495+1135io 4pf+0w
error: command /usr/common/software/wien2k-ccm/14.2/opticpara optic.def failed
...............

I went through the list and found couples of threads but the error is not solved.

Please look for this.

The job was successfully complied on a local two CPU based cluster (4GB RAM each)

The job file was:
--------------------------------------------------------
#!/bin/bash -l
#SBATCH -N 2
#SBATCH -n 64
#SBATCH -t 00:20:00
#SBATCH -p regular
#SBATCH -J orthorhombic_1
#SBATCH --ccm

#module load wien2k-ccm
#generating .machines file for k-point and mpi parallel lapw1/2
let ntasks_per_kgroup=1
gen.machines -m $ntasks_per_kgroup

#need to disable SLURM envs hereafter
unset `env|grep SLURM_|awk -F= '{print $1}'`

#put your Wien2k command here
x optic -p
#remove leftover .machines file
rm -fr .machine
---------------------------------------------------------------------------
*

*
regards
Bhamu*
*
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to