There is a little incompatibility (bug) in opticpara_lapw when using -hf (hybrid calculations).

Because of the nature of the hf-module, we have to set the scratch variable to blank in any def file which has the -hf switch.

This is done in x_lapw :

....
if ($hf == hf) then
  set scratch=
  set scratchstring=
  setenv SCRATCH $scratch
endif
....

and means that any calculation with -hf should NOT use scratch for case.vectorhf_XX, but also case.symmat_XX, .....

opticpara_lapw has 2 "logial" problems:

...
if ( $?SCRATCH ) then
  set scratch=`echo $SCRATCH  | sed -e 's/\/$//'`/ # we are afraid
                                # different settings in different
                                # computing centers
                                #use global variable for scratch if set
endif
...

should be done only if SCRATCH is not empty (but since x_lapw has set it empty, also in opticpara it is empty).
The solution is the same what is already in x_lapw, change the if statement:

if ( $?SCRATCH && $SCRATCH != '') then

------------
second problem:
opticpara calls opticcopy_lapw, but also this should NOT be done when -hf was specified (the symmat files are already local and not on a remote node in $SCRATCH).

I attach a patched   opticpara_lapw script.

Regards
Peter Blaha

On 11/14/19 4:01 AM, Oleg Rubel wrote:
Dear Wien2k community,

I run into a problem when performing optics calculations in parallel mode (not MPI), hybrid with SOC. It is run for Si, but it is just a step stone to heavier materials where SOC really matters.

Optics is executed as

[rubel@gra690 optics]$ x optic -so  -hf -p
running OPTIC in parallel mode
[1] 4932
 OPTIC END
[1]  + Done                          ( cd $PWD; $t $exe ${def}_${loop}.def; rm -f .lock_$lockfile[$p] ) >> .timeop_$loop
[1] 4937
...
   Summary of opticpara:
   localhost     user=0  wallclock=203580
scratch=/
touch: cannot touch '/optics.symmat': Read-only file system
touch: cannot touch '/optics.mommat2': Read-only file system
touch: cannot touch '/optics.mat_diag': Read-only file system
touch: cannot touch '/optics.mme': Read-only file system
/optics.symmat: Read-only file system.
/optics.symma1: Read-only file system.
/optics.symma2: Read-only file system.
/optics.mat_diag: Read-only file system.
/optics.mme: Read-only file system.
rm: cannot remove '/optics.symmat_1': No such file or directory
rm: cannot remove '/optics.mat_diag_1': No such file or directory
rm: cannot remove '/optics.mme_1': No such file or directory
...

The output shows that optics actually ends OK, but the script gets stuck with results files pointing to the root directory "/" for some reason. Of course, I have no permission to write there. The same problem was reported earlier on the mailing list
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17103.html

The problem is the value of "scratch" variable. I edited the file

[rubel@gra690 optics]$ vim $WIENROOT/opticcpara

to display the variable. As you can see in the output above, the value is "scratch=/" in spite of the fact that

[rubel@gra690 optics]$ echo $SCRATCH
./

The workaround is to make changes in the file $WIENROOT/opticcpara

if ( $?SCRATCH ) then
  set scratch=`echo $SCRATCH  | sed -e 's/\/$//'`/ # we are afraid
                                # different settings in different
                                # computing centers
                                #use global variable for scratch if set
  echo "scratch=$scratch" # OLEG
  set scratch=$SCRATCH # OLEG
endif

I am not sure what does the whole command with "sed ..." suppose to do? Why do we need to change $SCRATCH value? I tried in different shells

[rubel@gra-login1 optics]$ scratch=`echo $SCRATCH  | sed -e 's/\/$//'`/
[rubel@gra-login1 optics]$ echo $scratch
./
[rubel@gra-login1 optics]$ /bin/csh
[rubel@gra-login1 optics]$ set scratch=`echo $SCRATCH  | sed -e 's/\/$//'`/
[rubel@gra-login1 optics]$ echo $scratch
./
[rubel@gra-login1 optics]$ echo $shell
/bin/tcsh

I could not reproduce "/" in the command line, but in the script the value is different for some reason.

Any thoughts are welcome :)

Thank you in advance
Oleg


--

                                      P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/TC_Blaha
--------------------------------------------------------------------------
#!/bin/tcsh -f
#
# Run optic in parallel mode
#
# $Author: M.Lee $
# 

touch .lock_
foreach i (.lock_*)
    rm $i
end

onintr exit
set name        = $0
set bin         = $name:h       #default directory for WIEN-executables
if !(-d $bin) set bin = .
set name        = $name:t

unalias rm
alias   testinput       'if (! -e \!:1 ||  -z \!:1) goto \!:2'
alias   testerror       'if (! -z \!:1.error) goto error'

set t       = time  
set cmplx
set log     = :parallel
set defmach = `hostname`
set updn                        # spinpolarization switch
set dnup    = 'dn'              # spinpolarization switch
set sc                          # semicore-switch
set hf                          # hybrid-switch
set so                          # spinorbit-switch
set remote = ssh
set tmp_dir = /tmp
set init = init:
set res  = residue:
set taskset0
set taskset='no'  

set scratch =      

if ( $?SCRATCH && $SCRATCH != '') then
  set scratch=`echo $SCRATCH  | sed -e 's/\/$//'`/ # we are afraid
                                # different settings in different
                                # computing centers
                                #use global variable for scratch if set
endif


############################################################################
# In this section use 0 to turn of an option, 1 to turn it on, 
# respectively choose a value

set useremote   = 1             # using remote shell to launch processes
setenv DELAY 0.1                  # delay launching of processes by n seconds
set debug       = 0             # set verbosity of debugging output

############################################################################
# and now we look if we should override the defaults
if (-e $bin/parallel_options) then
      source $bin/parallel_options
endif
if ( $?TASKSET ) then
        set taskset="$TASKSET"
endif
if ( $?USE_REMOTE ) then
        set useremote = $USE_REMOTE
endif
############################################################################

set tmp  = $tmp_dir/opticpara.$user.$$
set tmp2 = $tmp_dir/opticpara.$user.$$_2

if ($#argv < 1) then
        echo usage: $0 deffile
        exit
endif

while ($#argv)
  switch ($1)
  case -h:
  case -H: 
    set help
    shift; breaksw
  case -c:
    set cmplx = c
    shift; breaksw
  case -up:
    set updn = 'up'
    set dnup = 'dn'
    shift; breaksw
  case -dn:
    set updn = 'dn'
    set dnup = 'up'
    shift; breaksw
  case -hf:
    set hf = 'hf'
    shift; breaksw
  case -so:
    set so = 'so'
    shift; breaksw
  case -sc:
    set sc = 's'
    shift; breaksw
  default:
    set def = $1:r
    shift; breaksw
  endsw
end

set exe = $bin/optic$cmplx
set exe = optic$cmplx

#are we running parallel?
testinput .processes single
echo "running OPTIC in parallel mode"
echo "RUNNING" >.opticpara

#before we start, we wipe away all parallel error files
if ( -e optic_1.error ) rm *optic_*.error
if ( -e .timeop_1) rm .timeop_*

if (-e .machines.help) rm .machines.help

grep -v $init .processes|grep : | grep -v $res >$tmp2
set mist     = `wc $tmp2 `
set maxproc  = $mist[1]
#set machine  = `grep $init .processes |cut -f2 -d: | xargs`
set machine  = `grep -v $init .processes |grep : | grep -v $res | cut -f2 -d: | 
xargs`

set lockfile = `cut -f2 -d: $tmp2 | awk '{print $1 NR}'|xargs`
set residue  = `grep $res .processes|cut -f2 -d:`
if ($residue == "") unset residue

if ($debug > 0) echo machines: $machine

echo "** " Error in Parallel OPTIC >$def.error
#bounding cpus
set p_cpu_bound = ($machine)
set i=1
set cpu=0
set old=old
while ($i <= $#p_cpu_bound)
  if($old != $p_cpu_bound[$i]) then
     set cpu=0
  endif
  set old=$p_cpu_bound[$i]
  set p_cpu_bound[$i] = $cpu
  @ cpu ++
  @ i ++
end
#echo $machine
#echo $p_cpu_bound

#get name of case
setenv PWD `pwd|sed "s/tmp_mnt\///"`
#echo $PWD
setenv PWD $cwd
set case    = $PWD
set case    = $case:t          
if ($case == "") then
  echo "ERROR: cannot detect working directory $cwd -> exit"
  exit 1
endif
####set case    = $case:r           #head of file-names
if ($debug > 0) echo Setting up case $case for parallel execution
if ($debug > 0) echo of OPTIC
if ($debug > 0) echo "  "
#

#creating  def files
if ($debug > 0) echo " "
if ($debug > 0) echo -n "creating "$def"_*.def:  "
set i = 1
while ($i <= $maxproc)
  if ($debug > 0) echo -n "$i "
  cp $def.def .tmp
  #subsituting in files:
  cat <<theend >.script
s/vector$hf$so$dnup/&_$i/w .mist
s/vector$hf$so$updn/&_$i/w .mist
s/outputop/&_$i/w .mist
s/symmat/&_$i/w .mist
s/symma1/&_$i/w .mist
s/symma2/&_$i/w .mist
s/mommat2/&_$i/w .mist
s/mat_diag/&_$i/w .mist
s/mme/&_$i/w .mist
s/symop/&_$i/w .mist

theend

  sed -f .script .tmp > .tmp1
  sed "s/vector_${i}_$i\&dn/vectordn_$i/" .tmp1>.tmp2 
  sed "s/vector_${i}_$i\&up/vectorup_$i/" .tmp2>.tmp1
  sed "s/vector_${i}dn_$i/vectordn_$i/" .tmp1>.tmp2
  sed "s/vector_${i}_$i/vector_$i/" .tmp2>.tmp1
  sed "s/vectorhf_${i}dn_$i/vectorhfdn_$i/" .tmp1> .tmp2
  sed "s/vectorso_${i}dn_$i/vectorsodn_$i/" .tmp2> "$def"_$i.def
#similar fix for SO necessary

  @ i ++
end
if ($debug > 0) echo " "


#starting processes
if ($debug > 0) echo " "
if ($debug > 0) echo "starting process:  "

echo "->  "starting parallel optic at `date` >>$log

set loop    = 0
set endloop = 0
set runmach = ""
echo "files:$maxproc" >.processes2
# change working dir because of problems with automounted directories
#   cd $cwd
while ($loop < $maxproc)
  set p = 1
  if ($?residue && $?resok) set p = 2
  while ($p <= $#machine)
    if ($loop < $maxproc) then
        if !(-e .lock_$lockfile[$p]) then
            @ loop ++
            echo "${loop}:${maxproc} : $p_cpu_bound[$p]" >.processes2
            if ($debug > 0) echo prepare $loop on $machine[$p]
            set runmach = ($runmach $machine[$p])
            echo $runmach >>.processes2
            if ($debug > 1) echo "   >  $exe ${def}_${loop}.def on $machine[$p]"
            if ($debug > 1) echo "   >  $exe ${def}_${loop}.def on 
$machine[$p]">>$log
            if ($useremote == 1) then
            if ($debug > 1) echo use remote
                touch .lock_$lockfile[$p]
                echo -n "$runmach[$loop] ">.timeop_$loop
                if("$taskset" != 'no') set taskset0="$taskset $p_cpu_bound[$p]"
                ($remote $machine[$p] "cd $PWD;$t $taskset0 $exe 
${def}_${loop}.def;rm -f .lock_$lockfile[$p]") >>.timeop_$loop  &
         else
                if ($debug > 1) echo not using remote shell
                touch .lock_$lockfile[$p]
                echo -n "$runmach[$loop] " >.timeop_$loop
                (cd $PWD;$t $exe ${def}_${loop}.def;rm -f .lock_$lockfile[$p]) 
>>.timeop_$loop  &
            endif
        endif
    if ($debug > 1) echo sleeping for $DELAY seconds
    sleep $DELAY
            jobs -l >.optic${cmplx}para.$$.`hostname`
    endif
    @ p ++
  end
end

#wait for execution to be completed
if ($debug > 0) echo " "
if ($debug > 0) echo "waiting for processes:  "
wait


set i = 1
while ($i <= $maxproc)
  testerror "$def"_$i
  @ i ++
end

#cpu summary:
set i = 1
while ($i <= $maxproc)
#    echo "      "`cat .timeop_$i`
#fix for bash timing
    bashtime2csh.pl_lapw .timeop_$i > .time_tmp
    mv .time_tmp .timeop_$i
    echo "      "`cat .timeop_$i` >>$log
    @ i ++
end

# postanalysis
echo "   Summary of opticpara:" >$tmp
set p = 1
while ($p <= $#machine)
    set m = $runmach[$p]
    cat .timeop_* | grep $m | tr "():" " " | \
            awk '{u += $2; cl += 60*$4+$5} \
                END {print "   '$m'\t user=" u "\t wallclock=" cl}' >>$tmp
    @ p ++
end

uniq < $tmp |tee -a $log

echo "<-  "done at `date` >>$log

# concatenating the case.symmat files and case.mommat files

if (-e $case.symop)  rm $case.symop
if (-e ${scratch}$case.mme$updn)  rm ${scratch}$case.mme$updn
if (-e ${scratch}$case.symmat$updn)  rm ${scratch}$case.symmat$updn
if (-e ${scratch}$case.mommat2$updn)  rm ${scratch}$case.mommat2$updn
if (-e ${scratch}$case.mat_diag$updn) rm ${scratch}$case.mat_diag$updn
touch ${scratch}$case.symmat$updn
touch ${scratch}$case.mommat2$updn
touch ${scratch}$case.mat_diag$updn
touch ${scratch}$case.mme$updn

mv $case.symop_1 $case.symop
rm $case.symop_*
if( "$hf" != "hf" ) then
   opticcopy_lapw
endif

set i = 1
while ($i <= $maxproc)
 if ( $i == 1 ) then
#    testinput $case.symmat_$i$updn scratchwarning
    cat ${scratch}$case.symmat_$i$updn    >  ${scratch}$case.symmat$updn
    cat ${scratch}$case.symma1_${i}$updn    >  ${scratch}$case.symma1$updn
    cat ${scratch}$case.symma2_${i}$updn    >  ${scratch}$case.symma2$updn
    if (-e ${scratch}$case.mommat2_$i$updn) then
      cat ${scratch}$case.mommat2_$i$updn   >  ${scratch}$case.mommat2$updn
    endif
    cat ${scratch}$case.mat_diag_$i$updn  >  ${scratch}$case.mat_diag$updn
    cat ${scratch}$case.mme_$i$updn  >  ${scratch}$case.mme$updn
 else
    tail -n +2  ${scratch}$case.symmat_$i$updn   >>  ${scratch}$case.symmat$updn
    tail -n +2  ${scratch}$case.symma1_${i}$updn   >>  
${scratch}$case.symma1$updn
    tail -n +2  ${scratch}$case.symma2_${i}$updn   >>  
${scratch}$case.symma2$updn
    if (-e ${scratch}$case.mommat2_$i$updn) then
      tail -n +2  ${scratch}$case.mommat2_$i$updn   >>  
${scratch}$case.mommat2$updn
    endif
    tail -n +2  ${scratch}$case.mat_diag_$i$updn >>  
${scratch}$case.mat_diag$updn
    tail -n +2  ${scratch}$case.mme_$i$updn >>  ${scratch}$case.mme$updn
 endif
    rm ${scratch}$case.symmat_$i$updn
    rm -f ${scratch}$case.mommat2_$i$updn
    rm ${scratch}$case.mat_diag_$i$updn
    rm ${scratch}$case.mme_$i$updn
    @ i ++
end

echo "<-  "done at `date` >>$log
echo "-----------------------------------------------------------------">>$log
rm $def.error
#rm .in.tmp .in.tmp1
touch $def.error
rm $tmp* >&/dev/null
rm .optic${cmplx}para.$$.`hostname` >&/dev/null
echo "DONE" >.opticpara
exit 0

single:
echo "running in single mode"
$exe $def.def
rm $tmp* >&/dev/null
rm .optic${cmplx}para.$$.`hostname` >&/dev/null
exit 0

scratchwarning:
echo "Could not find $case.symmat_$i$updn , which is probably because you used 
a scratch directory $SCRATCH"
echo "Copy these files  from the remote machines and concatenate them yourself"
echo "with commands like (for all your parallel calculations i):"
echo " cat $case.symmat_"'$i'"$updn >> $case.symmat$updn           when $i =1a "
echo " tail -n +2  $case.symmat_"'$i'"$updn >> $case.symmat$updn   else"
exit 1

error:
echo "** " OPTIC crashed!
echo "** " OPTIC crashed at `date`>>$log
echo "** " check ERROR FILES! >>$log
echo "-----------------------------------------------------------------">>$log
echo "** " Error in Parallel OPTIC >>$def.error
rm $tmp* >&/dev/null
rm .optic${cmplx}para.$$.`hostname` >&/dev/null
echo "ERROR" >.opticpara
exit 1
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to