Thank you very much for all your responses.

I did some more testing to provide more information.

1. I tried a new compilation (since dr Gavin had no problems with my calculation, I thought it might have been a compilation issue) but nothing changed.

2. Adding "x" to opticpara script shows that the script loops on a:

while ( 0 < 2 )
set p = 1
if ( 0 && 0 ) set p = 2
while ( 1 < = 0 )

which corresponds to lines 213-246 (in opticpara):

while ($loop < $maxproc)
  set p = 1
  if ($?residue && $?resok) set p = 2
  while ($p <= $#machine)

I tracked down that line 126:

set machine = `grep -v $init .processes |grep : | grep -v $res | cut -f2 -d: | xargs`

gives me nothing (the output of this command is just blank).
It is supposed to take the second column from my .processes file (without the init:* lines), which in my case is empty:
1 :  :  143 : 1 : 1 : 0
2 :  :  143 : 1 : 2 : 0

What is supposed to be in that column? Isn't that the node names? .processes is generated automatically from .machines, and my machines looks OK (and it works for previous calculations):


There is line 125:

set machine  = `grep $init .processes |cut -f2 -d: | xargs`

which is commented, but it would make more sense to use it here. I commented line 126, uncommented 125 and it seems to work now, but I don't know if it has any other consequences. Can I leave it like that? Someone wiser than me commented that line, and they probably had some reason for doing so.

I'm not really sure what to do next. Any help would be appreciated. Please tell me if there is any other info that you might need.

Best regards,

Maciej Polak

P.S. the answers to your other questions:
1. All the files that are created after "x optic -p" is executed:

-rw------- 1 mpolak grant045     172 04-23 02:53 .script
-rw------- 1 mpolak grant045 17 04-23 02:53 .running.100962.wn0926.2304025353
-rw------- 1 mpolak grant045       8 04-23 02:53 .processes2
-rw------- 1 mpolak grant045    7793 04-23 02:53 :parallel
-rw------- 1 mpolak grant045       8 04-23 02:53 .opticpara
-rw------- 1 mpolak grant045      28 04-23 02:53 optic.error
-rw------- 1 mpolak grant045    1475 04-23 02:53 optic.def
-rw------- 1 mpolak grant045    1495 04-23 02:53 optic_2.def
-rw------- 1 mpolak grant045    1495 04-23 02:53 optic_1.def
-rw------- 1 mpolak grant045    1115 04-23 02:53 .mist
-rw------- 1 mpolak grant045    2449 04-23 02:53 :log
-rw------- 1 mpolak grant045       5 04-23 02:53 .lapw1para
-rw------- 1 mpolak grant045       0 04-23 02:53 lapw1.error

2. "ps -ef | grep optic" gives:

mpolak 102451 97092 0 03:04 ? 00:00:00 /bin/csh -f /home/mpolak/WIEN2k/x optic -p mpolak 102465 102451 11 03:04 ? 00:00:03 /bin/csh -fx /home/mpolak/WIEN2k/opticpara optic.def

On 04/22/2016 07:27 AM, Peter Blaha wrote:
First one needs a detailed information which files are really generated in order to see where it stucks. ls -alsrt list the files with full information (empty or non-empty files, date+time of last write).

Then you should do a ps -ef and see what is running in connection with optic (maybe add |grep optic)

If it does not start the parallel optic calculations, you may edit opticpara and replace -f by -fx in the first line of this script.

It will give you a very lengthy, hard to read output, but basically this should help to find the exact position/reason where it got stuck.

PS: I guess you have tried this to reproduce in a fresh directory ?

Am 22.04.2016 um 05:08 schrieb Gavin Abo:
If you haven't already done so, I would suggest looking at the content
in the files .timeop_1, .timeop_2, ... , and .timeop_X (e.g., while in
the case directory: cat .timeop_*), because an error message might be
logged in these files for a parallel optic calculation.

On 4/21/2016 3:44 PM, Maciej Polak wrote:
Dear WIEN2k Community,

I want to calculate the joint density of states but I ran into some
problems with parallel execution of x optic. I use only K-point
parallelization and run the newest 14.2 version of WIEN2k.

When I do sequential calculations, it all works fine. But for bigger
cases, and many K-points it is impossible to finish on one CPU. After
I add the -p flag to the relevant procedures, the last output I see
is: running OPTIC in parallel mode. From then, nothing happens. The
optic_X.def files are generated, and an optic.error file containing
"Error in Parallel OPTIC", nothing else. The code just stands still
after that, no activity on CPUs.

A simple minimalistic example to reproduce the error:

init_lapw -bw -vxc 5 -rkmax 7 -numk 1000 -red 2
run_lapw -p
x kgen <<< 10000
x lapw1 -p
x lapw2 -fermi -p
x optic -p

The same set of calculations, without the -p flag, would work just
fine. However, when I generate a bigger k-mesh and have a large number
of atoms it is absolutely impossible to perform the calculations on a
single core.

Regular k-point calculations (geometry optimization, bandstructures,
etc.) work perfectly.

I attached my *.struct and *.inop, but they are not the problem in
this case, since they work with sequential version as intended. This
is just a super simple FCC Si calculation just for testing.

I would really appreciate any help. I tried to read through the
mailing list, but couldn't find a similar problem.

Best regards,

Maciej Polak
Wroclaw University of Science and Technology
Wien mailing list

Wien mailing list

Reply via email to