Re: [Wien] Logging output issue for parallel jobs?

2024-05-02 Thread Peter Blaha

No, not really.

lapw0 writes only to :parallel_lapw0

susi:/area51/WIEN2k_23> grep 'starting parallel' *_lapw
dstartpara_lapw:echo "starting parallel dstart at `date`"
dstartpara_lapw:echo "starting parallel dstart at `date`" >>$log
hfpara_lapw:echo "->  "starting parallel hf$cmplx at `date` >>$log
irreppara_lapw:echo "->  "starting parallel irrep at `date` >>$log
lapw0para_lapw:echo "starting parallel lapw0 at `date`"
lapw0para_lapw:echo "starting parallel lapw0 at `date`" >>$log
lapw1para_lapw:echo "starting parallel lapw1 at `date`"
lapw1para_lapw:echo "starting parallel lapw1 at `date`" >>$log
lapw1para_lapw:echo "->  starting parallel LAPW1 jobs at `date`"
lapw2para_lapw:echo "->  "starting parallel lapw2$cmplx at `date` >>$log
lapwdmpara_lapw:echo "->  "starting parallel lapwdm$cmplx at `date` >>$log
lapwsopara_lapw:echo "->  "starting parallel lapwso at `date` >>$log
nlvdwpara_lapw:echo "starting parallel nlvdw at `date`"
nlvdwpara_lapw:echo "starting parallel nlvdw at `date`" >>$log
opticpara_lapw:echo "->  "starting parallel optic at `date` >>$log

and

susi:/area51/WIEN2k_23> grep 'done at' *_lapw
dstartpara_lapw:echo "<- " done at `date`>>$log
hfpara_lapw:echo "<-  "done at `date` >>$log
hfpara_lapw:echo "<-  "done at `date` >>$log
irreppara_lapw:echo "<-  "done at `date` >>$log
irreppara_lapw:echo "<-  "done at `date` >>$log
lapw0para_lapw:echo "<- " done at `date`>>$log
lapw1para_lapw:echo "<- " done at `date`>>$log
lapw2para_lapw:echo "<-  "done at `date` >>$log
lapw2para_lapw:echo "<-  "done at `date` >>$log
lapwdmpara_lapw:echo "<-  "done at `date` >>$log
lapwsopara_lapw:echo "<-  "done at `date` >>$log
lapwsopara_lapw:echo "<-  "done at `date` >>$log
nlvdwpara_lapw:echo "<- " done at `date`>>$log
opticpara_lapw:echo "<-  "done at `date` >>$log
opticpara_lapw:echo "<-  "done at `date`
opticpara_lapw:echo "<-  "done at `date` >>$log

where log is set as:

susi:/area51/WIEN2k_23> grep 'set log' *_lapw
checkparam_lapw:set logfile = checkparam.log
dstartpara_lapw:set log = :parallel_dstart
hfpara_lapw:set log = :parallel
init_lapw:set logfile = :log
init_so_lapw:set logfile   = :log
init_w2w_lapw:set logfile = :log
irreppara_lapw:set log = :parallel
kill_w2web_lapw:set logpid="$tmp_dir/w2web.pid.$$"
kill_w2web_lapw:set logpid1=`cat $logpid`
lapw0para_lapw:set log = :parallel_lapw0
lapw1para_lapw:set log = :parallel

.

---

I agree, the :parallel* files are a bit confusing. If you monitor the 
scf cycle, I'd concentrate at case.dayfile, although the dayfile gets 
overwritten by a new scf cycle.


Regards


Am 02.05.2024 um 17:47 schrieb Straus, Daniel B:


Hi,

I’m running ver 23.2 on a SLURM-managed cluster. I’ve been looking at 
the :parallel file for job step timing to optimize performance.


I think there is a minor bug in the output, where it refers to lapw0 
as lapw1.


Here is relevant text from a :parallel file:

starting parallel lapw1 at Wed May  1 16:25:39 CDT 2024

  2 <-  done at Wed May 1 16:25:52 CDT 2024

  3 -

  4 starting parallel lapw1 at Wed May  1 16:26:18 CDT 2024

  5 cypress01-029 cypress01-029 cypress01-029 cypress01-029 
cypress01-029 cypress01-029 cypress01-029 cypress01-029 cyp


    ress01-029(3) 0.019u 0.056s 23:55.17 0.0%   0+0k 0+8io 0pf+0w

  6 cypress01-031 cypress01-031 cypress01-031 cypress01-031 
cypress01-031 cypress01-031 cypress01-031 cypress01-031 cyp


    ress01-031(2) 0.031u 0.053s 40:08.71 0.0%   0+0k 0+16io 0pf+0w

  7 cypress01-032 cypress01-032 cypress01-032 cypress01-032 
cypress01-032 cypress01-032 cypress01-032 cypress01-032 cyp


    ress01-032(2) 0.025u 0.066s 56:58.72 0.0%   0+0k 0+16io 0pf+0w

  8 cypress01-036 cypress01-036 cypress01-036 cypress01-036 
cypress01-036 cypress01-036 cypress01-036 cypress01-036 cyp


    ress01-036(2) 0.032u 0.067s 1:13:47.00 0.0% 0+0k 0+16io 0pf+0w

  9    Summary of lapw1para:

10 cypress01-029 k=3 user=0.019 wallclock=1435.17

11 cypress01-031 k=2 user=0.031 wallclock=2408.71

12 cypress01-032 k=2 user=0.025 wallclock=3418.72

13 cypress01-036 k=2 user=0.032  wallclock=73

14 <-  done at Wed May 1 17:40:07 CDT 2024

15 -

The first “lapw1” should be lapw0.

A separate file called :parallel_lapw0 is also generated. This refers 
to lapw0 correctly.


starting parallel lapw0 at Wed May  1 16:25:52 CDT 2024

2 <-  done at Wed May 1 16:26:18 CDT 2024

3 -

4 starting parallel lapw0 at Wed May  1 18:55:36 CDT 2024

5 <-  done at Wed May 1 18:56:00 CDT 2024

6 -

Perhaps this is as intended, though I got confused by it.

Daniel Straus

Assistant Professor

Department of Chemistry

Tulane University

5088 Percival Stern Hall

6400 

[Wien] Logging output issue for parallel jobs?

2024-05-02 Thread Straus, Daniel B
Hi,

I'm running ver 23.2 on a SLURM-managed cluster. I've been looking at the 
:parallel file for job step timing to optimize performance.

I think there is a minor bug in the output, where it refers to lapw0 as lapw1.

Here is relevant text from a :parallel file:


starting parallel lapw1 at Wed May  1 16:25:39 CDT 2024
  2 <-  done at Wed May 1 16:25:52 CDT 2024
  3 -
  4 starting parallel lapw1 at Wed May  1 16:26:18 CDT 2024
  5  cypress01-029 cypress01-029 cypress01-029 cypress01-029 cypress01-029 
cypress01-029 cypress01-029 cypress01-029 cyp
ress01-029(3) 0.019u 0.056s 23:55.17 0.0%   0+0k 0+8io 0pf+0w
  6  cypress01-031 cypress01-031 cypress01-031 cypress01-031 cypress01-031 
cypress01-031 cypress01-031 cypress01-031 cyp
ress01-031(2) 0.031u 0.053s 40:08.71 0.0%   0+0k 0+16io 0pf+0w
  7  cypress01-032 cypress01-032 cypress01-032 cypress01-032 cypress01-032 
cypress01-032 cypress01-032 cypress01-032 cyp
ress01-032(2) 0.025u 0.066s 56:58.72 0.0%   0+0k 0+16io 0pf+0w
  8  cypress01-036 cypress01-036 cypress01-036 cypress01-036 cypress01-036 
cypress01-036 cypress01-036 cypress01-036 cyp
ress01-036(2) 0.032u 0.067s 1:13:47.00 0.0% 0+0k 0+16io 0pf+0w
  9Summary of lapw1para:
10cypress01-029 k=3 user=0.019  wallclock=1435.17
11cypress01-031 k=2 user=0.031  wallclock=2408.71
12cypress01-032 k=2 user=0.025  wallclock=3418.72
13cypress01-036 k=2 user=0.032  wallclock=73
14 <-  done at Wed May 1 17:40:07 CDT 2024
15 -

The first "lapw1" should be lapw0.

A separate file called :parallel_lapw0 is also generated. This refers to lapw0 
correctly.
starting parallel lapw0 at Wed May  1 16:25:52 CDT 2024
2 <-  done at Wed May 1 16:26:18 CDT 2024
3 -
4 starting parallel lapw0 at Wed May  1 18:55:36 CDT 2024
5 <-  done at Wed May 1 18:56:00 CDT 2024
6 -

Perhaps this is as intended, though I got confused by it.


Daniel Straus
Assistant Professor
Department of Chemistry
Tulane University
5088 Percival Stern Hall
6400 Freret Street
New Orleans, LA 70118
(504) 862-3585
http://straus.tulane.edu/


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html