[Wien] delays in parallel work

2020-10-06 Thread Lyudmila Dobysheva

Dear all,

I have started working at supercomputer and sometimes I see some delays 
during execution. They occur randomly, more frequently during lapw0, but 
in other programs also (extra 7-20 min). Administrators say that there 
can be sometimes problems with the net's speed.
But I cannot understand: now I take only one node with 16 processors. 
I'd say that if I send the task to one node the problems of the net 
between computers should not affect till the whole task ends.

Maybe I have wrongly set scratch variable?
In .bashrc:
export SCRATCH=./

During execution I see how the cycle is fulfilled, that is, after lapw0 
I see its output files. This means that after lapw0 the calculating node 
sends to the governing computer the files, and, maybe, here it waits? Is 
this behavior correct? I expected that I should not see the intermediate 
stages, till the work ends.
And the very programs lapw0, lapw1, lapw2, lcore, mixer - maybe they are 
reloaded to the calculating computer every cycle anew?


Best regards
Lyudmila Dobysheva

some details WIEN2k_19.2
ifort 64 19.1.0.166
---
parallel_options:
setenv TASKSET "srun "
if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0
setenv WIEN_GRANULARITY 1
setenv DELAY 0.1
setenv SLEEPY 1
if ( ! $?WIEN_MPIRUN) setenv WIEN_MPIRUN "srun -K -N_nodes_ -n_NP_ 
-r_offset_ _PINNING_ _EXEC_"

if ( ! $?CORES_PER_NODE) setenv CORES_PER_NODE  16
--
WIEN2k_OPTIONS:
current:FOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-traceback -assume buffered_io -I$(

MKLROOT)/include
current:FPOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-traceback -assume buffered_io -I$

(MKLROOT)/include
current:OMP_SWITCH:-qopenmp
current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -lpthread 
-lm -ldl -liomp5

current:DPARALLEL:'-DParallel'
current:R_LIBS:-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
current:FFTWROOT:/home/u/.local/
current:FFTW_VERSION:FFTW3
current:FFTW_LIB:lib
current:FFTW_LIBNAME:fftw3
current:LIBXCROOT:
current:LIBXC_FORTRAN:
current:LIBXC_LIBNAME:
current:LIBXC_LIBDNAME:
current:SCALAPACKROOT:$(MKLROOT)/lib/
current:SCALAPACK_LIBNAME:mkl_scalapack_lp64
current:BLACSROOT:$(MKLROOT)/lib/
current:BLACS_LIBNAME:mkl_blacs_intelmpi_lp64
current:ELPAROOT:
current:ELPA_VERSION:
current:ELPA_LIB:
current:ELPA_LIBNAME:
current:MPIRUN:srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_
current:CORES_PER_NODE:16
current:MKL_TARGET_ARCH:intel64

--
http://ftiudm.ru/content/view/25/103/lang,english/
Physics-Techn.Institute,
Udmurt Federal Research Center, Ural Br. of Rus.Ac.Sci.
426000 Izhevsk Kirov str. 132
Russia
---
Tel. +7 (34I2)43-24-59 (office), +7 (9I2)OI9-795O (home)
Skype: lyuka18 (office), lyuka17 (home)
E-mail: lyuk...@mail.ru (office), lyuk...@gmail.com (home)
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] delays in parallel work

2020-10-06 Thread Peter Blaha
Compare the times in the dayfile with ls -alsrt (the times when the 
corresponding files were created).


You did not say how you run the parallel calculations (.machines file).

Definitely, ask the administrator how to access a local (local to the 
compute node) directory and set the SCRATCH variable to this directory.


If this does not help, you may even copy the whole directory to a local 
dir, change into it, run scf there and at the end copy back all files.


For more info: On many supercomputers you can ssh to a node which is 
allocated to you. when you can do this, do a top and ckeck where delays 
are, 
If you cannot ssh to the node, usually one can ask for an interactive 
session in the queueing system. Then you should have access to the 
allocated node.


It could also be aproblem of "pinning".

As Laurence said: probably only a good sys.admin can help. But the 
problem could be that the system is not setup properly and then the 
sys.admins cannot help you ...



Am 06.10.2020 um 17:09 schrieb Lyudmila Dobysheva:

Dear all,

I have started working at supercomputer and sometimes I see some delays 
during execution. They occur randomly, more frequently during lapw0, but 
in other programs also (extra 7-20 min). Administrators say that there 
can be sometimes problems with the net's speed.
But I cannot understand: now I take only one node with 16 processors. 
I'd say that if I send the task to one node the problems of the net 
between computers should not affect till the whole task ends.

Maybe I have wrongly set scratch variable?
In .bashrc:
export SCRATCH=./

During execution I see how the cycle is fulfilled, that is, after lapw0 
I see its output files. This means that after lapw0 the calculating node 
sends to the governing computer the files, and, maybe, here it waits? Is 
this behavior correct? I expected that I should not see the intermediate 
stages, till the work ends.
And the very programs lapw0, lapw1, lapw2, lcore, mixer - maybe they are 
reloaded to the calculating computer every cycle anew?


Best regards
Lyudmila Dobysheva

some details WIEN2k_19.2
ifort 64 19.1.0.166
---
parallel_options:
setenv TASKSET "srun "
if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0
setenv WIEN_GRANULARITY 1
setenv DELAY 0.1
setenv SLEEPY 1
if ( ! $?WIEN_MPIRUN) setenv WIEN_MPIRUN "srun -K -N_nodes_ -n_NP_ 
-r_offset_ _PINNING_ _EXEC_"

if ( ! $?CORES_PER_NODE) setenv CORES_PER_NODEĀ  16
--
WIEN2k_OPTIONS:
current:FOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-traceback -assume buffered_io -I$(

MKLROOT)/include
current:FPOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-traceback -assume buffered_io -I$

(MKLROOT)/include
current:OMP_SWITCH:-qopenmp
current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -lpthread 
-lm -ldl -liomp5

current:DPARALLEL:'-DParallel'
current:R_LIBS:-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
current:FFTWROOT:/home/u/.local/
current:FFTW_VERSION:FFTW3
current:FFTW_LIB:lib
current:FFTW_LIBNAME:fftw3
current:LIBXCROOT:
current:LIBXC_FORTRAN:
current:LIBXC_LIBNAME:
current:LIBXC_LIBDNAME:
current:SCALAPACKROOT:$(MKLROOT)/lib/
current:SCALAPACK_LIBNAME:mkl_scalapack_lp64
current:BLACSROOT:$(MKLROOT)/lib/
current:BLACS_LIBNAME:mkl_blacs_intelmpi_lp64
current:ELPAROOT:
current:ELPA_VERSION:
current:ELPA_LIB:
current:ELPA_LIBNAME:
current:MPIRUN:srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_
current:CORES_PER_NODE:16
current:MKL_TARGET_ARCH:intel64

--
http://ftiudm.ru/content/view/25/103/lang,english/
Physics-Techn.Institute,
Udmurt Federal Research Center, Ural Br. of Rus.Ac.Sci.
426000 Izhevsk Kirov str. 132
Russia
---
Tel. +7 (34I2)43-24-59 (office), +7 (9I2)OI9-795O (home)
Skype: lyuka18 (office), lyuka17 (home)
E-mail: lyuk...@mail.ru (office), lyuk...@gmail.com (home)
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


--
--
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.atWIEN2k: http://www.wien2k.at
WWW: 
http://www.imc.tuwien.ac.at/tc_blaha- 


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] delays in parallel work

2020-10-06 Thread Laurence Marks
Dear Lyudmila,

This is almost certainly an OS problem, and there is little that you can do
except find a better supercomputer!

It could be an NFS problem, and setting SCRATCH to a local file on each
computer node might then help. Alternatively, while you are supposed to
have all of any given node, they might not be running that way -- a lot
depends upon how srun is configured.

One thing to test is internal mpi (same node) versus cross-node mpi. The
first should always be fast.

Andbuy a sys admin a beer (vodka) and have him/her explain how they
have things configured in more detail.

On Tue, Oct 6, 2020 at 10:14 AM Lyudmila Dobysheva  wrote:

> Dear all,
>
> I have started working at supercomputer and sometimes I see some delays
> during execution. They occur randomly, more frequently during lapw0, but
> in other programs also (extra 7-20 min). Administrators say that there
> can be sometimes problems with the net's speed.
> But I cannot understand: now I take only one node with 16 processors.
> I'd say that if I send the task to one node the problems of the net
> between computers should not affect till the whole task ends.
> Maybe I have wrongly set scratch variable?
> In .bashrc:
> export SCRATCH=./
>
> During execution I see how the cycle is fulfilled, that is, after lapw0
> I see its output files. This means that after lapw0 the calculating node
> sends to the governing computer the files, and, maybe, here it waits? Is
> this behavior correct? I expected that I should not see the intermediate
> stages, till the work ends.
> And the very programs lapw0, lapw1, lapw2, lcore, mixer - maybe they are
> reloaded to the calculating computer every cycle anew?
>
> Best regards
> Lyudmila Dobysheva
>
> some details WIEN2k_19.2
> ifort 64 19.1.0.166
> ---
> parallel_options:
> setenv TASKSET "srun "
> if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
> if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0
> setenv WIEN_GRANULARITY 1
> setenv DELAY 0.1
> setenv SLEEPY 1
> if ( ! $?WIEN_MPIRUN) setenv WIEN_MPIRUN "srun -K -N_nodes_ -n_NP_
> -r_offset_ _PINNING_ _EXEC_"
> if ( ! $?CORES_PER_NODE) setenv CORES_PER_NODE  16
> --
> WIEN2k_OPTIONS:
> current:FOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
> -traceback -assume buffered_io -I$(
> MKLROOT)/include
> current:FPOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
> -traceback -assume buffered_io -I$
> (MKLROOT)/include
> current:OMP_SWITCH:-qopenmp
> current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -lpthread
> -lm -ldl -liomp5
> current:DPARALLEL:'-DParallel'
> current:R_LIBS:-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
> current:FFTWROOT:/home/u/.local/
> current:FFTW_VERSION:FFTW3
> current:FFTW_LIB:lib
> current:FFTW_LIBNAME:fftw3
> current:LIBXCROOT:
> current:LIBXC_FORTRAN:
> current:LIBXC_LIBNAME:
> current:LIBXC_LIBDNAME:
> current:SCALAPACKROOT:$(MKLROOT)/lib/
> current:SCALAPACK_LIBNAME:mkl_scalapack_lp64
> current:BLACSROOT:$(MKLROOT)/lib/
> current:BLACS_LIBNAME:mkl_blacs_intelmpi_lp64
> current:ELPAROOT:
> current:ELPA_VERSION:
> current:ELPA_LIB:
> current:ELPA_LIBNAME:
> current:MPIRUN:srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_
> current:CORES_PER_NODE:16
> current:MKL_TARGET_ARCH:intel64
>
> --
>
> https://urldefense.com/v3/__http://ftiudm.ru/content/view/25/103/lang,english/__;!!Dq0X2DkFhyF93HkjWTBQKhk!DR3lyfE3O6uY7hwNXSGhDD_cUJeZJ30DGB2hyhheIjmw6g37W7S_HNcCObMl3AHsatYthw$
> Physics-Techn.Institute,
> Udmurt Federal Research Center, Ural Br. of Rus.Ac.Sci.
> 426000 Izhevsk Kirov str. 132
> Russia
> ---
> Tel. +7 (34I2)43-24-59 (office), +7 (9I2)OI9-795O (home)
> Skype: lyuka18 (office), lyuka17 (home)
> E-mail: lyuk...@mail.ru (office), lyuk...@gmail.com (home)
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
>
> https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!DR3lyfE3O6uY7hwNXSGhDD_cUJeZJ30DGB2hyhheIjmw6g37W7S_HNcCObMl3AFZ-tY25Q$
> SEARCH the MAILING-LIST at:
> https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!DR3lyfE3O6uY7hwNXSGhDD_cUJeZJ30DGB2hyhheIjmw6g37W7S_HNcCObMl3AE759vujg$
>


-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
Corrosion in 4D: www.numis.northwestern.edu/MURI
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] View structure in a remote machine

2020-10-06 Thread Rubel, Oleg
Hi Pablo,

of course it is up to you to select the software, but Zoom is great for video 
conferencing and is probably not an ideal solution for remote control. Here is 
a wiki quote: "Virtual Network Computing is a graphical desktop-sharing system 
that uses the Remote Frame Buffer protocol to remotely control another 
computer." There should be a plenty of info in internet on how to set it up. 
VNC is easier on network and gives a "smoother" control experience when it 
comes to xcrysden rendering. (At least this was my experience.)

Good luck
Oleg


From: Wien  on behalf of delamora 

Sent: Monday, October 5, 2020 21:22
To: A Mailing list for WIEN2k users
Subject: Re: [Wien] View structure in a remote machine

I have used "zoom" and it works in my machine and the other people that are 
connected, but not in the remote computer.
Maybe there is a trick that I do not see.


Oleg, sorry for my late reply.
When I am working directly on one computer and I would like to see the crystal 
structure then I push the "view structure" button and the structure appears in 
the screen, but when I am in a remote computer working the WIEN2k and I push 
this button then the crystal structure is displayed in the other computer and 
non in mine.
Does VNC corrects or solves this problem?
Cheers

Pablo

De: Wien  en nombre de Rubel, Oleg 

Enviado: viernes, 2 de octubre de 2020 02:59 p. m.
Para: A Mailing list for WIEN2k users 
Asunto: Re: [Wien] View structure in a remote machine

I would suggest trying VNC (https://www.youtube.com/watch?v=EWkrqqnOgdo). It 
has a number of advantages over X11 display forwarding.

Oleg


From: Wien  on behalf of delamora 

Sent: Friday, October 2, 2020 15:39
To: A Mailing list for WIEN2k users
Subject: [Wien] View structure in a remote machine

Dear WIEN2k community,
I am doing calculations in a remote machine ("remote.machine")
I can connect with
remote.machine:7890
and with this I can do calculations in than machine
but when I try to see the crystal structure, xcrysden, the image appears in 
"remote.machine" so I cannot see it in my machine at home
Is there a way to visualize the crystal structure in my machine at home?
The same seems to happen with Electron Density Plots.

Cheers

Pablo
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html