Dear Prof. Peter Blaha and WIEN2k users, Now I have loaded the openmpi/4.1.0 and compiled Wine2k. The admin told me that I can use your script in >http://www.wien2k.at/reg_user/faq/slurm.job . I added this lines to it too:
module load openmpi/4.1.0_gcc620 module load ifort module load mkl but this error happened “bash: mpirun: command not found”. In an interactive mode “x lapw0 –p” and “x lapw2 –p” are executed MPI but “x lapw1 –p” is stoped with following error: w2k_dispatch_signal(): received: Segmentation fault -------------------------------------------------------------------------- I noticed that the FFTW3 and OpenMPI installed on the cluster are both compiled by gfortan. But I have compiled WIEN2k by intel ifort. I am not sure whether the problem originates from this inconsistency between gfortan and ifort. I have checked that lapw1 has compiled correctly. Sincerely yours, Leila On Fri, Apr 23, 2021 at 7:26 PM Peter Blaha <[email protected]> wrote: > Recompile with LI, since mpirun is supported (after loading the proper > mpi). > > PS: Ask them if -np and -machinefile is still possible to use. Otherwise > you cannot mix k-parallel and mpi parallel and for sure, for smaller > cases it is a severe limitation to have only ONE mpi job with many > k-points, small matrix size and many mpi cores. > > Am 23.04.2021 um 16:04 schrieb leila mollabashi: > > Dear Prof. Peter Blaha and WIEN2k users, > > > > Thank you for your assistances. > > > > Here it is the admin reply: > > > > * mpirun/mpiexec command is supported after loadin propper module ( I > > suggest openmpi/4.1.0 with gcc 6.2.0 or icc ) > > * you have to describe needed resources (I suggest : --nodes and > > --ntasks-per-node , please use "whole node" , so ntasks-pper-node= > > 28 or 32 or 48 , depending of partition) > > * Yes, our cluster have "tight integration with mpi" but the > > other-way-arround : our MPI libraries are compiled with SLURM > > support, so when you describe resources at the beginning of batch > > script, you do not have to use "-np" and "-machinefile" options for > > mpirun/mpiexec > > > > * this error message " btl_openib_component.c:1699:init_one_device" is > > caused by "old" mpi library, so please recompile your application > > (WIEN2k) using openmpi/4.1.0_icc19 > > > > Now should I compile WIEN2k with SL or LI? > > > > Sincerely yours, > > > > Leila Mollabashi > > > > > > On Wed, Apr 14, 2021 at 10:34 AM Peter Blaha > > <[email protected] <mailto:[email protected]>> > wrote: > > > > It cannot initialize an mpi job, because it is missing the interface > > software. > > > > You need to ask the computing center / system administrators how one > > executes a mpi job on this computer. > > > > It could be, that "mpirun" is not supported on this machine. You may > > try > > a wien2k installation with system "LS" in siteconfig. This will > > configure the parallel environment/commands using "slurm" commands > like > > srun -K -N_nodes_ -n_NP_ ..., replacing mpirun. > > We used it once on our hpc machine, since it was recommended by the > > computing center people. However, it turned out that the standard > > mpirun > > installation was more stable because the "slurm controller" died too > > often leading to many random crashes. Anyway, if your system has > > what is > > called "tight integration of mpi", it might be necessary. > > > > Am 13.04.2021 um 21:47 schrieb leila mollabashi: > > > Dear Prof. Peter Blaha and WIEN2k users, > > > > > > Then by run x lapw1 –p: > > > > > > starting parallel lapw1 at Tue Apr 13 21:04:15 CEST 2021 > > > > > > -> starting parallel LAPW1 jobs at Tue Apr 13 21:04:15 CEST 2021 > > > > > > running LAPW1 in parallel mode (using .machines) > > > > > > 2 number_of_parallel_jobs > > > > > > [1] 14530 > > > > > > [e0467:14538] mca_base_component_repository_open: unable to open > > > mca_btl_uct: libucp.so.0: cannot open shared object file: No such > > file > > > or directory (ignored) > > > > > > WARNING: There was an error initializing an OpenFabrics device. > > > > > > Local host: e0467 > > > > > > Local device: mlx4_0 > > > > > > MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD > > > > > > with errorcode 0. > > > > > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI > processes. > > > > > > You may or may not see output from other processes, depending on > > > > > > exactly when Open MPI kills them. > > > > > > > > > -------------------------------------------------------------------------- > > > > > > [e0467:14567] 1 more process has sent help message > > > help-mpi-btl-openib.txt / error in device init > > > > > > [e0467:14567] 1 more process has sent help message > > > help-mpi-btl-openib.txt / error in device init > > > > > > [e0467:14567] Set MCA parameter "orte_base_help_aggregate" to 0 > > to see > > > all help / error messages > > > > > > [warn] Epoll MOD(1) on fd 27 failed. Old events were 6; read > > change was > > > 0 (none); write change was 2 (del): Bad file descriptor > > > > > >>Somewhere there should be some documentation how one runs an mpi > > job on > > > your system. > > > > > > Only I found this: > > > > > > Before ordering a task, it should be encapsulated in an > appropriate > > > script understandable for the queue system, e.g .: > > > > > > /home/users/user/submit_script.sl <http://submit_script.sl> > > <http://submit_script.sl <http://submit_script.sl>> > > > > > > Sample SLURM script: > > > > > > #! / bin / bash -l > > > > > > #SBATCH -N 1 > > > > > > #SBATCH --mem 5000 > > > > > > #SBATCH --time = 20:00:00 > > > > > > /sciezka/do/pliku/binarnego/plik_binarny.in > > <http://plik_binarny.in> <http://plik_binarny.in > > <http://plik_binarny.in>>> > > > /sciezka/do/pliku/wyjsciowego.out > > > > > > To order a task to a specific queue, use the #SBATCH -p > > parameter, e.g. > > > > > > #! / bin / bash -l > > > > > > #SBATCH -N 1 > > > > > > #SBATCH --mem 5000 > > > > > > #SBATCH --time = 20:00:00 > > > > > > #SBATCH -p standard > > > > > > /sciezka/do/pliku/binarnego/plik_binarny.in > > <http://plik_binarny.in> <http://plik_binarny.in > > <http://plik_binarny.in>>> > > > /siezka/do/pliku/wyjsciowego.out > > > > > > The task must then be ordered using the *sbatch* command > > > > > > sbatch /home/users/user/submit_script.sl > > <http://submit_script.sl> <http://submit_script.sl > > <http://submit_script.sl>> > > > > > > *Ordering interactive tasks*** > > > > > > > > > Interactive tasks can be divided into two groups: > > > > > > ·interactive task (working in text mode) > > > > > > ·interactive task > > > > > > *Interactive task (working in text mode)*** > > > > > > > > > Ordering interactive tasks is very simple and in the simplest > > case it > > > comes down to issuing the command below. > > > > > > srun --pty / bin / bash > > > > > > Sincerely yours, > > > > > > Leila Mollabashi > > > > > > > > > On Wed, Apr 14, 2021 at 12:03 AM leila mollabashi > > > <[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> > > wrote: > > > > > > Dear Prof. Peter Blaha and WIEN2k users, > > > > > > Thank you for your assistances. > > > > > > > At least now the error: "lapw0 not found" is gone. Do you > > > understand why ?? > > > > > > Yes, I think that because now the path is clearly known. > > > > > > >How many slots do you get by this srun command ? > > > > > > Usually I went to node with 28 CPUs. > > > > > > >Is this the node with the name e0591 ??? > > > > > > Yes, it is. > > > > > > >Of course the .machines file must be consistent (dynamically > > adapted) > > > > > > with the actual nodename. > > > > > > Yes, to do this I use my script. > > > > > > >When I use “srun --pty -n 8 /bin/bash” that goes to the > > node with 8 free > > > cores, and run x lapw0 –p then this happens: > > > > > > starting parallel lapw0 at Tue Apr 13 20:50:49 CEST 2021 > > > > > > -------- .machine0 : 4 processors > > > > > > [1] 12852 > > > > > > [e0467:12859] mca_base_component_repository_open: unable to > open > > > mca_btl_uct: libucp.so.0: cannot open shared object file: No > such > > > file or directory (ignored) > > > > > > > [e0467][[56319,1],1][btl_openib_component.c:1699:init_one_device] > > > error obtaining device attributes for mlx4_0 errno says > > Protocol not > > > supported > > > > > > [e0467:12859] mca_base_component_repository_open: unable to > open > > > mca_pml_ucx: libucp.so.0: cannot open shared object file: No > such > > > file or directory (ignored) > > > > > > LAPW0 END > > > > > > [1] Done mpirun -np 4 -machinefile > > > .machine0 /home/users/mollabashi/v19.2/lapw0_mpi lapw0.def >> > > .time00 > > > > > > Sincerely yours, > > > > > > Leila Mollabashi > > > > > > > > > _______________________________________________ > > > Wien mailing list > > > [email protected] > > <mailto:[email protected]> > > > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien > > <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien> > > > SEARCH the MAILING-LIST at: > > > http://www.mail-archive.com/[email protected]/index.html > > < > http://www.mail-archive.com/[email protected]/index.html> > > > > > > > -- > > > -------------------------------------------------------------------------- > > Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna > > Phone: +43-1-58801-165300 FAX: +43-1-58801-165982 > > Email: [email protected] > > <mailto:[email protected]> WIEN2k: http://www.wien2k.at > > <http://www.wien2k.at> > > WWW: http://www.imc.tuwien.ac.at <http://www.imc.tuwien.ac.at> > > > ------------------------------------------------------------------------- > > _______________________________________________ > > Wien mailing list > > [email protected] <mailto: > [email protected]> > > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien > > <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien> > > SEARCH the MAILING-LIST at: > > > http://www.mail-archive.com/[email protected]/index.html > > < > http://www.mail-archive.com/[email protected]/index.html> > > > > > > _______________________________________________ > > Wien mailing list > > [email protected] > > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien > > SEARCH the MAILING-LIST at: > http://www.mail-archive.com/[email protected]/index.html > > > > -- > -------------------------------------------------------------------------- > Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna > Phone: +43-1-58801-165300 FAX: +43-1-58801-165982 > Email: [email protected] WIEN2k: http://www.wien2k.at > WWW: http://www.imc.tuwien.ac.at > ------------------------------------------------------------------------- > _______________________________________________ > Wien mailing list > [email protected] > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien > SEARCH the MAILING-LIST at: > http://www.mail-archive.com/[email protected]/index.html >
_______________________________________________ Wien mailing list [email protected] http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/[email protected]/index.html

