Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-05-06 Thread Gavin Abo
Ok, now it is clear that there is no additional error messages. Unfortunately, I cannot tell specifically what went wrong from those error messages. You might try replacing mpirun with mpirun_rsh. As you can see at

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-05-06 Thread lung Fermin
Thanks for the reply. Please see below. As I asked before, did you give us all the error information in the case.dayfile and from standard output? It is not entirely clear in your previous posts, but it looks to me that you might have only provided information from the case.dayfile and the

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-05-06 Thread Gavin Abo
See below for my comments. Thanks for all the information and suggestions. I have tried to change -lmkl_blacs_intelmpi_lp64 to -lmkl_blacs_lp64 and recompile. However, I got the following error message in the screen output LAPW0 END [cli_14]: [cli_15]: [cli_6]: aborting job: Fatal error

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-05-05 Thread lung Fermin
Thanks for all the information and suggestions. I have tried to change -lmkl_blacs_intelmpi_lp64 to -lmkl_blacs_lp64 and recompile. However, I got the following error message in the screen output LAPW0 END [cli_14]: [cli_15]: [cli_6]: aborting job: Fatal error in PMPI_Comm_size: Invalid

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-05-04 Thread lung Fermin
-lmkl_blacs_intelmpi_lp64? Thanks a lot for all the suggestions. Regards, Fermin -Original Message- From: wien-boun...@zeus.theochem.tuwien.ac.at [mailto: wien-boun...@zeus.theochem.tuwien.ac.at] On Behalf Of Peter Blaha To: A Mailing list for WIEN2k users Subject: Re: [Wien] Error in mpi+k point

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-05-04 Thread Peter Blaha
No !!! (Can use it only if you are using intelmpi). I'm not sure (and it may even depend on the compiler version) which mpi-versions are supported by intel. But maybe try the simplest version -lmkl_blacs_lp64 Am 04.05.2015 um 08:03 schrieb lung Fermin: Is it ok to use

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-05-04 Thread Gavin Abo
On page 131 in the User's Guide for Intel mkl 11.1 for Linux [ https://software.intel.com/en-us/mkl_11.1_ug_lin_pdf ], it has: libmkl_blacs_intelmpi_lp64.so = LP64 version of BLACS routines for Intel MPI and MPICH2 So -lmkl_blacs_intelmpi_lp64 might also work with MPICH2. From the compile

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-05-04 Thread Laurence Marks
To reiterate what everyone else said, you should change your blacs, the intelmpi version only works if you are using impi (I am 98% certain). Normally this leads to a wierd but understandable error when lapw0/lapw1 initiate the mpi routines, not sure why this did not show up in your case. On

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-05-03 Thread lung Fermin
I have tried to set MPI_REMOTE=0 and used 32 cores (on 2 nodes) for distributing the mpi job. However, the problem still persist... but the error message looks different this time: $ cat *.error Error in LAPW2 ** testerror: Error in Parallel LAPW2 and the output on screen: Warning: no access to

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-05-03 Thread Peter Blaha
It seems as if lapw0_mpi runs properly ?? Please check if you have NEW (check date with ls -als)!! valid case.vsp/vns files, which can be used in eg. a sequential lapw1 step. This suggests that mpi and fftw are ok. The problems seem to start in lapw1_mpi, and this program requires in addition

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-04-29 Thread lung Fermin
Thanks for your comment, Prof. Marks. Each node on the cluster has 32GB memory and each core (16 in total) on the node is limited to 2GB of memory usage. For the current system, I used RKMAX=6, and the smallest RMT=2.25. I have tested the calculation with single k point and mpi on 16 cores

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-04-29 Thread Laurence Marks
As an addendum, the calculation may be too big for a single node. How much memory does the node have, what is the RKMAX, the smallest RMT unit cell size? Maybe use in your machines file 1:z1-2:16 z1-13:16 lapw0: z1-2:16 z1-13:16 granularity:1 extrafine:1 Check the size using x law1 -c -p

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-04-29 Thread Peter Blaha
Try setting setenv MPI_REMOTE 0 in parallel options. Am 29.04.2015 um 09:44 schrieb lung Fermin: Thanks for your comment, Prof. Marks. Each node on the cluster has 32GB memory and each core (16 in total) on the node is limited to 2GB of memory usage. For the current system, I used RKMAX=6,

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-04-28 Thread Laurence Marks
You appear to be missing the line setenv WIEN_MPIRUN=... This is setup when you run siteconfig, and provides the information on how mpi is run on your system. N.B., did you setup and compile the mpi code? ___ Professor Laurence Marks Department of Materials Science and

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-04-28 Thread lung Fermin
Thanks for Prof. Marks' comment. 1. In the previous email, I have missed to copy the line setenv WIEN_MPIRUN /usr/local/mvapich2-icc/bin/mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_ It was in the parallel_option. Sorry about that. 2. I have checked that the running program was lapw1c_mpi. Besides,

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

2015-04-28 Thread Laurence Marks
Unfortunately it is hard to know what is going on. A google search on Error while reading PMI socket. indicates that the message you have means it did not work, and is not specific. Some suggestions: a) Try mpiexec (slightly different arguments). You just edit parallel_options.

[Wien] Error in mpi+k point parallelization across multiple nodes

2015-04-28 Thread lung Fermin
Dear Wien2k community, I am trying to perform calculation on a system of ~100 in-equivalent atoms using mpi+k point parallelization on a cluster. Everything goes fine when the program was run on a single node. However, if I perform the calculation across different nodes, the follow error occurs.