Ok, now it is clear that there is no additional error messages.
Unfortunately, I cannot tell specifically what went wrong from those
error messages.
You might try replacing mpirun with mpirun_rsh. As you can see at
Thanks for the reply. Please see below.
As I asked before, did you give us all the error information in the
case.dayfile and from standard output? It is not entirely clear in your
previous posts, but it looks to me that you might have only provided
information from the case.dayfile and the
See below for my comments.
Thanks for all the information and suggestions.
I have tried to change -lmkl_blacs_intelmpi_lp64 to -lmkl_blacs_lp64
and recompile. However, I got the following error message in the
screen output
LAPW0 END
[cli_14]: [cli_15]: [cli_6]: aborting job:
Fatal error
Thanks for all the information and suggestions.
I have tried to change -lmkl_blacs_intelmpi_lp64 to -lmkl_blacs_lp64 and
recompile. However, I got the following error message in the screen output
LAPW0 END
[cli_14]: [cli_15]: [cli_6]: aborting job:
Fatal error in PMPI_Comm_size:
Invalid
-lmkl_blacs_intelmpi_lp64?
Thanks a lot for all the suggestions.
Regards,
Fermin
-Original Message-
From: wien-boun...@zeus.theochem.tuwien.ac.at [mailto:
wien-boun...@zeus.theochem.tuwien.ac.at] On Behalf Of Peter Blaha
To: A Mailing list for WIEN2k users
Subject: Re: [Wien] Error in mpi+k point
No !!! (Can use it only if you are using intelmpi).
I'm not sure (and it may even depend on the compiler version) which
mpi-versions are supported by intel. But maybe try the simplest version
-lmkl_blacs_lp64
Am 04.05.2015 um 08:03 schrieb lung Fermin:
Is it ok to use
On page 131 in the User's Guide for Intel mkl 11.1 for Linux [
https://software.intel.com/en-us/mkl_11.1_ug_lin_pdf ], it has:
libmkl_blacs_intelmpi_lp64.so = LP64 version of BLACS routines for
Intel MPI and MPICH2
So -lmkl_blacs_intelmpi_lp64 might also work with MPICH2.
From the compile
To reiterate what everyone else said, you should change your blacs, the
intelmpi version only works if you are using impi (I am 98% certain).
Normally this leads to a wierd but understandable error when lapw0/lapw1
initiate the mpi routines, not sure why this did not show up in your case.
On
I have tried to set MPI_REMOTE=0 and used 32 cores (on 2 nodes) for
distributing the mpi job. However, the problem still persist... but the
error message looks different this time:
$ cat *.error
Error in LAPW2
** testerror: Error in Parallel LAPW2
and the output on screen:
Warning: no access to
It seems as if lapw0_mpi runs properly ?? Please check if you have
NEW (check date with ls -als)!! valid case.vsp/vns files, which can be used in
eg. a sequential lapw1 step.
This suggests that mpi and fftw are ok.
The problems seem to start in lapw1_mpi, and this program requires in addition
Thanks for your comment, Prof. Marks.
Each node on the cluster has 32GB memory and each core (16 in total) on the
node is limited to 2GB of memory usage. For the current system, I used
RKMAX=6, and the smallest RMT=2.25.
I have tested the calculation with single k point and mpi on 16 cores
As an addendum, the calculation may be too big for a single node. How much
memory does the node have, what is the RKMAX, the smallest RMT unit cell
size? Maybe use in your machines file
1:z1-2:16 z1-13:16
lapw0: z1-2:16 z1-13:16
granularity:1
extrafine:1
Check the size using
x law1 -c -p
Try setting
setenv MPI_REMOTE 0
in parallel options.
Am 29.04.2015 um 09:44 schrieb lung Fermin:
Thanks for your comment, Prof. Marks.
Each node on the cluster has 32GB memory and each core (16 in total) on
the node is limited to 2GB of memory usage. For the current system, I
used RKMAX=6,
You appear to be missing the line
setenv WIEN_MPIRUN=...
This is setup when you run siteconfig, and provides the information on how
mpi is run on your system.
N.B., did you setup and compile the mpi code?
___
Professor Laurence Marks
Department of Materials Science and
Thanks for Prof. Marks' comment.
1. In the previous email, I have missed to copy the line
setenv WIEN_MPIRUN /usr/local/mvapich2-icc/bin/mpirun -np _NP_ -hostfile
_HOSTS_ _EXEC_
It was in the parallel_option. Sorry about that.
2. I have checked that the running program was lapw1c_mpi. Besides,
Unfortunately it is hard to know what is going on. A google search on
Error while reading PMI socket. indicates that the message you have means
it did not work, and is not specific. Some suggestions:
a) Try mpiexec (slightly different arguments). You just edit
parallel_options.
Dear Wien2k community,
I am trying to perform calculation on a system of ~100 in-equivalent atoms
using mpi+k point parallelization on a cluster. Everything goes fine when
the program was run on a single node. However, if I perform the calculation
across different nodes, the follow error occurs.
17 matches
Mail list logo