Re: [OMPI users] compilation problem with ifort

2014-09-04 Thread Gus Correa

Hi Elie

I really think you need to direct your questions to the EPW and QE lists 
or developers.

This is clearly a problem in their configuration scripts and makefiles,
which they should address.
Otherwise, since it works here it should work for you also,
assuming you follow the same recipe that I did and sent to you.
There are differences is in compiler versions,
maybe also in MPI versions, maybe in the Linux distribution,
but configuration scripts are meant to take care of such differences.
They are also the right forum for questions that are specific to EPW and 
QE, and don't belong to the OpenMPI list.


I don't know how to help you further.

***

As for your question about libraries.

Compilers in principle link the executables to shared libraries,
when they are available.  These belong to the computer where
the compilation happened. Hence, the executable portability to
another computer requires the same shared libraries (version, etc)
located in the same directories as the original computer.
(Well, there are ways to get around this, but this would only
make things make things even more confusing to you.)
That is why the epw.x executable I created here is useless
to you.

My Intel compiler is newer than yours, and it also has the MKL
libraries.
However, the QE configure script found the MKL BLAS and LAPACK
libraries in your case, but it didn't find on my case (it seems
to have found the lapack and blas libraries from Linux).
I.e, my configure ended up with this:

The following libraries have been found:
  BLAS_LIBS= -lblas
  LAPACK_LIBS= -llapack
  FFT_LIBS=
Please check if this is what you expect.


Why, it is a question to be posed to the EPW and QE developers.

Nevertheless, the EPW and QE code seems to come with code for
all the required libraries (blas, lapack, fft) and to build
them.  At least that is what seems to have happened on my computer.
So, I don't think you need any other libraries.

Good luck,
Gus Correa


On 09/04/2014 04:17 PM, Elio Physics wrote:

Dear Gus,

Firstly I really need to thank you for the effort you are doing to help
me and write all these e-mails and in details explaining every step.
Secondly, I did all what you wrote; the EPW is indeed inside the QE
espresso but I still get the same annoying error. I actually deleted all
the tar files and the files themselves and started afresh...

However I still did not tackle the LIBRARIES ISSUE..I did not quite
understand what you said about libraries..How do I know the path of the
openmpi libraries...Sorry I am really "dumb" in Fortran...Can you just
explain ONLY that part please in more details.

Another thing when configure was successful, at the end there were those
lines:

"The following libraries have been found:
   BLAS_LIBS=-L/opt/intel/mkl/10.0.011/lib/em64t -lmkl_em64t
   LAPACK_LIBS= -L/opt/intel/mkl/10.0.011/lib/em64t -lmkl_em64t
   FFT_LIBS=
Please check if this is what you expect.

If any libraries are missing, you may specify a list of directories
to search and retry, as follows:
   ./configure LIBDIRS="list of directories, separated by spaces" "

Do I need other libraries?

Thanks a lot for your efforts

ELIO MOUJAES

 > Date: Thu, 4 Sep 2014 12:48:44 -0400
 > From: g...@ldeo.columbia.edu
 > To: us...@open-mpi.org
 > Subject: Re: [OMPI users] compilation problem with ifort
 >
 > Hi Elie
 >
 > The executable generated in my computer will be useless to you,
 > because these days most if not all libraries linked to an executable are
 > dynamic/shared libraries.
 > You won't have the same in your computer, or the equivalent will be
 > located in different places, may be from different versions, etc.
 > (E.g. your Intel compiler libraries will be from a different version,
 > in a different location, and likewise for OpenMPI libraries etc.)
 > Take any executable that you may have in your computer and do "ldd
 > exectuable_name" to see the list of shared libraries.
 >
 > The error you reported suggests a misconfiguration of Makefiles,
 > or better, a mispositioning of directories.
 >
 > **
 >
 > First thing I would try is to start fresh.
 > Delete or move the old directory trees,
 > download everything again on blank directories,
 > and do the recipe all over again.
 > Leftovers of previous compilations are often a hurdle,
 > so you do yourself a favor by starting over from scratch.
 >
 > **
 > Second *really important* item to check:
 >
 > The top directories of QE and EPW *must* follow this hierarchy:
 >
 > espresso-4.0.3
 > |-- EPW-3.0.0
 >
 > Is this what you have?
 > The EPW web site just hints this in their recipe step 3.
 > The Makefiles will NOT work if this directory hierarchy is incorrect.
 >
 > The error you reported in your first email *suggests* that the Makefiles
 > in the EPW tarball are not finding the Makefiles in the QE tarball,
 > which indicates that the the directories may not have a correct relative
 > location.
 >
 > I.e. the EPW top directory must be right under the QE top directory.
 >
 > **
 >

Re: [OMPI users] compilation problem with ifort

2014-09-04 Thread Elio Physics
Something else I have realized. Within the make.sys file of espresso-4.0.3, I 
have got:
IFLAGS = -I../includeMODFLAGS   = -I./  -I../Modules  -I../iotk/src 
\ -I../PW  -I../PH
Which are lacking other stuff such as 
LIBOBJS = ../../flib/ptools.a ../../flib/flib.a \
> ../../clib/clib.a ../../iotk/src/libiotk.a
> W90LIB = ../../W90/libwannier.a
Do I need to insert these before doing "make"
Regards
Elie

From: elio-phys...@live.com
To: us...@open-mpi.org
List-Post: users@lists.open-mpi.org
Date: Thu, 4 Sep 2014 23:17:10 +0300
Subject: Re: [OMPI users] compilation problem with ifort




Dear Gus,
Firstly I really need to thank you for the effort you are doing to help me and 
write all these e-mails and in details explaining every step.Secondly, I did 
all what you wrote; the EPW is indeed inside the QE espresso but I still get 
the same annoying error. I actually deleted all the tar files and the files 
themselves and started afresh...
However I still did not tackle the LIBRARIES ISSUE..I did not quite understand 
what you said about libraries..How do I know the path of the openmpi 
libraries...Sorry I am really "dumb" in Fortran...Can you just explain ONLY 
that part please in more details.
Another thing when configure was successful, at the end there were those lines:
"The following libraries have been found:  
BLAS_LIBS=-L/opt/intel/mkl/10.0.011/lib/em64t -lmkl_em64t  LAPACK_LIBS= 
-L/opt/intel/mkl/10.0.011/lib/em64t -lmkl_em64t  FFT_LIBS=Please check if this 
is what you expect.
If any libraries are missing, you may specify a list of directoriesto search 
and retry, as follows:  ./configure LIBDIRS="list of directories, separated by 
spaces" "
Do I need other libraries?
Thanks a lot for your efforts
ELIO MOUJAES
> Date: Thu, 4 Sep 2014 12:48:44 -0400
> From: g...@ldeo.columbia.edu
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] compilation problem with ifort
> 
> Hi Elie
> 
> The executable generated in my computer will be useless to you,
> because these days most if not all libraries linked to an executable are
> dynamic/shared libraries.
> You won't have the same in your computer, or the equivalent will be
> located in different places, may be from different versions, etc.
> (E.g. your Intel compiler libraries will be from a different version,
> in a different location, and likewise for OpenMPI libraries etc.)
> Take any executable that you may have in your computer and do "ldd 
> exectuable_name" to see the list of shared libraries.
> 
> The error you reported suggests a misconfiguration of Makefiles,
> or better, a mispositioning of directories.
> 
> **
> 
> First thing I would try is to start fresh.
> Delete or move the old directory trees,
> download everything again on blank directories,
> and do the recipe all over again.
> Leftovers of previous compilations are often a hurdle,
> so you do yourself a favor by starting over from scratch.
> 
> **
> Second *really important* item to check:
> 
> The top directories of QE and EPW *must* follow this hierarchy:
> 
> espresso-4.0.3
> |-- EPW-3.0.0
> 
> Is this what you have?
> The EPW web site just hints this in their recipe step 3.
> The Makefiles will NOT work if this directory hierarchy is incorrect.
> 
> The error you reported in your first email *suggests* that the Makefiles
> in the EPW tarball are not finding the Makefiles in the QE tarball,
> which indicates that the the directories may not have a correct relative 
> location.
> 
> I.e. the EPW top directory must be right under the QE top directory.
> 
> **
> 
> Third thing, is that you have to follow the recipe strictly (and on
> the EPW web site there seems to be typos and omissions):
> 
> 1) untar the QE tarball:
> 
> tar -zxf espresso-4.0.3.tar.gz
> 
> 2) move the EPW tarball to the QE top directory produced by step 1 
> above, something like this:
> 
> mv EPW-3.0.0.tar.gz espresso-4.0.3
> 
> 3) untar the EPW tarball you copied/moved in step 2 above,
> something like this:
> 
> cd espresso-4.0.3
> tar -zxf  EPW-3.0.0.tar.gz
> 
> 4) Set up your OpenMPI environment (assuming you are using OpenMPI
> and that it is not installed in a standard location such as /usr/local):
> 
> 
> [bash/sh]
> export PATH=/your/openmpi/bin:$PATH
> export LD_LIBRARY_PATH=/your/openmpi/lib:$LD_LIBRARY_PATH
> 
> [tcsh/csh]
> setenv PATH /your/openmpi/bin:$PATH
> setenv LD_LIBRARY_PATH /your/openmpi/lib:$LD_LIBRARY_PATH
> 
> 5) configure espresso-4.0.3, i.e, assuming you already are in the
> espresso-4.0.3, do:
> 
> ./configure CC=icc F77=ifort
> 
> (assuming you are using Intel compilers, and that you compiled OpenMPI 
> with them, if you did
> not, say, if you used gcc and gfortran, use CC=gcc FC=gfortran instead)
> 
> 6) Run "make" on the top EPW directory:
> 
> cd EPW-3.0.0
> make
> 
> When you configure QE it doesn't compile anything.
> It just generates/sets up a bunch of Makefiles in the QE directory tree.
> 
> When you do "make" on the EPW-3.0.0 directory the top Makefile just
> s

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-09-04 Thread Matt Thompson
Jeff,

Some limited testing shows that that srun does seem to work where the
quote-y one did not. I'm working with our admins now to make sure it let's
the prolog work as expected as well.

I'll keep you informed,
Matt


On Thu, Sep 4, 2014 at 1:26 PM, Jeff Squyres (jsquyres) 
wrote:

> Try this (typed in editor, not tested!):
>
> #! /usr/bin/perl -w
>
> use strict;
> use warnings;
>
> use FindBin;
>
> # Specify the path to the prolog.
> my $prolog = '--task-prolog=/gpfsm//.task.prolog';
>
> # Build the path to the SLURM srun command.
> my $srun_slurm = "${FindBin::Bin}/srun.slurm";
>
> # Add the prolog option, but abort if the user specifies a prolog option.
> my @command = split(/ /, "$srun_slurm $prolog");
> foreach (@ARGV) {
> if (/^--task-prolog=/) {
> print("The --task-prolog option is unsupported at . Please " .
>   "contact the  for assistance.\n");
> exit(1);
> } else {
> push(@command, $_);
> }
> }
> system(@command);
>
>
>
> On Sep 4, 2014, at 1:21 PM, Matt Thompson  wrote:
>
> > Jeff,
> >
> > Here is the script (with a bit of munging for safety's sake):
> >
> > #! /usr/bin/perl -w
> >
> > use strict;
> > use warnings;
> >
> > use FindBin;
> >
> > # Specify the path to the prolog.
> > my $prolog = '--task-prolog=/gpfsm//.task.prolog';
> >
> > # Build the path to the SLURM srun command.
> > my $srun_slurm = "${FindBin::Bin}/srun.slurm";
> >
> > # Add the prolog option, but abort if the user specifies a prolog option.
> > my $command = "$srun_slurm $prolog";
> > foreach (@ARGV) {
> > if (/^--task-prolog=/) {
> > print("The --task-prolog option is unsupported at . Please "
> .
> >   "contact the  for assistance.\n");
> > exit(1);
> > } else {
> > $command .= " $_";
> > }
> > }
> > system($command);
> >
> > Ideas?
> >
> >
> >
> > On Thu, Sep 4, 2014 at 10:51 AM, Ralph Castain  wrote:
> > Still begs the bigger question, though, as others have used script
> wrappers before - and I'm not sure we (OMPI) want to be in the business of
> dictating the scripting language they can use. :-)
> >
> > Jeff and I will argue that one out
> >
> >
> > On Sep 4, 2014, at 7:38 AM, Jeff Squyres (jsquyres) 
> wrote:
> >
> >> Ah, if it's perl, it might be easy. It might just be the difference
> between system("...string...") and system(@argv).
> >>
> >> Sent from my phone. No type good.
> >>
> >> On Sep 4, 2014, at 8:35 AM, "Matt Thompson"  wrote:
> >>
> >>> Jeff,
> >>>
> >>> I actually misspoke earlier. It turns out our srun is a *Perl* script
> around the SLURM srun. I'll speak with our admins to see if they can
> massage the script to not interpret the arguments. If possible, I'll ask
> them if I can share the script with you (privately or on the list) and
> maybe you can see how it is affecting Open MPI's argument passage.
> >>>
> >>> Matt
> >>>
> >>>
> >>> On Thu, Sep 4, 2014 at 8:04 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> >>> On Sep 3, 2014, at 9:27 AM, Matt Thompson  wrote:
> >>>
> >>> > Just saw this, sorry. Our srun is indeed a shell script. It seems to
> be a wrapper around the regular srun that runs a --task-prolog. What it
> does...that's beyond my ken, but I could ask. My guess is that it probably
> does something that helps keep our old PBS scripts running (sets
> $PBS_NODEFILE, say). We used to run PBS but switched to SLURM recently. The
> admins would, of course, prefer all future scripts be SLURM-native scripts,
> but there are a lot of production runs that uses many, many PBS scripts.
> Converting that would need slow, careful QC to make sure any "pure SLURM"
> versions act as expected.
> >>>
> >>> Ralph and I haven't had a chance to discuss this in detail yet, but I
> have thought about this quite a bit.
> >>>
> >>> What is happening is that one of the $argv OMPI passes is of the form
> "foo;bar".  Your srun script is interpreting the ";" as the end of the
> command the the "bar" as the beginning of a new command, and mayhem ensues.
> >>>
> >>> Basically, your srun script is violating what should be a very safe
> assumption: that the $argv we pass to it will not be interpreted by a
> shell.  Put differently: your "srun" script behaves differently than
> SLURM's "srun" executable.  This violates OMPI's expectations of how srun
> should behave.
> >>>
> >>> My $0.02 is that if we "fix" this in OMPI, we're effectively
> penalizing all other SLURM installations out there that *don't* violate
> this assumption (i.e., all of them).  Ralph may disagree with me on this
> point, BTW -- like I said, we haven't talked about this in detail since
> Tuesday.  :-)
> >>>
> >>> So here's my question: is there any chance you can change your "srun"
> script to a script language that doesn't recombine $argv?  This is a common
> problem, actually -- sh/csh/etc. script languages tend to recombine $argv,
> but other languages such as perl and python do not (e.g.,
> http://stackov

Re: [OMPI users] compilation problem with ifort

2014-09-04 Thread Elio Physics
Dear Gus,
Firstly I really need to thank you for the effort you are doing to help me and 
write all these e-mails and in details explaining every step.Secondly, I did 
all what you wrote; the EPW is indeed inside the QE espresso but I still get 
the same annoying error. I actually deleted all the tar files and the files 
themselves and started afresh...
However I still did not tackle the LIBRARIES ISSUE..I did not quite understand 
what you said about libraries..How do I know the path of the openmpi 
libraries...Sorry I am really "dumb" in Fortran...Can you just explain ONLY 
that part please in more details.
Another thing when configure was successful, at the end there were those lines:
"The following libraries have been found:  
BLAS_LIBS=-L/opt/intel/mkl/10.0.011/lib/em64t -lmkl_em64t  LAPACK_LIBS= 
-L/opt/intel/mkl/10.0.011/lib/em64t -lmkl_em64t  FFT_LIBS=Please check if this 
is what you expect.
If any libraries are missing, you may specify a list of directoriesto search 
and retry, as follows:  ./configure LIBDIRS="list of directories, separated by 
spaces" "
Do I need other libraries?
Thanks a lot for your efforts
ELIO MOUJAES
> Date: Thu, 4 Sep 2014 12:48:44 -0400
> From: g...@ldeo.columbia.edu
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] compilation problem with ifort
> 
> Hi Elie
> 
> The executable generated in my computer will be useless to you,
> because these days most if not all libraries linked to an executable are
> dynamic/shared libraries.
> You won't have the same in your computer, or the equivalent will be
> located in different places, may be from different versions, etc.
> (E.g. your Intel compiler libraries will be from a different version,
> in a different location, and likewise for OpenMPI libraries etc.)
> Take any executable that you may have in your computer and do "ldd 
> exectuable_name" to see the list of shared libraries.
> 
> The error you reported suggests a misconfiguration of Makefiles,
> or better, a mispositioning of directories.
> 
> **
> 
> First thing I would try is to start fresh.
> Delete or move the old directory trees,
> download everything again on blank directories,
> and do the recipe all over again.
> Leftovers of previous compilations are often a hurdle,
> so you do yourself a favor by starting over from scratch.
> 
> **
> Second *really important* item to check:
> 
> The top directories of QE and EPW *must* follow this hierarchy:
> 
> espresso-4.0.3
> |-- EPW-3.0.0
> 
> Is this what you have?
> The EPW web site just hints this in their recipe step 3.
> The Makefiles will NOT work if this directory hierarchy is incorrect.
> 
> The error you reported in your first email *suggests* that the Makefiles
> in the EPW tarball are not finding the Makefiles in the QE tarball,
> which indicates that the the directories may not have a correct relative 
> location.
> 
> I.e. the EPW top directory must be right under the QE top directory.
> 
> **
> 
> Third thing, is that you have to follow the recipe strictly (and on
> the EPW web site there seems to be typos and omissions):
> 
> 1) untar the QE tarball:
> 
> tar -zxf espresso-4.0.3.tar.gz
> 
> 2) move the EPW tarball to the QE top directory produced by step 1 
> above, something like this:
> 
> mv EPW-3.0.0.tar.gz espresso-4.0.3
> 
> 3) untar the EPW tarball you copied/moved in step 2 above,
> something like this:
> 
> cd espresso-4.0.3
> tar -zxf  EPW-3.0.0.tar.gz
> 
> 4) Set up your OpenMPI environment (assuming you are using OpenMPI
> and that it is not installed in a standard location such as /usr/local):
> 
> 
> [bash/sh]
> export PATH=/your/openmpi/bin:$PATH
> export LD_LIBRARY_PATH=/your/openmpi/lib:$LD_LIBRARY_PATH
> 
> [tcsh/csh]
> setenv PATH /your/openmpi/bin:$PATH
> setenv LD_LIBRARY_PATH /your/openmpi/lib:$LD_LIBRARY_PATH
> 
> 5) configure espresso-4.0.3, i.e, assuming you already are in the
> espresso-4.0.3, do:
> 
> ./configure CC=icc F77=ifort
> 
> (assuming you are using Intel compilers, and that you compiled OpenMPI 
> with them, if you did
> not, say, if you used gcc and gfortran, use CC=gcc FC=gfortran instead)
> 
> 6) Run "make" on the top EPW directory:
> 
> cd EPW-3.0.0
> make
> 
> When you configure QE it doesn't compile anything.
> It just generates/sets up a bunch of Makefiles in the QE directory tree.
> 
> When you do "make" on the EPW-3.0.0 directory the top Makefile just
> says (cd src; make).
> If you look into the "src" subdirectory you will see that the Makefile
> therein points to library and include directories two levels above,
> which means that they are in the *QE* directory tree:
> 
> *
> IFLAGS   = -I../../include
> MODFLAGS = -I./ -I../../Modules -I../../iotk/src \
> -I../../PW -I../../PH -I../../PP
> LIBOBJS  = ../../flib/ptools.a ../../flib/flib.a \
> ../../clib/clib.a ../../iotk/src/libiotk.a
> W90LIB   = ../../W90/libwannier.a
> **
> 
> Hence, if your QE directory is not immediately above your EPW directory
> everything will f

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-09-04 Thread Jeff Squyres (jsquyres)
Try this (typed in editor, not tested!):

#! /usr/bin/perl -w

use strict;
use warnings;

use FindBin;

# Specify the path to the prolog.
my $prolog = '--task-prolog=/gpfsm//.task.prolog';

# Build the path to the SLURM srun command.
my $srun_slurm = "${FindBin::Bin}/srun.slurm";

# Add the prolog option, but abort if the user specifies a prolog option.
my @command = split(/ /, "$srun_slurm $prolog");
foreach (@ARGV) {
if (/^--task-prolog=/) {
print("The --task-prolog option is unsupported at . Please " .
  "contact the  for assistance.\n");
exit(1);
} else {
push(@command, $_);
}
}
system(@command);



On Sep 4, 2014, at 1:21 PM, Matt Thompson  wrote:

> Jeff,
> 
> Here is the script (with a bit of munging for safety's sake):
> 
> #! /usr/bin/perl -w
> 
> use strict;
> use warnings;
> 
> use FindBin;
> 
> # Specify the path to the prolog.
> my $prolog = '--task-prolog=/gpfsm//.task.prolog';
> 
> # Build the path to the SLURM srun command.
> my $srun_slurm = "${FindBin::Bin}/srun.slurm";
> 
> # Add the prolog option, but abort if the user specifies a prolog option.
> my $command = "$srun_slurm $prolog";
> foreach (@ARGV) {
> if (/^--task-prolog=/) {
> print("The --task-prolog option is unsupported at . Please " .
>   "contact the  for assistance.\n");
> exit(1);
> } else {
> $command .= " $_";
> }
> }
> system($command);
> 
> Ideas?
> 
> 
> 
> On Thu, Sep 4, 2014 at 10:51 AM, Ralph Castain  wrote:
> Still begs the bigger question, though, as others have used script wrappers 
> before - and I'm not sure we (OMPI) want to be in the business of dictating 
> the scripting language they can use. :-)
> 
> Jeff and I will argue that one out
> 
> 
> On Sep 4, 2014, at 7:38 AM, Jeff Squyres (jsquyres)  
> wrote:
> 
>> Ah, if it's perl, it might be easy. It might just be the difference between 
>> system("...string...") and system(@argv). 
>> 
>> Sent from my phone. No type good. 
>> 
>> On Sep 4, 2014, at 8:35 AM, "Matt Thompson"  wrote:
>> 
>>> Jeff,
>>> 
>>> I actually misspoke earlier. It turns out our srun is a *Perl* script 
>>> around the SLURM srun. I'll speak with our admins to see if they can 
>>> massage the script to not interpret the arguments. If possible, I'll ask 
>>> them if I can share the script with you (privately or on the list) and 
>>> maybe you can see how it is affecting Open MPI's argument passage.
>>> 
>>> Matt
>>> 
>>> 
>>> On Thu, Sep 4, 2014 at 8:04 AM, Jeff Squyres (jsquyres) 
>>>  wrote:
>>> On Sep 3, 2014, at 9:27 AM, Matt Thompson  wrote:
>>> 
>>> > Just saw this, sorry. Our srun is indeed a shell script. It seems to be a 
>>> > wrapper around the regular srun that runs a --task-prolog. What it 
>>> > does...that's beyond my ken, but I could ask. My guess is that it 
>>> > probably does something that helps keep our old PBS scripts running (sets 
>>> > $PBS_NODEFILE, say). We used to run PBS but switched to SLURM recently. 
>>> > The admins would, of course, prefer all future scripts be SLURM-native 
>>> > scripts, but there are a lot of production runs that uses many, many PBS 
>>> > scripts. Converting that would need slow, careful QC to make sure any 
>>> > "pure SLURM" versions act as expected.
>>> 
>>> Ralph and I haven't had a chance to discuss this in detail yet, but I have 
>>> thought about this quite a bit.
>>> 
>>> What is happening is that one of the $argv OMPI passes is of the form 
>>> "foo;bar".  Your srun script is interpreting the ";" as the end of the 
>>> command the the "bar" as the beginning of a new command, and mayhem ensues.
>>> 
>>> Basically, your srun script is violating what should be a very safe 
>>> assumption: that the $argv we pass to it will not be interpreted by a 
>>> shell.  Put differently: your "srun" script behaves differently than 
>>> SLURM's "srun" executable.  This violates OMPI's expectations of how srun 
>>> should behave.
>>> 
>>> My $0.02 is that if we "fix" this in OMPI, we're effectively penalizing all 
>>> other SLURM installations out there that *don't* violate this assumption 
>>> (i.e., all of them).  Ralph may disagree with me on this point, BTW -- like 
>>> I said, we haven't talked about this in detail since Tuesday.  :-)
>>> 
>>> So here's my question: is there any chance you can change your "srun" 
>>> script to a script language that doesn't recombine $argv?  This is a common 
>>> problem, actually -- sh/csh/etc. script languages tend to recombine $argv, 
>>> but other languages such as perl and python do not (e.g., 
>>> http://stackoverflow.com/questions/6981533/how-to-preserve-single-and-double-quotes-in-shell-script-arguments-without-the-a).
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-09-04 Thread Matt Thompson
Jeff,

Here is the script (with a bit of munging for safety's sake):

#! /usr/bin/perl -w

use strict;
use warnings;

use FindBin;

# Specify the path to the prolog.
my $prolog = '--task-prolog=/gpfsm//.task.prolog';

# Build the path to the SLURM srun command.
my $srun_slurm = "${FindBin::Bin}/srun.slurm";

# Add the prolog option, but abort if the user specifies a prolog option.
my $command = "$srun_slurm $prolog";
foreach (@ARGV) {
if (/^--task-prolog=/) {
print("The --task-prolog option is unsupported at . Please " .
  "contact the  for assistance.\n");
exit(1);
} else {
$command .= " $_";
}
}
system($command);

Ideas?



On Thu, Sep 4, 2014 at 10:51 AM, Ralph Castain  wrote:

> Still begs the bigger question, though, as others have used script
> wrappers before - and I'm not sure we (OMPI) want to be in the business of
> dictating the scripting language they can use. :-)
>
> Jeff and I will argue that one out
>
>
> On Sep 4, 2014, at 7:38 AM, Jeff Squyres (jsquyres) 
> wrote:
>
>  Ah, if it's perl, it might be easy. It might just be the difference
> between system("...string...") and system(@argv).
>
> Sent from my phone. No type good.
>
> On Sep 4, 2014, at 8:35 AM, "Matt Thompson"  wrote:
>
>   Jeff,
>
>  I actually misspoke earlier. It turns out our srun is a *Perl* script
> around the SLURM srun. I'll speak with our admins to see if they can
> massage the script to not interpret the arguments. If possible, I'll ask
> them if I can share the script with you (privately or on the list) and
> maybe you can see how it is affecting Open MPI's argument passage.
>
>  Matt
>
>
> On Thu, Sep 4, 2014 at 8:04 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> On Sep 3, 2014, at 9:27 AM, Matt Thompson  wrote:
>>
>> > Just saw this, sorry. Our srun is indeed a shell script. It seems to be
>> a wrapper around the regular srun that runs a --task-prolog. What it
>> does...that's beyond my ken, but I could ask. My guess is that it probably
>> does something that helps keep our old PBS scripts running (sets
>> $PBS_NODEFILE, say). We used to run PBS but switched to SLURM recently. The
>> admins would, of course, prefer all future scripts be SLURM-native scripts,
>> but there are a lot of production runs that uses many, many PBS scripts.
>> Converting that would need slow, careful QC to make sure any "pure SLURM"
>> versions act as expected.
>>
>>  Ralph and I haven't had a chance to discuss this in detail yet, but I
>> have thought about this quite a bit.
>>
>> What is happening is that one of the $argv OMPI passes is of the form
>> "foo;bar".  Your srun script is interpreting the ";" as the end of the
>> command the the "bar" as the beginning of a new command, and mayhem ensues.
>>
>> Basically, your srun script is violating what should be a very safe
>> assumption: that the $argv we pass to it will not be interpreted by a
>> shell.  Put differently: your "srun" script behaves differently than
>> SLURM's "srun" executable.  This violates OMPI's expectations of how srun
>> should behave.
>>
>> My $0.02 is that if we "fix" this in OMPI, we're effectively penalizing
>> all other SLURM installations out there that *don't* violate this
>> assumption (i.e., all of them).  Ralph may disagree with me on this point,
>> BTW -- like I said, we haven't talked about this in detail since Tuesday.
>> :-)
>>
>> So here's my question: is there any chance you can change your "srun"
>> script to a script language that doesn't recombine $argv?  This is a common
>> problem, actually -- sh/csh/etc. script languages tend to recombine $argv,
>> but other languages such as perl and python do not (e.g.,
>> http://stackoverflow.com/questions/6981533/how-to-preserve-single-and-double-quotes-in-shell-script-arguments-without-the-a
>> ).
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>  Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/09/25263.php
>>
>
>
>
>  --
>  "And, isn't sanity really just a one-trick pony anyway? I mean all you
>  get is one trick: rational thinking. But when you're good and crazy,
>  oooh, oooh, oooh, the sky is the limit!" -- The Tick
>
>___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/09/25264.php
>
>  ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/09/25269.php
>
>
>
> ___

Re: [OMPI users] compilation problem with ifort

2014-09-04 Thread Gus Correa

Hi Elie

The executable generated in my computer will be useless to you,
because these days most if not all libraries linked to an executable are
dynamic/shared libraries.
You won't have the same in your computer, or the equivalent will be
located in different places, may be from different versions, etc.
(E.g. your Intel compiler libraries will be from a different version,
in a different location, and likewise for OpenMPI libraries etc.)
Take any executable that you may have in your computer and do "ldd 
exectuable_name" to see the list of shared libraries.


The error you reported suggests a misconfiguration of Makefiles,
or better, a mispositioning of directories.

**

First thing I would try is to start fresh.
Delete or move the old directory trees,
download everything again on blank directories,
and do the recipe all over again.
Leftovers of previous compilations are often a hurdle,
so you do yourself a favor by starting over from scratch.

**
Second *really important* item to check:

The top directories of QE and EPW *must* follow this hierarchy:

espresso-4.0.3
|-- EPW-3.0.0

Is this what you have?
The EPW web site just hints this in their recipe step 3.
The Makefiles will NOT work if this directory hierarchy is incorrect.

The error you reported in your first email *suggests* that the Makefiles
in the EPW tarball are not finding the Makefiles in the QE tarball,
which indicates that the the directories may not have a correct relative 
location.


I.e. the EPW top directory must be right under the QE top directory.

**

Third thing, is that you have to follow the recipe strictly (and on
the EPW web site there seems to be typos and omissions):

1) untar the QE tarball:

tar -zxf espresso-4.0.3.tar.gz

2) move the EPW tarball to the QE top directory produced by step 1 
above, something like this:


mv EPW-3.0.0.tar.gz espresso-4.0.3

3) untar the EPW tarball you copied/moved in step 2 above,
something like this:

cd espresso-4.0.3
tar -zxf  EPW-3.0.0.tar.gz

4) Set up your OpenMPI environment (assuming you are using OpenMPI
and that it is not installed in a standard location such as /usr/local):


[bash/sh]
export PATH=/your/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/your/openmpi/lib:$LD_LIBRARY_PATH

[tcsh/csh]
setenv PATH /your/openmpi/bin:$PATH
setenv LD_LIBRARY_PATH /your/openmpi/lib:$LD_LIBRARY_PATH

5) configure espresso-4.0.3, i.e, assuming you already are in the
espresso-4.0.3, do:

./configure CC=icc F77=ifort

(assuming you are using Intel compilers, and that you compiled OpenMPI 
with them, if you did

not, say, if you used gcc and gfortran, use CC=gcc FC=gfortran instead)

6) Run "make" on the top EPW directory:

cd EPW-3.0.0
make

When you configure QE it doesn't compile anything.
It just generates/sets up a bunch of Makefiles in the QE directory tree.

When you do "make" on the EPW-3.0.0 directory the top Makefile just
says (cd src; make).
If you look into the "src" subdirectory you will see that the Makefile
therein points to library and include directories two levels above,
which means that they are in the *QE* directory tree:

*
IFLAGS   = -I../../include
MODFLAGS = -I./ -I../../Modules -I../../iotk/src \
   -I../../PW -I../../PH -I../../PP
LIBOBJS  = ../../flib/ptools.a ../../flib/flib.a \
   ../../clib/clib.a ../../iotk/src/libiotk.a
W90LIB   = ../../W90/libwannier.a
**

Hence, if your QE directory is not immediately above your EPW directory
everything will fail, because the EPW Makefile won't be able to find
the bits and parts of QE that it needs.
And this is *exactly what the error message in your first email showed*,
a bunch of object files that were not found.

***

Sorry, but I cannot do any better than this.
I hope this helps,
Gus Correa

On 09/03/2014 08:59 PM, Elio Physics wrote:

Ray and Gus,

Thanks a lot for your help. I followed Gus' steps. I still have the same
problem for the compilation (I didnt check the libraries part though!).
The executables for quantum espresso work pretty fine. I have got them
in espresso-4.0.3/bin:
dynmat.x  iotk  iotk_print_kinds.x  iotk.x  matdyn.x  ph.x  pw.x  q2r.x.
The problem are the EPW executables and I dont understand why.

Gus do me a favor: can u send me all the EPW executables that you have
produced, in the epw.x? I guess this resolves the problem for the moment.

Regards

ELIO

 > Date: Wed, 3 Sep 2014 19:45:32 -0400
 > From: g...@ldeo.columbia.edu
 > To: us...@open-mpi.org
 > Subject: Re: [OMPI users] compilation problem with ifort
 >
 > Hi Elio
 >
 > For what it is worth, I followed the instructions on
 > the EPW web site, and the program compiled flawlessly.
 > Sorry, I don't know how to use/run it,
 > don't have the time to learn it, and won't even try.
 >
 > **
 >
 > 1) Environment:
 >
 > If your MPI/OpenMPI is not installed in a standard location,
 > you need to setup these environment variables:
 >
 > [bash/sh]
 > export PATH=/your/openmpi/bin:$PATH
 > export LD_LIBRARY_PATH=/your/openmpi/lib:$L

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-09-04 Thread Ralph Castain
Still begs the bigger question, though, as others have used script wrappers 
before - and I'm not sure we (OMPI) want to be in the business of dictating the 
scripting language they can use. :-)

Jeff and I will argue that one out


On Sep 4, 2014, at 7:38 AM, Jeff Squyres (jsquyres)  wrote:

> Ah, if it's perl, it might be easy. It might just be the difference between 
> system("...string...") and system(@argv). 
> 
> Sent from my phone. No type good. 
> 
> On Sep 4, 2014, at 8:35 AM, "Matt Thompson"  wrote:
> 
>> Jeff,
>> 
>> I actually misspoke earlier. It turns out our srun is a *Perl* script around 
>> the SLURM srun. I'll speak with our admins to see if they can massage the 
>> script to not interpret the arguments. If possible, I'll ask them if I can 
>> share the script with you (privately or on the list) and maybe you can see 
>> how it is affecting Open MPI's argument passage.
>> 
>> Matt
>> 
>> 
>> On Thu, Sep 4, 2014 at 8:04 AM, Jeff Squyres (jsquyres)  
>> wrote:
>> On Sep 3, 2014, at 9:27 AM, Matt Thompson  wrote:
>> 
>> > Just saw this, sorry. Our srun is indeed a shell script. It seems to be a 
>> > wrapper around the regular srun that runs a --task-prolog. What it 
>> > does...that's beyond my ken, but I could ask. My guess is that it probably 
>> > does something that helps keep our old PBS scripts running (sets 
>> > $PBS_NODEFILE, say). We used to run PBS but switched to SLURM recently. 
>> > The admins would, of course, prefer all future scripts be SLURM-native 
>> > scripts, but there are a lot of production runs that uses many, many PBS 
>> > scripts. Converting that would need slow, careful QC to make sure any 
>> > "pure SLURM" versions act as expected.
>> 
>> Ralph and I haven't had a chance to discuss this in detail yet, but I have 
>> thought about this quite a bit.
>> 
>> What is happening is that one of the $argv OMPI passes is of the form 
>> "foo;bar".  Your srun script is interpreting the ";" as the end of the 
>> command the the "bar" as the beginning of a new command, and mayhem ensues.
>> 
>> Basically, your srun script is violating what should be a very safe 
>> assumption: that the $argv we pass to it will not be interpreted by a shell. 
>>  Put differently: your "srun" script behaves differently than SLURM's "srun" 
>> executable.  This violates OMPI's expectations of how srun should behave.
>> 
>> My $0.02 is that if we "fix" this in OMPI, we're effectively penalizing all 
>> other SLURM installations out there that *don't* violate this assumption 
>> (i.e., all of them).  Ralph may disagree with me on this point, BTW -- like 
>> I said, we haven't talked about this in detail since Tuesday.  :-)
>> 
>> So here's my question: is there any chance you can change your "srun" script 
>> to a script language that doesn't recombine $argv?  This is a common 
>> problem, actually -- sh/csh/etc. script languages tend to recombine $argv, 
>> but other languages such as perl and python do not (e.g., 
>> http://stackoverflow.com/questions/6981533/how-to-preserve-single-and-double-quotes-in-shell-script-arguments-without-the-a).
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/09/25263.php
>> 
>> 
>> 
>> -- 
>> "And, isn't sanity really just a one-trick pony anyway? I mean all you
>>  get is one trick: rational thinking. But when you're good and crazy, 
>>  oooh, oooh, oooh, the sky is the limit!" -- The Tick
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/09/25264.php
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/09/25269.php



Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-09-04 Thread Jeff Squyres (jsquyres)
Ah, if it's perl, it might be easy. It might just be the difference between 
system("...string...") and system(@argv).

Sent from my phone. No type good.

On Sep 4, 2014, at 8:35 AM, "Matt Thompson" 
mailto:fort...@gmail.com>> wrote:

Jeff,

I actually misspoke earlier. It turns out our srun is a *Perl* script around 
the SLURM srun. I'll speak with our admins to see if they can massage the 
script to not interpret the arguments. If possible, I'll ask them if I can 
share the script with you (privately or on the list) and maybe you can see how 
it is affecting Open MPI's argument passage.

Matt


On Thu, Sep 4, 2014 at 8:04 AM, Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:
On Sep 3, 2014, at 9:27 AM, Matt Thompson 
mailto:fort...@gmail.com>> wrote:

> Just saw this, sorry. Our srun is indeed a shell script. It seems to be a 
> wrapper around the regular srun that runs a --task-prolog. What it 
> does...that's beyond my ken, but I could ask. My guess is that it probably 
> does something that helps keep our old PBS scripts running (sets 
> $PBS_NODEFILE, say). We used to run PBS but switched to SLURM recently. The 
> admins would, of course, prefer all future scripts be SLURM-native scripts, 
> but there are a lot of production runs that uses many, many PBS scripts. 
> Converting that would need slow, careful QC to make sure any "pure SLURM" 
> versions act as expected.

Ralph and I haven't had a chance to discuss this in detail yet, but I have 
thought about this quite a bit.

What is happening is that one of the $argv OMPI passes is of the form 
"foo;bar".  Your srun script is interpreting the ";" as the end of the command 
the the "bar" as the beginning of a new command, and mayhem ensues.

Basically, your srun script is violating what should be a very safe assumption: 
that the $argv we pass to it will not be interpreted by a shell.  Put 
differently: your "srun" script behaves differently than SLURM's "srun" 
executable.  This violates OMPI's expectations of how srun should behave.

My $0.02 is that if we "fix" this in OMPI, we're effectively penalizing all 
other SLURM installations out there that *don't* violate this assumption (i.e., 
all of them).  Ralph may disagree with me on this point, BTW -- like I said, we 
haven't talked about this in detail since Tuesday.  :-)

So here's my question: is there any chance you can change your "srun" script to 
a script language that doesn't recombine $argv?  This is a common problem, 
actually -- sh/csh/etc. script languages tend to recombine $argv, but other 
languages such as perl and python do not (e.g., 
http://stackoverflow.com/questions/6981533/how-to-preserve-single-and-double-quotes-in-shell-script-arguments-without-the-a).

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/09/25263.php



--
"And, isn't sanity really just a one-trick pony anyway? I mean all you
 get is one trick: rational thinking. But when you're good and crazy,
 oooh, oooh, oooh, the sky is the limit!" -- The Tick

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/09/25264.php


Re: [OMPI users] SGE and openMPI

2014-09-04 Thread Ralph Castain
Just to help separate out the issues, you might try running the hello_c program 
in the OMPI examples directory - this will verify whether the problem is in the 
mpirun command or in your program


On Sep 4, 2014, at 6:26 AM, Donato Pera  wrote:

> Hi,
> 
> the text was on the file.err file in the file.out file I get only the name
> of the node where the program run.
> 
> Thanks Donato.
> 
> 
> On 04/09/2014 15:14, Reuti wrote:
>> Hi,
>> 
>> Am 04.09.2014 um 14:43 schrieb Donato Pera:
>> 
>>> using this script :
>>> 
>>> #!/bin/bash
>>> #$ -S /bin/bash
>>> #$ -pe orte 64
>>> #$ -cwd
>>> #$ -o ./file.out
>>> #$ -e ./file.err
>>> 
>>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
>>> export OMP_NUM_THREADS=1
>>> 
>>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
>>> PP_PATH=/home/tanzi
>>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun ${CPMD_PATH}cpmd.x  input
>>> ${PP_PATH}/PP/ > out
>> Is this text below in out, file.out or file.err - any hint in the other 
>> files?
>> 
>> -- Reuti
>> 
>> 
>>> The program run for about 2 minutes and after I get this error
>>> 
>>> WARNING: A process refused to die!
>>> 
>>> Host: compute-2-2.local
>>> PID:  24897
>>> 
>>> This process may still be running and/or consuming resources.
>>> 
>>> --
>>> [compute-2-2.local:24889] 25 more processes have sent help message
>>> help-odls-default.txt / odls-default:could-not-kill
>>> [compute-2-2.local:24889] Set MCA parameter "orte_base_help_aggregate"
>>> to 0 to see all help / error messages
>>> [compute-2-2.local:24889] 27 more processes have sent help message
>>> help-odls-default.txt / odls-default:could-not-kill
>>> --
>>> mpirun has exited due to process rank 0 with PID 24896 on
>>> node compute-2-2.local exiting improperly. There are two reasons this
>>> could occur:
>>> 
>>> 1. this process did not call "init" before exiting, but others in
>>> the job did. This can cause a job to hang indefinitely while it waits
>>> for all processes to call "init". By rule, if one process calls "init",
>>> then ALL processes must call "init" prior to termination.
>>> 
>>> 2. this process called "init", but exited without calling "finalize".
>>> By rule, all processes that call "init" MUST call "finalize" prior to
>>> exiting or it will be considered an "abnormal termination"
>>> 
>>> This may have caused other processes in the application to be
>>> terminated by signals sent by mpirun (as reported here).
>>> --
>>> [compute-2-2.local:24889] 1 more process has sent help message
>>> help-odls-default.txt / odls-default:could-not-kill
>>> 
>>> 
>>> Thanks and Regards Donato
>>> 
>>> 
>>> 
>>> 
>>> On 03/09/2014 13:19, Reuti wrote:
 Am 03.09.2014 um 13:11 schrieb Donato Pera:
 
> I get
> 
> ompi_info | grep grid
>   MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)
 Good.
 
 
> and using this script
> 
> #!/bin/bash
> #$ -S /bin/bash
> #$ -pe orte 64
> #$ -cwd
> #$ -o ./file.out
> #$ -e ./file.err
> 
> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
> export OMP_NUM_THREADS=1
> 
> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
> PP_PATH=/home/tanzi
> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64
> -machinefile $TMPDIR/machines  ${CPMD_PATH}cpmd.x  input ${PP_PATH}/PP/
 In the PE "orte" is no "start_proc_args" defined which could generate the 
 machinefile. Please try to start the application with:
 
 /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib ${CPMD_PATH}cpmd.x  
 input ${PP_PATH}/PP/
 
 -- Reuti
 
 
>> out
> I get this error
> 
> Open RTE was unable to open the hostfile:
>  /tmp/21213.1.debug.q/machines
> Check to make sure the path and filename are correct.
> --
> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
> base/rmaps_base_support_fns.c at line 207
> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
> rmaps_rr.c at line 82
> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
> base/rmaps_base_map_job.c at line 88
> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
> base/plm_base_launch_support.c at line 105
> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
> plm_rsh_module.c at line 1173
> 
> 
> 
> 
> 
> Instead using this script
> 
> 
> #!/bin/bash
> #$ -S /bin/bash
> #$ -pe orte 64
> #$ -cwd
> #$ -o ./file.out
> #$ -e ./file.err
> 
> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib

Re: [OMPI users] SGE and openMPI

2014-09-04 Thread Donato Pera
Hi,

the text was on the file.err file in the file.out file I get only the name
of the node where the program run.

Thanks Donato.


On 04/09/2014 15:14, Reuti wrote:
> Hi,
>
> Am 04.09.2014 um 14:43 schrieb Donato Pera:
>
>> using this script :
>>
>> #!/bin/bash
>> #$ -S /bin/bash
>> #$ -pe orte 64
>> #$ -cwd
>> #$ -o ./file.out
>> #$ -e ./file.err
>>
>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
>> export OMP_NUM_THREADS=1
>>
>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
>> PP_PATH=/home/tanzi
>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun ${CPMD_PATH}cpmd.x  input
>> ${PP_PATH}/PP/ > out
> Is this text below in out, file.out or file.err - any hint in the other files?
>
> -- Reuti
>
>
>> The program run for about 2 minutes and after I get this error
>>
>> WARNING: A process refused to die!
>>
>> Host: compute-2-2.local
>> PID:  24897
>>
>> This process may still be running and/or consuming resources.
>>
>> --
>> [compute-2-2.local:24889] 25 more processes have sent help message
>> help-odls-default.txt / odls-default:could-not-kill
>> [compute-2-2.local:24889] Set MCA parameter "orte_base_help_aggregate"
>> to 0 to see all help / error messages
>> [compute-2-2.local:24889] 27 more processes have sent help message
>> help-odls-default.txt / odls-default:could-not-kill
>> --
>> mpirun has exited due to process rank 0 with PID 24896 on
>> node compute-2-2.local exiting improperly. There are two reasons this
>> could occur:
>>
>> 1. this process did not call "init" before exiting, but others in
>> the job did. This can cause a job to hang indefinitely while it waits
>> for all processes to call "init". By rule, if one process calls "init",
>> then ALL processes must call "init" prior to termination.
>>
>> 2. this process called "init", but exited without calling "finalize".
>> By rule, all processes that call "init" MUST call "finalize" prior to
>> exiting or it will be considered an "abnormal termination"
>>
>> This may have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --
>> [compute-2-2.local:24889] 1 more process has sent help message
>> help-odls-default.txt / odls-default:could-not-kill
>>
>>
>> Thanks and Regards Donato
>>
>>
>>
>>
>> On 03/09/2014 13:19, Reuti wrote:
>>> Am 03.09.2014 um 13:11 schrieb Donato Pera:
>>>
 I get

 ompi_info | grep grid
MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)
>>> Good.
>>>
>>>
 and using this script

 #!/bin/bash
 #$ -S /bin/bash
 #$ -pe orte 64
 #$ -cwd
 #$ -o ./file.out
 #$ -e ./file.err

 export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
 export OMP_NUM_THREADS=1

 CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
 PP_PATH=/home/tanzi
 /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64
 -machinefile $TMPDIR/machines  ${CPMD_PATH}cpmd.x  input ${PP_PATH}/PP/
>>> In the PE "orte" is no "start_proc_args" defined which could generate the 
>>> machinefile. Please try to start the application with:
>>>
>>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib ${CPMD_PATH}cpmd.x  
>>> input ${PP_PATH}/PP/
>>>
>>> -- Reuti
>>>
>>>
> out
 I get this error

 Open RTE was unable to open the hostfile:
   /tmp/21213.1.debug.q/machines
 Check to make sure the path and filename are correct.
 --
 [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
 base/rmaps_base_support_fns.c at line 207
 [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
 rmaps_rr.c at line 82
 [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
 base/rmaps_base_map_job.c at line 88
 [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
 base/plm_base_launch_support.c at line 105
 [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
 plm_rsh_module.c at line 1173





 Instead using this script


 #!/bin/bash
 #$ -S /bin/bash
 #$ -pe orte 64
 #$ -cwd
 #$ -o ./file.out
 #$ -e ./file.err

 export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
 export OMP_NUM_THREADS=1

 CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
 PP_PATH=/home/tanzi
 /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64
 $TMPDIR/machines  ${CPMD_PATH}cpmd.x  input ${PP_PATH}/PP/ > out


 I get
 Executable: /tmp/21214.1.debug.q/machines
 Node: compute-2-0.local

 while attempting to start process rank 0.
 

Re: [OMPI users] SGE and openMPI

2014-09-04 Thread Reuti
Hi,

Am 04.09.2014 um 14:43 schrieb Donato Pera:

> using this script :
> 
> #!/bin/bash
> #$ -S /bin/bash
> #$ -pe orte 64
> #$ -cwd
> #$ -o ./file.out
> #$ -e ./file.err
> 
> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
> export OMP_NUM_THREADS=1
> 
> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
> PP_PATH=/home/tanzi
> /home/SWcbbc/openmpi-1.6.5/bin/mpirun ${CPMD_PATH}cpmd.x  input
> ${PP_PATH}/PP/ > out

Is this text below in out, file.out or file.err - any hint in the other files?

-- Reuti


> 
> The program run for about 2 minutes and after I get this error
> 
> WARNING: A process refused to die!
> 
> Host: compute-2-2.local
> PID:  24897
> 
> This process may still be running and/or consuming resources.
> 
> --
> [compute-2-2.local:24889] 25 more processes have sent help message
> help-odls-default.txt / odls-default:could-not-kill
> [compute-2-2.local:24889] Set MCA parameter "orte_base_help_aggregate"
> to 0 to see all help / error messages
> [compute-2-2.local:24889] 27 more processes have sent help message
> help-odls-default.txt / odls-default:could-not-kill
> --
> mpirun has exited due to process rank 0 with PID 24896 on
> node compute-2-2.local exiting improperly. There are two reasons this
> could occur:
> 
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
> 
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
> 
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --
> [compute-2-2.local:24889] 1 more process has sent help message
> help-odls-default.txt / odls-default:could-not-kill
> 
> 
> Thanks and Regards Donato
> 
> 
> 
> 
> On 03/09/2014 13:19, Reuti wrote:
>> Am 03.09.2014 um 13:11 schrieb Donato Pera:
>> 
>>> I get
>>> 
>>> ompi_info | grep grid
>>>MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)
>> Good.
>> 
>> 
>>> and using this script
>>> 
>>> #!/bin/bash
>>> #$ -S /bin/bash
>>> #$ -pe orte 64
>>> #$ -cwd
>>> #$ -o ./file.out
>>> #$ -e ./file.err
>>> 
>>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
>>> export OMP_NUM_THREADS=1
>>> 
>>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
>>> PP_PATH=/home/tanzi
>>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64
>>> -machinefile $TMPDIR/machines  ${CPMD_PATH}cpmd.x  input ${PP_PATH}/PP/
>> In the PE "orte" is no "start_proc_args" defined which could generate the 
>> machinefile. Please try to start the application with:
>> 
>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib ${CPMD_PATH}cpmd.x  
>> input ${PP_PATH}/PP/
>> 
>> -- Reuti
>> 
>> 
 out
>>> 
>>> I get this error
>>> 
>>> Open RTE was unable to open the hostfile:
>>>   /tmp/21213.1.debug.q/machines
>>> Check to make sure the path and filename are correct.
>>> --
>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>>> base/rmaps_base_support_fns.c at line 207
>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>>> rmaps_rr.c at line 82
>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>>> base/rmaps_base_map_job.c at line 88
>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>>> base/plm_base_launch_support.c at line 105
>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>>> plm_rsh_module.c at line 1173
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Instead using this script
>>> 
>>> 
>>> #!/bin/bash
>>> #$ -S /bin/bash
>>> #$ -pe orte 64
>>> #$ -cwd
>>> #$ -o ./file.out
>>> #$ -e ./file.err
>>> 
>>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
>>> export OMP_NUM_THREADS=1
>>> 
>>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
>>> PP_PATH=/home/tanzi
>>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64
>>> $TMPDIR/machines  ${CPMD_PATH}cpmd.x  input ${PP_PATH}/PP/ > out
>>> 
>>> 
>>> I get
>>> Executable: /tmp/21214.1.debug.q/machines
>>> Node: compute-2-0.local
>>> 
>>> while attempting to start process rank 0.
>>> --
>>> 
>>> can you help me
>>> 
>>> 
>>> Thanks and Regards Donato
>>> 
>>> 
>>> 
>>> 
>>> On 03/09/2014 12:28, Reuti wrote:
 ompi_info | grep grid
>>> ___
>>> users mailing 

Re: [OMPI users] SGE and openMPI

2014-09-04 Thread Donato Pera
Hi,

using this script :

#!/bin/bash
#$ -S /bin/bash
#$ -pe orte 64
#$ -cwd
#$ -o ./file.out
#$ -e ./file.err

export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
export OMP_NUM_THREADS=1

CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
PP_PATH=/home/tanzi
/home/SWcbbc/openmpi-1.6.5/bin/mpirun ${CPMD_PATH}cpmd.x  input
${PP_PATH}/PP/ > out



The program run for about 2 minutes and after I get this error

WARNING: A process refused to die!

Host: compute-2-2.local
PID:  24897

This process may still be running and/or consuming resources.

--
[compute-2-2.local:24889] 25 more processes have sent help message
help-odls-default.txt / odls-default:could-not-kill
[compute-2-2.local:24889] Set MCA parameter "orte_base_help_aggregate"
to 0 to see all help / error messages
[compute-2-2.local:24889] 27 more processes have sent help message
help-odls-default.txt / odls-default:could-not-kill
--
mpirun has exited due to process rank 0 with PID 24896 on
node compute-2-2.local exiting improperly. There are two reasons this
could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--
[compute-2-2.local:24889] 1 more process has sent help message
help-odls-default.txt / odls-default:could-not-kill


Thanks and Regards Donato




On 03/09/2014 13:19, Reuti wrote:
> Am 03.09.2014 um 13:11 schrieb Donato Pera:
>
>> I get
>>
>> ompi_info | grep grid
>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)
> Good.
>
>
>> and using this script
>>
>> #!/bin/bash
>> #$ -S /bin/bash
>> #$ -pe orte 64
>> #$ -cwd
>> #$ -o ./file.out
>> #$ -e ./file.err
>>
>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
>> export OMP_NUM_THREADS=1
>>
>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
>> PP_PATH=/home/tanzi
>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64
>> -machinefile $TMPDIR/machines  ${CPMD_PATH}cpmd.x  input ${PP_PATH}/PP/
> In the PE "orte" is no "start_proc_args" defined which could generate the 
> machinefile. Please try to start the application with:
>
> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib ${CPMD_PATH}cpmd.x  
> input ${PP_PATH}/PP/
>
> -- Reuti
>
>
>>> out
>>
>> I get this error
>>
>> Open RTE was unable to open the hostfile:
>>/tmp/21213.1.debug.q/machines
>> Check to make sure the path and filename are correct.
>> --
>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>> base/rmaps_base_support_fns.c at line 207
>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>> rmaps_rr.c at line 82
>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>> base/rmaps_base_map_job.c at line 88
>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>> base/plm_base_launch_support.c at line 105
>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>> plm_rsh_module.c at line 1173
>>
>>
>>
>>
>>
>> Instead using this script
>>
>>
>> #!/bin/bash
>> #$ -S /bin/bash
>> #$ -pe orte 64
>> #$ -cwd
>> #$ -o ./file.out
>> #$ -e ./file.err
>>
>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
>> export OMP_NUM_THREADS=1
>>
>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
>> PP_PATH=/home/tanzi
>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64
>> $TMPDIR/machines  ${CPMD_PATH}cpmd.x  input ${PP_PATH}/PP/ > out
>>
>>
>> I get
>> Executable: /tmp/21214.1.debug.q/machines
>> Node: compute-2-0.local
>>
>> while attempting to start process rank 0.
>> --
>>
>> can you help me
>>
>>
>> Thanks and Regards Donato
>>
>>
>>
>>
>> On 03/09/2014 12:28, Reuti wrote:
>>> ompi_info | grep grid
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/09/25240.php
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.o

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-09-04 Thread Matt Thompson
Jeff,

I actually misspoke earlier. It turns out our srun is a *Perl* script
around the SLURM srun. I'll speak with our admins to see if they can
massage the script to not interpret the arguments. If possible, I'll ask
them if I can share the script with you (privately or on the list) and
maybe you can see how it is affecting Open MPI's argument passage.

Matt


On Thu, Sep 4, 2014 at 8:04 AM, Jeff Squyres (jsquyres) 
wrote:

> On Sep 3, 2014, at 9:27 AM, Matt Thompson  wrote:
>
> > Just saw this, sorry. Our srun is indeed a shell script. It seems to be
> a wrapper around the regular srun that runs a --task-prolog. What it
> does...that's beyond my ken, but I could ask. My guess is that it probably
> does something that helps keep our old PBS scripts running (sets
> $PBS_NODEFILE, say). We used to run PBS but switched to SLURM recently. The
> admins would, of course, prefer all future scripts be SLURM-native scripts,
> but there are a lot of production runs that uses many, many PBS scripts.
> Converting that would need slow, careful QC to make sure any "pure SLURM"
> versions act as expected.
>
> Ralph and I haven't had a chance to discuss this in detail yet, but I have
> thought about this quite a bit.
>
> What is happening is that one of the $argv OMPI passes is of the form
> "foo;bar".  Your srun script is interpreting the ";" as the end of the
> command the the "bar" as the beginning of a new command, and mayhem ensues.
>
> Basically, your srun script is violating what should be a very safe
> assumption: that the $argv we pass to it will not be interpreted by a
> shell.  Put differently: your "srun" script behaves differently than
> SLURM's "srun" executable.  This violates OMPI's expectations of how srun
> should behave.
>
> My $0.02 is that if we "fix" this in OMPI, we're effectively penalizing
> all other SLURM installations out there that *don't* violate this
> assumption (i.e., all of them).  Ralph may disagree with me on this point,
> BTW -- like I said, we haven't talked about this in detail since Tuesday.
> :-)
>
> So here's my question: is there any chance you can change your "srun"
> script to a script language that doesn't recombine $argv?  This is a common
> problem, actually -- sh/csh/etc. script languages tend to recombine $argv,
> but other languages such as perl and python do not (e.g.,
> http://stackoverflow.com/questions/6981533/how-to-preserve-single-and-double-quotes-in-shell-script-arguments-without-the-a
> ).
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/09/25263.php
>



-- 
"And, isn't sanity really just a one-trick pony anyway? I mean all you
 get is one trick: rational thinking. But when you're good and crazy,
 oooh, oooh, oooh, the sky is the limit!" -- The Tick


Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-09-04 Thread Jeff Squyres (jsquyres)
On Sep 3, 2014, at 9:27 AM, Matt Thompson  wrote:

> Just saw this, sorry. Our srun is indeed a shell script. It seems to be a 
> wrapper around the regular srun that runs a --task-prolog. What it 
> does...that's beyond my ken, but I could ask. My guess is that it probably 
> does something that helps keep our old PBS scripts running (sets 
> $PBS_NODEFILE, say). We used to run PBS but switched to SLURM recently. The 
> admins would, of course, prefer all future scripts be SLURM-native scripts, 
> but there are a lot of production runs that uses many, many PBS scripts. 
> Converting that would need slow, careful QC to make sure any "pure SLURM" 
> versions act as expected.

Ralph and I haven't had a chance to discuss this in detail yet, but I have 
thought about this quite a bit.

What is happening is that one of the $argv OMPI passes is of the form 
"foo;bar".  Your srun script is interpreting the ";" as the end of the command 
the the "bar" as the beginning of a new command, and mayhem ensues.

Basically, your srun script is violating what should be a very safe assumption: 
that the $argv we pass to it will not be interpreted by a shell.  Put 
differently: your "srun" script behaves differently than SLURM's "srun" 
executable.  This violates OMPI's expectations of how srun should behave.

My $0.02 is that if we "fix" this in OMPI, we're effectively penalizing all 
other SLURM installations out there that *don't* violate this assumption (i.e., 
all of them).  Ralph may disagree with me on this point, BTW -- like I said, we 
haven't talked about this in detail since Tuesday.  :-)

So here's my question: is there any chance you can change your "srun" script to 
a script language that doesn't recombine $argv?  This is a common problem, 
actually -- sh/csh/etc. script languages tend to recombine $argv, but other 
languages such as perl and python do not (e.g., 
http://stackoverflow.com/questions/6981533/how-to-preserve-single-and-double-quotes-in-shell-script-arguments-without-the-a).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/