Re: [OMPI users] openmpi configuration error?

2014-05-21 Thread Ben Lash
I know why it quite - M3EXIT was called - but thanks for looking.


On Wed, May 21, 2014 at 4:02 PM, Gus Correa  wrote:

> Hi Ben
>
> One of the ranks (52) called MPI_Abort.
> This may be a bug in the code, or a problem with the setup
> (e.g. a missing or incorrect input file).
> For instance, the CCTM Wiki says:
> "AERO6 expects emissions inputs for 13 new PM species. CCTM will crash if
> any emitted PM species is not included in the emissions input file"
> I am not familiar to CCTM, so these are just guesses.
>
> It doesn't look like an MPI problem, though.
>
> You may want to check any other logs that the CCTM code may
> produce, for any clue on where it fails.
> Otherwise, you could compile with -g -traceback (and remove any
> optimization options in FFLAGS, FCFLAGS, CFLAGS, etc.)
> It may also have a -DDEBUG or similar that can be turned on
> in the CPPFLAGS, which in many models makes a more verbose log.
> This *may* tell you where it fails (source file, subroutine and line),
> and may help understand why it fails.
> If it dumps a core file, you can trace the failure point with
> a debugger.
>
>
> I hope this helps,
> Gus
>
> On 05/21/2014 03:20 PM, Ben Lash wrote:
>
>> I used a different build of netcdf 4.1.3, and the code seems to run now.
>> I have a totally different, non-mpi related error in part of it, but
>> there's no way for the list to help, I mostly just wanted to report that
>> this particular problem seems to be solved for the record. It doesn't
>> seem to fail quite as gracefully anymore, but I'm still getting enough
>> of the error messages to know what's going on.
>>
>> MPI_ABORT was invoked on rank 52 in communicator MPI_COMM_WORLD
>> with errorcode 0.
>>
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them.
>> 
>> --
>> [cn-099.davinci.rice.edu:26185 ]
>>
>> [[63355,0],4]-[[63355,1],52] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-099.davinci.rice.edu:26185 ]
>>
>> [[63355,0],4]-[[63355,1],54] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-099.davinci.rice.edu:26185 ]
>>
>> [[63355,0],4]-[[63355,1],55] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-158.davinci.rice.edu:12459 ]
>>
>> [[63355,0],1]-[[63355,1],15] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-158.davinci.rice.edu:12459 ]
>>
>> [[63355,0],1]-[[63355,1],17] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-099.davinci.rice.edu:26185 ]
>>
>> [[63355,0],4]-[[63355,1],56] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-099.davinci.rice.edu:26185 ]
>>
>> [[63355,0],4]-[[63355,1],53] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-099.davinci.rice.edu:26185 ]
>>
>> [[63355,0],4]-[[63355,1],51] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> [cn-099.davinci.rice.edu:26185 ]
>>
>> [[63355,0],4]-[[63355,1],57] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> forrtl: error (78): process killed (SIGTERM)
>> Image  PCRoutineLineSource
>>
>> 
>>
>> [cn-158.davinci.rice.edu:12459 ]
>>
>> [[63355,0],1]-[[63355,1],16] mca_oob_tcp_msg_recv: readv failed:
>> Connection reset by peer (104)
>> 
>> --
>> mpiexec has exited due to process rank 49 with PID 26187 on
>> node cn-099 exiting improperly. There are two reasons this could occur:
>>
>> 1. this process did not call "init" before exiting, but others in
>> the job did. This can cause a job to hang indefinitely while it waits
>> for all processes to call "init". By rule, if one process calls "init",
>> then ALL processes must call "init" prior to termination.
>>
>> 2. this process called "init", but exited without calling "finalize".
>> By rule, all processes that call "init" MUST call "finalize" prior to
>> exiting or it will be considered an "abnormal termination"
>>
>> This may have caused other processes in the application to be
>> terminated by signals sent by mpiexec (as reported here).
>> 
>> --
>> forrtl: error (78): process killed (SIGTERM)
>> Image  PCRoutineLineSource
>> CCTM_V5g_Linux2_x  

Re: [OMPI users] openmpi configuration error?

2014-05-21 Thread Gus Correa

Hi Ben

One of the ranks (52) called MPI_Abort.
This may be a bug in the code, or a problem with the setup
(e.g. a missing or incorrect input file).
For instance, the CCTM Wiki says:
"AERO6 expects emissions inputs for 13 new PM species. CCTM will crash 
if any emitted PM species is not included in the emissions input file"

I am not familiar to CCTM, so these are just guesses.

It doesn't look like an MPI problem, though.

You may want to check any other logs that the CCTM code may
produce, for any clue on where it fails.
Otherwise, you could compile with -g -traceback (and remove any
optimization options in FFLAGS, FCFLAGS, CFLAGS, etc.)
It may also have a -DDEBUG or similar that can be turned on
in the CPPFLAGS, which in many models makes a more verbose log.
This *may* tell you where it fails (source file, subroutine and line),
and may help understand why it fails.
If it dumps a core file, you can trace the failure point with
a debugger.

I hope this helps,
Gus

On 05/21/2014 03:20 PM, Ben Lash wrote:

I used a different build of netcdf 4.1.3, and the code seems to run now.
I have a totally different, non-mpi related error in part of it, but
there's no way for the list to help, I mostly just wanted to report that
this particular problem seems to be solved for the record. It doesn't
seem to fail quite as gracefully anymore, but I'm still getting enough
of the error messages to know what's going on.

MPI_ABORT was invoked on rank 52 in communicator MPI_COMM_WORLD
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--
[cn-099.davinci.rice.edu:26185 ]
[[63355,0],4]-[[63355,1],52] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185 ]
[[63355,0],4]-[[63355,1],54] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185 ]
[[63355,0],4]-[[63355,1],55] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-158.davinci.rice.edu:12459 ]
[[63355,0],1]-[[63355,1],15] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-158.davinci.rice.edu:12459 ]
[[63355,0],1]-[[63355,1],17] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185 ]
[[63355,0],4]-[[63355,1],56] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185 ]
[[63355,0],4]-[[63355,1],53] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185 ]
[[63355,0],4]-[[63355,1],51] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185 ]
[[63355,0],4]-[[63355,1],57] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
forrtl: error (78): process killed (SIGTERM)
Image  PCRoutineLineSource



[cn-158.davinci.rice.edu:12459 ]
[[63355,0],1]-[[63355,1],16] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
--
mpiexec has exited due to process rank 49 with PID 26187 on
node cn-099 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
--
forrtl: error (78): process killed (SIGTERM)
Image  PCRoutineLineSource
CCTM_V5g_Linux2_x  007FEA29  Unknown   Unknown  Unknown
CCTM_V5g_Linux2_x  007FD3A0  Unknown   Unknown  Unknown
CCTM_V5g_Linux2_x  007BA9A2  Unknown   Unknown  Unknown
CCTM_V5g_Linux2_x  00759288  Unknown   Unknown  Unknown

...



On Wed, May 21, 2014 at 2:08 PM, Gus Correa > wrote:

Hi Ben

My guess is that your 

Re: [OMPI users] openmpi configuration error?

2014-05-21 Thread Ben Lash
I used a different build of netcdf 4.1.3, and the code seems to run now. I
have a totally different, non-mpi related error in part of it, but there's
no way for the list to help, I mostly just wanted to report that this
particular problem seems to be solved for the record. It doesn't seem to
fail quite as gracefully anymore, but I'm still getting enough of the error
messages to know what's going on.

MPI_ABORT was invoked on rank 52 in communicator MPI_COMM_WORLD
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--
[cn-099.davinci.rice.edu:26185] [[63355,0],4]-[[63355,1],52]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185] [[63355,0],4]-[[63355,1],54]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185] [[63355,0],4]-[[63355,1],55]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[cn-158.davinci.rice.edu:12459] [[63355,0],1]-[[63355,1],15]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[cn-158.davinci.rice.edu:12459] [[63355,0],1]-[[63355,1],17]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185] [[63355,0],4]-[[63355,1],56]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185] [[63355,0],4]-[[63355,1],53]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185] [[63355,0],4]-[[63355,1],51]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[cn-099.davinci.rice.edu:26185] [[63355,0],4]-[[63355,1],57]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
forrtl: error (78): process killed (SIGTERM)
Image  PCRoutineLineSource



[cn-158.davinci.rice.edu:12459] [[63355,0],1]-[[63355,1],16]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
--
mpiexec has exited due to process rank 49 with PID 26187 on
node cn-099 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
--
forrtl: error (78): process killed (SIGTERM)
Image  PCRoutineLineSource
CCTM_V5g_Linux2_x  007FEA29  Unknown   Unknown  Unknown
CCTM_V5g_Linux2_x  007FD3A0  Unknown   Unknown  Unknown
CCTM_V5g_Linux2_x  007BA9A2  Unknown   Unknown  Unknown
CCTM_V5g_Linux2_x  00759288  Unknown   Unknown  Unknown

...



On Wed, May 21, 2014 at 2:08 PM, Gus Correa  wrote:

> Hi Ben
>
> My guess is that your sys admins may have built NetCDF
> with parallel support, pnetcdf, and the latter with OpenMPI,
> which could explain the dependency.
> Ideally, they should have built it again with the latest default OpenMPI
> (1.6.5?)
>
> Check if there is a NetCDF module that either doesn't have any
> dependence on MPI, or depends on the current Open MPI that
> you are using (1.6.5 I think).
> A  'module show netcdf/bla/bla'
> on the available netcdf modules will tell.
>
> If the application code is old as you said, it probably doesn't use
> any pnetcdf. In addition, it should work even with NetCDF 3.X.Y,
> which probably doesn't have any pnetcdf built in.
> Newer netcdf (4.Z.W > 4.1.3) should also work, and in this case
> pick one that requires the default OpenMPI, if available.
>
> Just out of curiosity, besides netcdf/4.1.3, did you load openmpi/1.6.5?
> Somehow the openmpi/1.6.5 should have been marked
> to conflict with 1.4.4.
> Is it?
> Anyway, you may want to do a 'which mpiexec' to see which one is
> taking precedence in your environment (1.6.5 or 1.4.4)
> Probably 1.6.5.
>
> Does the code work now, or does it continue to fail?
>
>
> I hope this helps,
> Gus Correa
>
>
>
> On 05/21/2014 02:36 PM, Ben Lash wrote:
>
>> Yep, there is is.
>>
>> [bl10@login2 USlogsminus10]$ module show netcdf/4.1.3
>> ---
>> /opt/apps/modulefiles/netcdf/4.1.3:
>>
>> module   load 

Re: [OMPI users] openmpi configuration error?

2014-05-21 Thread Gus Correa

Hi Ben

My guess is that your sys admins may have built NetCDF
with parallel support, pnetcdf, and the latter with OpenMPI,
which could explain the dependency.
Ideally, they should have built it again with the latest default OpenMPI 
(1.6.5?)


Check if there is a NetCDF module that either doesn't have any
dependence on MPI, or depends on the current Open MPI that
you are using (1.6.5 I think).
A  'module show netcdf/bla/bla'
on the available netcdf modules will tell.

If the application code is old as you said, it probably doesn't use
any pnetcdf. In addition, it should work even with NetCDF 3.X.Y,
which probably doesn't have any pnetcdf built in.
Newer netcdf (4.Z.W > 4.1.3) should also work, and in this case
pick one that requires the default OpenMPI, if available.

Just out of curiosity, besides netcdf/4.1.3, did you load openmpi/1.6.5?
Somehow the openmpi/1.6.5 should have been marked
to conflict with 1.4.4.
Is it?
Anyway, you may want to do a 'which mpiexec' to see which one is
taking precedence in your environment (1.6.5 or 1.4.4)
Probably 1.6.5.

Does the code work now, or does it continue to fail?

I hope this helps,
Gus Correa



On 05/21/2014 02:36 PM, Ben Lash wrote:

Yep, there is is.

[bl10@login2 USlogsminus10]$ module show netcdf/4.1.3
---
/opt/apps/modulefiles/netcdf/4.1.3:

module   load openmpi/1.4.4-intel
prepend-path PATH
/opt/apps/netcdf/4.1.3/bin:/opt/apps/netcdf/4.1.3/deps/hdf5/1.8.7/bin
prepend-path LD_LIBRARY_PATH
/opt/apps/netcdf/4.1.3/lib:/opt/apps/netcdf/4.1.3/deps/hdf5/1.8.7/lib:/opt/apps/netcdf/4.1.3/deps/szip/2.1/lib

prepend-path MANPATH /opt/apps/netcdf/4.1.3/share/man
---



On Wed, May 21, 2014 at 1:34 PM, Douglas L Reeder > wrote:

Ben,

The netcdf/4.1.3 module maybe loading the openmpi/1.4.4 module. Can
you do module show the netcdf module file to to see if there is a
module load openmpi command.

Doug Reeder

On May 21, 2014, at 12:23 PM, Ben Lash > wrote:


I just wanted to follow up for anyone else who got a similar
problem - module load netcdf/4.1.3 *also* loaded openmpi/1.4.4.
 Don't ask me why. My code doesn't seem to fail as
gracefully but otherwise works now. Thanks.


On Sat, May 17, 2014 at 6:02 AM, Jeff Squyres (jsquyres)
> wrote:

Ditto -- Lmod looks pretty cool.  Thanks for the heads up.


On May 16, 2014, at 6:23 PM, Douglas L Reeder
> wrote:

> Maxime,
>
> I was unaware of Lmod. Thanks for bringing it to my attention.
>
> Doug
> On May 16, 2014, at 4:07 PM, Maxime Boissonneault
> wrote:
>
>> Instead of using the outdated and not maintained Module
environment, why not use Lmod :
https://www.tacc.utexas.edu/tacc-projects/lmod
>>
>> It is a drop-in replacement for Module environment that
supports all of their features and much, much more, such as :
>> - module hierarchies
>> - module properties and color highlighting (we use it to
higlight bioinformatic modules or tools for example)
>> - module caching (very useful for a parallel filesystem
with tons of modules)
>> - path priorities (useful to make sure personal modules
take precendence over system modules)
>> - export module tree to json
>>
>> It works like a charm, understand both TCL and Lua modules
and is actively developped and debugged. There are litteraly
new features every month or so. If it does not do what you
want, odds are that the developper will add it shortly (I've
had it happen).
>>
>> Maxime
>>
>> Le 2014-05-16 17:58, Douglas L Reeder a écrit :
>>> Ben,
>>>
>>> You might want to use module (source forge) to manage
paths to different mpi implementations. It is fairly easy to
set up and very robust for this type of problem. You would
remove contentious application paths from you standard PATH
and then use module to switch them in and out as needed.
>>>
>>> Doug Reeder
>>> On May 16, 2014, at 3:39 PM, Ben Lash > wrote:
>>>
 My cluster has just upgraded to a new version of MPI, and
I'm using an old one. It seems that I'm having trouble
compiling due to the compiler wrapper file moving (full error
here: http://pastebin.com/EmwRvCd9)
 "Cannot open 

Re: [OMPI users] openmpi configuration error?

2014-05-21 Thread Ben Lash
Yep, there is is.

[bl10@login2 USlogsminus10]$ module show netcdf/4.1.3
---
/opt/apps/modulefiles/netcdf/4.1.3:

module   load openmpi/1.4.4-intel
prepend-path PATH
/opt/apps/netcdf/4.1.3/bin:/opt/apps/netcdf/4.1.3/deps/hdf5/1.8.7/bin
prepend-path LD_LIBRARY_PATH
/opt/apps/netcdf/4.1.3/lib:/opt/apps/netcdf/4.1.3/deps/hdf5/1.8.7/lib:/opt/apps/netcdf/4.1.3/deps/szip/2.1/lib
prepend-path MANPATH /opt/apps/netcdf/4.1.3/share/man
---



On Wed, May 21, 2014 at 1:34 PM, Douglas L Reeder wrote:

> Ben,
>
> The netcdf/4.1.3 module maybe loading the openmpi/1.4.4 module. Can you do
> module show the netcdf module file to to see if there is a module load
> openmpi command.
>
> Doug Reeder
>
> On May 21, 2014, at 12:23 PM, Ben Lash  wrote:
>
> I just wanted to follow up for anyone else who got a similar problem -
> module load netcdf/4.1.3 *also* loaded openmpi/1.4.4. Don't ask me why.
> My code doesn't seem to fail as gracefully but otherwise works now. Thanks.
>
>
> On Sat, May 17, 2014 at 6:02 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> Ditto -- Lmod looks pretty cool.  Thanks for the heads up.
>>
>>
>> On May 16, 2014, at 6:23 PM, Douglas L Reeder 
>> wrote:
>>
>> > Maxime,
>> >
>> > I was unaware of Lmod. Thanks for bringing it to my attention.
>> >
>> > Doug
>> > On May 16, 2014, at 4:07 PM, Maxime Boissonneault <
>> maxime.boissonnea...@calculquebec.ca> wrote:
>> >
>> >> Instead of using the outdated and not maintained Module environment,
>> why not use Lmod : https://www.tacc.utexas.edu/tacc-projects/lmod
>> >>
>> >> It is a drop-in replacement for Module environment that supports all
>> of their features and much, much more, such as :
>> >> - module hierarchies
>> >> - module properties and color highlighting (we use it to higlight
>> bioinformatic modules or tools for example)
>> >> - module caching (very useful for a parallel filesystem with tons of
>> modules)
>> >> - path priorities (useful to make sure personal modules take
>> precendence over system modules)
>> >> - export module tree to json
>> >>
>> >> It works like a charm, understand both TCL and Lua modules and is
>> actively developped and debugged. There are litteraly new features every
>> month or so. If it does not do what you want, odds are that the developper
>> will add it shortly (I've had it happen).
>> >>
>> >> Maxime
>> >>
>> >> Le 2014-05-16 17:58, Douglas L Reeder a écrit :
>> >>> Ben,
>> >>>
>> >>> You might want to use module (source forge) to manage paths to
>> different mpi implementations. It is fairly easy to set up and very robust
>> for this type of problem. You would remove contentious application paths
>> from you standard PATH and then use module to switch them in and out as
>> needed.
>> >>>
>> >>> Doug Reeder
>> >>> On May 16, 2014, at 3:39 PM, Ben Lash  wrote:
>> >>>
>>  My cluster has just upgraded to a new version of MPI, and I'm using
>> an old one. It seems that I'm having trouble compiling due to the compiler
>> wrapper file moving (full error here: http://pastebin.com/EmwRvCd9)
>>  "Cannot open configuration file
>> /opt/apps/openmpi/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt"
>> 
>>  I've found the file on the cluster at
>>  /opt/apps/openmpi/retired/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt
>>  How do I tell the old mpi wrapper where this file is?
>>  I've already corrected one link to mpich ->
>> /opt/apps/openmpi/retired/1.4.4-intel/, which is in the software I'm trying
>> to recompile's lib folder (/home/bl10/CMAQv5.0.1/lib/x86_64/ifort). Thanks
>> for any ideas. I also tried changing $pkgdatadir based on what I read here:
>> 
>> http://www.open-mpi.org/faq/?category=mpi-apps#default-wrapper-compiler-flags
>> 
>>  Thanks.
>> 
>>  --Ben L
>>  ___
>>  users mailing list
>>  us...@open-mpi.org
>>  http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>>
>> >>>
>> >>>
>> >>> ___
>> >>> users mailing list
>> >>>
>> >>> us...@open-mpi.org
>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>
>> >>
>> >> --
>> >> -
>> >> Maxime Boissonneault
>> >> Analyste de calcul - Calcul Québec, Université Laval
>> >> Ph. D. en physique
>> >>
>> >> ___
>> >> users mailing list
>> >> us...@open-mpi.org
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> 

Re: [OMPI users] openmpi configuration error?

2014-05-21 Thread Douglas L Reeder
Ben,

The netcdf/4.1.3 module maybe loading the openmpi/1.4.4 module. Can you do 
module show the netcdf module file to to see if there is a module load openmpi 
command.

Doug Reeder 
On May 21, 2014, at 12:23 PM, Ben Lash  wrote:

> I just wanted to follow up for anyone else who got a similar problem - module 
> load netcdf/4.1.3 *also* loaded openmpi/1.4.4. Don't ask me why. My code 
> doesn't seem to fail as gracefully but otherwise works now. Thanks. 
> 
> 
> On Sat, May 17, 2014 at 6:02 AM, Jeff Squyres (jsquyres)  
> wrote:
> Ditto -- Lmod looks pretty cool.  Thanks for the heads up.
> 
> 
> On May 16, 2014, at 6:23 PM, Douglas L Reeder  wrote:
> 
> > Maxime,
> >
> > I was unaware of Lmod. Thanks for bringing it to my attention.
> >
> > Doug
> > On May 16, 2014, at 4:07 PM, Maxime Boissonneault 
> >  wrote:
> >
> >> Instead of using the outdated and not maintained Module environment, why 
> >> not use Lmod : https://www.tacc.utexas.edu/tacc-projects/lmod
> >>
> >> It is a drop-in replacement for Module environment that supports all of 
> >> their features and much, much more, such as :
> >> - module hierarchies
> >> - module properties and color highlighting (we use it to higlight 
> >> bioinformatic modules or tools for example)
> >> - module caching (very useful for a parallel filesystem with tons of 
> >> modules)
> >> - path priorities (useful to make sure personal modules take precendence 
> >> over system modules)
> >> - export module tree to json
> >>
> >> It works like a charm, understand both TCL and Lua modules and is actively 
> >> developped and debugged. There are litteraly new features every month or 
> >> so. If it does not do what you want, odds are that the developper will add 
> >> it shortly (I've had it happen).
> >>
> >> Maxime
> >>
> >> Le 2014-05-16 17:58, Douglas L Reeder a écrit :
> >>> Ben,
> >>>
> >>> You might want to use module (source forge) to manage paths to different 
> >>> mpi implementations. It is fairly easy to set up and very robust for this 
> >>> type of problem. You would remove contentious application paths from you 
> >>> standard PATH and then use module to switch them in and out as needed.
> >>>
> >>> Doug Reeder
> >>> On May 16, 2014, at 3:39 PM, Ben Lash  wrote:
> >>>
>  My cluster has just upgraded to a new version of MPI, and I'm using an 
>  old one. It seems that I'm having trouble compiling due to the compiler 
>  wrapper file moving (full error here: http://pastebin.com/EmwRvCd9)
>  "Cannot open configuration file 
>  /opt/apps/openmpi/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt"
> 
>  I've found the file on the cluster at  
>  /opt/apps/openmpi/retired/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt
>  How do I tell the old mpi wrapper where this file is?
>  I've already corrected one link to mpich -> 
>  /opt/apps/openmpi/retired/1.4.4-intel/, which is in the software I'm 
>  trying to recompile's lib folder 
>  (/home/bl10/CMAQv5.0.1/lib/x86_64/ifort). Thanks for any ideas. I also 
>  tried changing $pkgdatadir based on what I read here:
>  http://www.open-mpi.org/faq/?category=mpi-apps#default-wrapper-compiler-flags
> 
>  Thanks.
> 
>  --Ben L
>  ___
>  users mailing list
>  us...@open-mpi.org
>  http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>>
> >>>
> >>> ___
> >>> users mailing list
> >>>
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >> --
> >> -
> >> Maxime Boissonneault
> >> Analyste de calcul - Calcul Québec, Université Laval
> >> Ph. D. en physique
> >>
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> 
> -- 
> --Ben L
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] openmpi configuration error?

2014-05-21 Thread Ben Lash
I just wanted to follow up for anyone else who got a similar problem -
module load netcdf/4.1.3 *also* loaded openmpi/1.4.4. Don't ask me why. My
code doesn't seem to fail as gracefully but otherwise works now. Thanks.


On Sat, May 17, 2014 at 6:02 AM, Jeff Squyres (jsquyres)  wrote:

> Ditto -- Lmod looks pretty cool.  Thanks for the heads up.
>
>
> On May 16, 2014, at 6:23 PM, Douglas L Reeder 
> wrote:
>
> > Maxime,
> >
> > I was unaware of Lmod. Thanks for bringing it to my attention.
> >
> > Doug
> > On May 16, 2014, at 4:07 PM, Maxime Boissonneault <
> maxime.boissonnea...@calculquebec.ca> wrote:
> >
> >> Instead of using the outdated and not maintained Module environment,
> why not use Lmod : https://www.tacc.utexas.edu/tacc-projects/lmod
> >>
> >> It is a drop-in replacement for Module environment that supports all of
> their features and much, much more, such as :
> >> - module hierarchies
> >> - module properties and color highlighting (we use it to higlight
> bioinformatic modules or tools for example)
> >> - module caching (very useful for a parallel filesystem with tons of
> modules)
> >> - path priorities (useful to make sure personal modules take
> precendence over system modules)
> >> - export module tree to json
> >>
> >> It works like a charm, understand both TCL and Lua modules and is
> actively developped and debugged. There are litteraly new features every
> month or so. If it does not do what you want, odds are that the developper
> will add it shortly (I've had it happen).
> >>
> >> Maxime
> >>
> >> Le 2014-05-16 17:58, Douglas L Reeder a écrit :
> >>> Ben,
> >>>
> >>> You might want to use module (source forge) to manage paths to
> different mpi implementations. It is fairly easy to set up and very robust
> for this type of problem. You would remove contentious application paths
> from you standard PATH and then use module to switch them in and out as
> needed.
> >>>
> >>> Doug Reeder
> >>> On May 16, 2014, at 3:39 PM, Ben Lash  wrote:
> >>>
>  My cluster has just upgraded to a new version of MPI, and I'm using
> an old one. It seems that I'm having trouble compiling due to the compiler
> wrapper file moving (full error here: http://pastebin.com/EmwRvCd9)
>  "Cannot open configuration file
> /opt/apps/openmpi/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt"
> 
>  I've found the file on the cluster at
>  /opt/apps/openmpi/retired/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt
>  How do I tell the old mpi wrapper where this file is?
>  I've already corrected one link to mpich ->
> /opt/apps/openmpi/retired/1.4.4-intel/, which is in the software I'm trying
> to recompile's lib folder (/home/bl10/CMAQv5.0.1/lib/x86_64/ifort). Thanks
> for any ideas. I also tried changing $pkgdatadir based on what I read here:
> 
> http://www.open-mpi.org/faq/?category=mpi-apps#default-wrapper-compiler-flags
> 
>  Thanks.
> 
>  --Ben L
>  ___
>  users mailing list
>  us...@open-mpi.org
>  http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>>
> >>>
> >>> ___
> >>> users mailing list
> >>>
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >> --
> >> -
> >> Maxime Boissonneault
> >> Analyste de calcul - Calcul Québec, Université Laval
> >> Ph. D. en physique
> >>
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>


-- 
--Ben L


Re: [OMPI users] openmpi configuration error?

2014-05-17 Thread Jeff Squyres (jsquyres)
Ditto -- Lmod looks pretty cool.  Thanks for the heads up.


On May 16, 2014, at 6:23 PM, Douglas L Reeder  wrote:

> Maxime,
> 
> I was unaware of Lmod. Thanks for bringing it to my attention.
> 
> Doug
> On May 16, 2014, at 4:07 PM, Maxime Boissonneault 
>  wrote:
> 
>> Instead of using the outdated and not maintained Module environment, why not 
>> use Lmod : https://www.tacc.utexas.edu/tacc-projects/lmod
>> 
>> It is a drop-in replacement for Module environment that supports all of 
>> their features and much, much more, such as : 
>> - module hierarchies
>> - module properties and color highlighting (we use it to higlight 
>> bioinformatic modules or tools for example)
>> - module caching (very useful for a parallel filesystem with tons of modules)
>> - path priorities (useful to make sure personal modules take precendence 
>> over system modules)
>> - export module tree to json
>> 
>> It works like a charm, understand both TCL and Lua modules and is actively 
>> developped and debugged. There are litteraly new features every month or so. 
>> If it does not do what you want, odds are that the developper will add it 
>> shortly (I've had it happen). 
>> 
>> Maxime
>> 
>> Le 2014-05-16 17:58, Douglas L Reeder a écrit :
>>> Ben,
>>> 
>>> You might want to use module (source forge) to manage paths to different 
>>> mpi implementations. It is fairly easy to set up and very robust for this 
>>> type of problem. You would remove contentious application paths from you 
>>> standard PATH and then use module to switch them in and out as needed.
>>> 
>>> Doug Reeder
>>> On May 16, 2014, at 3:39 PM, Ben Lash  wrote:
>>> 
 My cluster has just upgraded to a new version of MPI, and I'm using an old 
 one. It seems that I'm having trouble compiling due to the compiler 
 wrapper file moving (full error here: http://pastebin.com/EmwRvCd9)
 "Cannot open configuration file 
 /opt/apps/openmpi/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt"
 
 I've found the file on the cluster at  
 /opt/apps/openmpi/retired/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt
 How do I tell the old mpi wrapper where this file is?
 I've already corrected one link to mpich -> 
 /opt/apps/openmpi/retired/1.4.4-intel/, which is in the software I'm 
 trying to recompile's lib folder (/home/bl10/CMAQv5.0.1/lib/x86_64/ifort). 
 Thanks for any ideas. I also tried changing $pkgdatadir based on what I 
 read here: 
 http://www.open-mpi.org/faq/?category=mpi-apps#default-wrapper-compiler-flags
 
 Thanks. 
 
 --Ben L
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> 
>>> ___
>>> users mailing list
>>> 
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> -- 
>> -
>> Maxime Boissonneault
>> Analyste de calcul - Calcul Québec, Université Laval
>> Ph. D. en physique
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] openmpi configuration error?

2014-05-16 Thread Gus Correa

On 05/16/2014 07:09 PM, Ben Lash wrote:

The $PATH and $LD_LIBRARY_PATH seem to be correct, as does module list.
I will try to hear back from our particular cluster people, otherwise I
will try using the latest version. This is old government software,
significant parts are written in fortran77 for example, typically
upgrading to a new version breaks it. It was looking for mpich, hence
the link, but a long time ago I gave it openmpi instead as recommended
and that worked, so I suppose it's less persnickety about the mpi
version than some other things. The most current version installed
is openmpi/1.6.5-intel(default). Thanks again.



We have code here that has been recompiled (some with modifications, 
some not) with OpenMPI since 1.2.8 with no problems.
MPI is a standard, both OpenMPI and MPICH follow the standard (except 
perhaps for very dusty corners or latest greatest MPI 3 features).

If your code compiled with 1.4.4, it should (better) do with 1.6.5.
Fortran77 shouldn't be an issue.

I agree, the PATH and LD_LIBRARY_PATH point to the "retired" directory.
Many things may have happened, though, say, the "retired" directory may 
not be complete, or may not have been installed on all cluster nodes,
or (if not really re-installed) probably points to the original 
(pre-retirement) directories that no longer exist.

Rather than sorting this out, I think you have a better shot using
Open MPI 1.6.5.
Just load the module and try to recompile the code.
(Probably just
module swap openmpi/1.4.4-intel openmpi/1.6.5-intel)

You may need to tweak with the Makefile, if it hardwires
the MPI wrappers/binary location, or the library and include paths.
Some do, some don't.

Gus Correa



[bl10@login2 ~]$ echo $PATH
/home/bl10/rlib/deps/bin:/opt/apps/netcdf/4.1.3/bin:/opt/apps/netcdf/4.1.3/deps/hdf5/1.8.7/bin:/opt/apps/openmpi/retired/1.4.4-intel/bin:/opt/apps/pgi/11.7/linux86-64/11.7/bin:/opt/apps/python3/3.2.1/bin:/opt/apps/intel/2013.1.039/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/opt/apps/moab/current/bin:/projects/dsc1/apps/cmaq/deps/ioapi-kiran/3.1/bin:/home/bl10/bin

[bl10@login2 ~]$ echo $LD_LIBRARY_PATH
/home/bl10/rlib/deps/lib:/projects/dsc1/apps/cmaq/deps/netcdf/4.1.3-intel/lib:/opt/apps/netcdf/4.1.3/lib:/opt/apps/netcdf/4.1.3/deps/hdf5/1.8.7/lib:/opt/apps/netcdf/4.1.3/deps/szip/2.1/lib:/opt/apps/openmpi/retired/1.4.4-intel/lib:/opt/apps/intel/2011.0.013/mkl/lib/intel64:/opt/apps/intel/2013.1.039/mkl/lib/intel64:/opt/apps/intel/2013.1.039/lib/intel64

[bl10@login2 ~]$ module list
Currently Loaded Modulefiles:
   1) intel/2013.1.039  2) python3/3.2.1 3) pgi/11.7
  4) openmpi/1.4.4-intel   5) netcdf/4.1.3
[bl10@login2 ~]$




On Fri, May 16, 2014 at 5:46 PM, Gus Correa > wrote:

On 05/16/2014 06:26 PM, Ben Lash wrote:

I'm not sure I have the ability to implement a different module
management system, I am using a university cluster. We have a module
system, and I am beginning to suspect that maybe it wasn't updated
during the upgrade. I have
module list
..other modulesopenmpi/1.4.4
Perhaps this is still trying to go to the old source location?
How would
I check? Is there an easy way around it if it is wrong? Thanks
again!


Most likely the module openmpi/1.4.4 is out of date.
You can check it with
echo $PATH
If it doesn't point to the "retired" directory, then it is probably
out of date.

Why don't you try to recompile the code
with the current Open MPI installed in the cluster?

module avail
will show everyting, and you can pick the latest, load it,
and try to recompile the program with that.

Gus Correa


On Fri, May 16, 2014 at 5:07 PM, Maxime Boissonneault

>> wrote:

 Instead of using the outdated and not maintained Module
environment,
 why not use Lmod :
https://www.tacc.utexas.edu/__tacc-projects/lmod


 It is a drop-in replacement for Module environment that
supports all
 of their features and much, much more, such as :
 - module hierarchies
 - module properties and color highlighting (we use it to
higlight
 bioinformatic modules or tools for example)
 - module caching (very useful for a parallel filesystem
with tons of
 modules)
 - path priorities (useful to make sure personal modules take
 precendence over system modules)
 - export module tree to json

 It works like a charm, understand both TCL and 

Re: [OMPI users] openmpi configuration error?

2014-05-16 Thread Ben Lash
The $PATH and $LD_LIBRARY_PATH seem to be correct, as does module list. I
will try to hear back from our particular cluster people, otherwise I will
try using the latest version. This is old government software, significant
parts are written in fortran77 for example, typically upgrading to a new
version breaks it. It was looking for mpich, hence the link, but a long
time ago I gave it openmpi instead as recommended and that worked, so I
suppose it's less persnickety about the mpi version than some other things.
The most current version installed is openmpi/1.6.5-intel(default). Thanks
again.

[bl10@login2 ~]$ echo $PATH
/home/bl10/rlib/deps/bin:/opt/apps/netcdf/4.1.3/bin:/opt/apps/netcdf/4.1.3/deps/hdf5/1.8.7/bin:/opt/apps/openmpi/retired/1.4.4-intel/bin:/opt/apps/pgi/11.7/linux86-64/11.7/bin:/opt/apps/python3/3.2.1/bin:/opt/apps/intel/2013.1.039/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/opt/apps/moab/current/bin:/projects/dsc1/apps/cmaq/deps/ioapi-kiran/3.1/bin:/home/bl10/bin

[bl10@login2 ~]$ echo $LD_LIBRARY_PATH
/home/bl10/rlib/deps/lib:/projects/dsc1/apps/cmaq/deps/netcdf/4.1.3-intel/lib:/opt/apps/netcdf/4.1.3/lib:/opt/apps/netcdf/4.1.3/deps/hdf5/1.8.7/lib:/opt/apps/netcdf/4.1.3/deps/szip/2.1/lib:/opt/apps/openmpi/retired/1.4.4-intel/lib:/opt/apps/intel/2011.0.013/mkl/lib/intel64:/opt/apps/intel/2013.1.039/mkl/lib/intel64:/opt/apps/intel/2013.1.039/lib/intel64

[bl10@login2 ~]$ module list
Currently Loaded Modulefiles:
  1) intel/2013.1.039  2) python3/3.2.1 3) pgi/11.7
 4) openmpi/1.4.4-intel   5) netcdf/4.1.3
[bl10@login2 ~]$




On Fri, May 16, 2014 at 5:46 PM, Gus Correa  wrote:

> On 05/16/2014 06:26 PM, Ben Lash wrote:
>
>> I'm not sure I have the ability to implement a different module
>> management system, I am using a university cluster. We have a module
>> system, and I am beginning to suspect that maybe it wasn't updated
>> during the upgrade. I have
>> module list
>> ..other modulesopenmpi/1.4.4
>> Perhaps this is still trying to go to the old source location? How would
>> I check? Is there an easy way around it if it is wrong? Thanks again!
>>
>>
> Most likely the module openmpi/1.4.4 is out of date.
> You can check it with
> echo $PATH
> If it doesn't point to the "retired" directory, then it is probably out of
> date.
>
> Why don't you try to recompile the code
> with the current Open MPI installed in the cluster?
>
> module avail
> will show everyting, and you can pick the latest, load it,
> and try to recompile the program with that.
>
> Gus Correa
>
>
>> On Fri, May 16, 2014 at 5:07 PM, Maxime Boissonneault
>> > > wrote:
>>
>> Instead of using the outdated and not maintained Module environment,
>> why not use Lmod : https://www.tacc.utexas.edu/tacc-projects/lmod
>>
>> It is a drop-in replacement for Module environment that supports all
>> of their features and much, much more, such as :
>> - module hierarchies
>> - module properties and color highlighting (we use it to higlight
>> bioinformatic modules or tools for example)
>> - module caching (very useful for a parallel filesystem with tons of
>> modules)
>> - path priorities (useful to make sure personal modules take
>> precendence over system modules)
>> - export module tree to json
>>
>> It works like a charm, understand both TCL and Lua modules and is
>> actively developped and debugged. There are litteraly new features
>> every month or so. If it does not do what you want, odds are that
>> the developper will add it shortly (I've had it happen).
>>
>> Maxime
>>
>> Le 2014-05-16 17:58, Douglas L Reeder a écrit :
>>
>>> Ben,
>>>
>>> You might want to use module (source forge) to manage paths to
>>> different mpi implementations. It is fairly easy to set up and
>>> very robust for this type of problem. You would remove contentious
>>> application paths from you standard PATH and then use module to
>>> switch them in and out as needed.
>>>
>>> Doug Reeder
>>> On May 16, 2014, at 3:39 PM, Ben Lash >> > wrote:
>>>
>>>  My cluster has just upgraded to a new version of MPI, and I'm
 using an old one. It seems that I'm having trouble compiling due
 to the compiler wrapper file moving (full error here:
 http://pastebin.com/EmwRvCd9)
 "Cannot open configuration file
 /opt/apps/openmpi/1.4.4-intel/share/openmpi/mpif90-wrapper-
 data.txt"

 I've found the file on the cluster at
  /opt/apps/openmpi/retired/1.4.4-intel/share/openmpi/mpif90-
 wrapper-data.txt
 How do I tell the old mpi wrapper where this file is?
 I've already corrected one link to mpich ->
 /opt/apps/openmpi/retired/1.4.4-intel/, which is in the software
 I'm trying 

Re: [OMPI users] openmpi configuration error?

2014-05-16 Thread Gus Correa

On 05/16/2014 06:26 PM, Ben Lash wrote:

I'm not sure I have the ability to implement a different module
management system, I am using a university cluster. We have a module
system, and I am beginning to suspect that maybe it wasn't updated
during the upgrade. I have
module list
..other modulesopenmpi/1.4.4
Perhaps this is still trying to go to the old source location? How would
I check? Is there an easy way around it if it is wrong? Thanks again!



Most likely the module openmpi/1.4.4 is out of date.
You can check it with
echo $PATH
If it doesn't point to the "retired" directory, then it is probably out 
of date.


Why don't you try to recompile the code
with the current Open MPI installed in the cluster?

module avail
will show everyting, and you can pick the latest, load it,
and try to recompile the program with that.

Gus Correa



On Fri, May 16, 2014 at 5:07 PM, Maxime Boissonneault
> wrote:

Instead of using the outdated and not maintained Module environment,
why not use Lmod : https://www.tacc.utexas.edu/tacc-projects/lmod

It is a drop-in replacement for Module environment that supports all
of their features and much, much more, such as :
- module hierarchies
- module properties and color highlighting (we use it to higlight
bioinformatic modules or tools for example)
- module caching (very useful for a parallel filesystem with tons of
modules)
- path priorities (useful to make sure personal modules take
precendence over system modules)
- export module tree to json

It works like a charm, understand both TCL and Lua modules and is
actively developped and debugged. There are litteraly new features
every month or so. If it does not do what you want, odds are that
the developper will add it shortly (I've had it happen).

Maxime

Le 2014-05-16 17:58, Douglas L Reeder a écrit :

Ben,

You might want to use module (source forge) to manage paths to
different mpi implementations. It is fairly easy to set up and
very robust for this type of problem. You would remove contentious
application paths from you standard PATH and then use module to
switch them in and out as needed.

Doug Reeder
On May 16, 2014, at 3:39 PM, Ben Lash > wrote:


My cluster has just upgraded to a new version of MPI, and I'm
using an old one. It seems that I'm having trouble compiling due
to the compiler wrapper file moving (full error here:
http://pastebin.com/EmwRvCd9)
"Cannot open configuration file
/opt/apps/openmpi/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt"

I've found the file on the cluster at
 /opt/apps/openmpi/retired/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt
How do I tell the old mpi wrapper where this file is?
I've already corrected one link to mpich ->
/opt/apps/openmpi/retired/1.4.4-intel/, which is in the software
I'm trying to recompile's lib folder
(/home/bl10/CMAQv5.0.1/lib/x86_64/ifort). Thanks for any ideas. I
also tried changing $pkgdatadir based on what I read here:

http://www.open-mpi.org/faq/?category=mpi-apps#default-wrapper-compiler-flags


Thanks.

--Ben L
___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org  
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique


___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
--Ben L


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] openmpi configuration error?

2014-05-16 Thread Gus Correa

Hi Ben

I guess you are not particularly interested on Modules or LMod either.
You probably don't administer this cluster,
but are just trying to have MPI working, right?


Are you trying to use Open MPI or MPICH?
Your email mentions both.
Let's assume it is Open MPI.

Is recompiling your code with the new version of Open MPI installed
in the cluster an option for you, or not?
If it is, that may be the simplest and best solution.

If the cluster has a system administrator, you may ask him about
the best way to set up your environment variables.

In any case, for your Open MPI to work you need your PATH and 
LD_LIBRARY_PATH to be correctly set.
If the system administator migrated OpenMPI 1.4.4 to a "retired" 
directory, you may need to adjust PATH and LD_LIBRARY_PATH

to point to the "retired" directories (if they are still functional).

echo $PATH
echo $LD_LIBRARY_PATH

may show what you have.

Does the cluster use modules, perhaps?
What do you get from

module list

?

As a workaround you're using bash, you could add these to .bashrc

export PATH=/opt/apps/openmpi/retired/1.4.4-intel/bin:$PATH
export 
LD_LIBRARY_PATH=/opt/apps/openmpi/retired/1.4.4-intel/lib:$LD_LIBRARY_PATH


if csh to .cshrc

setenv PATH /opt/apps/openmpi/retired/1.4.4-intel/bin:$PATH
setenv LD_LIBRARY_PATH 
/opt/apps/openmpi/retired/1.4.4-intel/lib:$LD_LIBRARY_PATH


I hope this helps,
Gus Correa



On 05/16/2014 05:39 PM, Ben Lash wrote:

My cluster has just upgraded to a new version of MPI, and I'm using an
old one. It seems that I'm having trouble compiling due to the compiler
wrapper file moving (full error here: http://pastebin.com/EmwRvCd9)
"Cannot open configuration file
/opt/apps/openmpi/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt"

I've found the file on the cluster at
  /opt/apps/openmpi/retired/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt
How do I tell the old mpi wrapper where this file is?
I've already corrected one link to mpich ->
/opt/apps/openmpi/retired/1.4.4-intel/, which is in the software I'm
trying to recompile's lib folder
(/home/bl10/CMAQv5.0.1/lib/x86_64/ifort). Thanks for any ideas. I also
tried changing $pkgdatadir based on what I read here:
http://www.open-mpi.org/faq/?category=mpi-apps#default-wrapper-compiler-flags

Thanks.

--Ben L


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] openmpi configuration error?

2014-05-16 Thread Ben Lash
I'm not sure I have the ability to implement a different module management
system, I am using a university cluster. We have a module system, and I am
beginning to suspect that maybe it wasn't updated during the upgrade. I have
module list
..other modulesopenmpi/1.4.4
Perhaps this is still trying to go to the old source location? How would I
check? Is there an easy way around it if it is wrong? Thanks again!


On Fri, May 16, 2014 at 5:07 PM, Maxime Boissonneault <
maxime.boissonnea...@calculquebec.ca> wrote:

>  Instead of using the outdated and not maintained Module environment, why
> not use Lmod : https://www.tacc.utexas.edu/tacc-projects/lmod
>
> It is a drop-in replacement for Module environment that supports all of
> their features and much, much more, such as :
> - module hierarchies
> - module properties and color highlighting (we use it to higlight
> bioinformatic modules or tools for example)
> - module caching (very useful for a parallel filesystem with tons of
> modules)
> - path priorities (useful to make sure personal modules take precendence
> over system modules)
> - export module tree to json
>
> It works like a charm, understand both TCL and Lua modules and is actively
> developped and debugged. There are litteraly new features every month or
> so. If it does not do what you want, odds are that the developper will add
> it shortly (I've had it happen).
>
> Maxime
>
> Le 2014-05-16 17:58, Douglas L Reeder a écrit :
>
> Ben,
>
>  You might want to use module (source forge) to manage paths to different
> mpi implementations. It is fairly easy to set up and very robust for this
> type of problem. You would remove contentious application paths from you
> standard PATH and then use module to switch them in and out as needed.
>
>  Doug Reeder
>  On May 16, 2014, at 3:39 PM, Ben Lash  wrote:
>
>  My cluster has just upgraded to a new version of MPI, and I'm using an
> old one. It seems that I'm having trouble compiling due to the compiler
> wrapper file moving (full error here: http://pastebin.com/EmwRvCd9)
> "Cannot open configuration file
> /opt/apps/openmpi/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt"
>
>  I've found the file on the cluster at
>  /opt/apps/openmpi/retired/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt
> How do I tell the old mpi wrapper where this file is?
> I've already corrected one link to mpich -> 
> /opt/apps/openmpi/retired/1.4.4-intel/,
> which is in the software I'm trying to recompile's lib folder
> (/home/bl10/CMAQv5.0.1/lib/x86_64/ifort). Thanks for any ideas. I also
> tried changing $pkgdatadir based on what I read here:
>
> http://www.open-mpi.org/faq/?category=mpi-apps#default-wrapper-compiler-flags
>
>  Thanks.
>
>  --Ben L
>  ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> ___
> users mailing 
> listusers@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> -
> Maxime Boissonneault
> Analyste de calcul - Calcul Québec, Université Laval
> Ph. D. en physique
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
--Ben L


Re: [OMPI users] openmpi configuration error?

2014-05-16 Thread Douglas L Reeder
Maxime,

I was unaware of Lmod. Thanks for bringing it to my attention.

Doug
On May 16, 2014, at 4:07 PM, Maxime Boissonneault 
 wrote:

> Instead of using the outdated and not maintained Module environment, why not 
> use Lmod : https://www.tacc.utexas.edu/tacc-projects/lmod
> 
> It is a drop-in replacement for Module environment that supports all of their 
> features and much, much more, such as : 
> - module hierarchies
> - module properties and color highlighting (we use it to higlight 
> bioinformatic modules or tools for example)
> - module caching (very useful for a parallel filesystem with tons of modules)
> - path priorities (useful to make sure personal modules take precendence over 
> system modules)
> - export module tree to json
> 
> It works like a charm, understand both TCL and Lua modules and is actively 
> developped and debugged. There are litteraly new features every month or so. 
> If it does not do what you want, odds are that the developper will add it 
> shortly (I've had it happen). 
> 
> Maxime
> 
> Le 2014-05-16 17:58, Douglas L Reeder a écrit :
>> Ben,
>> 
>> You might want to use module (source forge) to manage paths to different mpi 
>> implementations. It is fairly easy to set up and very robust for this type 
>> of problem. You would remove contentious application paths from you standard 
>> PATH and then use module to switch them in and out as needed.
>> 
>> Doug Reeder
>> On May 16, 2014, at 3:39 PM, Ben Lash  wrote:
>> 
>>> My cluster has just upgraded to a new version of MPI, and I'm using an old 
>>> one. It seems that I'm having trouble compiling due to the compiler wrapper 
>>> file moving (full error here: http://pastebin.com/EmwRvCd9)
>>> "Cannot open configuration file 
>>> /opt/apps/openmpi/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt"
>>> 
>>> I've found the file on the cluster at  
>>> /opt/apps/openmpi/retired/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt
>>> How do I tell the old mpi wrapper where this file is?
>>> I've already corrected one link to mpich -> 
>>> /opt/apps/openmpi/retired/1.4.4-intel/, which is in the software I'm trying 
>>> to recompile's lib folder (/home/bl10/CMAQv5.0.1/lib/x86_64/ifort). Thanks 
>>> for any ideas. I also tried changing $pkgdatadir based on what I read here: 
>>> http://www.open-mpi.org/faq/?category=mpi-apps#default-wrapper-compiler-flags
>>> 
>>> Thanks. 
>>> 
>>> --Ben L
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> -
> Maxime Boissonneault
> Analyste de calcul - Calcul Québec, Université Laval
> Ph. D. en physique
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] openmpi configuration error?

2014-05-16 Thread Maxime Boissonneault
Instead of using the outdated and not maintained Module environment, why 
not use Lmod : https://www.tacc.utexas.edu/tacc-projects/lmod


It is a drop-in replacement for Module environment that supports all of 
their features and much, much more, such as :

- module hierarchies
- module properties and color highlighting (we use it to higlight 
bioinformatic modules or tools for example)
- module caching (very useful for a parallel filesystem with tons of 
modules)
- path priorities (useful to make sure personal modules take precendence 
over system modules)

- export module tree to json

It works like a charm, understand both TCL and Lua modules and is 
actively developped and debugged. There are litteraly new features every 
month or so. If it does not do what you want, odds are that the 
developper will add it shortly (I've had it happen).


Maxime

Le 2014-05-16 17:58, Douglas L Reeder a écrit :

Ben,

You might want to use module (source forge) to manage paths to 
different mpi implementations. It is fairly easy to set up and very 
robust for this type of problem. You would remove contentious 
application paths from you standard PATH and then use module to switch 
them in and out as needed.


Doug Reeder
On May 16, 2014, at 3:39 PM, Ben Lash > wrote:


My cluster has just upgraded to a new version of MPI, and I'm using 
an old one. It seems that I'm having trouble compiling due to the 
compiler wrapper file moving (full error here: 
http://pastebin.com/EmwRvCd9)
"Cannot open configuration file 
/opt/apps/openmpi/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt"


I've found the file on the cluster at 
 /opt/apps/openmpi/retired/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt

How do I tell the old mpi wrapper where this file is?
I've already corrected one link to mpich -> 
/opt/apps/openmpi/retired/1.4.4-intel/, which is in the software I'm 
trying to recompile's lib folder 
(/home/bl10/CMAQv5.0.1/lib/x86_64/ifort). Thanks for any ideas. I 
also tried changing $pkgdatadir based on what I read here:
http://www.open-mpi.org/faq/?category=mpi-apps#default-wrapper-compiler-flags 



Thanks.

--Ben L
___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique



Re: [OMPI users] openmpi configuration error?

2014-05-16 Thread Douglas L Reeder
Ben,

You might want to use module (source forge) to manage paths to different mpi 
implementations. It is fairly easy to set up and very robust for this type of 
problem. You would remove contentious application paths from you standard PATH 
and then use module to switch them in and out as needed.

Doug Reeder
On May 16, 2014, at 3:39 PM, Ben Lash  wrote:

> My cluster has just upgraded to a new version of MPI, and I'm using an old 
> one. It seems that I'm having trouble compiling due to the compiler wrapper 
> file moving (full error here: http://pastebin.com/EmwRvCd9)
> "Cannot open configuration file 
> /opt/apps/openmpi/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt"
> 
> I've found the file on the cluster at  
> /opt/apps/openmpi/retired/1.4.4-intel/share/openmpi/mpif90-wrapper-data.txt
> How do I tell the old mpi wrapper where this file is?
> I've already corrected one link to mpich -> 
> /opt/apps/openmpi/retired/1.4.4-intel/, which is in the software I'm trying 
> to recompile's lib folder (/home/bl10/CMAQv5.0.1/lib/x86_64/ifort). Thanks 
> for any ideas. I also tried changing $pkgdatadir based on what I read here: 
> http://www.open-mpi.org/faq/?category=mpi-apps#default-wrapper-compiler-flags
> 
> Thanks. 
> 
> --Ben L
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users