[OMPI users] Exit code 65

2018-08-03 Thread William Lasher
Hi:

 

I am running an application (OpenFOAM) with 20 processors using Ubuntu
16.04 and occasionally mpirun exits with an exit code of 65.  I looked at
the documentation and it says:

 

MPI_T_ERR_PVAR_NO_STARTSTOP

  65  Variable cannot be started or stopped.

 

The mpi.h file on my machine has the same error code listed.  I have no
idea what this means.

 

This does not happen all the time, and if I restart the job it usually
runs fine to the end.

 

This is a new (to me) rebuilt computer, wondering if it indicates a
hardware problem.

 

Thanks,

 

Bill

 

-

William C. Lasher

Professor Emeritus of Mechanical Engineering

Penn State Behrend

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] openMPI and ifort debuggin flags, is it possible?

2018-08-03 Thread Gilles Gouaillardet
Diego,

Yes, that would be clearly an issue.

Cheers,

Gilles

On Friday, August 3, 2018, Diego Avesani  wrote:

> Dear Gilles, dear all,
>
> I do not remember.
> I use -r8 when I compile.
>
> What do you think?
> It could be a problem?
>
>
> Thanks a lot
>
> Diego
>
>
> On 27 July 2018 at 16:05, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
>> Diego,
>>
>> Did you build OpenMPI with FCFLAGS=-r8 ?
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On Friday, July 27, 2018, Diego Avesani  wrote:
>>
>>> Dear all,
>>>
>>> I am developing a code for hydrological applications. It is written in
>>> FORTRAN and I am using ifort combine with openMPI.
>>>
>>> In this moment, I am debugging my code due to the fact that I have some
>>> NaN errors. As a consequence, I have introduce in my Makefile some flags
>>> for the ifort compiler. In particular:
>>>
>>> -c -r8 -align *-CB -traceback -check all -check uninit -ftrapuv -debug
>>> all* -fpp
>>>
>>> However, this produce some unexpected errors/warning with mpirun.  This
>>> is the error\warning:
>>>
>>> Image  PCRoutineLine
>>>Source
>>> MPIHyperStrem  005AA3F0  Unknown   Unknown
>>>  Unknown
>>> MPIHyperStrem  00591A5C  mod_lathyp_mp_lat 219
>>>  LATHYP.f90
>>> MPIHyperStrem  005A0C2A  mod_optimizer_mp_ 279
>>>  OPTIMIZER.f90
>>> MPIHyperStrem  005986F2  mod_optimizer_mp_  34
>>>  OPTIMIZER.f90
>>> MPIHyperStrem  005A1F84  MAIN__114
>>>  MAIN.f90
>>> MPIHyperStrem  0040A46E  Unknown   Unknown
>>>  Unknown
>>> libc-2.23.so   7FEA758B8830  __libc_start_main Unknown
>>>  Unknown
>>> MPIHyperStrem  0040A369  Unknown   Unknown
>>>  Unknown
>>> forrtl: warning (406): fort: (1): In call to SHUFFLE, an array temporary
>>> was created for argument #1
>>>
>>>
>>>
>>> My questions is:
>>> It is possible to use ifort debugging flags with openMPI?
>>>
>>> thanks a lot
>>>
>>> Diego
>>>
>>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] local communicator and crash of the code

2018-08-03 Thread Nathan Hjelm via users
If your are trying to create a communicator containing all node local processes 
then use MPI_Comm_split_type. 

> On Aug 3, 2018, at 12:24 PM, Diego Avesani  wrote:
> 
> Deal all,
> probably I have found the error.
> Let's me check. Probably I have not properly set-up colors.
> 
> Thanks a lot,
> I hope that you have not lost too much time for me,
> I will let you know If that was the problem.
> 
> Thanks again
> 
> Diego
> 
> 
>> On 3 August 2018 at 19:57, Diego Avesani  wrote:
>> Dear R, Dear all,
>> 
>> I do not know. 
>> I have isolated the issues. It seem that I have some problem with:
>>   CALL 
>> MPI_COMM_SPLIT(MPI_COMM_WORLD,colorl,MPIworld%rank,MPI_LOCAL_COMM,MPIworld%iErr)
>>   CALL MPI_COMM_RANK(MPI_LOCAL_COMM, MPIlocal%rank,MPIlocal%iErr)
>>   CALL MPI_COMM_SIZE(MPI_LOCAL_COMM, MPIlocal%nCPU,MPIlocal%iErr) 
>> 
>> openMPI seems not able to create properly MPIlocal%rank.
>> 
>> what should be? a bug?
>> 
>> thanks again
>> 
>> Diego
>> 
>> 
>>> On 3 August 2018 at 19:47, Ralph H Castain  wrote:
>>> Those two command lines look exactly the same to me - what am I missing?
>>> 
>>> 
 On Aug 3, 2018, at 10:23 AM, Diego Avesani  wrote:
 
 Dear all,
 
 I am experiencing a strange error.
 
 In my code I use three group communications:
 MPI_COMM_WORLD
 MPI_MASTERS_COMM
 LOCAL_COMM
 
 which have in common some CPUs.
 
 when I run my code as 
  mpirun -np 4 --oversubscribe ./MPIHyperStrem
 
 I have no problem, while when I run it as
  
  mpirun -np 4 --oversubscribe ./MPIHyperStrem
 
 sometimes it crushes and sometimes not.
 
 It seems that all is linked to 
 CALL MPI_REDUCE(QTS(tstep,:), QTS(tstep,:), nNode, MPI_DOUBLE_PRECISION, 
 MPI_SUM, 0, MPI_LOCAL_COMM, iErr)
 
 which works with in local.
 
 What do you think? Can you please suggestion some debug test?
 Is a problem related to local communications?
 
 Thanks
 
 
 
 Diego
 
 ___
 users mailing list
 users@lists.open-mpi.org
 https://lists.open-mpi.org/mailman/listinfo/users
>>> 
>>> 
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] local communicator and crash of the code

2018-08-03 Thread Diego Avesani
Deal all,
probably I have found the error.
Let's me check. Probably I have not properly set-up colors.

Thanks a lot,
I hope that you have not lost too much time for me,
I will let you know If that was the problem.

Thanks again

Diego


On 3 August 2018 at 19:57, Diego Avesani  wrote:

> Dear R, Dear all,
>
> I do not know.
> I have isolated the issues. It seem that I have some problem with:
>   CALL MPI_COMM_SPLIT(MPI_COMM_WORLD,colorl,MPIworld%rank,MPI_
> LOCAL_COMM,MPIworld%iErr)
>   CALL MPI_COMM_RANK(MPI_LOCAL_COMM, MPIlocal%rank,MPIlocal%iErr)
>   CALL MPI_COMM_SIZE(MPI_LOCAL_COMM, MPIlocal%nCPU,MPIlocal%iErr)
>
> openMPI seems not able to create properly MPIlocal%rank.
>
> what should be? a bug?
>
> thanks again
>
> Diego
>
>
> On 3 August 2018 at 19:47, Ralph H Castain  wrote:
>
>> Those two command lines look exactly the same to me - what am I missing?
>>
>>
>> On Aug 3, 2018, at 10:23 AM, Diego Avesani 
>> wrote:
>>
>> Dear all,
>>
>> I am experiencing a strange error.
>>
>> In my code I use three group communications:
>> MPI_COMM_WORLD
>> MPI_MASTERS_COMM
>> LOCAL_COMM
>>
>> which have in common some CPUs.
>>
>> when I run my code as
>>  mpirun -np 4 --oversubscribe ./MPIHyperStrem
>>
>> I have no problem, while when I run it as
>>
>>  mpirun -np 4 --oversubscribe ./MPIHyperStrem
>>
>> sometimes it crushes and sometimes not.
>>
>> It seems that all is linked to
>> CALL MPI_REDUCE(QTS(tstep,:), QTS(tstep,:), nNode, MPI_DOUBLE_PRECISION,
>> MPI_SUM, 0, MPI_LOCAL_COMM, iErr)
>>
>> which works with in local.
>>
>> What do you think? Can you please suggestion some debug test?
>> Is a problem related to local communications?
>>
>> Thanks
>>
>>
>>
>> Diego
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] local communicator and crash of the code

2018-08-03 Thread Diego Avesani
Dear R, Dear all,

I do not know.
I have isolated the issues. It seem that I have some problem with:
  CALL
MPI_COMM_SPLIT(MPI_COMM_WORLD,colorl,MPIworld%rank,MPI_LOCAL_COMM,MPIworld%iErr)
  CALL MPI_COMM_RANK(MPI_LOCAL_COMM, MPIlocal%rank,MPIlocal%iErr)
  CALL MPI_COMM_SIZE(MPI_LOCAL_COMM, MPIlocal%nCPU,MPIlocal%iErr)

openMPI seems not able to create properly MPIlocal%rank.

what should be? a bug?

thanks again

Diego


On 3 August 2018 at 19:47, Ralph H Castain  wrote:

> Those two command lines look exactly the same to me - what am I missing?
>
>
> On Aug 3, 2018, at 10:23 AM, Diego Avesani 
> wrote:
>
> Dear all,
>
> I am experiencing a strange error.
>
> In my code I use three group communications:
> MPI_COMM_WORLD
> MPI_MASTERS_COMM
> LOCAL_COMM
>
> which have in common some CPUs.
>
> when I run my code as
>  mpirun -np 4 --oversubscribe ./MPIHyperStrem
>
> I have no problem, while when I run it as
>
>  mpirun -np 4 --oversubscribe ./MPIHyperStrem
>
> sometimes it crushes and sometimes not.
>
> It seems that all is linked to
> CALL MPI_REDUCE(QTS(tstep,:), QTS(tstep,:), nNode, MPI_DOUBLE_PRECISION,
> MPI_SUM, 0, MPI_LOCAL_COMM, iErr)
>
> which works with in local.
>
> What do you think? Can you please suggestion some debug test?
> Is a problem related to local communications?
>
> Thanks
>
>
>
> Diego
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] local communicator and crash of the code

2018-08-03 Thread Ralph H Castain
Those two command lines look exactly the same to me - what am I missing?


> On Aug 3, 2018, at 10:23 AM, Diego Avesani  wrote:
> 
> Dear all,
> 
> I am experiencing a strange error.
> 
> In my code I use three group communications:
> MPI_COMM_WORLD
> MPI_MASTERS_COMM
> LOCAL_COMM
> 
> which have in common some CPUs.
> 
> when I run my code as 
>  mpirun -np 4 --oversubscribe ./MPIHyperStrem
> 
> I have no problem, while when I run it as
>  
>  mpirun -np 4 --oversubscribe ./MPIHyperStrem
> 
> sometimes it crushes and sometimes not.
> 
> It seems that all is linked to 
> CALL MPI_REDUCE(QTS(tstep,:), QTS(tstep,:), nNode, MPI_DOUBLE_PRECISION, 
> MPI_SUM, 0, MPI_LOCAL_COMM, iErr)
> 
> which works with in local.
> 
> What do you think? Can you please suggestion some debug test?
> Is a problem related to local communications?
> 
> Thanks
> 
> 
> 
> Diego
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] local communicator and crash of the code

2018-08-03 Thread Diego Avesani
Dear all,

I am experiencing a strange error.

In my code I use three group communications:
MPI_COMM_WORLD
MPI_MASTERS_COMM
LOCAL_COMM

which have in common some CPUs.

when I run my code as
 mpirun -np 4 --oversubscribe ./MPIHyperStrem

I have no problem, while when I run it as

 mpirun -np 4 --oversubscribe ./MPIHyperStrem

sometimes it crushes and sometimes not.

It seems that all is linked to
CALL MPI_REDUCE(QTS(tstep,:), QTS(tstep,:), nNode, MPI_DOUBLE_PRECISION,
MPI_SUM, 0, MPI_LOCAL_COMM, iErr)

which works with in local.

What do you think? Can you please suggestion some debug test?
Is a problem related to local communications?

Thanks



Diego
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] openMPI and ifort debuggin flags, is it possible?

2018-08-03 Thread Diego Avesani
Dear Gilles, dear all,

I do not remember.
I use -r8 when I compile.

What do you think?
It could be a problem?


Thanks a lot

Diego


On 27 July 2018 at 16:05, Gilles Gouaillardet  wrote:

> Diego,
>
> Did you build OpenMPI with FCFLAGS=-r8 ?
>
> Cheers,
>
> Gilles
>
>
> On Friday, July 27, 2018, Diego Avesani  wrote:
>
>> Dear all,
>>
>> I am developing a code for hydrological applications. It is written in
>> FORTRAN and I am using ifort combine with openMPI.
>>
>> In this moment, I am debugging my code due to the fact that I have some
>> NaN errors. As a consequence, I have introduce in my Makefile some flags
>> for the ifort compiler. In particular:
>>
>> -c -r8 -align *-CB -traceback -check all -check uninit -ftrapuv -debug
>> all* -fpp
>>
>> However, this produce some unexpected errors/warning with mpirun.  This
>> is the error\warning:
>>
>> Image  PCRoutineLine
>>Source
>> MPIHyperStrem  005AA3F0  Unknown   Unknown
>>  Unknown
>> MPIHyperStrem  00591A5C  mod_lathyp_mp_lat 219
>>  LATHYP.f90
>> MPIHyperStrem  005A0C2A  mod_optimizer_mp_ 279
>>  OPTIMIZER.f90
>> MPIHyperStrem  005986F2  mod_optimizer_mp_  34
>>  OPTIMIZER.f90
>> MPIHyperStrem  005A1F84  MAIN__114
>>  MAIN.f90
>> MPIHyperStrem  0040A46E  Unknown   Unknown
>>  Unknown
>> libc-2.23.so   7FEA758B8830  __libc_start_main Unknown
>>  Unknown
>> MPIHyperStrem  0040A369  Unknown   Unknown
>>  Unknown
>> forrtl: warning (406): fort: (1): In call to SHUFFLE, an array temporary
>> was created for argument #1
>>
>>
>>
>> My questions is:
>> It is possible to use ifort debugging flags with openMPI?
>>
>> thanks a lot
>>
>> Diego
>>
>>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Settings oversubscribe as default?

2018-08-03 Thread Ralph H Castain
The equivalent MCA param is rmaps_base_oversubscribe=1. You can add 
OMPI_MCA_rmaps_base_oversubscribe to your environ, or set 
rmaps_base_oversubscribe in your default MCA param file.


> On Aug 3, 2018, at 1:24 AM, Florian Lindner  wrote:
> 
> Hello,
> 
> I can use --oversubscribe to enable oversubscribing. What is OpenMPI way to 
> set this as a default, e.g. through a config file option or an environment 
> variable?
> 
> Thanks,
> Florian
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Comm_connect: Data unpack would read past end of buffer

2018-08-03 Thread Ralph H Castain
The buffer being overrun isn’t anything to do with you - it’s an internal 
buffer used as part of creating the connections. It indicates a problem in OMPI.

The 1.10 series is out of the support window, but if you want to stick with it 
you should at least update to the last release in that series - believe that is 
1.10.7.

The OMPI v2.x series had problems that don’t support dynamics, so you should 
skip that one. If you want to come all the way forward, you should take the 
OMPI v3.x series.

Ralph


> On Aug 3, 2018, at 3:40 AM, Florian Lindner  wrote:
> 
> Hello,
> 
> I have this piece of code:
> 
> MPI_Comm icomm;
> INFO << "Accepting connection on " << portName;
> MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, &icomm);
> 
> and sometimes (like in 1 of 5 runs), I get:
> 
> [helium:33883] [[32673,1],0] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file dpm_orte.c at line 406
> [helium:33883] *** An error occurred in MPI_Comm_accept
> [helium:33883] *** reported by process [2141257729,0]
> [helium:33883] *** on communicator MPI_COMM_SELF
> [helium:33883] *** MPI_ERR_UNKNOWN: unknown error
> [helium:33883] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
> now abort,
> [helium:33883] ***and potentially your MPI job)
> [helium:33883] [0] 
> func:/usr/lib/libopen-pal.so.13(opal_backtrace_buffer+0x33) [0x7fc1ad0ac6e3]
> [helium:33883] [1] func:/usr/lib/libmpi.so.12(ompi_mpi_abort+0x365) 
> [0x7fc1af4955e5]
> [helium:33883] [2] 
> func:/usr/lib/libmpi.so.12(ompi_mpi_errors_are_fatal_comm_handler+0xe2) 
> [0x7fc1af487e72]
> [helium:33883] [3] func:/usr/lib/libmpi.so.12(ompi_errhandler_invoke+0x145) 
> [0x7fc1af4874b5]
> [helium:33883] [4] func:/usr/lib/libmpi.so.12(MPI_Comm_accept+0x262) 
> [0x7fc1af4a90e2]
> [helium:33883] [5] func:./mpiports() [0x41e43d]
> [helium:33883] [6] 
> func:/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fc1ad7a1830]
> [helium:33883] [7] func:./mpiports() [0x41b249]
> 
> 
> Before that I check for the length of portName
> 
>  DEBUG << "COMM ACCEPT portName.size() = " << portName.size();
>  DEBUG << "MPI_MAX_PORT_NAME = " << MPI_MAX_PORT_NAME;
> 
> which both return 1024.
> 
> I am completely puzzled, how I can get a buffer issue, except something 
> faulty with std::string portName.
> 
> Any clues?
> 
> Launch command: mpirun -n 4 -mca opal_abort_print_stack 1 
> OpenMPI 1.10.2 @ Ubuntu 16.
> 
> Thanks,
> Florian
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Comm_connect: Data unpack would read past end of buffer

2018-08-03 Thread Florian Lindner
Hello,

I have this piece of code:

MPI_Comm icomm;
INFO << "Accepting connection on " << portName;
MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, &icomm);

and sometimes (like in 1 of 5 runs), I get:

[helium:33883] [[32673,1],0] ORTE_ERROR_LOG: Data unpack would read past end of 
buffer in file dpm_orte.c at line 406
[helium:33883] *** An error occurred in MPI_Comm_accept
[helium:33883] *** reported by process [2141257729,0]
[helium:33883] *** on communicator MPI_COMM_SELF
[helium:33883] *** MPI_ERR_UNKNOWN: unknown error
[helium:33883] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
now abort,
[helium:33883] ***and potentially your MPI job)
[helium:33883] [0] func:/usr/lib/libopen-pal.so.13(opal_backtrace_buffer+0x33) 
[0x7fc1ad0ac6e3]
[helium:33883] [1] func:/usr/lib/libmpi.so.12(ompi_mpi_abort+0x365) 
[0x7fc1af4955e5]
[helium:33883] [2] 
func:/usr/lib/libmpi.so.12(ompi_mpi_errors_are_fatal_comm_handler+0xe2) 
[0x7fc1af487e72]
[helium:33883] [3] func:/usr/lib/libmpi.so.12(ompi_errhandler_invoke+0x145) 
[0x7fc1af4874b5]
[helium:33883] [4] func:/usr/lib/libmpi.so.12(MPI_Comm_accept+0x262) 
[0x7fc1af4a90e2]
[helium:33883] [5] func:./mpiports() [0x41e43d]
[helium:33883] [6] func:/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) 
[0x7fc1ad7a1830]
[helium:33883] [7] func:./mpiports() [0x41b249]


Before that I check for the length of portName

  DEBUG << "COMM ACCEPT portName.size() = " << portName.size();
  DEBUG << "MPI_MAX_PORT_NAME = " << MPI_MAX_PORT_NAME;

which both return 1024.

I am completely puzzled, how I can get a buffer issue, except something faulty 
with std::string portName.

Any clues?

Launch command: mpirun -n 4 -mca opal_abort_print_stack 1 
OpenMPI 1.10.2 @ Ubuntu 16.

Thanks,
Florian
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


[OMPI users] Settings oversubscribe as default?

2018-08-03 Thread Florian Lindner
Hello,

I can use --oversubscribe to enable oversubscribing. What is OpenMPI way to set 
this as a default, e.g. through a config file option or an environment variable?

Thanks,
Florian
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users