Re: [OMPI users] Issues compiling HPL with OMPIv4.0.0

2019-04-03 Thread Nathan Hjelm via users
Giles is correct. If mpicc is showing errors like those in your original email 
then it is not invoking a C compiler. C does not have any concept of try or 
catch. No modern C compiler will complain about a variable named “try” as it is 
not a reserved keyword in the C language.

Example:

foo.c:

int try = 0;

gcc --std=c11 -c foo.c

No error


g++ -c foo.c  
foo.c:3:5: error: expected unqualified-id
int try = 0;
^
1 error generated.

-Nathan

> On Apr 3, 2019, at 6:09 PM, Gilles Gouaillardet  wrote:
> 
> Do not get fooled by the symlinks to opal_wrapper !
> 
> opal_wrapper checks how it is invoked (e.g. check argv[0] in main()) and the 
> behavior is different
> 
> if it is invoked as mpicc, mpiCC, mpifort and other
> 
> 
> If the error persists with mpicc, you can manually extract the mpicc command 
> line, and manually run it with the -showme parameter,
> 
> it will show you the full command line (and who knows, mpicc might invoke a 
> C++ compiler after all, and that would be a config issue)
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> On 4/4/2019 7:48 AM, afernan...@odyhpc.com wrote:
>> 
>> Sam and Jeff,
>> 
>> Thank you for your answers. My first attempts actually used mpicc rather 
>> than mpiCC, switching to mpiCC was simply to check out if the problem 
>> persisted. I noticed that both mpicc and mpiCC are linked to the same file 
>> (opal_wrapper) and didn't bother switching it back. I'm not sure if the 
>> wrapper figures out what compiler you call because I was getting the same 
>> error message. Jeff is right pointing out that 'try' is reserved but the 
>> original file seems to be really old (think 1970). Apparently, the new 
>> compiler (shipped with OMPIv4) is more sensitive and beeps when the older 
>> didn't.
>> 
>> Thanks again,
>> 
>> AFernandez
>> 
>> Indeed, you cannot use "try" as a variable name in C++ because it is a 
>> https://en.cppreference.com/w/cpp/keyword.
>> 
>> As already suggested, use a C compiler, or you can replace "try" with "xtry" 
>> or any other non-reserved word.
>> 
>> Jeff
>> 
>> On Wed, Apr 3, 2019 at 1:41 PM Gutierrez, Samuel K. via users 
>> mailto:users@lists.open-mpi.org>> wrote:
>> 
>>Hi,
>> 
>>It looks like you are using the C++ wrapper compiler (mpiCC)
>>instead of the C wrapper compiler (mpicc). Perhaps using mpicc
>>instead of mpiCC will resolve your issue.
>> 
>>Best,
>> 
>>Sam
>> 
>> 
>> 
>>On Apr 3, 2019, at 12:38 PM, afernan...@odyhpc.com
>> wrote:
>> 
>>Hello,
>> 
>>I'm trying to compile HPL(v2.3) with OpenBLAS and OMPI. The
>>compilation succeeds when using the old OMPI (v1.10.8) but
>>fails with OMPI v4.0.0 (I'm still not using v4.0.1). The error
>>is for an old subroutine that determines machine-specific
>>arithmetic constants:
>> 
>>mpiCC -o HPL_dlamch.o -c
>>-I/home/centos/benchmarks/hpl-2.2/include
>>-I/home/centos/benchmarks/hpl-2.2/include/impetus03
>>-I/opt/openmpi/include  ../HPL_dlamch.c
>> 
>>../HPL_dlamch.c: In function ‘void HPL_dlamc5(int, int, int,
>>int, int*, double*)’:
>> 
>>../HPL_dlamch.c:749:67: error: expected unqualified-id before
>>‘try’
>> 
>>intexbits=1, expsum, i, lexp=1, nbits,
>>try,
>> 
>>^
>> 
>>../HPL_dlamch.c:761:8: error: expected ‘{’ before ‘=’ token
>> 
>>try = (int)( (unsigned int)(lexp) << 1 );
>> 
>>^
>> 
>>../HPL_dlamch.c:761:8: error: expected ‘catch’ before ‘=’ token
>> 
>>../HPL_dlamch.c:761:8: error: expected ‘(’ before ‘=’ token
>> 
>>../HPL_dlamch.c:761:8: error: expected type-specifier before
>>‘=’ token
>> 
>>../HPL_dlamch.c:761:8: error: expected ‘)’ before ‘=’ token
>> 
>>../HPL_dlamch.c:761:8: error: expected ‘{’ before ‘=’ token
>> 
>>../HPL_dlamch.c:761:8: error: expected primary-expression
>>before ‘=’ token
>> 
>>../HPL_dlamch.c:762:8: error: expected primary-expression
>>before ‘try’
>> 
>>if( try <= ( -EMIN ) ) { lexp = try; exbits++; goto l_10; }
>> 
>>^
>> 
>>../HPL_dlamch.c:762:8: error: expected ‘)’ before ‘try’
>> 
>>../HPL_dlamch.c:762:36: error: expected primary-expression
>>before ‘try’
>> 
>>if( try <= ( -EMIN ) ) { lexp = try; exbits++; goto l_10; }
>> 
>>^
>> 
>>../HPL_dlamch.c:762:36: error: expected ‘;’ before ‘try’
>> 
>>../HPL_dlamch.c:764:26: error: ‘uexp’ was not declared in this
>>scope
>> 
>>if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try;
>>exbits++; }
>> 
>>^
>> 
>>../HPL_dlamch.c:764:48: error: ‘uexp’ was not declared in this
>>scope
>> 
>>if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try;
>>exbits++; }
>> 
>>^
>> 
>>../HPL_dlamch.c:764:55: 

Re: [OMPI users] Issues compiling HPL with OMPIv4.0.0

2019-04-03 Thread Gilles Gouaillardet

Do not get fooled by the symlinks to opal_wrapper !

opal_wrapper checks how it is invoked (e.g. check argv[0] in main()) and 
the behavior is different


if it is invoked as mpicc, mpiCC, mpifort and other


If the error persists with mpicc, you can manually extract the mpicc 
command line, and manually run it with the -showme parameter,


it will show you the full command line (and who knows, mpicc might 
invoke a C++ compiler after all, and that would be a config issue)



Cheers,


Gilles

On 4/4/2019 7:48 AM, afernan...@odyhpc.com wrote:


Sam and Jeff,

Thank you for your answers. My first attempts actually used mpicc 
rather than mpiCC, switching to mpiCC was simply to check out if the 
problem persisted. I noticed that both mpicc and mpiCC are linked to 
the same file (opal_wrapper) and didn't bother switching it back. I'm 
not sure if the wrapper figures out what compiler you call because I 
was getting the same error message. Jeff is right pointing out that 
'try' is reserved but the original file seems to be really old (think 
1970). Apparently, the new compiler (shipped with OMPIv4) is more 
sensitive and beeps when the older didn't.


Thanks again,

AFernandez

Indeed, you cannot use "try" as a variable name in C++ because it is a 
https://en.cppreference.com/w/cpp/keyword.


As already suggested, use a C compiler, or you can replace "try" with 
"xtry" or any other non-reserved word.


Jeff

On Wed, Apr 3, 2019 at 1:41 PM Gutierrez, Samuel K. via users 
mailto:users@lists.open-mpi.org>> wrote:


Hi,

It looks like you are using the C++ wrapper compiler (mpiCC)
instead of the C wrapper compiler (mpicc). Perhaps using mpicc
instead of mpiCC will resolve your issue.

Best,

Sam



On Apr 3, 2019, at 12:38 PM, afernan...@odyhpc.com
 wrote:

Hello,

I'm trying to compile HPL(v2.3) with OpenBLAS and OMPI. The
compilation succeeds when using the old OMPI (v1.10.8) but
fails with OMPI v4.0.0 (I'm still not using v4.0.1). The error
is for an old subroutine that determines machine-specific
arithmetic constants:

mpiCC -o HPL_dlamch.o -c
-I/home/centos/benchmarks/hpl-2.2/include
-I/home/centos/benchmarks/hpl-2.2/include/impetus03
-I/opt/openmpi/include  ../HPL_dlamch.c

../HPL_dlamch.c: In function ‘void HPL_dlamc5(int, int, int,
int, int*, double*)’:

../HPL_dlamch.c:749:67: error: expected unqualified-id before
‘try’

int    exbits=1, expsum, i, lexp=1, nbits,
try,

^

../HPL_dlamch.c:761:8: error: expected ‘{’ before ‘=’ token

    try = (int)( (unsigned int)(lexp) << 1 );

    ^

../HPL_dlamch.c:761:8: error: expected ‘catch’ before ‘=’ token

../HPL_dlamch.c:761:8: error: expected ‘(’ before ‘=’ token

../HPL_dlamch.c:761:8: error: expected type-specifier before
‘=’ token

../HPL_dlamch.c:761:8: error: expected ‘)’ before ‘=’ token

../HPL_dlamch.c:761:8: error: expected ‘{’ before ‘=’ token

../HPL_dlamch.c:761:8: error: expected primary-expression
before ‘=’ token

../HPL_dlamch.c:762:8: error: expected primary-expression
before ‘try’

    if( try <= ( -EMIN ) ) { lexp = try; exbits++; goto l_10; }

    ^

../HPL_dlamch.c:762:8: error: expected ‘)’ before ‘try’

../HPL_dlamch.c:762:36: error: expected primary-expression
before ‘try’

    if( try <= ( -EMIN ) ) { lexp = try; exbits++; goto l_10; }

^

../HPL_dlamch.c:762:36: error: expected ‘;’ before ‘try’

../HPL_dlamch.c:764:26: error: ‘uexp’ was not declared in this
scope

    if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try;
exbits++; }

^

../HPL_dlamch.c:764:48: error: ‘uexp’ was not declared in this
scope

    if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try;
exbits++; }

^

../HPL_dlamch.c:764:55: error: expected primary-expression
before ‘try’

    if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try;
exbits++; }

^

../HPL_dlamch.c:764:55: error: expected ‘;’ before ‘try’

../HPL_dlamch.c:770:10: error: ‘uexp’ was not declared in this
scope

    if( ( uexp+EMIN ) > ( -lexp-EMIN ) )

  ^

make[2]: *** [HPL_dlamch.o] Error 1

make[2]: Leaving directory
`/home/centos/hpl-2.3/src/auxil/impetus03'

make[1]: *** [build_src] Error 2

make[1]: Leaving directory `/home/centos/hpl-2.3'

make: *** [build] Error 2

I don't understand the nature of the problem or why it works
with the old OMPI version and not with the new. Any help or
pointer would be appreciated.

Thanks.

AFernandez


Re: [OMPI users] Issues compiling HPL with OMPIv4.0.0

2019-04-03 Thread afernandez
Sam and Jeff,

Thank you for your answers. My first attempts actually used mpicc rather than 
mpiCC, switching to mpiCC was simply to check out if the problem persisted. I 
noticed that both mpicc and mpiCC are linked to the same file (opal_wrapper) 
and didn't bother switching it back. I'm not sure if the wrapper figures out 
what compiler you call because I was getting the same error message. Jeff is 
right pointing out that 'try' is reserved but the original file seems to be 
really old (think 1970). Apparently, the new compiler (shipped with OMPIv4) is 
more sensitive and beeps when the older didn't. 

Thanks again,

AFernandez

 

Indeed, you cannot use "try" as a variable name in C++ because it is a 
https://en.cppreference.com/w/cpp/keyword.

 

As already suggested, use a C compiler, or you can replace "try" with "xtry" or 
any other non-reserved word.

 

Jeff

 

On Wed, Apr 3, 2019 at 1:41 PM Gutierrez, Samuel K. via users 
mailto:users@lists.open-mpi.org> > wrote:

Hi, 

 

It looks like you are using the C++ wrapper compiler (mpiCC) instead of the C 
wrapper compiler (mpicc). Perhaps using mpicc instead of mpiCC will resolve 
your issue.

 

Best,

 

Sam





On Apr 3, 2019, at 12:38 PM, afernan...@odyhpc.com 
  wrote:

 

Hello,

I'm trying to compile HPL(v2.3) with OpenBLAS and OMPI. The compilation 
succeeds when using the old OMPI (v1.10.8) but fails with OMPI v4.0.0 (I'm 
still not using v4.0.1). The error is for an old subroutine that determines 
machine-specific arithmetic constants:

 

mpiCC -o HPL_dlamch.o -c   -I/home/centos/benchmarks/hpl-2.2/include 
-I/home/centos/benchmarks/hpl-2.2/include/impetus03  -I/opt/openmpi/include  
../HPL_dlamch.c

../HPL_dlamch.c: In function ‘void HPL_dlamc5(int, int, int, int, int*, 
double*)’:

../HPL_dlamch.c:749:67: error: expected unqualified-id before ‘try’

intexbits=1, expsum, i, lexp=1, nbits, try,

   ^

../HPL_dlamch.c:761:8: error: expected ‘{’ before ‘=’ token

try = (int)( (unsigned int)(lexp) << 1 );

^

../HPL_dlamch.c:761:8: error: expected ‘catch’ before ‘=’ token

../HPL_dlamch.c:761:8: error: expected ‘(’ before ‘=’ token

../HPL_dlamch.c:761:8: error: expected type-specifier before ‘=’ token

../HPL_dlamch.c:761:8: error: expected ‘)’ before ‘=’ token

../HPL_dlamch.c:761:8: error: expected ‘{’ before ‘=’ token

../HPL_dlamch.c:761:8: error: expected primary-expression before ‘=’ token

../HPL_dlamch.c:762:8: error: expected primary-expression before ‘try’

if( try <= ( -EMIN ) ) { lexp = try; exbits++; goto l_10; }

^

../HPL_dlamch.c:762:8: error: expected ‘)’ before ‘try’

../HPL_dlamch.c:762:36: error: expected primary-expression before ‘try’

if( try <= ( -EMIN ) ) { lexp = try; exbits++; goto l_10; }

^

../HPL_dlamch.c:762:36: error: expected ‘;’ before ‘try’

../HPL_dlamch.c:764:26: error: ‘uexp’ was not declared in this scope

if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try; exbits++; }

  ^

../HPL_dlamch.c:764:48: error: ‘uexp’ was not declared in this scope

if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try; exbits++; }

^

../HPL_dlamch.c:764:55: error: expected primary-expression before ‘try’

if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try; exbits++; }

   ^

../HPL_dlamch.c:764:55: error: expected ‘;’ before ‘try’

../HPL_dlamch.c:770:10: error: ‘uexp’ was not declared in this scope

if( ( uexp+EMIN ) > ( -lexp-EMIN ) )

  ^

make[2]: *** [HPL_dlamch.o] Error 1

make[2]: Leaving directory `/home/centos/hpl-2.3/src/auxil/impetus03'

make[1]: *** [build_src] Error 2

make[1]: Leaving directory `/home/centos/hpl-2.3'

make: *** [build] Error 2

 

I don't understand the nature of the problem or why it works with the old OMPI 
version and not with the new. Any help or pointer would be appreciated.

Thanks.

AFernandez

 

 

___
users mailing list
  users@lists.open-mpi.org
  
https://lists.open-mpi.org/mailman/listinfo/users

 

___
users mailing list
users@lists.open-mpi.org  
https://lists.open-mpi.org/mailman/listinfo/users




 

-- 

Jeff Hammond
jeff.scie...@gmail.com  
http://jeffhammond.github.io/

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Issues compiling HPL with OMPIv4.0.0

2019-04-03 Thread Jeff Hammond
Indeed, you cannot use "try" as a variable name in C++ because it is a
https://en.cppreference.com/w/cpp/keyword.

As already suggested, use a C compiler, or you can replace "try" with
"xtry" or any other non-reserved word.

Jeff

On Wed, Apr 3, 2019 at 1:41 PM Gutierrez, Samuel K. via users <
users@lists.open-mpi.org> wrote:

> Hi,
>
> It looks like you are using the C++ wrapper compiler (mpiCC) instead of
> the C wrapper compiler (mpicc). Perhaps using mpicc instead of mpiCC will
> resolve your issue.
>
> Best,
>
> Sam
>
> On Apr 3, 2019, at 12:38 PM, afernan...@odyhpc.com wrote:
>
> Hello,
> I'm trying to compile HPL(v2.3) with OpenBLAS and OMPI. The compilation
> succeeds when using the old OMPI (v1.10.8) but fails with OMPI v4.0.0 (I'm
> still not using v4.0.1). The error is for an old subroutine that determines
> machine-specific arithmetic constants:
>
> mpiCC -o HPL_dlamch.o -c   -I/home/centos/benchmarks/hpl-2.2/include
> -I/home/centos/benchmarks/hpl-2.2/include/impetus03
> -I/opt/openmpi/include  ../HPL_dlamch.c
> ../HPL_dlamch.c: In function ‘void HPL_dlamc5(int, int, int, int, int*,
> double*)’:
> ../HPL_dlamch.c:749:67: error: expected unqualified-id before ‘try’
> intexbits=1, expsum, i, lexp=1, nbits, try,
>^
> ../HPL_dlamch.c:761:8: error: expected ‘{’ before ‘=’ token
> try = (int)( (unsigned int)(lexp) << 1 );
> ^
> ../HPL_dlamch.c:761:8: error: expected ‘catch’ before ‘=’ token
> ../HPL_dlamch.c:761:8: error: expected ‘(’ before ‘=’ token
> ../HPL_dlamch.c:761:8: error: expected type-specifier before ‘=’ token
> ../HPL_dlamch.c:761:8: error: expected ‘)’ before ‘=’ token
> ../HPL_dlamch.c:761:8: error: expected ‘{’ before ‘=’ token
> ../HPL_dlamch.c:761:8: error: expected primary-expression before ‘=’ token
> ../HPL_dlamch.c:762:8: error: expected primary-expression before ‘try’
> if( try <= ( -EMIN ) ) { lexp = try; exbits++; goto l_10; }
> ^
> ../HPL_dlamch.c:762:8: error: expected ‘)’ before ‘try’
> ../HPL_dlamch.c:762:36: error: expected primary-expression before ‘try’
> if( try <= ( -EMIN ) ) { lexp = try; exbits++; goto l_10; }
> ^
> ../HPL_dlamch.c:762:36: error: expected ‘;’ before ‘try’
> ../HPL_dlamch.c:764:26: error: ‘uexp’ was not declared in this scope
> if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try; exbits++; }
>   ^
> ../HPL_dlamch.c:764:48: error: ‘uexp’ was not declared in this scope
> if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try; exbits++; }
> ^
> ../HPL_dlamch.c:764:55: error: expected primary-expression before ‘try’
> if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try; exbits++; }
>^
> ../HPL_dlamch.c:764:55: error: expected ‘;’ before ‘try’
> ../HPL_dlamch.c:770:10: error: ‘uexp’ was not declared in this scope
> if( ( uexp+EMIN ) > ( -lexp-EMIN ) )
>   ^
> make[2]: *** [HPL_dlamch.o] Error 1
> make[2]: Leaving directory `/home/centos/hpl-2.3/src/auxil/impetus03'
> make[1]: *** [build_src] Error 2
> make[1]: Leaving directory `/home/centos/hpl-2.3'
> make: *** [build] Error 2
>
> I don't understand the nature of the problem or why it works with the old
> OMPI version and not with the new. Any help or pointer would be appreciated.
> Thanks.
> AFernandez
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users



-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Issues compiling HPL with OMPIv4.0.0

2019-04-03 Thread Gutierrez, Samuel K. via users
Hi,

It looks like you are using the C++ wrapper compiler (mpiCC) instead of the C 
wrapper compiler (mpicc). Perhaps using mpicc instead of mpiCC will resolve 
your issue.

Best,

Sam

On Apr 3, 2019, at 12:38 PM, 
afernan...@odyhpc.com wrote:

Hello,
I'm trying to compile HPL(v2.3) with OpenBLAS and OMPI. The compilation 
succeeds when using the old OMPI (v1.10.8) but fails with OMPI v4.0.0 (I'm 
still not using v4.0.1). The error is for an old subroutine that determines 
machine-specific arithmetic constants:

mpiCC -o HPL_dlamch.o -c   -I/home/centos/benchmarks/hpl-2.2/include 
-I/home/centos/benchmarks/hpl-2.2/include/impetus03  -I/opt/openmpi/include  
../HPL_dlamch.c
../HPL_dlamch.c: In function ‘void HPL_dlamc5(int, int, int, int, int*, 
double*)’:
../HPL_dlamch.c:749:67: error: expected unqualified-id before ‘try’
intexbits=1, expsum, i, lexp=1, nbits, try,
   ^
../HPL_dlamch.c:761:8: error: expected ‘{’ before ‘=’ token
try = (int)( (unsigned int)(lexp) << 1 );
^
../HPL_dlamch.c:761:8: error: expected ‘catch’ before ‘=’ token
../HPL_dlamch.c:761:8: error: expected ‘(’ before ‘=’ token
../HPL_dlamch.c:761:8: error: expected type-specifier before ‘=’ token
../HPL_dlamch.c:761:8: error: expected ‘)’ before ‘=’ token
../HPL_dlamch.c:761:8: error: expected ‘{’ before ‘=’ token
../HPL_dlamch.c:761:8: error: expected primary-expression before ‘=’ token
../HPL_dlamch.c:762:8: error: expected primary-expression before ‘try’
if( try <= ( -EMIN ) ) { lexp = try; exbits++; goto l_10; }
^
../HPL_dlamch.c:762:8: error: expected ‘)’ before ‘try’
../HPL_dlamch.c:762:36: error: expected primary-expression before ‘try’
if( try <= ( -EMIN ) ) { lexp = try; exbits++; goto l_10; }
^
../HPL_dlamch.c:762:36: error: expected ‘;’ before ‘try’
../HPL_dlamch.c:764:26: error: ‘uexp’ was not declared in this scope
if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try; exbits++; }
  ^
../HPL_dlamch.c:764:48: error: ‘uexp’ was not declared in this scope
if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try; exbits++; }
^
../HPL_dlamch.c:764:55: error: expected primary-expression before ‘try’
if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try; exbits++; }
   ^
../HPL_dlamch.c:764:55: error: expected ‘;’ before ‘try’
../HPL_dlamch.c:770:10: error: ‘uexp’ was not declared in this scope
if( ( uexp+EMIN ) > ( -lexp-EMIN ) )
  ^
make[2]: *** [HPL_dlamch.o] Error 1
make[2]: Leaving directory `/home/centos/hpl-2.3/src/auxil/impetus03'
make[1]: *** [build_src] Error 2
make[1]: Leaving directory `/home/centos/hpl-2.3'
make: *** [build] Error 2

I don't understand the nature of the problem or why it works with the old OMPI 
version and not with the new. Any help or pointer would be appreciated.
Thanks.
AFernandez


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Cannot catch std::bac_alloc?

2019-04-03 Thread Zhen Wang
OK. After help from several forums, I think I understand the cause of the
problem. As Jeff said, it has nothing to do with MPI.

Linux allows over committing (Thanks Joseph). See here
 and here
. In my case, say the machine has
32GB RAM, 2GB is already used and each MPI process is trying to allocate
8GB at a time. Then this happens:

Initial memory usage: 2GB
After the first memory allocation: 18GB
In the second memory allocation, each MPI process thinks it has enough
space for 8GB because of over committing, and writes to it. But that
requires 34GB, exceeding RAM size. So the out of memory killer on Linux
kills one of the MPI process (sends a SIGKILL signal), and the other MPI
process receives a SIGTERM signal.

This also explains my questions above.

The reason this problem doesn't happen on Windows is Windows doesn't allow
over commit

.

Thanks again everyone.

Best regards,
Zhen


On Wed, Apr 3, 2019 at 12:57 PM Jeff Hammond  wrote:

> This is not an MPI problem.  You will likely find StackOverflow to be a
> more effective way to get support on C++ issues.
>
> Jeff
>
> On Wed, Apr 3, 2019 at 8:47 AM Zhen Wang  wrote:
>
>> Joseph,
>>
>> Thanks for your response. I'm no expert on Linux so please bear with me.
>> If I understand correctly, using malloc instead of resize should allow me
>> to handle out of memory error properly, but I still see abnormal
>> termination (code is attached).
>>
>> I have more questions.
>>
>> 1. If the problem is overcommit, (meaning not related to MP I suppose)I,
>> why don't I see it if only MPI 0 calls resize? MPI 0 still overcommits and
>> bac_alloc is caught.
>>
>> 2. In resize, if the returned pointer is null, should it throw some kind
>> of error so the caller could catch and handle that? I don't know the
>> implementation but simply exiting doesn't seem a good idea.
>>
>> Thanks.
>>
>> Best regards,
>> Zhen
>>
>>
>> On Wed, Apr 3, 2019 at 10:02 AM Joseph Schuchart 
>> wrote:
>>
>>> Zhen,
>>>
>>> The "problem" you're running into is memory overcommit [1]. The system
>>> will happily hand you a pointer to memory upon calling malloc without
>>> actually allocating the pages (that's the first step in
>>> std::vector::resize) and then terminate your application as soon as it
>>> tries to actually allocate them if the system runs out of memory. This
>>> happens in std::vector::resize too, which sets each entry in the vector
>>> to it's initial value. There is no way you can catch that. You might
>>> want to try to disable overcommit in the kernel and see if
>>> std::vector::resize throws an exception because malloc fails.
>>>
>>> HTH,
>>> Joseph
>>>
>>> [1] https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
>>>
>>> On 4/3/19 3:26 PM, Zhen Wang wrote:
>>> > Hi,
>>> >
>>> > I have difficulty catching std::bac_alloc in an MPI environment. The
>>> > code is attached. I'm uisng gcc 6.3 on SUSE Linux Enterprise Server 11
>>> > (x86_64). OpenMPI is built from source. The commands are as follows:
>>> >
>>> > *Build*
>>> > g++ -I -L -lmpi
>>> memory.cpp
>>> >
>>> > *Run*
>>> >  -n 2 a.out
>>> >
>>> > *Output*
>>> > 0
>>> > 0
>>> > 1
>>> > 1
>>> >
>>> --
>>> > Primary job  terminated normally, but 1 process returned
>>> > a non-zero exit code. Per user-direction, the job has been aborted.
>>> >
>>> --
>>> >
>>> --
>>> > mpiexec noticed that process rank 0 with PID 0 on node cdcebus114qa05
>>> > exited on signal 9 (Killed).
>>> >
>>> --
>>> >
>>> >
>>> > If I uncomment the line //if (rank == 0), i.e., only rank 0 allocates
>>> > memory, I'm able to catch bad_alloc as I expected. It seems that I am
>>> > misunderstanding something. Could you please help? Thanks a lot.
>>> >
>>> >
>>> >
>>> > Best regards,
>>> > Zhen
>>> >
>>> > ___
>>> > users mailing list
>>> > users@lists.open-mpi.org
>>> > https://lists.open-mpi.org/mailman/listinfo/users
>>> >
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
>
>
> --
> Jeff Hammond
> jeff.scie...@gmail.com
> http://jeffhammond.github.io/
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list

[OMPI users] Issues compiling HPL with OMPIv4.0.0

2019-04-03 Thread afernandez
Hello,

I'm trying to compile HPL(v2.3) with OpenBLAS and OMPI. The compilation
succeeds when using the old OMPI (v1.10.8) but fails with OMPI v4.0.0 (I'm
still not using v4.0.1). The error is for an old subroutine that determines
machine-specific arithmetic constants:

 

mpiCC -o HPL_dlamch.o -c   -I/home/centos/benchmarks/hpl-2.2/include
-I/home/centos/benchmarks/hpl-2.2/include/impetus03  -I/opt/openmpi/include
../HPL_dlamch.c

../HPL_dlamch.c: In function 'void HPL_dlamc5(int, int, int, int, int*,
double*)':

../HPL_dlamch.c:749:67: error: expected unqualified-id before 'try'

intexbits=1, expsum, i, lexp=1, nbits, try,

   ^

../HPL_dlamch.c:761:8: error: expected '{' before '=' token

try = (int)( (unsigned int)(lexp) << 1 );

^

../HPL_dlamch.c:761:8: error: expected 'catch' before '=' token

../HPL_dlamch.c:761:8: error: expected '(' before '=' token

../HPL_dlamch.c:761:8: error: expected type-specifier before '=' token

../HPL_dlamch.c:761:8: error: expected ')' before '=' token

../HPL_dlamch.c:761:8: error: expected '{' before '=' token

../HPL_dlamch.c:761:8: error: expected primary-expression before '=' token

../HPL_dlamch.c:762:8: error: expected primary-expression before 'try'

if( try <= ( -EMIN ) ) { lexp = try; exbits++; goto l_10; }

^

../HPL_dlamch.c:762:8: error: expected ')' before 'try'

../HPL_dlamch.c:762:36: error: expected primary-expression before 'try'

if( try <= ( -EMIN ) ) { lexp = try; exbits++; goto l_10; }

^

../HPL_dlamch.c:762:36: error: expected ';' before 'try'

../HPL_dlamch.c:764:26: error: 'uexp' was not declared in this scope

if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try; exbits++; }

  ^

../HPL_dlamch.c:764:48: error: 'uexp' was not declared in this scope

if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try; exbits++; }

^

../HPL_dlamch.c:764:55: error: expected primary-expression before 'try'

if( lexp == -EMIN ) { uexp = lexp; } else { uexp = try; exbits++; }

   ^

../HPL_dlamch.c:764:55: error: expected ';' before 'try'

../HPL_dlamch.c:770:10: error: 'uexp' was not declared in this scope

if( ( uexp+EMIN ) > ( -lexp-EMIN ) )

  ^

make[2]: *** [HPL_dlamch.o] Error 1

make[2]: Leaving directory `/home/centos/hpl-2.3/src/auxil/impetus03'

make[1]: *** [build_src] Error 2

make[1]: Leaving directory `/home/centos/hpl-2.3'

make: *** [build] Error 2

 

I don't understand the nature of the problem or why it works with the old
OMPI version and not with the new. Any help or pointer would be appreciated.

Thanks.

AFernandez

 

 

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Cannot catch std::bac_alloc?

2019-04-03 Thread Jeff Hammond
This is not an MPI problem.  You will likely find StackOverflow to be a
more effective way to get support on C++ issues.

Jeff

On Wed, Apr 3, 2019 at 8:47 AM Zhen Wang  wrote:

> Joseph,
>
> Thanks for your response. I'm no expert on Linux so please bear with me.
> If I understand correctly, using malloc instead of resize should allow me
> to handle out of memory error properly, but I still see abnormal
> termination (code is attached).
>
> I have more questions.
>
> 1. If the problem is overcommit, (meaning not related to MP I suppose)I,
> why don't I see it if only MPI 0 calls resize? MPI 0 still overcommits and
> bac_alloc is caught.
>
> 2. In resize, if the returned pointer is null, should it throw some kind
> of error so the caller could catch and handle that? I don't know the
> implementation but simply exiting doesn't seem a good idea.
>
> Thanks.
>
> Best regards,
> Zhen
>
>
> On Wed, Apr 3, 2019 at 10:02 AM Joseph Schuchart 
> wrote:
>
>> Zhen,
>>
>> The "problem" you're running into is memory overcommit [1]. The system
>> will happily hand you a pointer to memory upon calling malloc without
>> actually allocating the pages (that's the first step in
>> std::vector::resize) and then terminate your application as soon as it
>> tries to actually allocate them if the system runs out of memory. This
>> happens in std::vector::resize too, which sets each entry in the vector
>> to it's initial value. There is no way you can catch that. You might
>> want to try to disable overcommit in the kernel and see if
>> std::vector::resize throws an exception because malloc fails.
>>
>> HTH,
>> Joseph
>>
>> [1] https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
>>
>> On 4/3/19 3:26 PM, Zhen Wang wrote:
>> > Hi,
>> >
>> > I have difficulty catching std::bac_alloc in an MPI environment. The
>> > code is attached. I'm uisng gcc 6.3 on SUSE Linux Enterprise Server 11
>> > (x86_64). OpenMPI is built from source. The commands are as follows:
>> >
>> > *Build*
>> > g++ -I -L -lmpi
>> memory.cpp
>> >
>> > *Run*
>> >  -n 2 a.out
>> >
>> > *Output*
>> > 0
>> > 0
>> > 1
>> > 1
>> >
>> --
>> > Primary job  terminated normally, but 1 process returned
>> > a non-zero exit code. Per user-direction, the job has been aborted.
>> >
>> --
>> >
>> --
>> > mpiexec noticed that process rank 0 with PID 0 on node cdcebus114qa05
>> > exited on signal 9 (Killed).
>> >
>> --
>> >
>> >
>> > If I uncomment the line //if (rank == 0), i.e., only rank 0 allocates
>> > memory, I'm able to catch bad_alloc as I expected. It seems that I am
>> > misunderstanding something. Could you please help? Thanks a lot.
>> >
>> >
>> >
>> > Best regards,
>> > Zhen
>> >
>> > ___
>> > users mailing list
>> > users@lists.open-mpi.org
>> > https://lists.open-mpi.org/mailman/listinfo/users
>> >
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users



-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Cannot catch std::bac_alloc?

2019-04-03 Thread Zhen Wang
Joseph,

Thanks for your response. I'm no expert on Linux so please bear with me. If
I understand correctly, using malloc instead of resize should allow me to
handle out of memory error properly, but I still see abnormal termination
(code is attached).

I have more questions.

1. If the problem is overcommit, (meaning not related to MP I suppose)I,
why don't I see it if only MPI 0 calls resize? MPI 0 still overcommits and
bac_alloc is caught.

2. In resize, if the returned pointer is null, should it throw some kind of
error so the caller could catch and handle that? I don't know the
implementation but simply exiting doesn't seem a good idea.

Thanks.

Best regards,
Zhen


On Wed, Apr 3, 2019 at 10:02 AM Joseph Schuchart  wrote:

> Zhen,
>
> The "problem" you're running into is memory overcommit [1]. The system
> will happily hand you a pointer to memory upon calling malloc without
> actually allocating the pages (that's the first step in
> std::vector::resize) and then terminate your application as soon as it
> tries to actually allocate them if the system runs out of memory. This
> happens in std::vector::resize too, which sets each entry in the vector
> to it's initial value. There is no way you can catch that. You might
> want to try to disable overcommit in the kernel and see if
> std::vector::resize throws an exception because malloc fails.
>
> HTH,
> Joseph
>
> [1] https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
>
> On 4/3/19 3:26 PM, Zhen Wang wrote:
> > Hi,
> >
> > I have difficulty catching std::bac_alloc in an MPI environment. The
> > code is attached. I'm uisng gcc 6.3 on SUSE Linux Enterprise Server 11
> > (x86_64). OpenMPI is built from source. The commands are as follows:
> >
> > *Build*
> > g++ -I -L -lmpi
> memory.cpp
> >
> > *Run*
> >  -n 2 a.out
> >
> > *Output*
> > 0
> > 0
> > 1
> > 1
> >
> --
> > Primary job  terminated normally, but 1 process returned
> > a non-zero exit code. Per user-direction, the job has been aborted.
> >
> --
> >
> --
> > mpiexec noticed that process rank 0 with PID 0 on node cdcebus114qa05
> > exited on signal 9 (Killed).
> >
> --
> >
> >
> > If I uncomment the line //if (rank == 0), i.e., only rank 0 allocates
> > memory, I'm able to catch bad_alloc as I expected. It seems that I am
> > misunderstanding something. Could you please help? Thanks a lot.
> >
> >
> >
> > Best regards,
> > Zhen
> >
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
> >
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
#include "mpi.h"
#include 
#include 
#include 
#include 

int main( int argc, char *argv[] )
{
  MPI_Init( ,  );

  int rank;
  MPI_Comm_rank(MPI_COMM_WORLD, );
  if (rank == 0)
  {
double * a[100];
for (long long i = 0; i < 100; i++)
{
  std::cout << i << std::endl;
  a[i] = (double *)malloc(1*sizeof(double));
  if (!a[i])
  {
std::cout << "out" << std::endl;
continue;
  }
  memset(a[i], 0, 1*sizeof(double));
  usleep(100);
}
  }

  MPI_Finalize();
  return 0;
}
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Cannot catch std::bac_alloc?

2019-04-03 Thread Joseph Schuchart

Zhen,

The "problem" you're running into is memory overcommit [1]. The system 
will happily hand you a pointer to memory upon calling malloc without 
actually allocating the pages (that's the first step in 
std::vector::resize) and then terminate your application as soon as it 
tries to actually allocate them if the system runs out of memory. This 
happens in std::vector::resize too, which sets each entry in the vector 
to it's initial value. There is no way you can catch that. You might 
want to try to disable overcommit in the kernel and see if 
std::vector::resize throws an exception because malloc fails.


HTH,
Joseph

[1] https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

On 4/3/19 3:26 PM, Zhen Wang wrote:

Hi,

I have difficulty catching std::bac_alloc in an MPI environment. The 
code is attached. I'm uisng gcc 6.3 on SUSE Linux Enterprise Server 11 
(x86_64). OpenMPI is built from source. The commands are as follows:


*Build*
g++ -I -L -lmpi memory.cpp

*Run*
 -n 2 a.out

*Output*
0
0
1
1
--
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
--
mpiexec noticed that process rank 0 with PID 0 on node cdcebus114qa05 
exited on signal 9 (Killed).

--


If I uncomment the line //if (rank == 0), i.e., only rank 0 allocates 
memory, I'm able to catch bad_alloc as I expected. It seems that I am 
misunderstanding something. Could you please help? Thanks a lot.




Best regards,
Zhen

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Cannot catch std::bac_alloc?

2019-04-03 Thread Zhen Wang
Hi,

I have difficulty catching std::bac_alloc in an MPI environment. The code
is attached. I'm uisng gcc 6.3 on SUSE Linux Enterprise Server 11 (x86_64).
OpenMPI is built from source. The commands are as follows:

*Build*
g++ -I -L -lmpi memory.cpp

*Run*
 -n 2 a.out

*Output*
0
0
1
1
--
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
--
mpiexec noticed that process rank 0 with PID 0 on node cdcebus114qa05
exited on signal 9 (Killed).
--


If I uncomment the line //if (rank == 0), i.e., only rank 0 allocates
memory, I'm able to catch bad_alloc as I expected. It seems that I am
misunderstanding something. Could you please help? Thanks a lot.



Best regards,
Zhen
#include "mpi.h"
#include 
#include 
#include 

int main( int argc, char *argv[] )
{
  MPI_Init( ,  );

  int rank;
  MPI_Comm_rank(MPI_COMM_WORLD, );
  //if (rank == 0)
  {
std::vector > a(100);
for (long long i = 0; i < 100; i++)
{
  std::cout << i << std::endl;
  try
  {
a[i].resize(10);
  }
  catch (std::bad_alloc b)
  {
std::cout << "out" << std::endl;
continue;
  }
  usleep(100);
}
  }

  MPI_Finalize();
  return 0;
}
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users