Re: [OMPI users] How to configure Intel Visual Fortran to work with OpenMPI

2012-01-18 Thread Shiqing Fan

Hi Robert,

This is a known issue. The released binary was built on Windows Server 
2008,  which has newer Windows system dependencies. We have fixed this 
problem and it is included for the next release. If you don't want to 
switch to another Windows version, I can build one working binary 
package and pass it to you off-list, meanwhile, you can also wait for 
the upcoming release.



Best Regards,
Shiqing


On 2012-01-18 6:21 PM, Robert garcia wrote:

hi all,

I'm experiencing a difficulty when I'm trying to run the fortran code 
which which uses parallel processing. I started with downloading the 
/"OpenMPI_v1.5.4-1_win32.exe/ 
" I've 
configured everything in IVF to point out to the correct static libs 
and search paths, also set the enviromental variables Path to 
directory where OpenMPI dll's(e.g. libmpi.dll, libmpid.dll 
etc..) reside. Program compiles and links properly till the message 
pops out like "The procedure entry point InterlockedCompareExchange64 
could not be located in the dynamic link library KERNEL32.dll"

What can be the problem ?
Regards,



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
---
Shiqing Fan
High Performance Computing Center Stuttgart (HLRS)
Tel: ++49(0)711-685-87234  Nobelstrasse 19
Fax: ++49(0)711-685-65832  70569 Stuttgart
http://www.hlrs.de/organization/people/shiqing-fan/
email: f...@hlrs.de



Re: [OMPI users] Possible bug in finalize, OpenMPI v1.5, head revision

2012-01-18 Thread Andrew Senin
No, nothing specific. Only basic settings (--mca btl openib,self
--npernode 1, etc).

Actually I'm were confused with this error because today it just
disapeared. I had 2 separate folders where it was reproduced in 100%
of test runs. Today I recompiled the source and it is gone in both
folders. But yesterday I tried recompiling multiple times with no
effect. So I believe this must be somehow related to some unknown
settings in the lab which have been changed. Trying to reproduce the
crash now...

Regards,
Andrew Senin.

On Thu, Jan 19, 2012 at 12:05 AM, Jeff Squyres  wrote:
> Jumping in pretty late in this thread here...
>
> I see that it's failing in opal_hwloc_base_close().  That's a little 
> worrysome.
>
> I do see an odd path through the hwloc initialization that *could* cause an 
> error during finalization -- but it would involve you setting an invalid 
> value for an MCA parameter.  Are you setting 
> hwloc_base_mem_bind_failure_action or
> hwloc_base_mem_alloc_policy, perchance?
>
>
> On Jan 16, 2012, at 1:56 PM, Andrew Senin wrote:
>
>> Hi,
>>
>> I think I've found a bug in the hear revision of the OpenMPI 1.5
>> branch. If it is configured with --disable-debug it crashes in
>> finalize on the hello_c.c example. Did I miss something out?
>>
>> Configure options:
>> ./configure --with-pmi=/usr/ --with-slurm=/usr/ --without-psm
>> --disable-debug --enable-mpirun-prefix-by-default
>> --prefix=/hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install
>>
>> Runtime command and output:
>> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:../lib ./mpirun --mca btl openib,self
>> --npernode 1 --host mir1,mir2 ./hello
>>
>> Hello, world, I am 0 of 2
>> Hello, world, I am 1 of 2
>> [mir1:05542] *** Process received signal ***
>> [mir1:05542] Signal: Segmentation fault (11)
>> [mir1:05542] Signal code: Address not mapped (1)
>> [mir1:05542] Failing at address: 0xe8
>> [mir2:10218] *** Process received signal ***
>> [mir2:10218] Signal: Segmentation fault (11)
>> [mir2:10218] Signal code: Address not mapped (1)
>> [mir2:10218] Failing at address: 0xe8
>> [mir1:05542] [ 0] /lib64/libpthread.so.0() [0x390d20f4c0]
>> [mir1:05542] [ 1]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(+0x1346a8)
>> [0x7f4588cee6a8]
>> [mir1:05542] [ 2]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_hwloc_base_close+0x32)
>> [0x7f4588cee700]
>> [mir1:05542] [ 3]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_finalize+0x73)
>> [0x7f4588d1beb2]
>> [mir1:05542] [ 4]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(orte_finalize+0xfe)
>> [0x7f4588c81eb5]
>> [mir1:05542] [ 5]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(ompi_mpi_finalize+0x67a)
>> [0x7f4588c217c3]
>> [mir1:05542] [ 6]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(PMPI_Finalize+0x59)
>> [0x7f4588c39959]
>> [mir1:05542] [ 7] ./hello(main+0x69) [0x4008fd]
>> [mir1:05542] [ 8] /lib64/libc.so.6(__libc_start_main+0xfd) [0x390ca1ec5d]
>> [mir1:05542] [ 9] ./hello() [0x4007d9]
>> [mir1:05542] *** End of error message ***
>> [mir2:10218] [ 0] /lib64/libpthread.so.0() [0x3a6dc0f4c0]
>> [mir2:10218] [ 1]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(+0x1346a8)
>> [0x7f409f31d6a8]
>> [mir2:10218] [ 2]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_hwloc_base_close+0x32)
>> [0x7f409f31d700]
>> [mir2:10218] [ 3]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_finalize+0x73)
>> [0x7f409f34aeb2]
>> [mir2:10218] [ 4]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(orte_finalize+0xfe)
>> [0x7f409f2b0eb5]
>> [mir2:10218] [ 5]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(ompi_mpi_finalize+0x67a)
>> [0x7f409f2507c3]
>> [mir2:10218] [ 6]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(PMPI_Finalize+0x59)
>> [0x7f409f268959]
>> [mir2:10218] [ 7] ./hello(main+0x69) [0x4008fd]
>> [mir2:10218] [ 8] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3a6d41ec5d]
>> [mir2:10218] [ 9] ./hello() [0x4007d9]
>> [mir2:10218] *** End of error message ***
>> --
>> mpirun noticed that process rank 0 with PID 5542 on node mir1 exited
>> on signal 11 (Segmentation fault).
>> -
>>
>> Thanks,
>> Andrew Senin
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>

[OMPI users] Bug Report for MPI_Alltoall

2012-01-18 Thread David Race
One of our users makes use of the MPI_IN_PLACE option, but there appears to be 
a bug in the MPI_Alltoall.  According to the specification -


The “in place” option for intracommunicators is specified by passing 
MPI_IN_PLACE to
the argument sendbuf at all processes. In such a case, sendcount and sendtype 
are ignored.
The data to be sent is taken from the recvbuf and replaced by the received 
data. Data sent
and received must have the same type map as specified by recvcount and recvtype.

The application fails with 


[prod-0002:12156] *** An error occurred in MPI_Alltoall
[prod-0002:12156] *** on communicator MPI_COMM_WORLD
[prod-0002:12156] *** MPI_ERR_ARG: invalid argument of some other kind
[prod-0002:12156] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

The file below shows the potential bug:



//
//  test program for the potential invalid argument bug
//
//  David Race
//  18 Jan 2012
//

#include 
#include 
#include 
#include 
//
//  mpi
//
#include "mpi.h"
#define  MAX_SIZE 32
//
//
//
int main ( int argc, char *argv[] )
{
//
//  definitions
//
int mpierror, isize, myRank; 
int typeSize;
int valA[MAX_SIZE], valB[MAX_SIZE];
int i, j;
int commRoot;
//
//  start processing
//
printf("Start of program\n");
printf("SIZE OF VALA %ld\n",sizeof(valA));

mpierror = MPI_Init ( ,  );
mpierror = MPI_Comm_rank ( MPI_COMM_WORLD,  );
mpierror = MPI_Comm_size ( MPI_COMM_WORLD,  );
MPI_Barrier(MPI_COMM_WORLD);
//
//  test the mpi_type_size using MPI_Alltoall
//
if (myRank == 0) {

printf("=\n");
printf("   Alltoall : Should work
\n");

printf("=\n");
}
fflush(stdout);
for(i=0;i

[OMPI users] Checkpoint an MPI process

2012-01-18 Thread Rodrigo Oliveira
Hi,

I'd like to know if there is a way to checkpoint a specific process running
under an mpirun call. In other words, is there a function CHECKPOINT(rank)
in which I can pass the rank of the process I want to checkpoint? I do not
want to checkpoint the entire application, but just one of its processes.

Thanks


Re: [OMPI users] Possible bug in finalize, OpenMPI v1.5, head revision

2012-01-18 Thread Jeff Squyres
Jumping in pretty late in this thread here...

I see that it's failing in opal_hwloc_base_close().  That's a little worrysome. 
 

I do see an odd path through the hwloc initialization that *could* cause an 
error during finalization -- but it would involve you setting an invalid value 
for an MCA parameter.  Are you setting hwloc_base_mem_bind_failure_action or 
hwloc_base_mem_alloc_policy, perchance?


On Jan 16, 2012, at 1:56 PM, Andrew Senin wrote:

> Hi,
> 
> I think I've found a bug in the hear revision of the OpenMPI 1.5
> branch. If it is configured with --disable-debug it crashes in
> finalize on the hello_c.c example. Did I miss something out?
> 
> Configure options:
> ./configure --with-pmi=/usr/ --with-slurm=/usr/ --without-psm
> --disable-debug --enable-mpirun-prefix-by-default
> --prefix=/hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install
> 
> Runtime command and output:
> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:../lib ./mpirun --mca btl openib,self
> --npernode 1 --host mir1,mir2 ./hello
> 
> Hello, world, I am 0 of 2
> Hello, world, I am 1 of 2
> [mir1:05542] *** Process received signal ***
> [mir1:05542] Signal: Segmentation fault (11)
> [mir1:05542] Signal code: Address not mapped (1)
> [mir1:05542] Failing at address: 0xe8
> [mir2:10218] *** Process received signal ***
> [mir2:10218] Signal: Segmentation fault (11)
> [mir2:10218] Signal code: Address not mapped (1)
> [mir2:10218] Failing at address: 0xe8
> [mir1:05542] [ 0] /lib64/libpthread.so.0() [0x390d20f4c0]
> [mir1:05542] [ 1]
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(+0x1346a8)
> [0x7f4588cee6a8]
> [mir1:05542] [ 2]
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_hwloc_base_close+0x32)
> [0x7f4588cee700]
> [mir1:05542] [ 3]
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_finalize+0x73)
> [0x7f4588d1beb2]
> [mir1:05542] [ 4]
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(orte_finalize+0xfe)
> [0x7f4588c81eb5]
> [mir1:05542] [ 5]
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(ompi_mpi_finalize+0x67a)
> [0x7f4588c217c3]
> [mir1:05542] [ 6]
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(PMPI_Finalize+0x59)
> [0x7f4588c39959]
> [mir1:05542] [ 7] ./hello(main+0x69) [0x4008fd]
> [mir1:05542] [ 8] /lib64/libc.so.6(__libc_start_main+0xfd) [0x390ca1ec5d]
> [mir1:05542] [ 9] ./hello() [0x4007d9]
> [mir1:05542] *** End of error message ***
> [mir2:10218] [ 0] /lib64/libpthread.so.0() [0x3a6dc0f4c0]
> [mir2:10218] [ 1]
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(+0x1346a8)
> [0x7f409f31d6a8]
> [mir2:10218] [ 2]
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_hwloc_base_close+0x32)
> [0x7f409f31d700]
> [mir2:10218] [ 3]
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_finalize+0x73)
> [0x7f409f34aeb2]
> [mir2:10218] [ 4]
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(orte_finalize+0xfe)
> [0x7f409f2b0eb5]
> [mir2:10218] [ 5]
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(ompi_mpi_finalize+0x67a)
> [0x7f409f2507c3]
> [mir2:10218] [ 6]
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(PMPI_Finalize+0x59)
> [0x7f409f268959]
> [mir2:10218] [ 7] ./hello(main+0x69) [0x4008fd]
> [mir2:10218] [ 8] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3a6d41ec5d]
> [mir2:10218] [ 9] ./hello() [0x4007d9]
> [mir2:10218] *** End of error message ***
> --
> mpirun noticed that process rank 0 with PID 5542 on node mir1 exited
> on signal 11 (Segmentation fault).
> -
> 
> Thanks,
> Andrew Senin
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] mpirun hangs when used on more than 2 CPUs ( mpirun compiled without thread support )

2012-01-18 Thread Theiner, Andre
Thanks, Jeff and Ralph for your good help.
I do not know yet, whether OpenFoam uses threads with OpenMPI but I will find 
out.

I ran "ompi_info" and it output the lines in the next chapter.
The important line is " Thread support: posix (mpi: no, progress: no)".
At first sight the above line made me think that I found the cause of the 
problem
but I compared the output to the output of the same command run on another 
machine
where OpenFoam runs fine. The OpenMPI version of that machine is 1.3.2-1.1 and 
it
also does not have thread support.
The difference though is that that machine's OpenFoam version is 1.7.1 and not 
2.0.1 and the
OS is SUSE Linux Enterprise Desktop 11 SP1 and not openSUSE 11.3.
So I am at the beginning of the search for the cause of the problem.

 Package: Open MPI abuild@build30 Distribution
Open MPI: 1.3.2
   Open MPI SVN revision: r21054
   Open MPI release date: Apr 21, 2009
Open RTE: 1.3.2
   Open RTE SVN revision: r21054
   Open RTE release date: Apr 21, 2009
OPAL: 1.3.2
   OPAL SVN revision: r21054
   OPAL release date: Apr 21, 2009
Ident string: 1.3.2
  Prefix: /usr/lib64/mpi/gcc/openmpi
 Configured architecture: x86_64-unknown-linux-gnu
  Configure host: build30
   Configured by: abuild
   Configured on: Fri Sep 23 05:58:54 UTC 2011
  Configure host: build30
Built by: abuild
Built on: Fri Sep 23 06:11:31 UTC 2011
  Built host: build30
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/bin/gfortran
  Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/bin/gfortran
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
   Sparse Groups: no
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
 MPI I/O support: yes
   MPI_WTIME support: gettimeofday
Symbol visibility support: yes
   FT Checkpoint support: no  (checkpoint thread: no)
   MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.3.2)
  MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3.2)
   MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.3.2)
   MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.3.2)
   MCA carto: file (MCA v2.0, API v2.0, Component v1.3.2)
   MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.3.2)
   MCA timer: linux (MCA v2.0, API v2.0, Component v1.3.2)
 MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.2)
 MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3.2)
 MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.2)
  MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.2)
   MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.2)
   MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: self (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.2)
  MCA io: romio (MCA v2.0, API v2.0, Component v1.3.2)
   MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.2)
   MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.2)
   MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.2)
 MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.2)
 MCA pml: csum (MCA v2.0, API v2.0, Component v1.3.2)
 MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.2)
 MCA pml: v (MCA v2.0, API v2.0, Component v1.3.2)
 MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.2)
  MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.2)
 MCA btl: self (MCA v2.0, API v2.0, Component v1.3.2)
 MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.2)
 MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.2)
MCA topo: unity (MCA v2.0, API 

[OMPI users] MPI_Type_struct for template class with dynamic arrays and objs. instantiated from other classes

2012-01-18 Thread Victor Pomponiu
Hi,

for several days I am trying to create MPI derived datatype in order to
send/receive an user defined object. I'm trying to implement it using
MPI::Datatype::Create_struct.
I have consulted several threads from the archive

http://www.open-mpi.org/community/lists/users/2012/01/18093.php
http://www.open-mpi.org/community/lists/users/2005/08/0123.php
http://www.open-mpi.org/community/lists/users/2008/08/6302.php

but I'm still havening difficulties to solve this issue.
There are some particular features that makes the task more difficult. Let
me explain: the obj. that I want to transmit is instantiated from a class
called MemberBlock. This class is a template class and contains: dynamic
allocating arrays, and objs. instantiated from other classes. Bellow is the
class declaration.

Therefore how can I construct a MPI dervied data type in this situation?
Any suggestions are highly appreciated

Thank you,
Victor Pomponiu

-
/**
 * VecData.h: Interface class for data appearing in vector format.
 */
# include "DistData.h" //Interface class for data having a pairwise
distance measure.

class VecData: public DistData {

public:
// no properties, only public/private methods;
.
}

/**
 * VecDataBlock.h:Base class for storable data having a pairwise
 * distance measure.
*/

class VecDataBlock {

public:
  VecData** dataList;   // Array of data items for this
block.
  int numItems;   // Number of data items assigned to the
block.
  int blockID;  // Integer identifier for this block.
  int sampleID;   // The sample identifier for this
block
  int globalOffset;   // Index of the first block item relative
to the full data set.
  char* fileNamePrefix;   // The file name prefix used for saving data
to disk.
  char commentChar; // The character denoting input comment lines.

// methods ..
}


/**
 * MemberBlock.h:   Class storing and managing member lists for a given
 *block of data objects.
 */

class MemberBlock_base {
public:
  virtual ~MemberBlock_base () {};
};

template 
class MemberBlock: public MemberBlock_base {

public:
  char* fileNamePrefix; // The file name prefix for the block save
file.
  ofstream* saveFile;   // refers to an open file currently being
used for accumulating
  VecDataBlock* dataBlock; // The block of data items upon
which
  int globalOffset;// The position of this block with
respect to the global ordering.
  int numItems;  // The number of data items assigned to
the block.
  int sampleLevel;  // The sample level from
which
  ScoreType** memberScoreLList;  // the scores of members associated with
   //   each data the
item.
  int** memberIndexLList;//  for each data item a list of global
indices of its members.
  int* memberSizeList;//   the number of list members.

  int memberListBufferSize;   // buffer size for storing an individual
member list.
  int saveCount;// Keeps track of the number of member
lists  saved
  float* tempDistBuffer;  // A temporary buffer for storing distances,
used for breaking

//methods
}