Re: [OMPI users] MPI_Comm_set_errhandler: error in Fortran90 Interfacempi.mod

2010-05-03 Thread Jeff Squyres
Paul --

Most excellent; thanks for the diagnosis and the reproducer.  You are 
absolutely correct that we have a bug in the F90 interface in 
MPI_COMM_SET_ERRHANDLER and MPI_WIN_SET_ERRHANDLER.  The INTENT for the 
communicator parameter was mistakenly set to INOUT instead of just IN, meaning 
that a constant parameter like MPI_COMM_WORLD is prohibited from being passed.

A workaround is to assign MPI_COMM_WORLD to temporary integer variable and use 
that instead.

The fix for OMPI is very easy, but I need to double check with some fortran 
experts tomorrow about the ABI implications for libmpif90.so.


On May 3, 2010, at 10:44 AM, Paul Kapinos wrote:

> Hello OpenMPI / Sun/Oracle MPI folks,
> 
> we believe that the OpenMPI and SunMPI (Cluster Tools)  has an error in
> the Fortran-90 (f90) bindings of the MPI_Comm_set_errhandler routine.
> 
> Tested MPI versions: OpenMPI/1.3.3 and Cluster Tools 8.2.1
> 
> Consider the attached example. This file uses the "USE MPI" to bind the
> MPI routines f90-style. The f77-style "include 'mpif.h'" is commented out.
> 
> If using Intel MPI the attached example is running error-free (with both
> bindings).
> 
> If trying to compiler with OpenMPI and using f90 bindings, any compilers
> tested (Intel/11.1, Sun Studio/12.1, gcc/4.1) says the code cannot be
> build because of trying to use a constant (MPI_COMM_WORLD) as input.
> 
> For example, the output of the Intel compiler:
> -
> MPI_Comm_set_errhandler.f90(12): error #6638: An actual argument is an
> expression or constant; this is not valid since the associated dummy
> argument has the explicit INTENT(OUT) or INTENT(INOUT) attribute.   [0]
> call MPI_Comm_set_errhandler (MPI_COMM_WORLD, errhandler, ierr)  !
> MPI_COMM_WORLD in MPI_Comm_set_errhandler is the problem...
> --^
> compilation aborted for MPI_Comm_set_errhandler.f90 (code 1)
> -
> With the f77 bindings, the attached program compiles and runs fine.
> 
> The older (deprecated) routine MPI_Errhandler_set which is defined to
> have the same functionality works fine with both bindings and all MPI's.
> 
> So, we believe the OpenMPI implementation of MPI standard erroneously
> sets the INTENT(OUT) or INTENT(INOUT) attribute for the communicator
> attribute. The definition of an error handle for MPI_COMM_WORLD should
> be possible which it is currently not.
> 
> Best wishes,
> Paul Kapinos
> 
> 
> 
> 
> 
> --
> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
> RWTH Aachen University, Center for Computing and Communication
> Seffenter Weg 23,  D 52074  Aachen (Germany)
> Tel: +49 241/80-24915
> 
> PROGRAM sunerr
> USE MPI   ! f90: Error on 
> MPI_Comm_set_errhandler if using this with OpenMPI / Sun MPI
> !include 'mpif.h'  ! f77: Works fine with all MPI's 
> tested
> IMPLICIT NONE
> !
> integer :: data = 1, errhandler, ierr
> external AbortWithMessage
> !
> call MPI_Init(ierr)
> call MPI_Comm_create_errhandler (AbortWithMessage, errhandler, ierr)  ! 
> Creating a handle: no problem
> 
> call MPI_Comm_set_errhandler (MPI_COMM_WORLD, errhandler, ierr)  ! 
> MPI_COMM_WORLD in MPI_Comm_set_errhandler is the problem... in f90
> !call MPI_Errhandler_set (MPI_COMM_WORLD, errhandler, ierr)! and 
> this one deprecated function works fine both for f77 and f90
> 
> 
> ! ... a errornous MPI routine ... 
> call MPI_Send (data, 1, MPI_INTEGER, 1, -12, MPI_COMM_WORLD, ierr)
> call MPI_Finalize( ierr )
> 
> END PROGRAM sunerr
> 
> 
> 
> subroutine AbortWithMessage (comm, errorcode)
>  use mpi
>  implicit none
>  integer :: comm, errorcode
>  character(LEN=MPI_MAX_ERROR_STRING) :: errstr
>  integer :: stringlength, ierr
>  call MPI_Error_string (errorcode, errstr, stringlength, ierr)
>  write (*,*) 'Error:  =+=>  ', errstr, ' =+=> Aborting'
>  call MPI_Abort (comm, errorcode, ierr)
> end subroutine AbortWithMessage
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI users] MPIError:MPI_Recv: MPI_ERR_TRUNCATE:

2010-05-03 Thread Pooja Varshneya

Hi All,

I have written a program where MPI master sends and receives large  
amount of data i.e sending from 1KB to 1MB of data.

The amount of data to be sent with each call is different

The program runs well when running with 5 slaves, but when i try to  
run the same program with 9 slaves, it gives me  
MPI_Recv:MPI_ERR_TRUNCATE: message truncated error.


I am using boost MPI and boost serialization libraries for sending data.
I understand that the internal buffer on the master are overrun in  
this case. Is there a way i can increase the buffer sizes ?


Here is the output:
-bash-3.2$ mpirun -np 9 --hostfile hostfile2 --rankfile rankfile2  
$BENCHMARKS_ROOT/bin/boost_binomial_LB 10 5000_steps.txt  
5000_homo_bytes.txt
Master: Starting Binomial Option Price calculations for American call  
option

Master: Current stock price: 110
Master: Strike price: 100
Master: Risk-free rate: 1.05
Master: Volatility (annualized): 0.15
Master: Time (years): 1
Master: Number of calculations: 10

Slave 1:Going to Received Skeleton: 1
Slave 1:Received Skeleton: 1
Slave 1:Gpoing to Received Payload: 1
Slave 1:Received Payload: 1
Master: Sent initial message
Master: Sent initial message
Master: Sent initial message
Slave 2:Going to Received Skeleton: 2
Slave 2:Received Skeleton: 2
Slave 2:Gpoing to Received Payload: 2
Slave 2:Received Payload: 2
Slave 3:Going to Received Skeleton: 3
Slave 3:Received Skeleton: 3
Slave 3:Gpoing to Received Payload: 3
Slave 3:Received Payload: 3
Slave 4:Going to Received Skeleton: 4
Slave 4:Received Skeleton: 4
Slave 4:Gpoing to Received Payload: 4
Slave 1: Sent Response Skeleton: 1
Master: Sent initial message
Slave 4:Received Payload: 4
Slave 5:Going to Received Skeleton: 5
terminate called after throwing an instance of  
'boost 
::exception_detail 
::clone_impl 
'

  what():  MPI_Recv: MPI_ERR_TRUNCATE: message truncated
[rh5x64-u12:26987] *** Process received signal ***
[rh5x64-u12:26987] Signal: Aborted (6)
[rh5x64-u12:26987] Signal code:  (-6)
[rh5x64-u12:26987] [ 0] /lib64/libpthread.so.0 [0x3ba680e7c0]
[rh5x64-u12:26987] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3ba5c30265]
[rh5x64-u12:26987] [ 2] /lib64/libc.so.6(abort+0x110) [0x3ba5c31d10]
[rh5x64-u12:26987] [ 3] /usr/lib64/libstdc++.so. 
6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x114) [0x3bb7abec44]

[rh5x64-u12:26987] [ 4] /usr/lib64/libstdc++.so.6 [0x3bb7abcdb6]
[rh5x64-u12:26987] [ 5] /usr/lib64/libstdc++.so.6 [0x3bb7abcde3]
[rh5x64-u12:26987] [ 6] /usr/lib64/libstdc++.so.6 [0x3bb7abceca]
[rh5x64-u12:26987] [ 7] /userdata/testing/benchmark_binaries/bin/ 
boost_binomial_LB(_ZN5boost15throw_exceptionINS_3mpi9exceptionEEEvRKT_ 
+0x172) [0x4216a2]
[rh5x64-u12:26987] [ 8] /usr/local/lib/libboost_mpi.so. 
1.42.0 
(_ZN5boost3mpi6detail19packed_archive_recvEP19ompi_communicator_tiiRNS0_15packed_iarchiveER20ompi_status_public_t 
+0x16b) [0x2b0317faa6b3]
[rh5x64-u12:26987] [ 9] /usr/local/lib/libboost_mpi.so. 
1.42.0 
(_ZNK5boost3mpi12communicator4recvINS0_15packed_iarchiveEEENS0_6statusEiiRT_ 
+0x40) [0x2b0317f9c72a]
[rh5x64-u12:26987] [10] /usr/local/lib/libboost_mpi.so. 
1.42.0 
(_ZNK5boost3mpi12communicator4recvINS0_24packed_skeleton_iarchiveEEENS0_6statusEiiRT_ 
+0x38) [0x2b0317f9c76c]
[rh5x64-u12:26987] [11] /userdata/testing/benchmark_binaries/bin/ 
boost_binomial_LB 
(_ZNK5boost3mpi12communicator4recvI31Binomial_Option_Pricing_RequestEENS0_6statusEiiRKNS0_14skeleton_proxyIT_EE 
+0x121) [0x4258c1]
[rh5x64-u12:26987] [12] /userdata/testing/benchmark_binaries/bin/ 
boost_binomial_LB(main+0x409) [0x41d369]
[rh5x64-u12:26987] [13] /lib64/libc.so.6(__libc_start_main+0xf4)  
[0x3ba5c1d994]
[rh5x64-u12:26987] [14] /userdata/testing/benchmark_binaries/bin/ 
boost_binomial_LB(__gxx_personality_v0+0x399) [0x419e69]

[rh5x64-u12:26987] *** End of error message ***
[rh5x64-u11.zlab.local][[47840,1],0][btl_tcp_frag.c: 
216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed:  
Connection reset by peer (104)

--
mpirun noticed that process rank 5 with PID 26987 on node 172.10.0.112  
exited on signal 6 (Aborted).

--

Here is the program code:

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include "ace/OS_NS_sys_time.h"
#include "ace/OS_NS_time.h"
#include "ace/Profile_Timer.h"

using namespace MPI;
using std::scientific;
using namespace std;

namespace mpi = boost::mpi;

#define STOPTAG 0

std::ofstream output_file;

static void master (int & n_calls,
std::string _file_name,
 

Re: [OMPI users] Can compute, but can not output files

2010-05-03 Thread Mohamad Chaarawi
One thing to check for is that you specified the cflags/ldflags/libs for
pvfs2 when u configured OMPI:

that's what i do to get ompi to work over pvfs2 on our cluster:

./configure CFLAGS=-I/path-to-pvfs2/include/
LDFLAGS=-L/path-to-pvfs2/lib/ LIBS="-lpvfs2 -lpthread"
--with-wrapper-cflags=-I/path-to-pvfs2/include/
--with-wrapper-ldflags=-L/path-to-pvfs2/lib/
--with-wrapper-libs="-lpvfs2 -lpthread"
--with-io-romio-flags="--with-file-system=pvfs2+ufs+nfs
--with-pvfs2=/path-to-pvfs2/" ...

Thanks
Mohamad

JiangjunZheng wrote:
> Dear All,
>
> I am using Rocks+openmpi+hdf5+pvfs2. The soft on the rocks+pvfs2
> cluster will output hdf5 files after computing. However, when the
> output starts, it shows errors:
> [root@nanohv pvfs2]# ./hdf5_mpio DH-ey-001400.20.h5
> Testing simple C MPIO program with 1 processes accessing file
> DH-ey-001400.20.h5
> (Filename can be specified via program argument)
> Proc 0: hostname=nanohv.columbia.edu
> Proc 0: MPI_File_open failed (MPI_ERR_IO: input/output error)
> --
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with
> errorcode 1.
>
> If run in a none shared folder on the main node, the program goes
> well. it shows:
> Proc 0: hostname=nanohv.columbia.edu
> Proc 0: all tests passed
>
> The following is the setting of the PATH and LD_LIBRARY_PATH on one of
> the nodes (I don't know whether it is because the hdf5 program can not
> find something from openmpi I/O. What will be needed when it is going
> to input and output files?):
> [root@compute-0-3 ~]# $PATH
> -bash: 
> /usr/kerberos/sbin:/usr/kerberos/bin:/usr/java/latest/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/rocks/bin:/opt/rocks/sbin:/opt/gm/bin:/opt/hdf5/bin:/opt/meep-mpi/bin:/opt/openmpi/bin:/opt/pvfs2/bin:/root/bin:
>
> [root@compute-0-3 ~]# $LD_LIBRARY_PATH
> -bash: 
> :/opt/gm/lib:/opt/hdf5/lib:/opt/meep-mpi/lib:/opt/openmpi/lib:/opt/pvfs2/lib: 
> No such file or directory
>
> [root@compute-0-3 ~]# mount -t pvfs2
> tcp://nanohv:3334/pvfs2-fs on /mnt/pvfs2 type pvfs2 (rw)
>
> [root@compute-0-3 ~]# ompi_info | grep gm
>  MCA btl: gm (MCA v2.0, API v2.0, Component v1.4.1)
>   
>
> The attached "log.out" is obtained by "./configure --with-gm
> --prefix=/opt/openmpi | tee log.out"
>
> Can anyone give some suggestions what is the reason of the
> input/output error? MANY THANKS!!!
>
> Best,
> Jiangjun
>
>
> 
> ?? 
> 
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] MPI_Comm_set_errhandler: error in Fortran90 Interface mpi.mod

2010-05-03 Thread Paul Kapinos

Hello OpenMPI / Sun/Oracle MPI folks,

we believe that the OpenMPI and SunMPI (Cluster Tools)  has an error in 
the Fortran-90 (f90) bindings of the MPI_Comm_set_errhandler routine.


Tested MPI versions: OpenMPI/1.3.3 and Cluster Tools 8.2.1

Consider the attached example. This file uses the "USE MPI" to bind the 
MPI routines f90-style. The f77-style "include 'mpif.h'" is commented out.


If using Intel MPI the attached example is running error-free (with both 
bindings).


If trying to compiler with OpenMPI and using f90 bindings, any compilers 
tested (Intel/11.1, Sun Studio/12.1, gcc/4.1) says the code cannot be 
build because of trying to use a constant (MPI_COMM_WORLD) as input.


For example, the output of the Intel compiler:
-
MPI_Comm_set_errhandler.f90(12): error #6638: An actual argument is an 
expression or constant; this is not valid since the associated dummy 
argument has the explicit INTENT(OUT) or INTENT(INOUT) attribute.   [0]
call MPI_Comm_set_errhandler (MPI_COMM_WORLD, errhandler, ierr)  ! 
MPI_COMM_WORLD in MPI_Comm_set_errhandler is the problem...

--^
compilation aborted for MPI_Comm_set_errhandler.f90 (code 1)
-
With the f77 bindings, the attached program compiles and runs fine.

The older (deprecated) routine MPI_Errhandler_set which is defined to 
have the same functionality works fine with both bindings and all MPI's.


So, we believe the OpenMPI implementation of MPI standard erroneously 
sets the INTENT(OUT) or INTENT(INOUT) attribute for the communicator 
attribute. The definition of an error handle for MPI_COMM_WORLD should 
be possible which it is currently not.


Best wishes,
Paul Kapinos





--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
PROGRAM sunerr
USE MPI   ! f90: Error on MPI_Comm_set_errhandler if using this with OpenMPI / Sun MPI
!include 'mpif.h'  ! f77: Works fine with all MPI's tested
IMPLICIT NONE
!
integer :: data = 1, errhandler, ierr
external AbortWithMessage
!
call MPI_Init(ierr)
call MPI_Comm_create_errhandler (AbortWithMessage, errhandler, ierr)  ! Creating a handle: no problem

call MPI_Comm_set_errhandler (MPI_COMM_WORLD, errhandler, ierr)  ! MPI_COMM_WORLD in MPI_Comm_set_errhandler is the problem... in f90
!call MPI_Errhandler_set (MPI_COMM_WORLD, errhandler, ierr)! and this one deprecated function works fine both for f77 and f90


! ... a errornous MPI routine ... 
call MPI_Send (data, 1, MPI_INTEGER, 1, -12, MPI_COMM_WORLD, ierr)
call MPI_Finalize( ierr )

END PROGRAM sunerr



subroutine AbortWithMessage (comm, errorcode)
  use mpi
  implicit none
  integer :: comm, errorcode
  character(LEN=MPI_MAX_ERROR_STRING) :: errstr
  integer :: stringlength, ierr
  call MPI_Error_string (errorcode, errstr, stringlength, ierr)
  write (*,*) 'Error:  =+=>  ', errstr, ' =+=> Aborting'
  call MPI_Abort (comm, errorcode, ierr)
end subroutine AbortWithMessage



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Can compute, but can not output files

2010-05-03 Thread Jeff Squyres
On Apr 30, 2010, at 10:36 PM, JiangjunZheng wrote:

> I am using Rocks+openmpi+hdf5+pvfs2. The soft on the rocks+pvfs2 cluster will 
> output hdf5 files after computing. However, when the output starts, it shows 
> errors:
> [root@nanohv pvfs2]# ./hdf5_mpio DH-ey-001400.20.h5
> Testing simple C MPIO program with 1 processes accessing file 
> DH-ey-001400.20.h5
> (Filename can be specified via program argument)
> Proc 0: hostname=nanohv.columbia.edu
> Proc 0: MPI_File_open failed (MPI_ERR_IO: input/output error)
> --
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 
> 1.
> 
> If run in a none shared folder on the main node, the program goes well. it 
> shows:
> Proc 0: hostname=nanohv.columbia.edu
> Proc 0: all tests passed

This seems to indicate that the file failed to open for some reason in your 
first test.

Given that this is an HDF5 test program, you might want to ping them for more 
details...?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] openmpi 1.4.1 and xgrid

2010-05-03 Thread Jeff Squyres
On Apr 30, 2010, at 7:12 PM, Ralph Castain wrote:

> I build it on Mac 10.6 every time we do an update to the 1.4 series, without 
> problem. --without-xgrid or --with-xgrid=no should both work just fine (I use 
> the latter myself).

Ditto.  I just downloaded 1.4.1 and tried it on my 10.6 mbp and when using 
--without-xgrid, I see:

--- MCA component plm:xgrid (m4 configuration macro)
checking for MCA component plm:xgrid compile mode... dso
checking if MCA component plm:xgrid can compile... no

and when not using that, I see:

--- MCA component plm:xgrid (m4 configuration macro)
checking for MCA component plm:xgrid compile mode... dso
checking if C and Objective C are link compatible... yes
checking for XgridFoundation Framework... yes
checking if MCA component plm:xgrid can compile... yes

You might want to double check that you're not just installing over an old 
installation that still contains the xgrid plugin.  OMPI's plugins are 
installed as individual files.  So if you install with xgrid support, you've 
installed the xgrid plugin.  If you then re-install in the same installation 
tree *without* xgrid support, then you'll still have xgrid support because the 
plugin will still be there from the prior install.

FWIW, you can remove the xgrid plugin by removing 
ompi_install_tree/lib/openmpi/*xgrid*.  Then ompi_info | grep xgrid should turn 
up nothing.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/