Re: [OMPI users] Stream interactions in CUDA

2012-12-12 Thread Jens Glaser
Hi Justin

from looking at your code it seems you are receiving more bytes from the 
processors then you send (I assume MAX_RECV_SIZE_PER_PE > send_sizes[p]).
I don't think this is valid. Your transfers should have matched sizes on the 
sending and receiving side. To achieve this, either communicate the message 
size before exchanging the actual data
(a simple MPI_Isend/MPI_Irecv pair with one MPI_INT will do), or use a 
mechanism provided by the MPI library for this. I believe MPI_Probe is made for 
this purpose.

As to why the transfers occur, my wild guess would be: you have set 
MAX_RECV_SIZE_PER_PE to something large, which would explain the size and 
number of the H2D transfers. 
I am just guessing, but maybe OMPI divides the data into chunks. Unless you are 
using intra-node Peer2Peer (smcuda), all MPI traffic has to go through the 
host, therefore the copies.
I don't know what causes the D2H transfers to be of the same size, the library 
might be doing something strange here, given that you have potentially
asked it to receive more data then you send - don't do that. Your third loop 
actually does not exchange the data, as you wrote, it just does an extra 
copying of data which in principle you could
avoid by sending the message sizes first.

Concerning your question about asynchronous copying. If you are using device 
buffers (and it seems you do) for MPI, then you will have to rely on the 
library to do asynchronous
copying of the buffers (cudaMemcpyAsync) for you. I don't know if OpenMPI does 
this, you could check the source. I think MVAPICH2 does. If you really want 
control over the streams,
you have to the D2H/H2D copying yourself, which is fine unless you are relying 
on peer-to-peer capability - but it seems you don't. If you are manually 
copying the data
you can give any stream parameter to the cudaMemcpyAsync calls you prefer.

My general experiences can be summarized as: achieving true async MPI 
computation is hard if using the CUDA support of the library, but very easy if 
you are using only the host
routines of MPI. Since your kernel calls are async with respect to host 
already, all you have to do is asynchronously copy the data between host and 
device.

Jens

On Dec 12, 2012, at 6:30 PM, Justin Luitjens wrote:

> Hello,
> 
> I'm working on an application using OpenMPI with CUDA and GPUDirect.  I would 
> like to get the MPI transfers to overlap with computation on the CUDA device. 
>  To do this I need to ensure that all memory transfers do not go to stream 0. 
>  In this application I have one step that performs an MPI_Alltoall operation. 
>  Ideally I would like this Alltoall operation to be asynchronous.  Thus I 
> have implemented my own Alltoall using Isend and Irecv.  Which can be found 
> at the bottom of this email.
> 
> The profiler shows that this operation has some very odd PCI-E traffic that I 
> was hoping someone could explain and help me eliminate.  In this example 
> NPES=2 and each process has its own M2090 GPU.  I am using cuda 5.0 and 
> OpenMPI-1.7rc5.  The behavior I am seeing is the following.  Once the Isend 
> loop occurs there is a sequence of DtoH followed by HtoD transfers.  These 
> transfers are 256K in size and there are 28 of them that occur.  Each of 
> these transfers are placed in stream0.  After this there are a few more small 
> transfers also placed in stream0.  Finally when the 3rd loop occurs there are 
> 2 DtoD transfers (this is the actual data being exchanged).  
> 
> Can anyone explain what all of the traffic ping-ponging back and forth 
> between the host and device is?  Is this traffic necessary? 
> 
> Thanks,
> Justin
> 
> 
> uint64_t scatter_gather( uint128 * input_buffer, uint128 *output_buffer, 
> uint128 *recv_buckets, int* send_sizes, int MAX_RECV_SIZE_PER_PE) {
> 
>  std::vector srequest(NPES), rrequest(NPES);
> 
>  //Start receives
>  for(int p=0;p
> MPI_Irecv(recv_buckets+MAX_RECV_SIZE_PER_PE*p,MAX_RECV_SIZE_PER_PE,MPI_INT_128,p,0,MPI_COMM_WORLD,[p]);
>  }
> 
>  //Start sends
>  int send_count=0;
>  for(int p=0;p
> MPI_Isend(input_buffer+send_count,send_sizes[p],MPI_INT_128,p,0,MPI_COMM_WORLD,[p]);
>send_count+=send_sizes[p];
>  }
> 
>  //Process outstanding receives
>  int recv_count=0;
>  for(int p=0;pMPI_Status status;
>MPI_Wait([p],);
>int count;
>MPI_Get_count(,MPI_INT_128,);
>assert(count
> cudaMemcpy(output_buffer+recv_count,recv_buckets+MAX_RECV_SIZE_PER_PE*p,count*sizeof(uint128),cudaMemcpyDeviceToDevice);
>recv_count+=count;
>  }
> 
>  //Wait for outstanding sends
>  for(int p=0;pMPI_Status status;
>MPI_Wait([p],);
>  }
>  return recv_count;
> }
> 
> ---
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure 

Re: [OMPI users] Stream interactions in CUDA

2012-12-12 Thread Dmitry N. Mikushin
Hi Justin,

Quick grepping reveals several cuMemcpy calls in OpenMPI. Some of them are
even synchronous, meaning stream0.

I think the best way of exploring this sort of behavior is to execute
OpenMPI runtime (thanks to its open-source nature!) under debugger. Rebuild
OpenMPI with -g -O0, add some initial sleep() into your app, such that this
time would be sufficient to gdb-attach to one of MPI processes. Once
attached, first put break on the beginning of your region of interest and
then break on cuMemcpy and cuMemcpyAsync.

Best,
- D.

2012/12/13 Justin Luitjens 

> Hello,
>
> I'm working on an application using OpenMPI with CUDA and GPUDirect.  I
> would like to get the MPI transfers to overlap with computation on the CUDA
> device.  To do this I need to ensure that all memory transfers do not go to
> stream 0.  In this application I have one step that performs an
> MPI_Alltoall operation.  Ideally I would like this Alltoall operation to be
> asynchronous.  Thus I have implemented my own Alltoall using Isend and
> Irecv.  Which can be found at the bottom of this email.
>
> The profiler shows that this operation has some very odd PCI-E traffic
> that I was hoping someone could explain and help me eliminate.  In this
> example NPES=2 and each process has its own M2090 GPU.  I am using cuda 5.0
> and OpenMPI-1.7rc5.  The behavior I am seeing is the following.  Once the
> Isend loop occurs there is a sequence of DtoH followed by HtoD transfers.
>  These transfers are 256K in size and there are 28 of them that occur.
>  Each of these transfers are placed in stream0.  After this there are a few
> more small transfers also placed in stream0.  Finally when the 3rd loop
> occurs there are 2 DtoD transfers (this is the actual data being exchanged).
>
> Can anyone explain what all of the traffic ping-ponging back and forth
> between the host and device is?  Is this traffic necessary?
>
> Thanks,
> Justin
>
>
> uint64_t scatter_gather( uint128 * input_buffer, uint128 *output_buffer,
> uint128 *recv_buckets, int* send_sizes, int MAX_RECV_SIZE_PER_PE) {
>
>   std::vector srequest(NPES), rrequest(NPES);
>
>   //Start receives
>   for(int p=0;p
> MPI_Irecv(recv_buckets+MAX_RECV_SIZE_PER_PE*p,MAX_RECV_SIZE_PER_PE,MPI_INT_128,p,0,MPI_COMM_WORLD,[p]);
>   }
>
>   //Start sends
>   int send_count=0;
>   for(int p=0;p
> MPI_Isend(input_buffer+send_count,send_sizes[p],MPI_INT_128,p,0,MPI_COMM_WORLD,[p]);
> send_count+=send_sizes[p];
>   }
>
>   //Process outstanding receives
>   int recv_count=0;
>   for(int p=0;p MPI_Status status;
> MPI_Wait([p],);
> int count;
> MPI_Get_count(,MPI_INT_128,);
> assert(count
> cudaMemcpy(output_buffer+recv_count,recv_buckets+MAX_RECV_SIZE_PER_PE*p,count*sizeof(uint128),cudaMemcpyDeviceToDevice);
> recv_count+=count;
>   }
>
>   //Wait for outstanding sends
>   for(int p=0;p MPI_Status status;
> MPI_Wait([p],);
>   }
>   return recv_count;
> }
>
>
> ---
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information.  Any unauthorized review, use, disclosure or
> distribution
> is prohibited.  If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
>
> ---
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] Open MPI videos

2012-12-12 Thread Ralph Castain
Hi all

For those of you who are interested in how Open MPI works, Greenplum/EMC 
recorded a set of talks last week covering several relevant topics. These talks 
are intended to help new developers become familiar with the more common parts 
of the code base. Greenplum is happy to make these available to the general 
community, and so we have posted them on the Open MPI web site at:

http://www.open-mpi.org/video/

In addition, last week's talks included one from Moe Jette on the Slurm 
resource manager, and another by yours truly on the porting of Map-Reduce to 
the OMPI code base. The following talks are available online or for download 
(including the slides):


Open MPI: Overview/Architecture by Jeff Squyres

Detailed overview of the Open MPI data transfer system by Brian Barrett

Overview of parallel I/O in Open MPI by Edgar Gabriel

Architecture, configuration, and use of Slurm by Moe Jette

Overview of Greenplum's port of Apache Hadoop\s Map-Reduce system to Open MPI 
by Ralph Castain

We hope these prove useful to you. Future videos on ORTE's design and messaging 
system are planned - requests for additional topics are welcome.
Ralph




Re: [OMPI users] problem configuring openmpi-1.6.4a1r27643 on Linux

2012-12-12 Thread Jeff Squyres
Can you send the config.log for the platform where it failed?

I'd like to see the specific compiler error that occurred.


On Dec 12, 2012, at 10:33 AM, Siegmar Gross wrote:

> Hi,
> 
> I tried to build openmpi-1.6.4a1r27643 on several platforms
> (Solaris Sparc, Solaris x86_64, and Linux x86_64) with Solaris
> Studio C (Sun C 5.12) in 32 and 64 bit mode. "configure" broke
> on Linux (openSuSE Linux 12.1) for the 32 bit version with the
> following error:
> 
> ...
> checking if Fortran 77 compiler supports INTEGER*16... yes
> checking size of Fortran 77 INTEGER*16... configure: error:
>  Could not determine size of INTEGER*16
> linpc1 openmpi-1.6.4-Linux.x86_64.32_cc 144 
> 
> 
> I could compile a 32 bit version of openmpi-1.9a1r27668 on the
> machine without problems.
> 
> linpc1 openmpi-1.9-Linux.x86_64.32_cc 148 grep "INTEGER\*16" 
> log.configure.Linux.x86_64.32_cc
> checking if Fortran compiler supports INTEGER*16... no
> linpc1 openmpi-1.9-Linux.x86_64.32_cc 149 
> 
> Does anybody have an idea why "configure" broke for
> openmpi-1.6.4a1r27643 and how I can fix the problem? Thank you
> very much for any help in advance.
> 
> 
> Kind regards
> 
> Siegmar
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] openmpi-1.7rc5 cannot install when build with ./configure --with-ft=cr

2012-12-12 Thread Ifeanyi
thanks Ralph.

On Thu, Dec 13, 2012 at 1:54 AM, Ralph Castain  wrote:

> The checkpoint/restart code in the 1.7 branch is almost certainly broken
> as the developer/maintainer of that code graduated and left for a colder
> climate. We do not yet have someone to take their place, so the future of
> that capability is somewhat in doubt.
>
> Afraid you'll have to stick with the 1.6 series for now.
>
> On Dec 12, 2012, at 12:38 AM, Ifeanyi  wrote:
>
> > Hi all,
> >
> > I am having trouble building openmpi-1.7rc5 with ../configure
> --with-ft=cr
> >
> > openmpi-1.7rc5# ./configure --with-ft=cr
> > openmpi-1.7rc5# make all install
> >
> > error message:
> > base/errmgr_base_fns.c:565:13: warning: ignoring return value of
> 'asprintf', declared with attribute warn_unused_result [-Wunused-result]
> > base/errmgr_base_fns.c: In function 'orte_errmgr_base_migrate_state_str':
> > base/errmgr_base_fns.c:384:17: warning: ignoring return value of
> 'asprintf', declared with attribute warn_unused_result [-Wunused-result]
> > base/errmgr_base_fns.c: In function 'orte_errmgr_base_abort':
> > base/errmgr_base_fns.c:244:18: warning: ignoring return value of
> 'vasprintf', declared with attribute warn_unused_result [-Wunused-result]
> > make[2]: *** [base/errmgr_base_fns.lo] Error 1
> > make[2]: Leaving directory
> `/home/abolap/Downloads/openmpi-1.7rc5/orte/mca/errmgr'
> > make[1]: *** [all-recursive] Error 1
> > make[1]: Leaving directory `/home/abolap/Downloads/openmpi-1.7rc5/orte'
> > make: *** [all-recursive] Error 1
> >
> > It install successfully when fault tolerance is not enabled on the build.
> >
> > Pls help.
> >
> > Regards - Ifeanyi
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] users Digest, Vol 2435, Issue 1

2012-12-12 Thread Extreme Programming
 thank you for your reply and I apologize for my English and for my
mistake, in every way I have had configured OPMI with this /usr/local/
prefix.


2012/12/12 

> Send users mailing list submissions to
> us...@open-mpi.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
> users-requ...@open-mpi.org
>
> You can reach the person managing the list at
> users-ow...@open-mpi.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
>1. Re: Live process migration (Ifeanyi)
>2. Re: Live process migration (Ifeanyi)
>3. Re: Live process migration (Ifeanyi)
>4. openmpi-1.7rc5 cannot install when build with ./configure
>   --with-ft=cr (Ifeanyi)
>5. Re: OpenMPI-1.6.3 MinGW64 buildup on Windows 7 (Ilias Miroslav)
>6. Re: OpenMPI-1.6.3 MinGW64 buildup on Windows 7 (Ilias Miroslav)
>7. Serveral issue in mpirun on macosx 10.8.2 (Extreme Programming)
>8. Re: OpenMPI-1.6.3 MinGW64 buildup on Windows 7 (Damien Hocking)
>9. Re: Serveral issue in mpirun on macosx 10.8.2 (Ralph Castain)
>   10. Re: openmpi-1.7rc5 cannot install when build with ./configure
>   --with-ft=cr (Ralph Castain)
>   11. problem configuring openmpi-1.6.4a1r27643 on Linux (Siegmar Gross)
>
>
> --
>
> Message: 1
> Date: Wed, 12 Dec 2012 09:58:45 +1100
> From: Ifeanyi 
> Subject: Re: [OMPI users] Live process migration
> To: Open MPI Users 
> Message-ID:
> <
> camxrty8+7jtgnznj30ossz-btt7sdgpb7erwkjbnbxpb7sh...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Thanks Josh.
>
> I will give a go.
>
> Regards - Ifeanyi
>
> On Wed, Dec 12, 2012 at 3:19 AM, Josh Hursey 
> wrote:
>
> > Process migration was implemented in Open MPI and working in the trunk a
> > couple of years ago. It has not been well maintained for a few years
> though
> > (hopefully that will change one day). So you can try it, but your results
> > may vary.
> >
> > Some details are at the link below:
> >   http://osl.iu.edu/research/ft/ompi-cr/tools.php#ompi-migrate
> >
> > On Mon, Dec 10, 2012 at 10:39 PM, Ifeanyi 
> wrote:
> >
> >> Hi all,
> >>
> >> Just wondering if live process migration of processes is supported in
> >> open mpi?
> >>
> >> or any idea of how to do live migration of processes pls.
> >>
> >> Regards,
> >> Ifeanyi
> >>
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >
> >
> >
> > --
> > Joshua Hursey
> > Assistant Professor of Computer Science
> > University of Wisconsin-La Crosse
> > http://cs.uwlax.edu/~jjhursey
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> -- next part --
> HTML attachment scrubbed and removed
>
> --
>
> Message: 2
> Date: Wed, 12 Dec 2012 13:04:59 +1100
> From: Ifeanyi 
> Subject: Re: [OMPI users] Live process migration
> To: Open MPI Users 
> Message-ID:
> <
> camxrty8u0zwyzkjslhvyuuhpwusbjvjbqj3vo8vdmrqbugb...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi Josh,
>
> I can checkpoint but cannot migrate.
>
> when I type ~openmpi-1.6# ompi-migrate ...  I got this problem
> bash: ompi-migrate: command not found
>
> Please assist.
>
> Regards - Ifeanyi
>
>
>
> On Wed, Dec 12, 2012 at 3:19 AM, Josh Hursey 
> wrote:
>
> > Process migration was implemented in Open MPI and working in the trunk a
> > couple of years ago. It has not been well maintained for a few years
> though
> > (hopefully that will change one day). So you can try it, but your results
> > may vary.
> >
> > Some details are at the link below:
> >   http://osl.iu.edu/research/ft/ompi-cr/tools.php#ompi-migrate
> >
> > On Mon, Dec 10, 2012 at 10:39 PM, Ifeanyi 
> wrote:
> >
> >> Hi all,
> >>
> >> Just wondering if live process migration of processes is supported in
> >> open mpi?
> >>
> >> or any idea of how to do live migration of processes pls.
> >>
> >> Regards,
> >> Ifeanyi
> >>
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >
> >
> >
> > --
> > Joshua Hursey
> > Assistant Professor of Computer Science
> > University of Wisconsin-La Crosse
> > http://cs.uwlax.edu/~jjhursey
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> 

Re: [OMPI users] Live process migration

2012-12-12 Thread Josh Hursey
ompi-migrate is not in the 1.6 release. It is only available in the Open
MPI trunk.

On Tue, Dec 11, 2012 at 8:04 PM, Ifeanyi  wrote:

> Hi Josh,
>
> I can checkpoint but cannot migrate.
>
> when I type ~openmpi-1.6# ompi-migrate ...  I got this problem
> bash: ompi-migrate: command not found
>
> Please assist.
>
> Regards - Ifeanyi
>
>
>
> On Wed, Dec 12, 2012 at 3:19 AM, Josh Hursey wrote:
>
>> Process migration was implemented in Open MPI and working in the trunk a
>> couple of years ago. It has not been well maintained for a few years though
>> (hopefully that will change one day). So you can try it, but your results
>> may vary.
>>
>> Some details are at the link below:
>>   http://osl.iu.edu/research/ft/ompi-cr/tools.php#ompi-migrate
>>
>>  On Mon, Dec 10, 2012 at 10:39 PM, Ifeanyi wrote:
>>
>>>  Hi all,
>>>
>>> Just wondering if live process migration of processes is supported in
>>> open mpi?
>>>
>>> or any idea of how to do live migration of processes pls.
>>>
>>> Regards,
>>> Ifeanyi
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> Joshua Hursey
>> Assistant Professor of Computer Science
>> University of Wisconsin-La Crosse
>> http://cs.uwlax.edu/~jjhursey
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Joshua Hursey
Assistant Professor of Computer Science
University of Wisconsin-La Crosse
http://cs.uwlax.edu/~jjhursey


[OMPI users] problem configuring openmpi-1.6.4a1r27643 on Linux

2012-12-12 Thread Siegmar Gross
Hi,

I tried to build openmpi-1.6.4a1r27643 on several platforms
(Solaris Sparc, Solaris x86_64, and Linux x86_64) with Solaris
Studio C (Sun C 5.12) in 32 and 64 bit mode. "configure" broke
on Linux (openSuSE Linux 12.1) for the 32 bit version with the
following error:

...
checking if Fortran 77 compiler supports INTEGER*16... yes
checking size of Fortran 77 INTEGER*16... configure: error:
  Could not determine size of INTEGER*16
linpc1 openmpi-1.6.4-Linux.x86_64.32_cc 144 


I could compile a 32 bit version of openmpi-1.9a1r27668 on the
machine without problems.

linpc1 openmpi-1.9-Linux.x86_64.32_cc 148 grep "INTEGER\*16" 
log.configure.Linux.x86_64.32_cc
checking if Fortran compiler supports INTEGER*16... no
linpc1 openmpi-1.9-Linux.x86_64.32_cc 149 

Does anybody have an idea why "configure" broke for
openmpi-1.6.4a1r27643 and how I can fix the problem? Thank you
very much for any help in advance.


Kind regards

Siegmar



Re: [OMPI users] openmpi-1.7rc5 cannot install when build with ./configure --with-ft=cr

2012-12-12 Thread Ralph Castain
The checkpoint/restart code in the 1.7 branch is almost certainly broken as the 
developer/maintainer of that code graduated and left for a colder climate. We 
do not yet have someone to take their place, so the future of that capability 
is somewhat in doubt.

Afraid you'll have to stick with the 1.6 series for now.

On Dec 12, 2012, at 12:38 AM, Ifeanyi  wrote:

> Hi all,
> 
> I am having trouble building openmpi-1.7rc5 with ../configure --with-ft=cr
> 
> openmpi-1.7rc5# ./configure --with-ft=cr
> openmpi-1.7rc5# make all install 
> 
> error message:
> base/errmgr_base_fns.c:565:13: warning: ignoring return value of 'asprintf', 
> declared with attribute warn_unused_result [-Wunused-result]
> base/errmgr_base_fns.c: In function 'orte_errmgr_base_migrate_state_str':
> base/errmgr_base_fns.c:384:17: warning: ignoring return value of 'asprintf', 
> declared with attribute warn_unused_result [-Wunused-result]
> base/errmgr_base_fns.c: In function 'orte_errmgr_base_abort':
> base/errmgr_base_fns.c:244:18: warning: ignoring return value of 'vasprintf', 
> declared with attribute warn_unused_result [-Wunused-result]
> make[2]: *** [base/errmgr_base_fns.lo] Error 1
> make[2]: Leaving directory 
> `/home/abolap/Downloads/openmpi-1.7rc5/orte/mca/errmgr'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `/home/abolap/Downloads/openmpi-1.7rc5/orte'
> make: *** [all-recursive] Error 1
> 
> It install successfully when fault tolerance is not enabled on the build.
> 
> Pls help.
> 
> Regards - Ifeanyi 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Serveral issue in mpirun on macosx 10.8.2

2012-12-12 Thread Ralph Castain
Well, this

> LD_LIBRARY_PATH = /usr/local/bin

certainly isn't right - it needs to be /usr/local/lib, based on the output 
below. What prefix did you provide to configure OMPI?


On Dec 12, 2012, at 4:11 AM, Extreme Programming  wrote:

> Hi, I have just installed openmpi 1.7 on my macosx 10.8.2 because i need java 
> bining.
> Installation works fine, compilation too but when i executed the "mpirun -n 4 
> myfile" i get this error:
> 
> MacBook-Pro:Desktop rainmaker$ mpirun -n 4 a.out 
> [MacBook-Pro.local:18481] mca: base: component_find: unable to open 
> /usr/local/lib/openmpi/mca_ess_slurmd: 
> dlopen(/usr/local/lib/openmpi/mca_ess_slurmd.so, 9): Symbol not found: 
> _orte_jmap_t_class
>   Referenced from: /usr/local/lib/openmpi/mca_ess_slurmd.so
>   Expected in: flat namespace
>  in /usr/local/lib/openmpi/mca_ess_slurmd.so (ignored)
> [MacBook-Pro.local:18481] mca: base: component_find: unable to open 
> /usr/local/lib/openmpi/mca_errmgr_default: 
> dlopen(/usr/local/lib/openmpi/mca_errmgr_default.so, 9): Symbol not found: 
> _orte_errmgr_base_error_abort
>   Referenced from: /usr/local/lib/openmpi/mca_errmgr_default.so
>   Expected in: flat namespace
>  in /usr/local/lib/openmpi/mca_errmgr_default.so (ignored)
> [MacBook-Pro.local:18481] mca: base: component_find: unable to open 
> /usr/local/lib/openmpi/mca_routed_cm: 
> dlopen(/usr/local/lib/openmpi/mca_routed_cm.so, 9): Symbol not found: 
> _orte_message_event_t_class
>   Referenced from: /usr/local/lib/openmpi/mca_routed_cm.so
>   Expected in: flat namespace
>  in /usr/local/lib/openmpi/mca_routed_cm.so (ignored)
> [MacBook-Pro.local:18481] mca: base: component_find: unable to open 
> /usr/local/lib/openmpi/mca_routed_linear: 
> dlopen(/usr/local/lib/openmpi/mca_routed_linear.so, 9): Symbol not found: 
> _orte_message_event_t_class
>   Referenced from: /usr/local/lib/openmpi/mca_routed_linear.so
>   Expected in: flat namespace
>  in /usr/local/lib/openmpi/mca_routed_linear.so (ignored)
> [MacBook-Pro.local:18481] mca: base: component_find: unable to open 
> /usr/local/lib/openmpi/mca_grpcomm_basic: 
> dlopen(/usr/local/lib/openmpi/mca_grpcomm_basic.so, 9): Symbol not found: 
> _opal_profile
>   Referenced from: /usr/local/lib/openmpi/mca_grpcomm_basic.so
>   Expected in: flat namespace
>  in /usr/local/lib/openmpi/mca_grpcomm_basic.so (ignored)
> [MacBook-Pro.local:18481] mca: base: component_find: unable to open 
> /usr/local/lib/openmpi/mca_grpcomm_hier: 
> dlopen(/usr/local/lib/openmpi/mca_grpcomm_hier.so, 9): Symbol not found: 
> _orte_daemon_cmd_processor
>   Referenced from: /usr/local/lib/openmpi/mca_grpcomm_hier.so
>   Expected in: flat namespace
>  in /usr/local/lib/openmpi/mca_grpcomm_hier.so (ignored)
> [MacBook-Pro:18481] *** Process received signal ***
> [MacBook-Pro:18481] Signal: Segmentation fault: 11 (11)
> [MacBook-Pro:18481] Signal code: Address not mapped (1)
> [MacBook-Pro:18481] Failing at address: 0x14
> [MacBook-Pro:18481] [ 0] 2   libsystem_c.dylib   
> 0x7fff820308ea _sigtramp + 26
> [MacBook-Pro:18481] [ 1] 3   ??? 
> 0x7fff59070458 0x0 + 140734687020120
> [MacBook-Pro:18481] [ 2] 4   libopen-rte.5.dylib 
> 0x000106bd7658 orte_rmaps_base_map_job + 984
> [MacBook-Pro:18481] [ 3] 5   libopen-rte.5.dylib 
> 0x000106c1a0a0 opal_libevent2019_event_base_loop + 1888
> [MacBook-Pro:18481] [ 4] 6   mpirun  
> 0x000106b916e1 orterun + 5137
> [MacBook-Pro:18481] [ 5] 7   mpirun  
> 0x000106b90290 main + 32
> [MacBook-Pro:18481] [ 6] 8   libdyld.dylib   
> 0x7fff8ac597e1 start + 0
> [MacBook-Pro:18481] [ 7] 9   ??? 
> 0x0004 0x0 + 4
> [MacBook-Pro:18481] *** End of error message ***
> Segmentation fault: 11
> 
> 
> for resolving it, i have exported the LD_LIBRARY_PATH = /usr/local/bin and I 
> used the same configuration of my last openmpi's installation.
> Please help me to solve this issue.
> 
> Thank you very much.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] OpenMPI-1.6.3 MinGW64 buildup on Windows 7

2012-12-12 Thread Damien Hocking
I know 1.6.3 is broken for Win builds with VS2012 and Intel.  I'm not a 
MinGW expert by any means, I've hardly ever used it.  I'll try and look 
at this on the weekend.  If you can post on Friday to job my memory that 
would help.  :-)


Damien

On 12/12/2012 3:31 AM, Ilias Miroslav wrote:

Ad: http://www.open-mpi.org/community/lists/users/2012/12/20865.php

Thanks for your efforts, Damien;

however, in between I realized that this standalone Windows OpenMPI is built 
from ifort.exe + cl.exe, and I have only MinGW suite in disposal...

For that reason I tried to build-up OpenMPI on Windows 7 (both 64 and 32-bits), 
but failed, see:
http://www.open-mpi.org/community/lists/users/2012/12/20921.php

Best, Miro
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Serveral issue in mpirun on macosx 10.8.2

2012-12-12 Thread Extreme Programming
Hi, I have just installed openmpi 1.7 on my macosx 10.8.2 because i need
java bining.
Installation works fine, compilation too but when i executed the "mpirun -n
4 myfile" i get this error:

MacBook-Pro:Desktop rainmaker$ mpirun -n 4 a.out
[MacBook-Pro.local:18481] mca: base: component_find: unable to open
/usr/local/lib/openmpi/mca_ess_slurmd:
dlopen(/usr/local/lib/openmpi/mca_ess_slurmd.so, 9): Symbol not found:
_orte_jmap_t_class
  Referenced from: /usr/local/lib/openmpi/mca_ess_slurmd.so
  Expected in: flat namespace
 in /usr/local/lib/openmpi/mca_ess_slurmd.so (ignored)
[MacBook-Pro.local:18481] mca: base: component_find: unable to open
/usr/local/lib/openmpi/mca_errmgr_default:
dlopen(/usr/local/lib/openmpi/mca_errmgr_default.so, 9): Symbol not found:
_orte_errmgr_base_error_abort
  Referenced from: /usr/local/lib/openmpi/mca_errmgr_default.so
  Expected in: flat namespace
 in /usr/local/lib/openmpi/mca_errmgr_default.so (ignored)
[MacBook-Pro.local:18481] mca: base: component_find: unable to open
/usr/local/lib/openmpi/mca_routed_cm:
dlopen(/usr/local/lib/openmpi/mca_routed_cm.so, 9): Symbol not found:
_orte_message_event_t_class
  Referenced from: /usr/local/lib/openmpi/mca_routed_cm.so
  Expected in: flat namespace
 in /usr/local/lib/openmpi/mca_routed_cm.so (ignored)
[MacBook-Pro.local:18481] mca: base: component_find: unable to open
/usr/local/lib/openmpi/mca_routed_linear:
dlopen(/usr/local/lib/openmpi/mca_routed_linear.so, 9): Symbol not found:
_orte_message_event_t_class
  Referenced from: /usr/local/lib/openmpi/mca_routed_linear.so
  Expected in: flat namespace
 in /usr/local/lib/openmpi/mca_routed_linear.so (ignored)
[MacBook-Pro.local:18481] mca: base: component_find: unable to open
/usr/local/lib/openmpi/mca_grpcomm_basic:
dlopen(/usr/local/lib/openmpi/mca_grpcomm_basic.so, 9): Symbol not found:
_opal_profile
  Referenced from: /usr/local/lib/openmpi/mca_grpcomm_basic.so
  Expected in: flat namespace
 in /usr/local/lib/openmpi/mca_grpcomm_basic.so (ignored)
[MacBook-Pro.local:18481] mca: base: component_find: unable to open
/usr/local/lib/openmpi/mca_grpcomm_hier:
dlopen(/usr/local/lib/openmpi/mca_grpcomm_hier.so, 9): Symbol not found:
_orte_daemon_cmd_processor
  Referenced from: /usr/local/lib/openmpi/mca_grpcomm_hier.so
  Expected in: flat namespace
 in /usr/local/lib/openmpi/mca_grpcomm_hier.so (ignored)
[MacBook-Pro:18481] *** Process received signal ***
[MacBook-Pro:18481] Signal: Segmentation fault: 11 (11)
[MacBook-Pro:18481] Signal code: Address not mapped (1)
[MacBook-Pro:18481] Failing at address: 0x14
[MacBook-Pro:18481] [ 0] 2   libsystem_c.dylib
0x7fff820308ea _sigtramp + 26
[MacBook-Pro:18481] [ 1] 3   ???
0x7fff59070458 0x0 + 140734687020120
[MacBook-Pro:18481] [ 2] 4   libopen-rte.5.dylib
0x000106bd7658 orte_rmaps_base_map_job + 984
[MacBook-Pro:18481] [ 3] 5   libopen-rte.5.dylib
0x000106c1a0a0 opal_libevent2019_event_base_loop + 1888
[MacBook-Pro:18481] [ 4] 6   mpirun
 0x000106b916e1 orterun + 5137
[MacBook-Pro:18481] [ 5] 7   mpirun
 0x000106b90290 main + 32
[MacBook-Pro:18481] [ 6] 8   libdyld.dylib
0x7fff8ac597e1 start + 0
[MacBook-Pro:18481] [ 7] 9   ???
0x0004 0x0 + 4
[MacBook-Pro:18481] *** End of error message ***
Segmentation fault: 11


for resolving it, i have exported the LD_LIBRARY_PATH = /usr/local/bin and
I used the same configuration of my last openmpi's installation.
Please help me to solve this issue.

Thank you very much.


Re: [OMPI users] OpenMPI-1.6.3 MinGW64 buildup on Windows 7

2012-12-12 Thread Ilias Miroslav
Ad: http://www.open-mpi.org/community/lists/users/2012/12/20865.php

Thanks for your efforts, Damien;

however, in between I realized that this standalone Windows OpenMPI is built 
from ifort.exe + cl.exe, and I have only MinGW suite in disposal...

For that reason I tried to build-up OpenMPI on Windows 7 (both 64 and 32-bits), 
but failed, see:
http://www.open-mpi.org/community/lists/users/2012/12/20921.php

Best, Miro


Re: [OMPI users] OpenMPI-1.6.3 MinGW64 buildup on Windows 7

2012-12-12 Thread Ilias Miroslav
Hi again,

in addition, I met the same compilation error on my 32-bit Windows 7 PC,  with 
32-bit MinGW compilers:

C:\OpenMPI-1.6.3-MinGW>mingw32-make
-- The Fortran compiler identification is GNU
-- checking for type of MPI_Offset...
-- checking for type of MPI_Offset...long long
-- checking for an MPI datatype for MPI_Offset...
-- checking for an MPI datatype for MPI_Offset...MPI_LONG_LONG
-- Check for working flex...
-- Skipping MPI F77 interface
-- looking for ccp...
-- looking for ccp...not found.
-- looking for ccp...
-- looking for ccp...not found.
-- Configuring done
-- Generating done
-- Build files have been written to: C:/OpenMPI-1.6.3-MinGW
Scanning dependencies of target libopen-pal
[  1%] Building C object opal/CMakeFiles/libopen-pal.dir/datatype/opal_datatype_
pack_checksum.obj
In file included from C:/Users/milias/Downloads/openmpi-1.6.3/opal/include/opal_
config_bottom.h:258:0,
 from C:/OpenMPI-1.6.3-MinGW/opal/include/opal_config.h:1468,
 from C:\OpenMPI-1.6.3-MinGW\opal\datatype\opal_datatype_pack_ch
ecksum.c:21:
C:/Users/milias/Downloads/openmpi-1.6.3/opal/win32/win_compat.h:93:14: error: co
nflicting types for 'ssize_t'
In file included from c:\mingw\bin\../lib/gcc/mingw32/4.7.2/../../../../include/
process.h:17:0,
 from C:/Users/milias/Downloads/openmpi-1.6.3/opal/win32/win_com
pat.h:70,
 from C:/Users/milias/Downloads/openmpi-1.6.3/opal/include/opal_
config_bottom.h:258,
 from C:/OpenMPI-1.6.3-MinGW/opal/include/opal_config.h:1468,
 from C:\OpenMPI-1.6.3-MinGW\opal\datatype\opal_datatype_pack_ch
ecksum.c:21:
c:\mingw\bin\../lib/gcc/mingw32/4.7.2/../../../../include/sys/types.h:118:18: no
te: previous declaration of 'ssize_t' was here
In file included from C:/OpenMPI-1.6.3-MinGW/opal/include/opal_config.h:1468:0,
 from C:\OpenMPI-1.6.3-MinGW\opal\datatype\opal_datatype_pack_ch
ecksum.c:21:
C:/Users/milias/Downloads/openmpi-1.6.3/opal/include/opal_config_bottom.h:559:0:
 warning: "PF_UNSPEC" redefined [enabled by default]
In file included from C:/Users/milias/Downloads/openmpi-1.6.3/opal/win32/win_com
pat.h:68:0,
 from C:/Users/milias/Downloads/openmpi-1.6.3/opal/include/opal_
config_bottom.h:258,
 from C:/OpenMPI-1.6.3-MinGW/opal/include/opal_config.h:1468,
 from C:\OpenMPI-1.6.3-MinGW\opal\datatype\opal_datatype_pack_ch
ecksum.c:21:
c:\mingw\bin\../lib/gcc/mingw32/4.7.2/../../../../include/winsock2.h:368:0: note
: this is the location of the previous definition
In file included from C:/OpenMPI-1.6.3-MinGW/opal/include/opal_config.h:1468:0,
 from C:\OpenMPI-1.6.3-MinGW\opal\datatype\opal_datatype_pack_ch
ecksum.c:21:
C:/Users/milias/Downloads/openmpi-1.6.3/opal/include/opal_config_bottom.h:562:0:
 warning: "AF_INET6" redefined [enabled by default]
In file included from C:/Users/milias/Downloads/openmpi-1.6.3/opal/win32/win_com
pat.h:68:0,
 from C:/Users/milias/Downloads/openmpi-1.6.3/opal/include/opal_
config_bottom.h:258,
 from C:/OpenMPI-1.6.3-MinGW/opal/include/opal_config.h:1468,
 from C:\OpenMPI-1.6.3-MinGW\opal\datatype\opal_datatype_pack_ch
ecksum.c:21:
c:\mingw\bin\../lib/gcc/mingw32/4.7.2/../../../../include/winsock2.h:329:0: note
: this is the location of the previous definition
In file included from C:/OpenMPI-1.6.3-MinGW/opal/include/opal_config.h:1468:0,
 from C:\OpenMPI-1.6.3-MinGW\opal\datatype\opal_datatype_pack_ch
ecksum.c:21:
C:/Users/milias/Downloads/openmpi-1.6.3/opal/include/opal_config_bottom.h:565:0:
 warning: "PF_INET6" redefined [enabled by default]
In file included from C:/Users/milias/Downloads/openmpi-1.6.3/opal/win32/win_com
pat.h:68:0,
 from C:/Users/milias/Downloads/openmpi-1.6.3/opal/include/opal_
config_bottom.h:258,
 from C:/OpenMPI-1.6.3-MinGW/opal/include/opal_config.h:1468,
 from C:\OpenMPI-1.6.3-MinGW\opal\datatype\opal_datatype_pack_ch
ecksum.c:21:
c:\mingw\bin\../lib/gcc/mingw32/4.7.2/../../../../include/winsock2.h:392:0: note
: this is the location of the previous definition
opal\CMakeFiles\libopen-pal.dir\build.make:57: recipe for target 'opal/CMakeFile
s/libopen-pal.dir/datatype/opal_datatype_pack_checksum.obj' failed
mingw32-make[2]: *** [opal/CMakeFiles/libopen-pal.dir/datatype/opal_datatype_pac
k_checksum.obj] Error 1
CMakeFiles\Makefile2:82: recipe for target 'opal/CMakeFiles/libopen-pal.dir/all'
 failed
mingw32-make[1]: *** [opal/CMakeFiles/libopen-pal.dir/all] Error 2
Makefile:135: recipe for target 'all' failed
mingw32-make: *** [all] Error 2

C:\OpenMPI-1.6.3-MinGW>



From: Ilias Miroslav
Sent: Sunday, December 09, 2012 8:52 PM
To: us...@open-mpi.org
Subject: OpenMPI-1.6.3 MinGW64 buildup on Windows 7

Dear experts,

following README.WINDOWS.txt I am trying to install OpenMPI-1.6.3 using MinGW64 
package on my 

[OMPI users] openmpi-1.7rc5 cannot install when build with ./configure --with-ft=cr

2012-12-12 Thread Ifeanyi
Hi all,

I am having trouble building openmpi-1.7rc5 with ../configure --with-ft=cr

openmpi-1.7rc5# ./configure --with-ft=cr
openmpi-1.7rc5# make all install

error message:
base/errmgr_base_fns.c:565:13: warning: ignoring return value of
'asprintf', declared with attribute warn_unused_result [-Wunused-result]
base/errmgr_base_fns.c: In function 'orte_errmgr_base_migrate_state_str':
base/errmgr_base_fns.c:384:17: warning: ignoring return value of
'asprintf', declared with attribute warn_unused_result [-Wunused-result]
base/errmgr_base_fns.c: In function 'orte_errmgr_base_abort':
base/errmgr_base_fns.c:244:18: warning: ignoring return value of
'vasprintf', declared with attribute warn_unused_result [-Wunused-result]
make[2]: *** [base/errmgr_base_fns.lo] Error 1
make[2]: Leaving directory
`/home/abolap/Downloads/openmpi-1.7rc5/orte/mca/errmgr'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/abolap/Downloads/openmpi-1.7rc5/orte'
make: *** [all-recursive] Error 1

It install successfully when fault tolerance is not enabled on the build.

Pls help.

Regards - Ifeanyi


Re: [OMPI users] Live process migration

2012-12-12 Thread Ifeanyi
this is what I did after installing dmtcp
 on a different terminal  # ./dmtcp_coordinator
on another terminal # ./dmtcp_checkpoint mpirun ./icpi

When I ran this command # ./dmtcp_command --checkpoint.
it terminates with this messages

[8147] WARNING at connectionmanager.cpp:263 in fdToDevice;
REASON='JWARNING(false) failed'
Message: PTS Device not found
[8147] ERROR at connectionmanager.cpp:277 in fdToDevice;
REASON='JASSERT(false) failed'
 fd = 1
 device = /dev/pts/2 (deleted)
Message: PTS Device not found in connection list
icpi (8147): Terminating...

pls what is the way to migrate process with dmtcp?

please assist
Regards - Ifeanyi

On Tue, Dec 11, 2012 at 5:10 PM, Jaroslaw Slawinski  wrote:

> true, looks like the entire sourceforge is down
> best
> js
>
> On Tue, Dec 11, 2012 at 12:57 AM, Ifeanyi  wrote:
> > Thanks Jaroslaw,
> >
> > I will try it asap, it appears that DMTCP at sourceforge.net site is
> down at
> > the moment.
> >
> > Regards - ifeanyi
> >
> >
> > On Tue, Dec 11, 2012 at 4:11 PM, Jaroslaw Slawinski
> >  wrote:
> >>
> >> check DMTCP - it worked for me
> >> js
> >>
> >> On Mon, Dec 10, 2012 at 11:39 PM, Ifeanyi 
> wrote:
> >> > Hi all,
> >> >
> >> > Just wondering if live process migration of processes is supported in
> >> > open
> >> > mpi?
> >> >
> >> > or any idea of how to do live migration of processes pls.
> >> >
> >> > Regards,
> >> > Ifeanyi
> >> >
> >> > ___
> >> > users mailing list
> >> > us...@open-mpi.org
> >> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>