[OMPI users] 1.2.4 cross-compilation problem

2007-10-15 Thread Jorge Parra

Hi,

I am trying to cross-compile Open-mpi 1.2.4 for an embedded system. 
The development system is a i686 Linux and the target system is a ppc 
405 based. When trying "make all" I get the following error:


/bin/sh ../../../libtool --tag=CC   --mode=link 
/opt/powerpc-405-linux/bin/powerpc-405-linux-gnu-gcc  -O3 -DNDEBUG 
-finline-functions -fno-strict-aliasing -pthread  -export-dynamic   -o 
opal_wrapper opal_wrapper.o ../../../opal/libopen-pal.la -lnsl -lutil  -lm
libtool: link: /opt/powerpc-405-linux/bin/powerpc-405-linux-gnu-gcc -O3 
-DNDEBUG -finline-functions -fno-strict-aliasing -pthread -o opal_wrapper 
opal_wrapper.o -Wl,--export-dynamic  ../../../opal/.libs/libopen-pal.a 
-ldl -lnsl -lutil -lm -pthread
../../../opal/.libs/libopen-pal.a(libltdlc_la-ltdl.o)(.text+0xbe): In 
function `lt_dlinit':

: undefined reference to `lt_libltdlc_LTX_preloaded_symbols'
../../../opal/.libs/libopen-pal.a(libltdlc_la-ltdl.o)(.text+0xc2): In 
function `lt_dlinit':

: undefined reference to `lt_libltdlc_LTX_preloaded_symbols'
collect2: ld returned 1 exit status
make[2]: *** [opal_wrapper] Error 1
make[2]: Leaving directory `/opt/openmpi-1.2.4/opal/tools/wrappers'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/opt/openmpi-1.2.4/opal'
make: *** [all-recursive] Error 1

Older versions of opem-mpi have been succesfully compiled in the same 
development system. I am attaching to this email all the output and the 
configuration information.


Any help will greatly appreciated.

Thank you,

Jorge

ompi-output.tar.gz
Description: GNU Zip compressed data


Re: [OMPI users] Performance of MPI_Isend() worse than MPI_Send() and even MPI_Ssend()

2007-10-15 Thread Christian Bell
  By using non-blocking communications, you choose to expose separate
  initiation and synchronization MPI calls such that an MPI
  implementation is free to schedule the communication in any way it
  wants between these two points (while retaining MPI semantics).  At
  your advantage, this may mean that the MPI implementation can
  co-opt specialized hardware/firmware in offloading parts (or all)
  of the communication while you get back to computation (or
  communication to another processor).  If there's nothing to overlap
  in your application or if the MPI application has no way to offload
  any parts of the communication, all you see is the added cost of
  turning one call into two calls.  However, with any decent MPI
  implementation, this added cost should be a matter of microseconds
  (if not nanoseconds), so nothing to worry about for any
  non-microbenchmark and at any reasonable scale.

  You should also know that since MPI requires local completion only
  in that the send buffer be reusable once an MPI send is "complete",
  it's perfectly valid for an implementation to simply copy the send
  data and claim completion while it defers but guarantees delivery
  of the message at a later time.  When comparing blocking against
  non-blocking, there's probably nothing wrong with the
  implementation unless there's a dramatic difference.  MPI
  implementations play different games on different types of messages
  and at different sizes and thresholds.  Microbenchmarks just muddy
  these issues up even further.  All uninteresting stuff, really.

  For what you call the MPI broker, it turns out that it's fairly
  difficult to completely and cheaply offload an MPI message since
  MPI messages have data transfer and implied point-to-point
  synchronization (this tags, ranks and communicators business).  A
  lot of hardware will give you fast primitives for pure data
  transfer but none can completely eliminate the cost of brokering or
  synchronizing the message from the main application.  As a separate
  software thread, the broker is tied to scheduling policies and may
  positively or negatively compete with the main application thread.
  When burried as a helper thread in firmware, the broker has to
  negotiate synchronizations at a fraction of the speed of the main
  host processor and may not provide sufficient concurrency for your
  many-core machine.  


. . christian

On Mon, 15 Oct 2007, Eric Thibodeau wrote:

> George,
> 
>   For completedness's sake, from what I understand here, the
>   only way to get "true" communications and computation overlap
>   is to have and "MPI broker" thread which would take care of
>   all communications in the form of sync MPI calls. It is that
>   thread which you call asynchronously and then let it manage
>   the communications in the back... correct?
> 
> Eric
> 
> Le October 15, 2007, George Bosilca a écrit :
> > Eric,
> > 
> > No there is no documentation about this on Open MPI. However, what I  
> > described here, is not related to Open MPI, it's a general problem  
> > with most/all MPI libraries. There are multiple scenarios where non  
> > blocking communications can improve the overall performance of a  
> > parallel application. But, in general, the reason is related to  
> > overlapping communications with computations, or communications with  
> > communications.
> > 
> > The problem is that using non blocking will increase the critical  
> > path compared with blocking, which usually never help at improving  
> > performance. Now I'll explain the real reason behind that. The REAL  
> > problem is that usually a MPI library cannot make progress while the  
> > application is not in an MPI call. Therefore, as soon as the MPI  
> > library return after posting the non-blocking send, no progress is  
> > possible on that send until the user goes back in the MPI library. If  
> > you compare this with the case of a blocking send, there the library  
> > do not return until the data is pushed on the network buffers, i.e.  
> > the library is the one in control until the send is completed.
> > 
> >Thanks,
> >  george.
> > 
> > On Oct 15, 2007, at 2:23 PM, Eric Thibodeau wrote:
> > 
> > > Hello George,
> > >
> > >   What you're saying here is very interesting. I am presently  
> > > profiling communication patterns for Parallel Genetic Algorithms  
> > > and could not figure out why the async versions tended to be worst  
> > > than the sync counterpart (imho, that was counter-intuitive). What  
> > > you're basically saying here is that the async communications  
> > > actually add some sychronization overhead that can only be  
> > > compensated if the application overlaps computation with the async  
> > > communications? Is there some "official" reference/documentation to  
> > > this behaviour from OpenMPI (I know the MPI standard doesn't define  
> > > the actual implementation of the communications and therefore 

Re: [OMPI users] Performance of MPI_Isend() worse than MPI_Send() and even MPI_Ssend()

2007-10-15 Thread George Bosilca
That's one possible way of achieving the overlap. However, it's not a  
portable solution as right now from all open source libraries, only  
Open MPI propose this "helper" thread (as far as I know).


Another way of achieving the same goal, it's to have a truly thread  
safe MPI library and the user will have a thread blocked in a  
MPI_Recv that will eventually complete at the end of the application.  
This approach, seems more user friendly, as the user is in control of  
when the overlap will occur.


  george.

On Oct 15, 2007, at 2:56 PM, Eric Thibodeau wrote:


George,

	For completedness's sake, from what I understand here, the only  
way to get "true" communications and computation overlap is to have  
and "MPI broker" thread which would take care of all communications  
in the form of sync MPI calls. It is that thread which you call  
asynchronously and then let it manage the communications in the  
back... correct?


Eric

Le October 15, 2007, George Bosilca a écrit :

Eric,

No there is no documentation about this on Open MPI. However, what I
described here, is not related to Open MPI, it's a general problem
with most/all MPI libraries. There are multiple scenarios where non
blocking communications can improve the overall performance of a
parallel application. But, in general, the reason is related to
overlapping communications with computations, or communications with
communications.

The problem is that using non blocking will increase the critical
path compared with blocking, which usually never help at improving
performance. Now I'll explain the real reason behind that. The REAL
problem is that usually a MPI library cannot make progress while the
application is not in an MPI call. Therefore, as soon as the MPI
library return after posting the non-blocking send, no progress is
possible on that send until the user goes back in the MPI library. If
you compare this with the case of a blocking send, there the library
do not return until the data is pushed on the network buffers, i.e.
the library is the one in control until the send is completed.

   Thanks,
 george.

On Oct 15, 2007, at 2:23 PM, Eric Thibodeau wrote:


Hello George,

What you're saying here is very interesting. I am presently
profiling communication patterns for Parallel Genetic Algorithms
and could not figure out why the async versions tended to be worst
than the sync counterpart (imho, that was counter-intuitive). What
you're basically saying here is that the async communications
actually add some sychronization overhead that can only be
compensated if the application overlaps computation with the async
communications? Is there some "official" reference/documentation to
this behaviour from OpenMPI (I know the MPI standard doesn't define
the actual implementation of the communications and therefore lets
the implementer do as he pleases).

Thanks,

Eric

Le October 15, 2007, George Bosilca a écrit :
Your conclusion is not necessarily/always true. The MPI_Isend is  
just
the non blocking version of the send operation. As one can  
imagine, a

MPI_Isend + MPI_Wait increase the execution path [inside the MPI
library] compared with any blocking point-to-point communication,
leading to worst performances. The main interest of the MPI_Isend
operation is the possible overlap of computation with  
communications,

or the possible overlap between multiple communications.

However, depending on the size of the message this might not be  
true.
For large messages, in order to keep the memory usage on the  
receiver

at a reasonable level, a rendezvous protocol is used. The sender
[after sending a small packet] wait until the receiver confirm the
message exchange (i.e. the corresponding receive operation has been
posted) to send the large data. Using MPI_Isend can lead to longer
execution times, as the real transfer will be delayed until the
program enter in the next MPI call.

In general, using non-blocking operations can improve the  
performance

of the application, if and only if the application is carefully
crafted.

   george.

On Oct 14, 2007, at 2:38 PM, Jeremias Spiegel wrote:


Hi,
I'm working with Open-Mpi on an infiniband-cluster and have some
strange
effect when using MPI_Isend(). To my understanding this should
always be
quicker than MPI_Send() and MPI_Ssend(), yet in my program both
MPI_Send()
and MPI_Ssend() reproducably perform quicker than SSend(). Is  
there

something
obvious I'm missing?

Regards,
Jeremias
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users







--
Eric Thibodeau
Neural Bucket Solutions Inc.
T. (514) 736-1436
C. (514) 710-0517







--
Eric Thibodeau
Neural Bucket Solutions Inc.
T. (514) 736-1436
C. (514) 710-0517




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] Performance of MPI_Isend() worse than MPI_Send() and even MPI_Ssend()

2007-10-15 Thread Eric Thibodeau
George,

For completedness's sake, from what I understand here, the only way to 
get "true" communications and computation overlap is to have and "MPI broker" 
thread which would take care of all communications in the form of sync MPI 
calls. It is that thread which you call asynchronously and then let it manage 
the communications in the back... correct?

Eric

Le October 15, 2007, George Bosilca a écrit :
> Eric,
> 
> No there is no documentation about this on Open MPI. However, what I  
> described here, is not related to Open MPI, it's a general problem  
> with most/all MPI libraries. There are multiple scenarios where non  
> blocking communications can improve the overall performance of a  
> parallel application. But, in general, the reason is related to  
> overlapping communications with computations, or communications with  
> communications.
> 
> The problem is that using non blocking will increase the critical  
> path compared with blocking, which usually never help at improving  
> performance. Now I'll explain the real reason behind that. The REAL  
> problem is that usually a MPI library cannot make progress while the  
> application is not in an MPI call. Therefore, as soon as the MPI  
> library return after posting the non-blocking send, no progress is  
> possible on that send until the user goes back in the MPI library. If  
> you compare this with the case of a blocking send, there the library  
> do not return until the data is pushed on the network buffers, i.e.  
> the library is the one in control until the send is completed.
> 
>Thanks,
>  george.
> 
> On Oct 15, 2007, at 2:23 PM, Eric Thibodeau wrote:
> 
> > Hello George,
> >
> > What you're saying here is very interesting. I am presently  
> > profiling communication patterns for Parallel Genetic Algorithms  
> > and could not figure out why the async versions tended to be worst  
> > than the sync counterpart (imho, that was counter-intuitive). What  
> > you're basically saying here is that the async communications  
> > actually add some sychronization overhead that can only be  
> > compensated if the application overlaps computation with the async  
> > communications? Is there some "official" reference/documentation to  
> > this behaviour from OpenMPI (I know the MPI standard doesn't define  
> > the actual implementation of the communications and therefore lets  
> > the implementer do as he pleases).
> >
> > Thanks,
> >
> > Eric
> >
> > Le October 15, 2007, George Bosilca a écrit :
> >> Your conclusion is not necessarily/always true. The MPI_Isend is just
> >> the non blocking version of the send operation. As one can imagine, a
> >> MPI_Isend + MPI_Wait increase the execution path [inside the MPI
> >> library] compared with any blocking point-to-point communication,
> >> leading to worst performances. The main interest of the MPI_Isend
> >> operation is the possible overlap of computation with communications,
> >> or the possible overlap between multiple communications.
> >>
> >> However, depending on the size of the message this might not be true.
> >> For large messages, in order to keep the memory usage on the receiver
> >> at a reasonable level, a rendezvous protocol is used. The sender
> >> [after sending a small packet] wait until the receiver confirm the
> >> message exchange (i.e. the corresponding receive operation has been
> >> posted) to send the large data. Using MPI_Isend can lead to longer
> >> execution times, as the real transfer will be delayed until the
> >> program enter in the next MPI call.
> >>
> >> In general, using non-blocking operations can improve the performance
> >> of the application, if and only if the application is carefully  
> >> crafted.
> >>
> >>george.
> >>
> >> On Oct 14, 2007, at 2:38 PM, Jeremias Spiegel wrote:
> >>
> >>> Hi,
> >>> I'm working with Open-Mpi on an infiniband-cluster and have some
> >>> strange
> >>> effect when using MPI_Isend(). To my understanding this should
> >>> always be
> >>> quicker than MPI_Send() and MPI_Ssend(), yet in my program both
> >>> MPI_Send()
> >>> and MPI_Ssend() reproducably perform quicker than SSend(). Is there
> >>> something
> >>> obvious I'm missing?
> >>>
> >>> Regards,
> >>> Jeremias
> >>> ___
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >
> >
> >
> > -- 
> > Eric Thibodeau
> > Neural Bucket Solutions Inc.
> > T. (514) 736-1436
> > C. (514) 710-0517
> 
> 



-- 
Eric Thibodeau
Neural Bucket Solutions Inc.
T. (514) 736-1436
C. (514) 710-0517



Re: [OMPI users] Performance of MPI_Isend() worse than MPI_Send() and even MPI_Ssend()

2007-10-15 Thread George Bosilca

Eric,

No there is no documentation about this on Open MPI. However, what I  
described here, is not related to Open MPI, it's a general problem  
with most/all MPI libraries. There are multiple scenarios where non  
blocking communications can improve the overall performance of a  
parallel application. But, in general, the reason is related to  
overlapping communications with computations, or communications with  
communications.


The problem is that using non blocking will increase the critical  
path compared with blocking, which usually never help at improving  
performance. Now I'll explain the real reason behind that. The REAL  
problem is that usually a MPI library cannot make progress while the  
application is not in an MPI call. Therefore, as soon as the MPI  
library return after posting the non-blocking send, no progress is  
possible on that send until the user goes back in the MPI library. If  
you compare this with the case of a blocking send, there the library  
do not return until the data is pushed on the network buffers, i.e.  
the library is the one in control until the send is completed.


  Thanks,
george.

On Oct 15, 2007, at 2:23 PM, Eric Thibodeau wrote:


Hello George,

	What you're saying here is very interesting. I am presently  
profiling communication patterns for Parallel Genetic Algorithms  
and could not figure out why the async versions tended to be worst  
than the sync counterpart (imho, that was counter-intuitive). What  
you're basically saying here is that the async communications  
actually add some sychronization overhead that can only be  
compensated if the application overlaps computation with the async  
communications? Is there some "official" reference/documentation to  
this behaviour from OpenMPI (I know the MPI standard doesn't define  
the actual implementation of the communications and therefore lets  
the implementer do as he pleases).


Thanks,

Eric

Le October 15, 2007, George Bosilca a écrit :

Your conclusion is not necessarily/always true. The MPI_Isend is just
the non blocking version of the send operation. As one can imagine, a
MPI_Isend + MPI_Wait increase the execution path [inside the MPI
library] compared with any blocking point-to-point communication,
leading to worst performances. The main interest of the MPI_Isend
operation is the possible overlap of computation with communications,
or the possible overlap between multiple communications.

However, depending on the size of the message this might not be true.
For large messages, in order to keep the memory usage on the receiver
at a reasonable level, a rendezvous protocol is used. The sender
[after sending a small packet] wait until the receiver confirm the
message exchange (i.e. the corresponding receive operation has been
posted) to send the large data. Using MPI_Isend can lead to longer
execution times, as the real transfer will be delayed until the
program enter in the next MPI call.

In general, using non-blocking operations can improve the performance
of the application, if and only if the application is carefully  
crafted.


   george.

On Oct 14, 2007, at 2:38 PM, Jeremias Spiegel wrote:


Hi,
I'm working with Open-Mpi on an infiniband-cluster and have some
strange
effect when using MPI_Isend(). To my understanding this should
always be
quicker than MPI_Send() and MPI_Ssend(), yet in my program both
MPI_Send()
and MPI_Ssend() reproducably perform quicker than SSend(). Is there
something
obvious I'm missing?

Regards,
Jeremias
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users







--
Eric Thibodeau
Neural Bucket Solutions Inc.
T. (514) 736-1436
C. (514) 710-0517




smime.p7s
Description: S/MIME cryptographic signature