Re: [OMPI users] Does current Intel dual processor support MPI?

2006-09-07 Thread Miguel Figueiredo Mascarenhas Sousa Filipe

Hi all,

Well, I just wanted to say that from a software engineering (and also
computer science) point of view,

OMP, MPI and threads are completely diferent models for parallel
computation/concurrent programming.

I do not believe that any capable engineer (or good programmer for all
I know) can know in advance whats the "best"
one to use without knowing the problem space, design requisites, ..etc..

Its not just about "portability" or code "readability/mantainability".

Deciding which too use will depend (and therefore influence) the
aplication arquitecture.

Should I use OMP on a web server as apache or tomcat, providing that
way better portability and code readability?

Should I use OMP or threading for a massively parallel system, such
has blue gene/L, what about SGI Altix3000?

Shoud I use threading for a 2 cpu, shared memory system for a
sequencial aplication where I just need to speed up
some highly vectorizable loops?

For instance, my thesis dealt with paralelizing a seismic simulation
application, I did a thread and a MPI version.
The threaded version, since "tasks" could share massive amounts of
data with very little lock contention, could work bigger data sets
than the MPI version (given the same total amount of ram). But the MPI
version could run in clusters, while with threading I needed a single
system image.
OpenMP was inadequate since it would have a much bigger sequential
execution time, providing inadequate speedup, for a algorithm which
was very parallel.

Seedups measured in the threaded version and MPI version were about
1.99 in 2cpu mode, (<1% of sequential  computation). In MPI, with 16
cpus (1 gigabit link for 8 x 2cpu nodes), the measured speedup was
14.8.

My threaded version would never achieve a 14.8 speedup, even in a
"SSI" 8 node cluster.
The effort applied to make the MPI version so scalable was _much_
bigger.. (designing a new concurrent, distributed algorithm to replace
one that was sequential, that in the sequencial aplication amounted to
1% of the computation time.) than the threaded one. It uses more ram
per process, but can scale up to 64/128 nodes, depending on the
problem size, and it doesn't require a shared memory system.
My threaded version, in a shared memory system, with lots of cpus,
will scale quite a lot..but probably the agregate bandwith will be
inferior to a cluster with the same amount of cpus/ram (normally, big
SMP or NUMA systems have bigger RAM latency and not proportional
bandwith).
Basically, I can't predict which performs better.


So, I hope that its understandable that choosing the right parallel
computing model isn't just a matter of "taste".




On 9/6/06, George Bosilca  wrote:

 From my perspective some [let's say #1 and #2) of the most important
features of an application that has to last for a while is the
readability and portability. And OMP code is far more readable than
pthread one. The loops look like loops, the critical sections are
obvious and the sequential meaning of the program is preserved.

On Sep 5, 2006, at 7:52 PM, Durga Choudhury wrote:

> My opinion would be to use pthreads, for a couple of reasons:
>
> 1. You don't need an OMP aware compiler; any old compiler would do.

Compilers can be downloaded for free these days. And most of them
have now OMP support. And on all operating systems (i.e. even the
free Microsoft compiler now has OMP support, and Windows was
definitively not the platform I expect to use for my OMP tasks).

> 2. The pthread library is more well adapted and hence might be more
> optimized than the code emitted from an OMP compiler.

The pthread library add a huge overhead for all operations. At this
level granularity quite often you need atomic locks and operations,
not critical sections protected by mutexes. Unfortunately, there is
no portable library that give you a common interface to atomic
operations (there was a BSD one at one point). Moreover, using
threads instead of OMP directive move the burden on the programmer.
Most of the people just cannot afford a one year student who has to
first understand and then add the correct pthread directive inside.
And for which result ... you don't even know that you will get the
fastest version. On the other side OMP compilers are getting smarter
and smarter every day. Today the results are quite impressive, just
imagine what will happens in few years.

>
> If your operating system is Linux, you may use the clone() system
> call directly; this would add further optimization at the expense
> of portability.

It's always a trade-off between performance and portability. What do
you want to loose in order to get the 1% performance gain ... And in
this case the only performance gain you will get is when you start
the threads, otherwise you will not improve anything. Generally,
people prefer to use threads pools in order to avoid the overhead of
creating and destroying threads all the time.

   george.

>
> Durga
>
>
> On 9/5/06, George Bosilca  wrote:
> On Sep 5, 2006,

Re: [OMPI users] Does current Intel dual processor support MPI?

2006-09-06 Thread George Bosilca
From my perspective some [let's say #1 and #2) of the most important  
features of an application that has to last for a while is the  
readability and portability. And OMP code is far more readable than  
pthread one. The loops look like loops, the critical sections are  
obvious and the sequential meaning of the program is preserved.


On Sep 5, 2006, at 7:52 PM, Durga Choudhury wrote:


My opinion would be to use pthreads, for a couple of reasons:

1. You don't need an OMP aware compiler; any old compiler would do.


Compilers can be downloaded for free these days. And most of them  
have now OMP support. And on all operating systems (i.e. even the  
free Microsoft compiler now has OMP support, and Windows was  
definitively not the platform I expect to use for my OMP tasks).


2. The pthread library is more well adapted and hence might be more  
optimized than the code emitted from an OMP compiler.


The pthread library add a huge overhead for all operations. At this  
level granularity quite often you need atomic locks and operations,  
not critical sections protected by mutexes. Unfortunately, there is  
no portable library that give you a common interface to atomic  
operations (there was a BSD one at one point). Moreover, using  
threads instead of OMP directive move the burden on the programmer.  
Most of the people just cannot afford a one year student who has to  
first understand and then add the correct pthread directive inside.  
And for which result ... you don't even know that you will get the  
fastest version. On the other side OMP compilers are getting smarter  
and smarter every day. Today the results are quite impressive, just  
imagine what will happens in few years.




If your operating system is Linux, you may use the clone() system  
call directly; this would add further optimization at the expense  
of portability.


It's always a trade-off between performance and portability. What do  
you want to loose in order to get the 1% performance gain ... And in  
this case the only performance gain you will get is when you start  
the threads, otherwise you will not improve anything. Generally,  
people prefer to use threads pools in order to avoid the overhead of  
creating and destroying threads all the time.


  george.



Durga


On 9/5/06, George Bosilca  wrote:
On Sep 5, 2006, at 3:19 AM, Aidaros Dev wrote:

> Nowdays we hear about intel dual core processor, An Intel dual-core
> processor consists of two complete execution cores in one physical
> processor both running at the same frequency. Both cores share the
> same packaging and the same interface with the chipset/memory.
> Can I use MPI library to communicate these processors? Can we
> consider as they are separated?


Yes and yes. However, these architectures fit better on a different
programming model. If you want to get the max performance out of
them, a OMP approach (instead of MPI) is more suitable. Using
processes on such architecture is just a waste of performance. One
should use a thread model, with locking to insure the coordination
between memory accesses. Or let the underlying libraries do their
magic for you. As an example most of the mathematical codes based on
BLAS can use the GOTO BLAS (developed at TACC) to get multi-code (and
multi-CPU) support for free, as this library will do all BLAS
operation in parallel using multiple threads.

  george.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Devil wanted omnipresence;
He therefore created communists.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


"Half of what I say is meaningless; but I say it so that the other  
half may reach you"

  Kahlil Gibran




Re: [OMPI users] Does current Intel dual processor support MPI?

2006-09-05 Thread Durga Choudhury

My opinion would be to use pthreads, for a couple of reasons:

1. You don't need an OMP aware compiler; any old compiler would do.
2. The pthread library is more well adapted and hence might be more
optimized than the code emitted from an OMP compiler.

If your operating system is Linux, you may use the clone() system call
directly; this would add further optimization at the expense of portability.

Durga


On 9/5/06, George Bosilca  wrote:



On Sep 5, 2006, at 3:19 AM, Aidaros Dev wrote:

> Nowdays we hear about intel dual core processor, An Intel dual-core
> processor consists of two complete execution cores in one physical
> processor both running at the same frequency. Both cores share the
> same packaging and the same interface with the chipset/memory.
> Can I use MPI library to communicate these processors? Can we
> consider as they are separated?


Yes and yes. However, these architectures fit better on a different
programming model. If you want to get the max performance out of
them, a OMP approach (instead of MPI) is more suitable. Using
processes on such architecture is just a waste of performance. One
should use a thread model, with locking to insure the coordination
between memory accesses. Or let the underlying libraries do their
magic for you. As an example most of the mathematical codes based on
BLAS can use the GOTO BLAS (developed at TACC) to get multi-code (and
multi-CPU) support for free, as this library will do all BLAS
operation in parallel using multiple threads.

  george.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





--
Devil wanted omnipresence;
He therefore created communists.


Re: [OMPI users] Does current Intel dual processor support MPI?

2006-09-05 Thread Greg Lindahl
On Tue, Sep 05, 2006 at 11:50:54AM -0400, George Bosilca wrote:

> Yes and yes. However, these architectures fit better on a different  
> programming model. If you want to get the max performance out of  
> them, a OMP approach (instead of MPI) is more suitable.

Eh? You mean all of those examples of codes with OMP and MPI versions
that run faster with MPI are a figment of my imagination?

-- greg



Re: [OMPI users] Does current Intel dual processor support MPI?

2006-09-05 Thread George Bosilca


On Sep 5, 2006, at 3:19 AM, Aidaros Dev wrote:

Nowdays we hear about intel dual core processor, An Intel dual-core  
processor consists of two complete execution cores in one physical  
processor both running at the same frequency. Both cores share the  
same packaging and the same interface with the chipset/memory.
Can I use MPI library to communicate these processors? Can we  
consider as they are separated?



Yes and yes. However, these architectures fit better on a different  
programming model. If you want to get the max performance out of  
them, a OMP approach (instead of MPI) is more suitable. Using  
processes on such architecture is just a waste of performance. One  
should use a thread model, with locking to insure the coordination  
between memory accesses. Or let the underlying libraries do their  
magic for you. As an example most of the mathematical codes based on  
BLAS can use the GOTO BLAS (developed at TACC) to get multi-code (and  
multi-CPU) support for free, as this library will do all BLAS  
operation in parallel using multiple threads.


  george.



Re: [OMPI users] Does current Intel dual processor support MPI?

2006-09-05 Thread Peter Kjellström
On Tuesday 05 September 2006 09:19, Aidaros Dev wrote:
> Nowdays we hear about intel dual core processor, An Intel dual-core
> processor consists of two complete execution cores in one physical
> processor both running at the same frequency. Both cores share the same
> packaging and the same interface with the chipset/memory.
> Can I use MPI library to communicate these processors? Can we consider as
> they are separated?

You can treat one dual core processor like it was two normal single core 
processors. As such, MPI works fine as it does on any smp.

/Peter


pgpsrLaskJUbO.pgp
Description: PGP signature


[OMPI users] Does current Intel dual processor support MPI?

2006-09-05 Thread Aidaros Dev

Nowdays we hear about intel dual core processor, An Intel dual-core
processor consists of two complete execution cores in one physical processor
both running at the same frequency. Both cores share the same packaging and
the same interface with the chipset/memory.
Can I use MPI library to communicate these processors? Can we consider as
they are separated?

--
A friend in need Is a friend indeed