Re: [OMPI users] arch question: long running app

2007-12-06 Thread doktora v
Jeff,
Thanks for the detailed discussion. It certainly makes things a lot clearer,
just as I was giving up my hopes for a reply.

The app is fairly heavy on communication (~10k messages per minute) and is
also embarrassingly parallel. Taking this into account, I think I'll
readjust my resilience expectations and go with MPI as it will make
communications a breeze to deal with.

It does make sense to have the ability to add/remove processes on the go. In
a multi-core hardware a scheduler could add more processes to an app as the
hardware becomes freed up from other tasks. Of course that would be a
problem for apps that require some type of data synchronisation (tightly
coupled as you say). It would be nice to have the option of "mpirun -min 4
-max 16" and let the scheduler optimise based on availability.

I'm currently running a test case on two machines with two cores each and,
after one day, so far so good. We'll see how it goes.

Thanks again
dok

On Dec 6, 2007 2:06 PM, Jeff Squyres <jsquy...@cisco.com> wrote:

> It certainly does make sense to use MPI for such a setup.  But there
> are some important things to consider:
>
> 1. MPI, at its heart, is a communications system.  There's lots of
> other bells and whistles (e.g., starting up a whole bunch of processes
> in tandem), but at the core: it's all about passing messages.
>
> 2. MPI tends to lend itself to a fairly tightly coupled systems.  The
> usual model is that you start all of your parallel processes at the
> same time (e.g., "mpirun -np 32 my_application").  The current state
> of technology is *not* good in terms of fault tolerance -- most MPI's
> (Open MPI included) will kill the entire job if any one of those
> processes die.  This is an important factor for running for weeks,
> months, or years.
>
> (lots of good research is ongoing about fault tolerance and MPI, but
> the existing solutions are still emphasizing tightly-coupled
> applications or required a bunch of involvement from the application)
>
> 3. MPI also emphasizes performance: low latency, high bandwidth, good
> concurrency, etc.
>
> If you don't need these things, for example, if your communication
> between manager and worker is infrequent, and/or the overall
> application time is not dominated by communication time, you might be
> better served for [extremely] long-running applications by using a
> simple (but resilient) sockets-based communication layer and not using
> MPI.  I say this mainly because of the fault tolerance issues involved
> and the natural hardware MTBF values that we see on today's hardware.
>
> Hope that helps.
>
>
> On Dec 4, 2007, at 1:15 PM, doktora v wrote:
>
> > Hi, although I did my due diligence on searching for this question,
> > I apologise if this is a repeat.
> >
> > From an architectural point of view does it make sense to use MPI in
> > the following scenario (for the purposes of resilience as much as
> > parallelization):
> >
> > Each process is a long-running process (runs non-interrupted for
> > weeks, months or even years) that collects and crunches some
> > streaming data, for example temperature readings, and the data is
> > replicated to R nodes.
> >
> > Because this is a diversion from the normal modus operandi (i.e. all
> > data is immediately available), is there any obvious MPI issues that
> > I am not considering in designing such an application?
> >
> > Here is a more detailed description of the app:
> >
> > A master receives the data and dispatches it according to some
> > function such that each tuple is replicated R times to R of the N
> > nodes (with R<=N). Suppose that there are K regions from which
> > temperature readings stream in  in the form of <K,T> where K is the
> > region id and T is the temperature reading. The master sends <K,T>
> > to R of the N nodes. These nodes maintain a long-term state of, say,
> > the min/max readings. If R=N=2, the system is basically duplicated
> > and if one of the two nodes dies inadvertently, the other one still
> > has accounted for all the data.
> >
> > Here is some pseudo-code:
> >
> > int main(argc, argv)
> >
> > int N=10, R=3, K=200;
> >
> > Init(argc,argv);
> > int rank=COMM_WORLD.Get_rank();
> > if(rank==0) {
> >  int lastnode = 1;
> >  while(read <k,T> from socket)
> >for(i in 0:R) COMM_WORLD.Send(<k,T>,1,tuple,++lastnode%N,tag);
> > } else {
> >   COMM_WORLD.Recv(<k,T>,1,tuple,any,tag,Info);
> >process_message(<k,T>);
> > }
> >
> > Many thanks for your time!
> > Regards
> > Dok
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] arch question: long running app

2007-12-04 Thread doktora v
Hi, although I did my due diligence on searching for this question,
I apologise if this is a repeat.
>From an architectural point of view does it make sense to use MPI in the
following scenario (for the purposes of resilience as much as
parallelization):

Each process is a long-running process (runs non-interrupted for weeks,
months or even years) that collects and crunches some streaming data, for
example temperature readings, and the data is replicated to R nodes.

Because this is a diversion from the normal modus operandi (i.e. all data is
immediately available), is there any obvious MPI issues that I am not
considering in designing such an application?

Here is a more detailed description of the app:

A master receives the data and dispatches it according to some function such
that each tuple is replicated R times to R of the N nodes (with R<=N).
Suppose that there are K regions from which temperature readings stream in
 in the form of  where K is the region id and T is the temperature
reading. The master sends  to R of the N nodes. These nodes maintain a
long-term state of, say, the min/max readings. If R=N=2, the system is
basically duplicated and if one of the two nodes dies inadvertently, the
other one still has accounted for all the data.

Here is some pseudo-code:

int main(argc, argv)

int N=10, R=3, K=200;


Init(argc,argv);

int rank=COMM_WORLD.Get_rank();
if(rank==0) {
 int lastnode = 1;
 while(read  from socket)
   for(i in 0:R) COMM_WORLD.Send(,1,tuple,++lastnode%N,tag);
} else {
 COMM_WORLD.Recv(,1,tuple,any,tag,Info);
   process_message();
}

Many thanks for your time!
Regards
Dok