Jeff, Thanks for the detailed discussion. It certainly makes things a lot clearer, just as I was giving up my hopes for a reply.
The app is fairly heavy on communication (~10k messages per minute) and is also embarrassingly parallel. Taking this into account, I think I'll readjust my resilience expectations and go with MPI as it will make communications a breeze to deal with. It does make sense to have the ability to add/remove processes on the go. In a multi-core hardware a scheduler could add more processes to an app as the hardware becomes freed up from other tasks. Of course that would be a problem for apps that require some type of data synchronisation (tightly coupled as you say). It would be nice to have the option of "mpirun -min 4 -max 16" and let the scheduler optimise based on availability. I'm currently running a test case on two machines with two cores each and, after one day, so far so good. We'll see how it goes. Thanks again dok On Dec 6, 2007 2:06 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > It certainly does make sense to use MPI for such a setup. But there > are some important things to consider: > > 1. MPI, at its heart, is a communications system. There's lots of > other bells and whistles (e.g., starting up a whole bunch of processes > in tandem), but at the core: it's all about passing messages. > > 2. MPI tends to lend itself to a fairly tightly coupled systems. The > usual model is that you start all of your parallel processes at the > same time (e.g., "mpirun -np 32 my_application"). The current state > of technology is *not* good in terms of fault tolerance -- most MPI's > (Open MPI included) will kill the entire job if any one of those > processes die. This is an important factor for running for weeks, > months, or years. > > (lots of good research is ongoing about fault tolerance and MPI, but > the existing solutions are still emphasizing tightly-coupled > applications or required a bunch of involvement from the application) > > 3. MPI also emphasizes performance: low latency, high bandwidth, good > concurrency, etc. > > If you don't need these things, for example, if your communication > between manager and worker is infrequent, and/or the overall > application time is not dominated by communication time, you might be > better served for [extremely] long-running applications by using a > simple (but resilient) sockets-based communication layer and not using > MPI. I say this mainly because of the fault tolerance issues involved > and the natural hardware MTBF values that we see on today's hardware. > > Hope that helps. > > > On Dec 4, 2007, at 1:15 PM, doktora v wrote: > > > Hi, although I did my due diligence on searching for this question, > > I apologise if this is a repeat. > > > > From an architectural point of view does it make sense to use MPI in > > the following scenario (for the purposes of resilience as much as > > parallelization): > > > > Each process is a long-running process (runs non-interrupted for > > weeks, months or even years) that collects and crunches some > > streaming data, for example temperature readings, and the data is > > replicated to R nodes. > > > > Because this is a diversion from the normal modus operandi (i.e. all > > data is immediately available), is there any obvious MPI issues that > > I am not considering in designing such an application? > > > > Here is a more detailed description of the app: > > > > A master receives the data and dispatches it according to some > > function such that each tuple is replicated R times to R of the N > > nodes (with R<=N). Suppose that there are K regions from which > > temperature readings stream in in the form of <K,T> where K is the > > region id and T is the temperature reading. The master sends <K,T> > > to R of the N nodes. These nodes maintain a long-term state of, say, > > the min/max readings. If R=N=2, the system is basically duplicated > > and if one of the two nodes dies inadvertently, the other one still > > has accounted for all the data. > > > > Here is some pseudo-code: > > > > int main(argc, argv) > > > > int N=10, R=3, K=200; > > > > Init(argc,argv); > > int rank=COMM_WORLD.Get_rank(); > > if(rank==0) { > > int lastnode = 1; > > while(read <k,T> from socket) > > for(i in 0:R) COMM_WORLD.Send(<k,T>,1,tuple,++lastnode%N,tag); > > } else { > > COMM_WORLD.Recv(<k,T>,1,tuple,any,tag,Info); > > process_message(<k,T>); > > } > > > > Many thanks for your time! > > Regards > > Dok > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > Cisco Systems > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >