Dear all,
May I help in this context ? I can't promise to do big things or high
availability in this regard, because I may get more busy in my work.
And also I am not sure that my
company will allow this or not. Any way I may do this in my spare time.
Thanks & Regards,
On 12/23/09, Ralph
That's just OMPI's default behavior - as Josh said, we are working towards
allowing other behaviors, but for now, this is what we have.
On Dec 23, 2009, at 5:40 AM, vipin kumar wrote:
> Thank you Ralph,
>
> I did as you said. Programs are running fine, But still killing one process
> leads
Thank you Ralph,
I did as you said. Programs are running fine, But still killing one process
leads to terminate all processes. Am I missing something? Any thing else to
be called with MPI::Comm::Disconnect()?
Thanks & Regards,
On Mon, Dec 21, 2009 at 8:00 PM, Ralph Castain
Disconnect is a -collective- operation. Both parent and child have to call it.
Your child process is "hanging" while it waits for the parent.
On Dec 21, 2009, at 1:37 AM, vipin kumar wrote:
> Hello folks,
>
> As I explained my problem earlier, I am looking for Fault Tolerance in MPI
>
Hello folks,
As I explained my problem earlier, I am looking for Fault Tolerance in MPI
Programs. I read in Open MPI 2.1 standard document that two DISCONNECTED
processes does not affect each other, i.e. they can die or can be killed
without whithout affecting other processes.
So, I was trying
Unfortunately I cannot provide a precise time frame for availability
at this point, but we are targeting the v1.5 release series. There is
a handful of core developers working on this issue at the moment.
Pieces of this work have already made it into the Open MPI
development trunk. If you
Task-farm or manager/worker recovery models typically depend on
intercommunicators (i.e., from MPI_Comm_spawn) and a resilient MPI
implementation. William Gropp and Ewing Lusk have a paper entitled
"Fault Tolerance in MPI Programs" that outlines how an application
might take advantage of
Is that kind of approach possible within an MPI framework? Perhaps a
grid approach would be better. More experienced people, speak up,
please?
(The reason I say that is that I too am interested in the solution of
that kind of problem, where an individual blade of a blade server
fails and
Hi
I guess "task-farming" could give you a certain amount of the kind of
fault-tolerance you want.
(i.e. a master process distributes tasks to idle slave processors -
however, this will only work
if the slave processes don't need to communicate with each other)
Jody
On Mon, Aug 3, 2009 at 1:24
Hi all,
Thanks Durga for your reply.
Jeff, once you wrote code for Mandelbrot set to demonstrate fault tolerance
in LAM-MPI. i. e. killing any slave process doesn't
affect others. Exact behaviour I am looking for in Open MPI. I attempted,
but no luck. Can you please tell how to write such
Although I have perhaps the least experience on the topic in this
list, I will take a shot; more experienced people, please correct me:
MPI standards specify communication mechanism, not fault tolerance at
any level. You may achieve network tolerance at the IP level by
implementing 'equal cost
11 matches
Mail list logo