Eitan Zahavi wrote:
Leonid just sent an example for a race that might happen if the SM is to
be the maintainer of the data.

The race Leonid mentioned is a client sending a request when the SM is down. That request will fail, so there's no data for the SM to maintain for that node. That's a retry condition that the client must deal with.

[EZ] The SM is a single entity that has to respond to all requests from
the entire cluster. (Even redirection requests). When you require that
SM to also provide transaction safe storage or even worse then that
consistency with multiple standby SMs you worsen the problem. The
clients on the their side only need to maintain their own registrations.

I don't believe that there's any requirement that the SM be a single system. But I do believe that the SM should be able to recover from all SM problems without interrupting any existing communication that is occurring the fabric. SM failover or failure/restart should be as transparent to the clients (i.e the non-SM nodes in the fabric) as possible. (Btw, I also believe that the SM should run on top of a real DBMS and support SQL style queries...)

You don't want to push this problem to every application running in the fabric, so why even push it to every node in the fabric?

- Sean
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to