Jack,

   I have implemented the Paxos protocol (with several of its extensions)
and found it to be valuable in a transaction processing system.  The
throughput tends to be limited by the system's ability to write logs to the
disk, and the application's mis-use of the protocol (for instance, sending
large datum as values through the Paxos layer - not good!).

   The system required 5x disk bandwidth to network bandwidth - roughly
corresponding to the number of write points in the Paxos protocol.  This
could be relaxed for an in-memory system, or an application with
less-than-ACID requirements.

  Much of the literature discusses message costs, but I tend to find these
'over-conservative', for instance it is typical that multiple Paxos messages
are coalesced when sending a single packet to another server.  While this
does not reduce the 2-delay minimum, it does multiply the throughput.

  Definitely take a look at "Paxos Made Live" [1], a good discussion of the
engineering challenges associated with fault-tolerant programming.  I also
found "Paxos Revisited" [2] to be invaluable in constructing the
implementation.  Further, there are some clever failure detectors [3] and
leader-electors [4] to tie into your implementation.  Obviously, read all of
Lamport's work on the subject, especially the generalizations [5,6,7,8].

> I'm guessing that a simple Paxos can handle a few thousand elections per
second on a fast LAN

  I can confirm that - low thousands of instances per second per 100 Mbps.
We hit CPU and disk limits long before network limits (protocol only, not
including the application traffic, recovery stream, etc).

Let me know if you have other questions,
--Bryan
[EMAIL PROTECTED]

[1] "Paxos Made Live"
      http://www.chandrakin.com/paper2.pdf
[2] "Paxos Revisited"
      http://groups.csail.mit.edu/tds/paxos.html
[3] "Finally the Weakest Failure Detector for Non-Blocking Atomic Commit"
      http://citeseer.ist.psu.edu/635354.html
      (..see the chain of references for a full discussion..)
[4] ... can't seem to find the link.. sorry!  search for "optimal leader
election", should be in there.
[5] "Cheap Paxos"
[6] "Generalized Consensus and Paxos"
[7] "Fast Paxos"
[8] "Consensus on Transaction Commit" (and references)
      5..8: http://research.microsoft.com/users/lamport/pubs/pubs.html


On 10/1/07, Jack Lloyd <[EMAIL PROTECTED]> wrote:
>
> Hi Alen,
>
> It is definitely interesting from a view of what has been proven
> practical in a real world Paxos deployment, though Chubby was
> explicitly not built for speed. That Paxos is only used for server
> replication and not by the clients is an important point. I had read
> the paper assuming all Chubby data was actually stored within the
> Paxos state which doesn't seem to be the case at all; they communicate
> via an RPC protocol that seemingly is more like NFS than Paxos.
>
> -Jack
>
>
_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Reply via email to