ZAB kick Paxos butt?
Hello, Anyone familer with Paxos protocol here? I was doing some comparision of ZAB vs Paxos... first of all, ZAB's FIFO based protocol is really cool! http://wiki.apache.org/hadoop/ZooKeeper/PaxosRun mentioned the inconsistency case for Paxos(the state change B depends upon A, but A was not committed). In the Paxos made simple paper, author suggests fill the GAP (lost state machine changes) with NO OP opeartion. Now I have some serious doubts how could Paxos be any useful in the real world. yeah you do reach the consesus - albeit the content is inconsistent/corrupted !? E.g. on the wiki page, why the Paxos state machine allow fire off 27,28 concurrently where there is actually depedency? Shouldn't you wait instance 27 to be committed before start 28? Did I miss something? Thanks for the enlight! Cheers Qing
Re: ZAB kick Paxos butt?
hi Qing, i'm glad you like the page and Zab. yes, we are very familiar with Paxos. that page is meant to show a weakness of Paxos and a design point for Zab. it is not to say Paxos is not useful. Paxos is used in the real world in production systems. sometimes there are not order dependencies between messages, so Paxos is fine. in cases where order is important, multiple messages are batched into a single operation and only one operation is outstanding at a time. (i believe that this is what Chubby does, for example.) this is the solution you allude to: wait for 27 to commit before 28 is issued. for ZooKeeper we do have order dependencies and we wanted to have multiple operations in progress at various stages of the pipeline to allow us to lower latencies as well as increase our bandwidth utilization, which led us to Zab. ben Qing Yan wrote: Hello, Anyone familer with Paxos protocol here? I was doing some comparision of ZAB vs Paxos... first of all, ZAB's FIFO based protocol is really cool! http://wiki.apache.org/hadoop/ZooKeeper/PaxosRun mentioned the inconsistency case for Paxos(the state change B depends upon A, but A was not committed). In the Paxos made simple paper, author suggests fill the GAP (lost state machine changes) with NO OP opeartion. Now I have some serious doubts how could Paxos be any useful in the real world. yeah you do reach the consesus - albeit the content is inconsistent/corrupted !? E.g. on the wiki page, why the Paxos state machine allow fire off 27,28 concurrently where there is actually depedency? Shouldn't you wait instance 27 to be committed before start 28? Did I miss something? Thanks for the enlight! Cheers Qing
Re: ZAB kick Paxos butt?
Yeah, actually I have no doubts about Paxos protocol itself but rather the state machine implementation part (as described in Paxos made simple,section 3) where there could be multiple Paxos instances. shouldn't the Paxos instance execution be serialized in order to make the state machine abstraction useful/friendly for the real world use? if one paxo instance fails application will be notified so that corresponding actions could be taken(retry,rollback,notify client...etc), instead of blindly continuing and getting unpredictable results later on. Actually in the Google Chubby case, the database changelog is being streamed into the Paxos cluster,how can they afford to lose some of the logs without breaking the database integrity? Did I miss something? On the other hand, I think adopting the FIFO based protocol is a very smart engineering decision. It makes the whole thing less complicated and is also more powerful. E.g. it saves you guys the efforts to invent another language/compiler(like the Google ppl does). Just curious, giving how persuasive the TCP stack is deployed today, why the research community still need to stick to the asynchronous system assumption? Just because TCP sounds uncool than asynchronous system on paper? hehe.. Cheers Qing