*Bump* On Tue, Jan 5, 2016 at 2:30 PM, singh.janmejay <[email protected]> wrote: > Thanks for the replies everyone, most of it was very useful. > > @Alexander: The section of chubby paper you pointed me to tries to > address exactly what I was looking for. That clearly is one good way > of doing it. Im also thinking of an alternative way and can use a > review or some feedback on that. > > @Powel: About x509 auth on intra-cluster communication, I don't have a > blocking need for it, as it can be achieved by setting up firewall > rules to accept only from desired hosts. It may be a good-to-have > thing though, because in cloud-based scenarios where IP addresses are > re-used, a recycled IP can still talk to a secure zk-cluster unless > config is changed to remove the old peer IP and replace it with the > new one. Its clearly a corner-case though. > > Here is the approach Im thinking of: > - Implement all operations(atleast master-triggered operations) on > operand machines idempotently > - Have master journal these operations to ZK before issuing RPC > - In case master fails with some of these operations in flight, the > newly elected master will need to read all issued (but not retired > yet) operations and issue them again. > - Existing master(before failure or after failure) can retry and > retire operations according to whatever the retry policy and > success-criterion is. > > Why am I thinking of this as opposed to going with chubby sequencer passing: > - I need to implement idempotency regardless, because recovery-path > involving master-death after successful execution of operation but > before writing ack to coordination service requires it. So idempotent > implementation complexity is here to stay. > - I need to increase surface-area of the architecture which is exposed > to coordination-service for sequencer validation. Which may bring > verification RPC in data-plane in some cases. > - The sequencer may expire after verification but before ack, in which > case new master may not recognize the operation as consistent with its > decisions (or previous decision path). > > Thoughts? Suggestions? > > > > On Sun, Jan 3, 2016 at 2:18 PM, Alexander Shraer <[email protected]> wrote: >> regarding atomic multi-znode updates -- check out "multi" updates >> <http://tdunning.blogspot.com/2011/06/tour-of-multi-update-for-zookeeper.html> >> . >> >> On Sat, Jan 2, 2016 at 10:45 PM, Alexander Shraer <[email protected]> wrote: >> >>> for 1, see the chubby paper >>> <http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf>, >>> section 2.4. >>> for 2, I'm not sure I fully understand the question, but essentially, ZK >>> guarantees that even during failures >>> consistency of updates is preserved. The user doesn't need to do anything >>> in particular to guarantee this, even >>> during leader failures. In such case, some suffix of operations executed >>> by the leader may be lost if they weren't >>> previously acked by a majority.However, none of these operations could >>> have been visible >>> to reads. >>> >>> On Fri, Jan 1, 2016 at 12:29 AM, powell molleti < >>> [email protected]> wrote: >>> >>>> Hi Janmejay, >>>> Regarding question 1, if a node takes a lock and the lock has timed-out >>>> from system perspective then it can mean that other nodes are free to take >>>> the lock and work on the resource. Hence the history could be well into the >>>> future when the previous node discovers the time-out. The question of >>>> rollback in the specific context depends on the implementation details, is >>>> the lock holder updating some common area?, then there could be corruption >>>> since other nodes are free to write in parallel to the first one?. In the >>>> usual sense a time-out of lock held means the node which held the lock is >>>> dead. It is upto the implementation to ensure this case and, using this >>>> primitive, if there is a timeout which means other nodes are sure that no >>>> one else is working on the resource and hence can move forward. >>>> Question 2 seems to imply the assumption that leader has significant work >>>> todo and leader change is quite common, which seems contrary to common >>>> implementation pattern. If the work can be broken down into smaller chunks >>>> which need serialization separately then each chunk/work type can have a >>>> different leader. >>>> For question 3, ZK does support auth and encryption for client >>>> connections but not for inter ZK node channels. Do you have requirement to >>>> secure inter ZK nodes, can you let us know what your requirements are so we >>>> can implement a solution to fit all needs?. >>>> For question 4 the official implementation is C, people tend to wrap that >>>> with C++ and there should projects that use ZK doing that you can look them >>>> up and see if you can separate it out and use them. >>>> Hope this helps.Powell. >>>> >>>> >>>> >>>> On Thursday, December 31, 2015 8:07 AM, Edward Capriolo < >>>> [email protected]> wrote: >>>> >>>> >>>> Q:What is the best way of handling distributed-lock expiry? The owner >>>> of the lock managed to acquire it and may be in middle of some >>>> computation when the session expires or lock expire >>>> >>>> If you are using Java a way I can see doing this is by using the >>>> ExecutorCompletionService >>>> >>>> https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorCompletionService.html >>>> . >>>> It allows you to keep your workers in a group, You can poll the group and >>>> provide cancel semantics as needed. >>>> An example of that service is here: >>>> >>>> https://github.com/edwardcapriolo/nibiru/blob/master/src/main/java/io/teknek/nibiru/coordinator/EventualCoordinator.java >>>> where I am issuing multiple reads and I want to abandon the process if >>>> they >>>> do not timeout in a while. Many async/promices frameworks do this by >>>> launching two task ComputationTask and a TimeoutTask that returns in 10 >>>> seconds. Then they ask the completions service to poll. If the service is >>>> given the TimoutTask after the timeout that means the Computation did not >>>> finish in time. >>>> >>>> Do people generally take action in middle of the computation (abort it and >>>> do itin a clever way such that effect appears atomic, so abort is >>>> notreally >>>> visible, if so what are some of those clever ways)? >>>> >>>> The base issue is java's synchronized/ AtomicReference do not have a >>>> rollback. >>>> >>>> There are a few ways I know to work around this. Clojure has STM (software >>>> Transactional Memory) such that if an exception is through inside a doSync >>>> all of the stems inside the critical block never happened. This assumes >>>> your using all clojure structures which you are probably not. >>>> A way co workers have done this is as follows. Move your entire >>>> transnational state into a SINGLE big object that you can >>>> copy/mutate/compare and swap. You never need to rollback each piece >>>> because >>>> your changing the clone up until the point you commit it. >>>> Writing reversal code is possible depending on the problem. There are >>>> questions to ask like "what if the reversal somehow fails?" >>>> >>>> >>>> >>>> >>>> On Thu, Dec 31, 2015 at 3:10 AM, singh.janmejay <[email protected] >>>> > >>>> wrote: >>>> >>>> > Hi, >>>> > >>>> > Was wondering if there are any reference designs, patterns on handling >>>> > common operations involving distributed coordination. >>>> > >>>> > I have a few questions and I guess they must have been asked before, I >>>> > am unsure what to search for to surface the right answers. It'll be >>>> > really valuable if someone can provide links to relevant >>>> > "best-practices guide" or "suggestions" per question or share some >>>> > wisdom or ideas on patterns to do this in the best way. >>>> > >>>> > 1. What is the best way of handling distributed-lock expiry? The owner >>>> > of the lock managed to acquire it and may be in middle of some >>>> > computation when the session expires or lock expires. When it finishes >>>> > that computation, it can tell that the lock expired, but do people >>>> > generally take action in middle of the computation (abort it and do it >>>> > in a clever way such that effect appears atomic, so abort is not >>>> > really visible, if so what are some of those clever ways)? Or is the >>>> > right thing to do, is to write reversal-code, such that operations can >>>> > be cleanly undone in case the verification at the end of computation >>>> > shows that lock expired? The later obviously is a lot harder to >>>> > achieve. >>>> > >>>> > 2. Same as above for leader-election scenarios. Leader generally >>>> > administers operations on data-systems that take significant time to >>>> > complete and have significant resource overhead and RPC to administer >>>> > such operations synchronously from leader to data-node can't be atomic >>>> > and can't be made latency-resilient to such a degree that issuing >>>> > operation across a large set of nodes on a cluster can be guaranteed >>>> > to finish without leader-change. What do people generally do in such >>>> > situations? How are timeouts for operations issued when operations are >>>> > issued using sequential-znode as a per-datanode dedicated queue? How >>>> > well does it scale, and what are some things to watch-out for >>>> > (operation-size, encoding, clustering into one znode for atomicity >>>> > etc)? Or how are atomic operations that need to be issued across >>>> > multiple data-nodes managed (do they have to be clobbered into one >>>> > znode)? >>>> > >>>> > 3. How do people secure zookeeper based services? Is >>>> > client-certificate-verification the recommended way? How well does >>>> > this work with C client? Is inter-zk-node communication done with >>>> > X509-auth too? >>>> > >>>> > 4. What other projects, reference-implementations or libraries should >>>> > I look at for working with C client? >>>> > >>>> > Most of what I have asked revolves around leader or lock-owner having >>>> > a false-failure (where it doesn't know that coordinator thinks it has >>>> > failed). >>>> > >>>> > -- >>>> > Regards, >>>> > Janmejay >>>> > http://codehunk.wordpress.com >>>> > >>>> >>>> >>>> >>>> >>> >>> > > > > -- > Regards, > Janmejay > http://codehunk.wordpress.com
-- Regards, Janmejay http://codehunk.wordpress.com
