regarding atomic multi-znode updates -- check out "multi" updates <http://tdunning.blogspot.com/2011/06/tour-of-multi-update-for-zookeeper.html> .
On Sat, Jan 2, 2016 at 10:45 PM, Alexander Shraer <[email protected]> wrote: > for 1, see the chubby paper > <http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf>, > section 2.4. > for 2, I'm not sure I fully understand the question, but essentially, ZK > guarantees that even during failures > consistency of updates is preserved. The user doesn't need to do anything > in particular to guarantee this, even > during leader failures. In such case, some suffix of operations executed > by the leader may be lost if they weren't > previously acked by a majority.However, none of these operations could > have been visible > to reads. > > On Fri, Jan 1, 2016 at 12:29 AM, powell molleti < > [email protected]> wrote: > >> Hi Janmejay, >> Regarding question 1, if a node takes a lock and the lock has timed-out >> from system perspective then it can mean that other nodes are free to take >> the lock and work on the resource. Hence the history could be well into the >> future when the previous node discovers the time-out. The question of >> rollback in the specific context depends on the implementation details, is >> the lock holder updating some common area?, then there could be corruption >> since other nodes are free to write in parallel to the first one?. In the >> usual sense a time-out of lock held means the node which held the lock is >> dead. It is upto the implementation to ensure this case and, using this >> primitive, if there is a timeout which means other nodes are sure that no >> one else is working on the resource and hence can move forward. >> Question 2 seems to imply the assumption that leader has significant work >> todo and leader change is quite common, which seems contrary to common >> implementation pattern. If the work can be broken down into smaller chunks >> which need serialization separately then each chunk/work type can have a >> different leader. >> For question 3, ZK does support auth and encryption for client >> connections but not for inter ZK node channels. Do you have requirement to >> secure inter ZK nodes, can you let us know what your requirements are so we >> can implement a solution to fit all needs?. >> For question 4 the official implementation is C, people tend to wrap that >> with C++ and there should projects that use ZK doing that you can look them >> up and see if you can separate it out and use them. >> Hope this helps.Powell. >> >> >> >> On Thursday, December 31, 2015 8:07 AM, Edward Capriolo < >> [email protected]> wrote: >> >> >> Q:What is the best way of handling distributed-lock expiry? The owner >> of the lock managed to acquire it and may be in middle of some >> computation when the session expires or lock expire >> >> If you are using Java a way I can see doing this is by using the >> ExecutorCompletionService >> >> https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorCompletionService.html >> . >> It allows you to keep your workers in a group, You can poll the group and >> provide cancel semantics as needed. >> An example of that service is here: >> >> https://github.com/edwardcapriolo/nibiru/blob/master/src/main/java/io/teknek/nibiru/coordinator/EventualCoordinator.java >> where I am issuing multiple reads and I want to abandon the process if >> they >> do not timeout in a while. Many async/promices frameworks do this by >> launching two task ComputationTask and a TimeoutTask that returns in 10 >> seconds. Then they ask the completions service to poll. If the service is >> given the TimoutTask after the timeout that means the Computation did not >> finish in time. >> >> Do people generally take action in middle of the computation (abort it and >> do itin a clever way such that effect appears atomic, so abort is >> notreally >> visible, if so what are some of those clever ways)? >> >> The base issue is java's synchronized/ AtomicReference do not have a >> rollback. >> >> There are a few ways I know to work around this. Clojure has STM (software >> Transactional Memory) such that if an exception is through inside a doSync >> all of the stems inside the critical block never happened. This assumes >> your using all clojure structures which you are probably not. >> A way co workers have done this is as follows. Move your entire >> transnational state into a SINGLE big object that you can >> copy/mutate/compare and swap. You never need to rollback each piece >> because >> your changing the clone up until the point you commit it. >> Writing reversal code is possible depending on the problem. There are >> questions to ask like "what if the reversal somehow fails?" >> >> >> >> >> On Thu, Dec 31, 2015 at 3:10 AM, singh.janmejay <[email protected] >> > >> wrote: >> >> > Hi, >> > >> > Was wondering if there are any reference designs, patterns on handling >> > common operations involving distributed coordination. >> > >> > I have a few questions and I guess they must have been asked before, I >> > am unsure what to search for to surface the right answers. It'll be >> > really valuable if someone can provide links to relevant >> > "best-practices guide" or "suggestions" per question or share some >> > wisdom or ideas on patterns to do this in the best way. >> > >> > 1. What is the best way of handling distributed-lock expiry? The owner >> > of the lock managed to acquire it and may be in middle of some >> > computation when the session expires or lock expires. When it finishes >> > that computation, it can tell that the lock expired, but do people >> > generally take action in middle of the computation (abort it and do it >> > in a clever way such that effect appears atomic, so abort is not >> > really visible, if so what are some of those clever ways)? Or is the >> > right thing to do, is to write reversal-code, such that operations can >> > be cleanly undone in case the verification at the end of computation >> > shows that lock expired? The later obviously is a lot harder to >> > achieve. >> > >> > 2. Same as above for leader-election scenarios. Leader generally >> > administers operations on data-systems that take significant time to >> > complete and have significant resource overhead and RPC to administer >> > such operations synchronously from leader to data-node can't be atomic >> > and can't be made latency-resilient to such a degree that issuing >> > operation across a large set of nodes on a cluster can be guaranteed >> > to finish without leader-change. What do people generally do in such >> > situations? How are timeouts for operations issued when operations are >> > issued using sequential-znode as a per-datanode dedicated queue? How >> > well does it scale, and what are some things to watch-out for >> > (operation-size, encoding, clustering into one znode for atomicity >> > etc)? Or how are atomic operations that need to be issued across >> > multiple data-nodes managed (do they have to be clobbered into one >> > znode)? >> > >> > 3. How do people secure zookeeper based services? Is >> > client-certificate-verification the recommended way? How well does >> > this work with C client? Is inter-zk-node communication done with >> > X509-auth too? >> > >> > 4. What other projects, reference-implementations or libraries should >> > I look at for working with C client? >> > >> > Most of what I have asked revolves around leader or lock-owner having >> > a false-failure (where it doesn't know that coordinator thinks it has >> > failed). >> > >> > -- >> > Regards, >> > Janmejay >> > http://codehunk.wordpress.com >> > >> >> >> >> > >
