Yes, that will depend on the way idempotency is implemented. The way I plan to implement it is by using a monotonically increasing operation-id. Any operation with id lower than last op-id will be identified as stale and will not be executed. Because only one op is executed at a time, and ops are executed in absolute order, op-id level identification for staleness is sufficient.
-- Regards, Janmejay PS: Please blame the typos in this mail on my phone's uncivilized soft keyboard sporting it's not-so-smart-assist technology. On Jan 13, 2016 11:49 PM, "Alexander Shraer" <[email protected]> wrote: > I may be wrong but I don't think that being idempotent gives you what you > said. Just because f(f(x))=f(x) doesn't mean that f(g(f(x))) = g(f(x)) -- > this was my example. But if your system can detect that X was already > executed (or if the operations are conditional on state) my scenario indeed > can't happen. > > > On Wed, Jan 13, 2016 at 2:08 AM, singh.janmejay <[email protected]> > wrote: > > > @Alexander: In that scenario, write of X will be attempted by A, but > > external system will not act upon write-X because that operation has > > already been acted upon in the past. This is guaranteed by idempotent > > operations invariant. But it does point out another problem, which I > > hadn't handled in my original algorithm. Problem: If X and Y have both > > not been issued yet, and if Y is issued before X towards external > > system, because neither operations have executed yet, it'll overwrite > > Y with X. I need another constraint, master should only issue 1 > > operation on a certain external-system at a time and must issue > > operations in the order of operation-id (sequential-znode sequence > > number). So we need the following invariants: > > - order of issuing operation being fixed (matching order of creation > > of operations) > > - concurrency of operation fixed to 1 > > - idempotent execution on external-system side > > > > @Powell: Im kind of doing the same thing. Except the loop doesn't run > > on consumer, instead it runs on master, which is assigning work to > > consumers. So triggerWork function is basically changed to issueWork, > > which is RPC + triggerWork. The replay if history is basically just > > replay of 1 operation per operand-node (in this thread we are calling > > it external-system), so its as if triggerWork failed, in which case we > > need to re-execute triggerWork. Idempotency also follows from that > > requirement. If triggerWork fails in the last step, and all the > > desired effect that was necessary has happened, we will still need to > > run triggerWork again, but we need awareness that actual work has been > > done, which is why idempotency is necessary. > > > > Btw, thanks for continuing to spare time for this, I really appreciate > > this feedback/validation. > > > > Thoughts? > > > > On Wed, Jan 13, 2016 at 3:47 AM, powell molleti > > <[email protected]> wrote: > > > Wouldn't a distributed queue recipe for consumer work?. Where one needs > > to add extra logic something like this: > > > > > > with lock() { > > > p = queue.peek() > > > if triggerWork(p) is Done: > > > queue.pop() > > > } > > > > > > With this a consumer that worked on it but crashed before popping the > > queue would result in another consumer processing the same work. > > > > > > I am not sure with the details of where you are getting the work from > > and the scale of it is but producers(leader) could write to this queue. > > Since there is guarantee of read after write , producer could delete from > > its local queue the work that was successfully queued. Hence again new > > producer could re-send the last entry of work so one has to handle that. > > Without more details on the origin of work etc its hard to design end to > > end. > > > > > > I do not see a need to do a total replay of past history etc when using > > ZK like system because ZK is built on idea of serialized and replicated > > log, hence if you are using ZK then your design should be much simpler > i.e > > fail and re-start from last know transaction. > > > > > > Powell. > > > > > > > > > > > > On Tuesday, January 12, 2016 11:51 AM, Alexander Shraer < > > [email protected]> wrote: > > > Hi, > > > > > > With your suggestion, the following scenario seems possible: master A > is > > > about to write value X to an external system so it logs it to ZK, then > > > freezes for some time, ZK suspects it as failed, another master B is > > > elected, writes X (completing what A wanted to do) > > > then starts doing something else and writes Y. Then leader A "wakes up" > > and > > > re-executes the old operation writing X which is now stale. > > > > > > perhaps if your external system supports conditional updates this can > be > > > avoided - a write of X only works if the current state is as expected. > > > > > > Alex > > > > > > > > > On Tue, Jan 5, 2016 at 1:00 AM, singh.janmejay < > [email protected] > > > > > > wrote: > > > > > >> Thanks for the replies everyone, most of it was very useful. > > >> > > >> @Alexander: The section of chubby paper you pointed me to tries to > > >> address exactly what I was looking for. That clearly is one good way > > >> of doing it. Im also thinking of an alternative way and can use a > > >> review or some feedback on that. > > >> > > >> @Powel: About x509 auth on intra-cluster communication, I don't have a > > >> blocking need for it, as it can be achieved by setting up firewall > > >> rules to accept only from desired hosts. It may be a good-to-have > > >> thing though, because in cloud-based scenarios where IP addresses are > > >> re-used, a recycled IP can still talk to a secure zk-cluster unless > > >> config is changed to remove the old peer IP and replace it with the > > >> new one. Its clearly a corner-case though. > > >> > > >> Here is the approach Im thinking of: > > >> - Implement all operations(atleast master-triggered operations) on > > >> operand machines idempotently > > >> - Have master journal these operations to ZK before issuing RPC > > >> - In case master fails with some of these operations in flight, the > > >> newly elected master will need to read all issued (but not retired > > >> yet) operations and issue them again. > > >> - Existing master(before failure or after failure) can retry and > > >> retire operations according to whatever the retry policy and > > >> success-criterion is. > > >> > > >> Why am I thinking of this as opposed to going with chubby sequencer > > >> passing: > > >> - I need to implement idempotency regardless, because recovery-path > > >> involving master-death after successful execution of operation but > > >> before writing ack to coordination service requires it. So idempotent > > >> implementation complexity is here to stay. > > >> - I need to increase surface-area of the architecture which is exposed > > >> to coordination-service for sequencer validation. Which may bring > > >> verification RPC in data-plane in some cases. > > >> - The sequencer may expire after verification but before ack, in which > > >> case new master may not recognize the operation as consistent with its > > >> decisions (or previous decision path). > > >> > > >> Thoughts? Suggestions? > > >> > > >> > > >> > > >> On Sun, Jan 3, 2016 at 2:18 PM, Alexander Shraer <[email protected]> > > >> wrote: > > >> > regarding atomic multi-znode updates -- check out "multi" updates > > >> > < > > >> > > > http://tdunning.blogspot.com/2011/06/tour-of-multi-update-for-zookeeper.html > > >> > > > >> > . > > >> > > > >> > On Sat, Jan 2, 2016 at 10:45 PM, Alexander Shraer < > [email protected]> > > >> wrote: > > >> > > > >> >> for 1, see the chubby paper > > >> >> < > > >> > > > http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf > > >> >, > > >> >> section 2.4. > > >> >> for 2, I'm not sure I fully understand the question, but > > essentially, ZK > > >> >> guarantees that even during failures > > >> >> consistency of updates is preserved. The user doesn't need to do > > >> anything > > >> >> in particular to guarantee this, even > > >> >> during leader failures. In such case, some suffix of operations > > executed > > >> >> by the leader may be lost if they weren't > > >> >> previously acked by a majority.However, none of these operations > > could > > >> >> have been visible > > >> >> to reads. > > >> >> > > >> >> On Fri, Jan 1, 2016 at 12:29 AM, powell molleti < > > >> >> [email protected]> wrote: > > >> >> > > >> >>> Hi Janmejay, > > >> >>> Regarding question 1, if a node takes a lock and the lock has > > timed-out > > >> >>> from system perspective then it can mean that other nodes are free > > to > > >> take > > >> >>> the lock and work on the resource. Hence the history could be well > > >> into the > > >> >>> future when the previous node discovers the time-out. The question > > of > > >> >>> rollback in the specific context depends on the implementation > > >> details, is > > >> >>> the lock holder updating some common area?, then there could be > > >> corruption > > >> >>> since other nodes are free to write in parallel to the first one?. > > In > > >> the > > >> >>> usual sense a time-out of lock held means the node which held the > > lock > > >> is > > >> >>> dead. It is upto the implementation to ensure this case and, using > > this > > >> >>> primitive, if there is a timeout which means other nodes are sure > > that > > >> no > > >> >>> one else is working on the resource and hence can move forward. > > >> >>> Question 2 seems to imply the assumption that leader has > significant > > >> work > > >> >>> todo and leader change is quite common, which seems contrary to > > common > > >> >>> implementation pattern. If the work can be broken down into > smaller > > >> chunks > > >> >>> which need serialization separately then each chunk/work type can > > have > > >> a > > >> >>> different leader. > > >> >>> For question 3, ZK does support auth and encryption for client > > >> >>> connections but not for inter ZK node channels. Do you have > > >> requirement to > > >> >>> secure inter ZK nodes, can you let us know what your requirements > > are > > >> so we > > >> >>> can implement a solution to fit all needs?. > > >> >>> For question 4 the official implementation is C, people tend to > wrap > > >> that > > >> >>> with C++ and there should projects that use ZK doing that you can > > look > > >> them > > >> >>> up and see if you can separate it out and use them. > > >> >>> Hope this helps.Powell. > > >> >>> > > >> >>> > > >> >>> > > >> >>> On Thursday, December 31, 2015 8:07 AM, Edward Capriolo < > > >> >>> [email protected]> wrote: > > >> >>> > > >> >>> > > >> >>> Q:What is the best way of handling distributed-lock expiry? The > > owner > > >> >>> of the lock managed to acquire it and may be in middle of some > > >> >>> computation when the session expires or lock expire > > >> >>> > > >> >>> If you are using Java a way I can see doing this is by using the > > >> >>> ExecutorCompletionService > > >> >>> > > >> >>> > > >> > > > https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorCompletionService.html > > >> >>> . > > >> >>> It allows you to keep your workers in a group, You can poll the > > group > > >> and > > >> >>> provide cancel semantics as needed. > > >> >>> An example of that service is here: > > >> >>> > > >> >>> > > >> > > > https://github.com/edwardcapriolo/nibiru/blob/master/src/main/java/io/teknek/nibiru/coordinator/EventualCoordinator.java > > >> >>> where I am issuing multiple reads and I want to abandon the > process > > if > > >> >>> they > > >> >>> do not timeout in a while. Many async/promices frameworks do this > by > > >> >>> launching two task ComputationTask and a TimeoutTask that returns > > in 10 > > >> >>> seconds. Then they ask the completions service to poll. If the > > service > > >> is > > >> >>> given the TimoutTask after the timeout that means the Computation > > did > > >> not > > >> >>> finish in time. > > >> >>> > > >> >>> Do people generally take action in middle of the computation > (abort > > it > > >> and > > >> >>> do itin a clever way such that effect appears atomic, so abort is > > >> >>> notreally > > >> >>> visible, if so what are some of those clever ways)? > > >> >>> > > >> >>> The base issue is java's synchronized/ AtomicReference do not > have a > > >> >>> rollback. > > >> >>> > > >> >>> There are a few ways I know to work around this. Clojure has STM > > >> (software > > >> >>> Transactional Memory) such that if an exception is through inside > a > > >> doSync > > >> >>> all of the stems inside the critical block never happened. This > > assumes > > >> >>> your using all clojure structures which you are probably not. > > >> >>> A way co workers have done this is as follows. Move your entire > > >> >>> transnational state into a SINGLE big object that you can > > >> >>> copy/mutate/compare and swap. You never need to rollback each > piece > > >> >>> because > > >> >>> your changing the clone up until the point you commit it. > > >> >>> Writing reversal code is possible depending on the problem. There > > are > > >> >>> questions to ask like "what if the reversal somehow fails?" > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> On Thu, Dec 31, 2015 at 3:10 AM, singh.janmejay < > > >> [email protected] > > >> >>> > > > >> >>> wrote: > > >> >>> > > >> >>> > Hi, > > >> >>> > > > >> >>> > Was wondering if there are any reference designs, patterns on > > >> handling > > >> >>> > common operations involving distributed coordination. > > >> >>> > > > >> >>> > I have a few questions and I guess they must have been asked > > before, > > >> I > > >> >>> > am unsure what to search for to surface the right answers. It'll > > be > > >> >>> > really valuable if someone can provide links to relevant > > >> >>> > "best-practices guide" or "suggestions" per question or share > some > > >> >>> > wisdom or ideas on patterns to do this in the best way. > > >> >>> > > > >> >>> > 1. What is the best way of handling distributed-lock expiry? The > > >> owner > > >> >>> > of the lock managed to acquire it and may be in middle of some > > >> >>> > computation when the session expires or lock expires. When it > > >> finishes > > >> >>> > that computation, it can tell that the lock expired, but do > people > > >> >>> > generally take action in middle of the computation (abort it and > > do > > >> it > > >> >>> > in a clever way such that effect appears atomic, so abort is not > > >> >>> > really visible, if so what are some of those clever ways)? Or is > > the > > >> >>> > right thing to do, is to write reversal-code, such that > operations > > >> can > > >> >>> > be cleanly undone in case the verification at the end of > > computation > > >> >>> > shows that lock expired? The later obviously is a lot harder to > > >> >>> > achieve. > > >> >>> > > > >> >>> > 2. Same as above for leader-election scenarios. Leader generally > > >> >>> > administers operations on data-systems that take significant > time > > to > > >> >>> > complete and have significant resource overhead and RPC to > > administer > > >> >>> > such operations synchronously from leader to data-node can't be > > >> atomic > > >> >>> > and can't be made latency-resilient to such a degree that > issuing > > >> >>> > operation across a large set of nodes on a cluster can be > > guaranteed > > >> >>> > to finish without leader-change. What do people generally do in > > such > > >> >>> > situations? How are timeouts for operations issued when > operations > > >> are > > >> >>> > issued using sequential-znode as a per-datanode dedicated queue? > > How > > >> >>> > well does it scale, and what are some things to watch-out for > > >> >>> > (operation-size, encoding, clustering into one znode for > atomicity > > >> >>> > etc)? Or how are atomic operations that need to be issued across > > >> >>> > multiple data-nodes managed (do they have to be clobbered into > one > > >> >>> > znode)? > > >> >>> > > > >> >>> > 3. How do people secure zookeeper based services? Is > > >> >>> > client-certificate-verification the recommended way? How well > does > > >> >>> > this work with C client? Is inter-zk-node communication done > with > > >> >>> > X509-auth too? > > >> >>> > > > >> >>> > 4. What other projects, reference-implementations or libraries > > should > > >> >>> > I look at for working with C client? > > >> >>> > > > >> >>> > Most of what I have asked revolves around leader or lock-owner > > having > > >> >>> > a false-failure (where it doesn't know that coordinator thinks > it > > has > > >> >>> > failed). > > >> >>> > > > >> >>> > -- > > >> >>> > Regards, > > >> >>> > Janmejay > > >> >>> > http://codehunk.wordpress.com > > >> >>> > > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >> > > >> >> > > >> > > >> > > >> > > >> -- > > >> Regards, > > >> Janmejay > > >> http://codehunk.wordpress.com > > >> > > > > > > > > -- > > Regards, > > Janmejay > > http://codehunk.wordpress.com > > >
