Hi Otis, Would it be possible to change your app to use producer-consumer queue? That way, no document will go unprocessed when an instance goes down.
http://zookeeper.apache.org/doc/r3.3.2/zookeeperTutorial.html#sc_producerConsumerQueues --Michi On Mar 18, 2011, at 9:41 PM, Otis Gospodnetic wrote: > Hi, > > Thanks Ben. Let me describe the context a bit more - once you know what I'm > trying to do you may have suggestions for how to solve the problem with or > without ZK. > > I have a continuous "stream" of documents that I need to process. The stream > is > pretty fast (don't know the exact number, but it's many docs a second). Docs > live only in-memory in that stream and cannot be saved to disk at any point > up > the stream. > > My app listens to this stream. Because of the high document ingestion rate I > need N instances of my app to listen to this stream. So all N apps listen > and > they all "get" the same documents, but only 1 app actually processes each > document -- "if (docID mod N == appID) then process doc" -- the usual > consistent > hashing stuff. I'd like to be able to add and remove apps dynamically and > have > the remaining apps realize that "N" has changed. Similarly, when some app > instance dies and thus "N" changes, I'd like all the remaining instances to > know > about it. > > If my apps don't know the correct "N" then 1/Nth of docs will go unprocessed > (if > the app died or was removed) until the remaining apps adjust their local > value > of "N". > >> to deal with this applications can use views, which allow clients to >> reconcile differences. for example, if two processes communicate and > > Hm, this requires apps to communicate with each other. If each app was aware > of > other apps, then I could get the membership count directly using that > mechanism, > although I still wouldn't be able to immediately detect when some app died, > at > least I'm not sure how I could do that. > >> one has a different list of members than the other then they can both >> consult zookeeper to reconcile or use the membership list with the >> highest zxid. the other option is to count on eventually everyone >> converging. > > Right, if I could live with eventually having the right "N", then I could use > ZK > as described on > http://eng.wealthfront.com/2010/01/actually-implementing-group-management.html > > But say I'm OK with "eventually everyone converging". Can I use ZK then? > And > if so, how "eventually" is this "eventually"? That is, if an app dies, how > quickly can ZK notify all znode watchers that znode change? A few > milliseconds > or more? > > In general, how does one deal with situations like the one I described above, > where each app is responsible for 1/Nth of work and where N can > uncontrollably > and unexpectedly change? > > Thanks! > Otis > > > > > > ----- Original Message ---- >> From: Benjamin Reed <[email protected]> >> To: [email protected] >> Sent: Fri, March 18, 2011 5:59:43 PM >> Subject: Re: Using ZK for real-time group membership notification >> >> in a distributed setting such an answer is impossible. especially >> given the theory of relativity and the speed of light. a machine may >> fail right after sending a heart beat or another may come online right >> after sending a report. even if zookeeper could provide this you would >> still have thread scheduling issues on a local machine that means that >> you are operating on old information. >> >> to deal with this applications can use views, which allow clients to >> reconcile differences. for example, if two processes communicate and >> one has a different list of members than the other then they can both >> consult zookeeper to reconcile or use the membership list with the >> highest zxid. the other option is to count on eventually everyone >> converging. >> >> i would not develop a distributed system with the assumption that "all >> group members know *the exact number of members at all times*". >> >> ben >> >> On Fri, Mar 18, 2011 at 2:02 PM, Otis Gospodnetic >> <[email protected]> wrote: >>> Hi, >>> >>> Short version: >>> How can ZK be used to make sure that all group members know *the exact >> number of >>> members at all times*? >>> >>> I have an app that can be run on 1 or more servers. New instances of the >> app >>> come and go, may die, etc. -- the number of the app instances is completely >>> dynamic. At any one time, as these apps come and go, each live instance >>> of >> the >>> app needs to know how many instances are there total. If a new instance >>> of >> the >>> app is started, all instances need to know the new total number of >> instances. >>> If an app is stopped or if it dies, the remaining apps need to know the new >>> number of app instances. >>> >>> Also, and this is critical, they need to know about these >>> additions/removals >> of >>> apps right away and they all need to find out them at the same time. >> Basically, >>> all members of some group need to know *the exact number of members at all >>> times*. >>> >>> This sounds almost like we need to watch a "parent group znode" and monitor >>> >> the >>> number of its ephemeral children, which represent each app instance that is >>> watching the "parent groups znode". Is that right? If so, then all I'd >> need to >>> know is the answer to "How many watchers are watching this znode?" of "How >> many >>> kids does this znode have?". And I'd need ZK to notify all watchers >>> whenever >> the >>> answer to this question changes. Ideally it would send/push the answer > (the >>> number of watchers) to all watchers, but if not, I assume any watcher that >> is >>> notified about the change would go poll ZK to get the number of ephemeral >> kids. >>> >>> I think the above is essentially what's described on >>> >> http://eng.wealthfront.com/2010/01/actually-implementing-group-management.html >> , >>> but doesn't answer the part that's critical for me (the very first Q up >> above). >>> >>> Thanks, >>> Otis >>> ---- >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >>> Lucene ecosystem search :: http://search-lucene.com/ >>> >>> >>
