well, not transaction rate but transaction count.. you can get the rate out of that :-D
-rgs On 7 July 2014 09:50, Raúl Gutiérrez Segalés <[email protected]> wrote: > On 7 July 2014 09:39, Aaron Zimmerman <[email protected]> wrote: > >> What I don't understand is how the entire cluster could die in such a >> situation. I was able to load zookeeper locally using the snapshot and >> 10g >> log file without apparent issue. > > > Sure, but it's syncing up with other learners that becomes challenging > when having either big snapshots or too many txnlogs, right? > > >> I can see how large amounts of data could >> cause latency issues in syncing causing a single worker to die, but how >> would that explain the node's inability to restart? When the server >> replays the log file, does it have to sync the transactions to other nodes >> while it does so? >> > > Given that your txn churn is so big, by the time it finished up reading > from disc it'll need > to catch up with the quorum.. how many txns have happened by that point? > By the way, we use > this patch: > > https://issues.apache.org/jira/browse/ZOOKEEPER-1804 > > to measure transaction rate, do you have any approximation of what your > transaction rate might be? > > >> >> I can alter the settings as has been discussed, but I worry that I'm just >> delaying the same thing from happening again, if I deploy another storm >> topology or something. How can I get the cluster in a state where I can >> be >> confident that it won't crash in a similar way as load increases, or at >> least set up some kind of monitoring that will let me know something is >> unhealthy? >> > > I think it depends on what your txn rate is, lets measure that first I > guess. > > > -rgs >
