Ben, The log is binary. Is there a log reader? Also, can I just look at the log on any zookeeper server?
Thanks, Jun On Fri, Jun 3, 2011 at 10:18 AM, Benjamin Reed <[email protected]> wrote: > actually, i think the transaction log could help a lot, and that will > always be there. two scenarios i can think of are: > 1) the change happened before the watch was set > 2) the change never got there > you could get an answer to both of those questions by looking at the > transaction log. > > ben > > On Fri, Jun 3, 2011 at 9:59 AM, Jun Rao <[email protected]> wrote: > > I don't expect that we can discover the problem right now. However, what > are > > the things that I can do to collect enough tracing should the problem > occur > > again in the future (e.g., is INFO level logging enough)? > > > > Thanks, > > > > Jun > > > > On Fri, Jun 3, 2011 at 9:56 AM, Jun Rao <[email protected]> wrote: > > > >> The log doesn't have any state changing entries around the time the > watcher > >> is triggered, in all clients. > >> > >> Jun > >> > >> > >> On Fri, Jun 3, 2011 at 9:32 AM, Fournier, Camille F. [Tech] < > >> [email protected]> wrote: > >> > >>> Any state changes for the problem client between setting the watch and > >>> when you expected it to get called? Do you have logs for that client vs > the > >>> others that show anything? > >>> > >>> -----Original Message----- > >>> From: Jun Rao [mailto:[email protected]] > >>> Sent: Friday, June 03, 2011 4:40 AM > >>> To: [email protected] > >>> Subject: Re: lost ZK events across datacenters > >>> > >>> Ben, > >>> > >>> Some details below. > >>> > >>> The call that sets the watcher simple calls getChildren with watcher > flag > >>> set to true. The triggering change is that one of the child nodes > (which > >>> is > >>> ephemeral) is deleted because the creating client is gone. > >>> > >>> Thanks, > >>> > >>> Jun > >>> > >>> On Thu, Jun 2, 2011 at 10:49 AM, Benjamin Reed <[email protected]> > wrote: > >>> > >>> > can you tell us a bit more about the scenario? what was the call the > >>> > set the watch event? and what were the changes that caused the event? > >>> > > >>> > thanx > >>> > ben > >>> > > >>> > On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <[email protected]> wrote: > >>> > > All my clients were on different machines. 2 of them got the > watcher > >>> > fired > >>> > > about the same time. The third one never got the watcher triggered. > >>> > > > >>> > > Thanks, > >>> > > > >>> > > Jun > >>> > > > >>> > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] < > >>> > > [email protected]> wrote: > >>> > > > >>> > >> All clients are in different processes? > >>> > >> I've used zkclient and haven't seen any problems, but I haven't > >>> hammered > >>> > it > >>> > >> too hard yet. I took a long look at the code and didn't see any > >>> errors > >>> > but > >>> > >> there could always be something very subtle. > >>> > >> > >>> > >> -----Original Message----- > >>> > >> From: Jun Rao [mailto:[email protected]] > >>> > >> Sent: Wednesday, June 01, 2011 4:09 PM > >>> > >> To: [email protected] > >>> > >> Subject: Re: lost ZK events across datacenters > >>> > >> > >>> > >> I am using the zkclient package ( > >>> > >> https://github.com/sgroschupf/zkclient.git). > >>> > >> The watcher code seems reasonable. Basically, each watcher event > is > >>> > first > >>> > >> added to a queue. A separate event thread dequeues each event and > >>> reads > >>> > the > >>> > >> children of a path (which re-registers the watcher) and invokes > the > >>> > >> registered listener. > >>> > >> > >>> > >> Anybody knows any issues in zkclient? > >>> > >> > >>> > >> Thanks, > >>> > >> > >>> > >> Jun > >>> > >> > >>> > >> On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning < > [email protected]> > >>> > >> wrote: > >>> > >> > >>> > >> > This is most commonly due, in my own history of programming > errors, > >>> to > >>> > >> > writing code that has a race window in it. It is conceivable > that > >>> > cross > >>> > >> > data-center operation would make such a race more of a problem. > >>> > >> > > >>> > >> > Can you say a bit about your code? Did you make sure to use > >>> standard > >>> > >> > idioms > >>> > >> > as opposed to setting the watch in a different call from reading > >>> the > >>> > >> data? > >>> > >> > > >>> > >> > On Wed, Jun 1, 2011 at 11:40 AM, Jun Rao <[email protected]> > wrote: > >>> > >> > > >>> > >> > > Hi, > >>> > >> > > > >>> > >> > > I have a setup where multiple ZK clients are sitting in a > >>> different > >>> > >> > > datacenter from the ZK server. All clients registered the same > >>> child > >>> > >> > > watcher > >>> > >> > > on a path. However, when the children of the path changed, the > >>> > watcher > >>> > >> on > >>> > >> > 1 > >>> > >> > > of the clients didn't fire. This seems to have happened a > couple > >>> of > >>> > >> times > >>> > >> > > to > >>> > >> > > me. I am using ZK 3.3.3. Has anyone used ZK in a cross > datacenter > >>> > setup > >>> > >> > and > >>> > >> > > seen problems like that before? > >>> > >> > > > >>> > >> > > Thanks, > >>> > >> > > > >>> > >> > > Jun > >>> > >> > > > >>> > >> > > >>> > >> > >>> > > > >>> > > >>> > >> > >> > > >
