Yes, but there's a reason the master was expecting the recovery nodes to exist, and I don't think that reason has been uncovered.
On Tue, Jun 12, 2012 at 10:46 PM, David Medinets <[email protected]> wrote: > This code does not avoid the recovery entries, it just checks that the > entries exist before looping over them. > > On Tue, Jun 12, 2012 at 10:42 PM, William Slacum <[email protected]> wrote: >> Does this just address your symptom? I'd be concerned that there was a >> recovery issue that put the Accumulo instance in this state and with >> the change in effect nobody would know about it. >> >> On Tue, Jun 12, 2012 at 10:25 PM, David Medinets >> <[email protected]> wrote: >>> I am greping source left and right but am not sure what to make of >>> this error. Here is the code from Master.java: >>> >>> ZooReaderWriter.getInstance().getChildren(zroot + >>> Constants.ZRECOVERY, new Watcher() { >>> @Override >>> public void process(WatchedEvent event) { >>> nextEvent.event("Noticed recovery changes", event.getType()); >>> } >>> }); >>> >>> I suggest replacing the above code with this: >>> >>> final String recoveryPath = zroot + Constants.ZRECOVERY; >>> Stat stat = >>> ZooReaderWriter.getInstance().getZooKeeper().exists(recoveryPath, >>> null); >>> if (stat != null && stat.getNumChildren() > 0) { >>> ZooReaderWriter.getInstance().getChildren(recoveryPath, new Watcher() { >>> @Override >>> public void process(WatchedEvent event) { >>> nextEvent.event("Noticed recovery changes", event.getType()); >>> } >>> }); >>> } >>> >>> I have changed my local Accumulo and this change seems to be Ok. >>> However, since this is a change to Accumulo itself, I would like >>> someone to code review before I commit this change. Does this change >>> make sense? >>> >>> On Mon, Jun 11, 2012 at 9:54 PM, David Medinets >>> <[email protected]> wrote: >>>> I am slowly working my way through whatever went wrong on my system. >>>> This is the latest. I've deleted the logs and started the master by >>>> hand: >>>> >>>> accumulo org.apache.accumulo.server.master.state.SetGoalState NORMAL >>>> start-server.sh localhost master >>>> >>>> Then checked the log files where I saw this message: >>>> >>>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode >>>> = NoNode for /accumulo/b519799c-3a51-4c9b-af21-96d577e2c11f/recovery >>>> at >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:111) >>>> at >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >>>> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1448) >>>> at >>>> org.apache.accumulo.core.zookeeper.ZooReader.getChildren(ZooReader.java:62) >>>> at org.apache.accumulo.server.master.Master.run(Master.java:2071) >>>> at org.apache.accumulo.server.master.Master.main(Master.java:2173) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:601) >>>> >>>> I've run out of time for debugging today. I'll dig into the source >>>> code more tomorrow ... until someone can point me in the right >>>> direction to resolve this?
