What does the other /etc/hosts file look like? On Dec 31, 2013 9:14 PM, "Arshak Navruzyan" <[email protected]> wrote:
> Josh, > > Yea Zookeeper is running on the master and I can connect to it using zkCli > from the slave. > > /etc/hosts looks fine > > 127.0.0.1 localhost localhost.localdomain localhost4 > localhost4.localdomain4 > ::1 localhost localhost.localdomain localhost6 > localhost6.localdomain6 > 10.240.203.36 shoki.c.accumulo-test.internal shoki # Added by Google > > Hmm, completely baffled! > > Arshak > > > On Tue, Dec 31, 2013 at 6:35 PM, Josh Elser <[email protected]> wrote: > >> On 12/31/13, 6:37 PM, Arshak Navruzyan wrote: >> >>> Here is my route -n >>> >>> Kernel IP routing table >>> Destination Gateway Genmask Flags Metric Ref Use >>> Iface >>> 10.240.0.1 0.0.0.0 255.255.255.255 UH 0 0 0 >>> eth0 >>> 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 >>> eth0 >>> 0.0.0.0 10.240.0.1 0.0.0.0 UG 0 0 0 >>> eth0 >>> >>> >>> "slave tserver" is another physical machine (well google compute engine >>> instance). Yes one gce instance is running master (and slave) and the >>> other is running just slave. >>> >>> here is my config: >>> >>> masters: >>> 10.240.165.43 >>> >>> slaves: >>> 10.240.165.43 >>> 10.240.203.36 >>> >>> BTW when I run bin/check-slaves conf/slaves >>> # WRITABLE value not configured, not checking partitions >>> 10.240.165.43 >>> 10.240.203.36 >>> >>> Is the master supposed to be listed in the slaves files too? >>> >> >> No, your configuration files look correct. >> >> I'm not sure why but for whatever reason, your slave (10.240.203.36) >> can't talk back to the master (10.240.165.43), but at least that's where >> you want to look at things. You know that the master can talk to the slave >> (otherwise the slave tserver would have never started) and that the slave >> tserver can talk to ZooKeeper (that it had and then lost a lock in ZK). Are >> you running ZooKeeper on the master (that would further isolate it in >> debugging this). >> >> It may be worthwhile to double check your /etc/hosts entries just to be >> safe. Aside from that, I can't think of anything else at the moment. >> >> >>> On Tue, Dec 31, 2013 at 3:32 PM, Josh Elser <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Maybe check the output of `route -n` on the master? It might be >>> something weird with DNS as well. >>> >>> When you say "slave tserver", are you talking about a separate >>> physical machine? You have one node running the Accumulo master and >>> another running a tserver? >>> >>> >>> On 12/31/13, 6:02 PM, Arshak Navruzyan wrote: >>> >>> I configured a new instance with a master and a slave tserver. >>> When I >>> do start-all on the master, the slave doesn't come up. I am >>> wondering >>> if it's because I left the instance secret as the default. (I >>> get an >>> exception when I try to change that). >>> >>> This is what I see in the master's monitor regarding the slave >>> >>> Non-Functioning Tablet Servers >>> The following tablet servers reported a status other than >>> Online >>> >>> 10.240.203.36:9997 <http://10.240.203.36:9997> >>> <http://10.240.203.36:9997> UNRESPONSIVE >>> >>> >>> >>> In the master log I see the following >>> >>> 2013-12-31 22:56:13,665 [master.Master] ERROR: unable to >>> get tablet >>> server status 10.240.203.36:9997[__1434a79d34404a2] >>> org.apache.thrift.transport.__TTransportException: >>> java.net <http://java.net>.__NoRouteToHostException: No route to >>> >>> host >>> 2013-12-31 22:56:13,712 [master.Master] ERROR: unable to >>> get tablet >>> server status 10.240.203.36:9997[__1434a79d34404a2] >>> org.apache.thrift.transport.__TTransportException: >>> java.net <http://java.net>.__NoRouteToHostException: No route to >>> >>> host >>> 2013-12-31 22:56:13,802 [balancer.TableLoadBalancer] INFO : >>> Loaded >>> class >>> org.apache.accumulo.server.__master.balancer.__ >>> DefaultLoadBalancer >>> >>> for table !0 >>> 2013-12-31 22:56:13,803 [master.Master] INFO : Assigning 1 >>> tablets >>> 2013-12-31 22:56:13,812 [master.Master] ERROR: Error >>> processing >>> table state for store Root Tablet >>> org.apache.thrift.transport.__TTransportException: >>> java.net <http://java.net>.__NoRouteToHostException: No route to >>> host >>> at >>> >>> org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__ >>> createNewTransport(__ThriftTransportPool.java:475) >>> at >>> >>> org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__ >>> getTransport(__ThriftTransportPool.java:464) >>> at >>> >>> org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__ >>> getTransport(__ThriftTransportPool.java:441) >>> at >>> >>> org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__ >>> getTransportWithDefaultTimeout__(ThriftTransportPool.java:366) >>> >>> >>> >>> >>> In the slave's tserver.log all I see is >>> >>> 2013-12-31 22:56:34,731 [tabletserver.TabletServer] FATAL: >>> Lost >>> tablet server lock (reason = LOCK_DELETED), exiting. >>> >>> >>> >
