On 12/31/13, 6:37 PM, Arshak Navruzyan wrote:
Here is my route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use
Iface
10.240.0.1 0.0.0.0 255.255.255.255 UH 0 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 eth0
0.0.0.0 10.240.0.1 0.0.0.0 UG 0 0 0 eth0
"slave tserver" is another physical machine (well google compute engine
instance). Yes one gce instance is running master (and slave) and the
other is running just slave.
here is my config:
masters:
10.240.165.43
slaves:
10.240.165.43
10.240.203.36
BTW when I run bin/check-slaves conf/slaves
# WRITABLE value not configured, not checking partitions
10.240.165.43
10.240.203.36
Is the master supposed to be listed in the slaves files too?
No, your configuration files look correct.
I'm not sure why but for whatever reason, your slave (10.240.203.36)
can't talk back to the master (10.240.165.43), but at least that's where
you want to look at things. You know that the master can talk to the
slave (otherwise the slave tserver would have never started) and that
the slave tserver can talk to ZooKeeper (that it had and then lost a
lock in ZK). Are you running ZooKeeper on the master (that would further
isolate it in debugging this).
It may be worthwhile to double check your /etc/hosts entries just to be
safe. Aside from that, I can't think of anything else at the moment.
On Tue, Dec 31, 2013 at 3:32 PM, Josh Elser <[email protected]
<mailto:[email protected]>> wrote:
Maybe check the output of `route -n` on the master? It might be
something weird with DNS as well.
When you say "slave tserver", are you talking about a separate
physical machine? You have one node running the Accumulo master and
another running a tserver?
On 12/31/13, 6:02 PM, Arshak Navruzyan wrote:
I configured a new instance with a master and a slave tserver.
When I
do start-all on the master, the slave doesn't come up. I am
wondering
if it's because I left the instance secret as the default. (I get an
exception when I try to change that).
This is what I see in the master's monitor regarding the slave
Non-Functioning Tablet Servers
The following tablet servers reported a status other than
Online
10.240.203.36:9997 <http://10.240.203.36:9997>
<http://10.240.203.36:9997> UNRESPONSIVE
In the master log I see the following
2013-12-31 22:56:13,665 [master.Master] ERROR: unable to
get tablet
server status 10.240.203.36:9997[__1434a79d34404a2]
org.apache.thrift.transport.__TTransportException:
java.net <http://java.net>.__NoRouteToHostException: No route to
host
2013-12-31 22:56:13,712 [master.Master] ERROR: unable to
get tablet
server status 10.240.203.36:9997[__1434a79d34404a2]
org.apache.thrift.transport.__TTransportException:
java.net <http://java.net>.__NoRouteToHostException: No route to
host
2013-12-31 22:56:13,802 [balancer.TableLoadBalancer] INFO :
Loaded
class
org.apache.accumulo.server.__master.balancer.__DefaultLoadBalancer
for table !0
2013-12-31 22:56:13,803 [master.Master] INFO : Assigning 1
tablets
2013-12-31 22:56:13,812 [master.Master] ERROR: Error processing
table state for store Root Tablet
org.apache.thrift.transport.__TTransportException:
java.net <http://java.net>.__NoRouteToHostException: No route to
host
at
org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__createNewTransport(__ThriftTransportPool.java:475)
at
org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__getTransport(__ThriftTransportPool.java:464)
at
org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__getTransport(__ThriftTransportPool.java:441)
at
org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__getTransportWithDefaultTimeout__(ThriftTransportPool.java:366)
In the slave's tserver.log all I see is
2013-12-31 22:56:34,731 [tabletserver.TabletServer] FATAL: Lost
tablet server lock (reason = LOCK_DELETED), exiting.