On 12/31/13, 6:37 PM, Arshak Navruzyan wrote:
Here is my route -n

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use
Iface
10.240.0.1      0.0.0.0         255.255.255.255 UH    0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     1002   0        0 eth0
0.0.0.0         10.240.0.1      0.0.0.0         UG    0      0        0 eth0


"slave tserver" is another physical machine (well google compute engine
instance).  Yes one gce instance is running master (and slave) and the
other is running just slave.

here is my config:

masters:
10.240.165.43

slaves:
10.240.165.43
10.240.203.36

BTW when I run bin/check-slaves conf/slaves
# WRITABLE value not configured, not checking partitions
10.240.165.43
10.240.203.36

Is the master supposed to be listed in the slaves files too?

No, your configuration files look correct.

I'm not sure why but for whatever reason, your slave (10.240.203.36) can't talk back to the master (10.240.165.43), but at least that's where you want to look at things. You know that the master can talk to the slave (otherwise the slave tserver would have never started) and that the slave tserver can talk to ZooKeeper (that it had and then lost a lock in ZK). Are you running ZooKeeper on the master (that would further isolate it in debugging this).

It may be worthwhile to double check your /etc/hosts entries just to be safe. Aside from that, I can't think of anything else at the moment.


On Tue, Dec 31, 2013 at 3:32 PM, Josh Elser <[email protected]
<mailto:[email protected]>> wrote:

    Maybe check the output of `route -n` on the master? It might be
    something weird with DNS as well.

    When you say "slave tserver", are you talking about a separate
    physical machine? You have one node running the Accumulo master and
    another running a tserver?


    On 12/31/13, 6:02 PM, Arshak Navruzyan wrote:

        I configured a new instance with a master and a slave tserver.
          When I
        do start-all on the master, the slave doesn't come up.  I am
        wondering
        if it's because I left the instance secret as the default. (I get an
        exception when I try to change that).

        This is what I see in the master's monitor regarding the slave

             Non-Functioning Tablet Servers
             The following tablet servers reported a status other than
        Online

        10.240.203.36:9997 <http://10.240.203.36:9997>
        <http://10.240.203.36:9997>  UNRESPONSIVE



        In the master log I see the following

             2013-12-31 22:56:13,665 [master.Master] ERROR: unable to
        get tablet
             server status 10.240.203.36:9997[__1434a79d34404a2]
             org.apache.thrift.transport.__TTransportException:
        java.net <http://java.net>.__NoRouteToHostException: No route to
        host
             2013-12-31 22:56:13,712 [master.Master] ERROR: unable to
        get tablet
             server status 10.240.203.36:9997[__1434a79d34404a2]
             org.apache.thrift.transport.__TTransportException:
        java.net <http://java.net>.__NoRouteToHostException: No route to
        host
             2013-12-31 22:56:13,802 [balancer.TableLoadBalancer] INFO :
        Loaded
             class
        org.apache.accumulo.server.__master.balancer.__DefaultLoadBalancer
             for table !0
             2013-12-31 22:56:13,803 [master.Master] INFO : Assigning 1
        tablets
             2013-12-31 22:56:13,812 [master.Master] ERROR: Error processing
             table state for store Root Tablet
             org.apache.thrift.transport.__TTransportException:
        java.net <http://java.net>.__NoRouteToHostException: No route to
        host
                      at

        
org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__createNewTransport(__ThriftTransportPool.java:475)
                      at

        
org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__getTransport(__ThriftTransportPool.java:464)
                      at

        
org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__getTransport(__ThriftTransportPool.java:441)
                      at

        
org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__getTransportWithDefaultTimeout__(ThriftTransportPool.java:366)



        In the slave's tserver.log all I see is

             2013-12-31 22:56:34,731 [tabletserver.TabletServer] FATAL: Lost
             tablet server lock (reason = LOCK_DELETED), exiting.


Reply via email to