Sure -- you have my address already.
Also, nc not working while the tabletserver is dead makes sense (that
process is what's listening on that port). Once the process dies,
there's nothing else listening.
On 1/1/2014 1:31 PM, Arshak Navruzyan wrote:
If anyone wants to look at my live environment please let me know (your
gmail) and I will add you to the Google Compute Engine. Thanks!
On Wed, Jan 1, 2014 at 7:58 AM, Arshak Navruzyan <[email protected]
<mailto:[email protected]>> wrote:
Sean
Thanks for looking into the log files.
These are two Google compute engine instance under the same project
so there shouldn't be any firewall between them.
For the brief moment that the slave runs during startup, I can nc
into port 9997 from the master to the slave. But after it crashes,
I can't. Seems like somehow the problem is on the slave.
Arshak
On Dec 31, 2013 11:58 PM, "Sean Busbey" <[email protected]
<mailto:busbey%[email protected]>> wrote:
Well, I can tell you the proximal cause. the tserver log shows
that it starts normally, then exits because it's told to (via
the zookeeper lock being removed).
If you look at the master debug logs, this happens because the
master fails in three attempts to talk to the tserver, all with
the same error:
2014-01-01 06:17:20,231 [master.Master] ERROR: unable to get
tablet server status 10.240.203.36:9997[1434c70ed30001b]
org.apache.thrift.transport.TTransportException:
java.net.NoRouteToHostException: No route to host
Unfortunately, this is the same error you noticed in your first
email. After 3 of those, the master deletes the zk lock so that
the tserver will shutdown.
Could there be another firewall blocking access to port 9997 on
the worker machine from the master machine?
Check from the master (you'll need netcat):
$ nc -z 10.240.203.36 9997
$ echo $?
On Wed, Jan 1, 2014 at 12:33 AM, Arshak Navruzyan
<[email protected] <mailto:[email protected]>> wrote:
I am probably missing something really basic so I posted
both the master and the slave log files:
https://www.dropbox.com/sh/liv1mzuohyiv6uu/X5kx7AZJ6i
Thanks again to everyone for the help!
On Tue, Dec 31, 2013 at 10:20 PM, Arshak Navruzyan
<[email protected] <mailto:[email protected]>> wrote:
disabled selinux (iptables already off) on both master
and slave but didn't make a difference unfortunately.
On Tue, Dec 31, 2013 at 9:25 PM, Kurt Christensen
<[email protected] <mailto:[email protected]>> wrote:
SELINUX disabled? IPTABLES configured? I have
nothing else.
Kurt
------
On 12/31/13 6:02 PM, Arshak Navruzyan wrote:
I configured a new instance with a master and a
slave tserver. When I do start-all on the
master, the slave doesn't come up. I am
wondering if it's because I left the instance
secret as the default. (I get an exception when
I try to change that).
This is what I see in the master's monitor
regarding the slave
Non-Functioning Tablet Servers
The following tablet servers reported a
status other than Online
10.240.203.36:9997 <http://10.240.203.36:9997>
<http://10.240.203.36:9997> UNRESPONSIVE
In the master log I see the following
2013-12-31 22:56:13,665 [master.Master]
ERROR: unable to get
tablet server status
10.240.203.36:9997[__1434a79d34404a2]
org.apache.thrift.transport.__TTransportException:
java.net
<http://java.net>.__NoRouteToHostException: No
route to host
2013-12-31 22:56:13,712 [master.Master]
ERROR: unable to get
tablet server status
10.240.203.36:9997[__1434a79d34404a2]
org.apache.thrift.transport.__TTransportException:
java.net
<http://java.net>.__NoRouteToHostException: No
route to host
2013-12-31 22:56:13,802
[balancer.TableLoadBalancer] INFO : Loaded
class
org.apache.accumulo.server.__master.balancer.__DefaultLoadBalancer
for
table !0
2013-12-31 22:56:13,803 [master.Master]
INFO : Assigning 1 tablets
2013-12-31 22:56:13,812 [master.Master]
ERROR: Error processing
table state for store Root Tablet
org.apache.thrift.transport.__TTransportException:
java.net
<http://java.net>.__NoRouteToHostException: No
route to host
at
org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__createNewTransport(__ThriftTransportPool.java:475)
at
org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__getTransport(__ThriftTransportPool.java:464)
at
org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__getTransport(__ThriftTransportPool.java:441)
at
org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__getTransportWithDefaultTimeout__(ThriftTransportPool.java:366)
In the slave's tserver.log all I see is
2013-12-31 22:56:34,731
[tabletserver.TabletServer] FATAL: Lost
tablet server lock (reason = LOCK_DELETED),
exiting.
--
Kurt Christensen
P.O. Box 811
Westminster, MD 21158-0811
------------------------------__------------------------------__------------
If you can't explain it simply, you don't understand
it well enough. -- Albert Einstein
--
Sean