If anyone wants to look at my live environment please let me know (your gmail) and I will add you to the Google Compute Engine. Thanks!
On Wed, Jan 1, 2014 at 7:58 AM, Arshak Navruzyan <[email protected]> wrote: > Sean > > Thanks for looking into the log files. > > These are two Google compute engine instance under the same project so > there shouldn't be any firewall between them. > > For the brief moment that the slave runs during startup, I can nc into > port 9997 from the master to the slave. But after it crashes, I can't. > Seems like somehow the problem is on the slave. > > Arshak > On Dec 31, 2013 11:58 PM, "Sean Busbey" <[email protected]> > wrote: > >> Well, I can tell you the proximal cause. the tserver log shows that it >> starts normally, then exits because it's told to (via the zookeeper lock >> being removed). >> >> If you look at the master debug logs, this happens because the master >> fails in three attempts to talk to the tserver, all with the same error: >> >> 2014-01-01 06:17:20,231 [master.Master] ERROR: unable to get tablet >> server status 10.240.203.36:9997[1434c70ed30001b] >> org.apache.thrift.transport.TTransportException: >> java.net.NoRouteToHostException: No route to host >> >> Unfortunately, this is the same error you noticed in your first email. >> After 3 of those, the master deletes the zk lock so that the tserver will >> shutdown. >> >> Could there be another firewall blocking access to port 9997 on the >> worker machine from the master machine? >> >> Check from the master (you'll need netcat): >> >> $ nc -z 10.240.203.36 9997 >> $ echo $? >> >> >> >> >> >> On Wed, Jan 1, 2014 at 12:33 AM, Arshak Navruzyan <[email protected]>wrote: >> >>> I am probably missing something really basic so I posted both the master >>> and the slave log files: >>> >>> https://www.dropbox.com/sh/liv1mzuohyiv6uu/X5kx7AZJ6i >>> >>> Thanks again to everyone for the help! >>> >>> >>> On Tue, Dec 31, 2013 at 10:20 PM, Arshak Navruzyan <[email protected]>wrote: >>> >>>> disabled selinux (iptables already off) on both master and slave but >>>> didn't make a difference unfortunately. >>>> >>>> >>>> >>>> On Tue, Dec 31, 2013 at 9:25 PM, Kurt Christensen <[email protected]>wrote: >>>> >>>>> >>>>> SELINUX disabled? IPTABLES configured? I have nothing else. >>>>> >>>>> Kurt >>>>> >>>>> ------ >>>>> >>>>> >>>>> On 12/31/13 6:02 PM, Arshak Navruzyan wrote: >>>>> >>>>>> I configured a new instance with a master and a slave tserver. When >>>>>> I do start-all on the master, the slave doesn't come up. I am wondering >>>>>> if >>>>>> it's because I left the instance secret as the default. (I get an >>>>>> exception >>>>>> when I try to change that). >>>>>> >>>>>> This is what I see in the master's monitor regarding the slave >>>>>> >>>>>> Non-Functioning Tablet Servers >>>>>> The following tablet servers reported a status other than Online >>>>>> >>>>>> 10.240.203.36:9997 <http://10.240.203.36:9997> UNRESPONSIVE >>>>>> >>>>>> >>>>>> >>>>>> In the master log I see the following >>>>>> >>>>>> 2013-12-31 22:56:13,665 [master.Master] ERROR: unable to get >>>>>> tablet server status 10.240.203.36:9997[1434a79d34404a2] >>>>>> org.apache.thrift.transport.TTransportException: >>>>>> java.net.NoRouteToHostException: No route to host >>>>>> 2013-12-31 22:56:13,712 [master.Master] ERROR: unable to get >>>>>> tablet server status 10.240.203.36:9997[1434a79d34404a2] >>>>>> org.apache.thrift.transport.TTransportException: >>>>>> java.net.NoRouteToHostException: No route to host >>>>>> 2013-12-31 22:56:13,802 [balancer.TableLoadBalancer] INFO : Loaded >>>>>> class >>>>>> org.apache.accumulo.server.master.balancer.DefaultLoadBalancer >>>>>> for >>>>>> table !0 >>>>>> 2013-12-31 22:56:13,803 [master.Master] INFO : Assigning 1 tablets >>>>>> 2013-12-31 22:56:13,812 [master.Master] ERROR: Error processing >>>>>> table state for store Root Tablet >>>>>> org.apache.thrift.transport.TTransportException: >>>>>> java.net.NoRouteToHostException: No route to host >>>>>> at >>>>>> org.apache.accumulo.core.client.impl.ThriftTransportPool. >>>>>> createNewTransport(ThriftTransportPool.java:475) >>>>>> at >>>>>> org.apache.accumulo.core.client.impl.ThriftTransportPool. >>>>>> getTransport(ThriftTransportPool.java:464) >>>>>> at >>>>>> org.apache.accumulo.core.client.impl.ThriftTransportPool. >>>>>> getTransport(ThriftTransportPool.java:441) >>>>>> at >>>>>> org.apache.accumulo.core.client.impl.ThriftTransportPool. >>>>>> getTransportWithDefaultTimeout(ThriftTransportPool.java:366) >>>>>> >>>>>> >>>>>> >>>>>> In the slave's tserver.log all I see is >>>>>> >>>>>> 2013-12-31 22:56:34,731 [tabletserver.TabletServer] FATAL: Lost >>>>>> tablet server lock (reason = LOCK_DELETED), exiting. >>>>>> >>>>>> >>>>> -- >>>>> >>>>> Kurt Christensen >>>>> P.O. Box 811 >>>>> Westminster, MD 21158-0811 >>>>> >>>>> ------------------------------------------------------------ >>>>> ------------ >>>>> If you can't explain it simply, you don't understand it well enough. >>>>> -- Albert Einstein >>>>> >>>> >>>> >>> >> >> >> -- >> Sean >> >
