Thank you for the quick response.

Where should I post the logs for you to access them? I’m relatively new to 
mailing lists, so I’m not sure what standard procedure for that is.

I’m running 3 ZooKeeper servers. IPtables is off for all nodes. The netcat test 
was done both locally and remotely for each zookeeper node. This cluster is 
being virtualized on Xen, the old one was on KVM.

Thanks again,

Alex

From: Sean Busbey [mailto:[email protected]]
Sent: Wednesday, March 05, 2014 12:28 PM
To: Accumulo User List
Subject: Re: Tablet server stuck waiting for lock

Hi Alex!

Can you post your logs somewhere?

How many zookeeper servers are you running?

Is iptables enabled?

was your netcat test run local to the zookeeper server or on a remote server?

What virtualization platform is this running on top of?

-Sean

On Wed, Mar 5, 2014 at 11:17 AM, Alex Lee 
<[email protected]<mailto:[email protected]>> wrote:
Hello,

I’m trying to create a virtualized Accumulo 1.4.4 cluster with 4 tablet servers 
using Hadoop 0.20.2 and ZooKeeper 3.3.5. It didn’t seem to be working correctly 
with 4 tablet servers, so I first tried just running with one tablet server, 
which seemed to work fine. When I tried to run it with just 2 tablet servers, I 
ran into some issues.

Just to preface, I double checked configs within zookeeper and accumulo, and 
everything matches. All hostnames are resolving correctly, and passwordless SSH 
for the accumulo user is also functional between all nodes. Running “echo stat 
| nc <zk-server> <zk port>” responds appropriately.

Here’s the first error log for the Tablet Master:

2014-03-05 11:18:16,626 [master.Master] ERROR: Error processing table state for 
store Root Tablet
org.apache.thrift.transport.TTransportException: java.io.IOException: 
Connection reset by peer
        at 
org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161)
        at 
org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:158)
        at 
org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.flush(ThriftTransportPool.java:299)
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.send_loadTablet(TabletClientService.java:653)
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.loadTablet(TabletClientService.java:640)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at 
org.apache.accumulo.cloudtrace.instrument.thrift.TraceWrap$2.invoke(TraceWrap.java:84)
        at com.sun.proxy.$Proxy4.loadTablet(Unknown Source)
        at 
org.apache.accumulo.server.master.LiveTServerSet$TServerConnection.assignTablet(LiveTServerSet.java:86)
        at 
org.apache.accumulo.server.master.Master$TabletGroupWatcher.flushChanges(Master.java:1818)
        at 
org.apache.accumulo.server.master.Master$TabletGroupWatcher.run(Master.java:1426)
Caused by: java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(Unknown Source)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
        at sun.nio.ch.IOUtil.write(Unknown Source)
        at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
        at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
        at java.io.BufferedOutputStream.flush(Unknown Source)
        at 
org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:159)
        ... 13 more

Here are the error logs for Tablet Server #1:

2014-03-05 11:17:15,152 [tabletserver.TabletServer] INFO : Tablet server 
starting on 172.16.111.3
2014-03-05 11:17:15,187 [util.FileSystemMonitor] INFO : Filesystem monitor 
started
2014-03-05 11:17:15,194 [tabletserver.NativeMap] INFO : Loaded native map 
shared library 
/opt/accumulo/accumulo/lib/native/map/libNativeMap-Linux-amd64-64.so
2014-03-05 11:17:15,499 [tabletserver.TabletServer] INFO : port = 9997
2014-03-05 11:17:15,540 [tabletserver.TabletServer] INFO : Waiting for tablet 
server lock
2014-03-05 11:17:16,633 [tabletserver.TabletServer] WARN : Got loadTablet 
message from master before lock acquired, ignoring...
2014-03-05 11:17:16,634 [server.TNonblockingServer] ERROR: Unexpected exception 
while invoking!
java.lang.RuntimeException: Lock not acquired
        at 
org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler.checkPermission(TabletServer.java:1782)
        at 
org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler.loadTablet(TabletServer.java:1814)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at 
org.apache.accumulo.cloudtrace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:59)
        at com.sun.proxy.$Proxy1.loadTablet(Unknown Source)
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$loadTablet.process(TabletClientService.java:2510)
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor.process(TabletClientService.java:2053)
        at 
org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:154)
        at 
org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:631)
        at 
org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:202)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at 
org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Unknown Source)
2014-03-05 11:17:20,564 [tabletserver.TabletServer] INFO : Waiting for tablet 
server lock
2014-03-05 11:17:25,589 [tabletserver.TabletServer] INFO : Waiting for tablet 
server lock

(continues until too many retries, then exits)

Tablet Server #2’s logs get as far as this (below), and then just stop.

2014-03-05 11:17:14,112 [tabletserver.TabletServer] INFO : Tablet server 
starting on 172.16.111.3
2014-03-05 11:17:14,149 [util.FileSystemMonitor] INFO : Filesystem monitor 
started
2014-03-05 11:17:14,157 [tabletserver.NativeMap] INFO : Loaded native map 
shared library 
/opt/accumulo/accumulo/lib/native/map/libNativeMap-Linux-amd64-64.so
2014-03-05 11:17:14,481 [tabletserver.TabletServer] INFO : port = 9997

Also, the master logs interestingly never make any calls to Tablet #2’s IP 
address.

Any thoughts? We have another cluster that is setup identically in just about 
every way (besides hostnames), but it has never experienced any of these 
issues. My research shows that these issues can exist within 1.4.3, which we 
were using at first, but we switched to 1.4.4 because these types of issues 
were supposed to be resolved. Any help would be greatly appreciated.

Thanks,

Alex Lee

Reply via email to