Hi,
I'm using Flume NG (1.2.0), and having some trouble with timeout exception.
I have 2 servers. A log file is appended at server A. And server B has HBase.
So I installed Flume NG to server A as a source, and to server B as a sink.
flume.conf for server A is:
src_agent.sources = tail
src_agent.channels = memoryChannel
src_agent.sinks = avroSink
src_agent.sources.tail.type = exec
src_agent.sources.tail.command = tail -F /home/www/crash_report/log/crash.log
src_agent.sources.tail.channels = memoryChannel
src_agent.sinks.avroSink.channel = memoryChannel
src_agent.sinks.avroSink.type = avro
src_agent.sinks.avroSink.hostname = serverb.hostname.com
src_agent.sinks.avroSink.port = 10001
src_agent.channels.memoryChannel.type = memory
src_agent.channels.memoryChannel.capacity = 10000
And flume.conf for server B is:
dst_agent.sources = avro
dst_agent.channels = memoryChannel
dst_agent.sinks = hbaseSink
dst_agent.sources.avro.type = avro
dst_agent.sources.avro.bind = 0.0.0.0
dst_agent.sources.avro.port = 10001
dst_agent.sources.avro.channels = memoryChannel
dst_agent.sinks.hbaseSink.channel = memoryChannel
dst_agent.sinks.hbaseSink.type = org.apache.flume.sink.hbase.HBaseSink
dst_agent.sinks.hbaseSink.table = crash_report_flume
dst_agent.sinks.hbaseSink.columnFamily = f
dst_agent.sinks.hbaseSink.column = foo
dst_agent.sinks.hbaseSink.serializer =
org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
dst_agent.sinks.hbaseSink.serializer.payloadColumn = pcol
dst_agent.sinks.hbaseSink.serializer.incrementColumn = icol
dst_agent.channels.memoryChannel.type = memory
dst_agent.channels.memoryChannel.capacity = 10000
Flume startup script for server A is:
$FLUME_HOME/bin/flume-ng agent --conf $FLUME_HOME/conf --conf-file
$FLUME_HOME/conf/flume-hbase-src.conf --name src_agent
-Dflume.root.logger=INFO,console 1>>flume.log 2>>flume.log &
Script for server B:
$FLUME_HOME/bin/flume-ng agent --conf $FLUME_HOME/conf --conf-file
$FLUME_HOME/conf/flume-hbase-dst.conf --name dst_agent
-Dflume.root.logger=INFO,console 1>>flume.log 2>>flume.log &
All worked fine after startup. Logs were collected to HBase as
expected. But after a couple of hours I found something went wrong.
Here's the flume error logs in server A:
2012-10-02 20:28:59,641
(SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)]
Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to send events
at org.apache.flume.sink.AvroSink.process(AvroSink.java:325)
at
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient
{ host: serverb.hostname.com, port: 10001 }: Failed to send batch
at
org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
at org.apache.flume.sink.AvroSink.process(AvroSink.java:309)
... 3 more
Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient
{ host: serverb.hostname.com, port: 10001 }: Handshake timed out after
20000ms
at
org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280)
at
org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
... 4 more
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:258)
at java.util.concurrent.FutureTask.get(FutureTask.java:119)
at
org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278)
... 5 more
There were lots of timeout exceptions, and the following exception
occured sparsely:
2012-10-04 09:44:06,408
(SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)]
Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to send events
at org.apache.flume.sink.AvroSink.process(AvroSink.java:325)
at
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.flume.FlumeException: NettyAvroRpcClient { host:
host: serverb.hostname.com, port: 10001 }: RPC connection error
at
org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:117)
at
org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:93)
at
org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:507)
at
org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:88)
at org.apache.flume.sink.AvroSink.createConnection(AvroSink.java:182)
at org.apache.flume.sink.AvroSink.verifyConnection(AvroSink.java:222)
at org.apache.flume.sink.AvroSink.process(AvroSink.java:282)
... 3 more
Caused by: java.io.IOException: Error connecting to host:
serverb.hostname.com/192.168.1.2:10001
at
org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:249)
at
org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:198)
at
org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:147)
at
org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:106)
... 9 more
Caused by: java.nio.channels.ClosedChannelException
at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$1.operationComplete(NioClientSocketPipelineSink.java:157)
at
org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:381)
at
org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:367)
at
org.jboss.netty.channel.DefaultChannelFuture.setSuccess(DefaultChannelFuture.java:316)
at
org.jboss.netty.channel.AbstractChannel$ChannelCloseFuture.setClosed(AbstractChannel.java:351)
at
org.jboss.netty.channel.AbstractChannel.setClosed(AbstractChannel.java:188)
at
org.jboss.netty.channel.socket.nio.NioSocketChannel.setClosed(NioSocketChannel.java:146)
at
org.jboss.netty.channel.socket.nio.NioWorker.close(NioWorker.java:592)
at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.close(NioClientSocketPipelineSink.java:415)
at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processConnectTimeout(NioClientSocketPipelineSink.java:379)
at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:299)
at
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
... 1 more
I'd like to know what's the root cause for those errors, and how I can
solve them.
FYI here's error logs in server B:
2012-10-02 23:13:41,479
(SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:221)]
Failed to commit transaction.Transaction rolled back.
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
Failed 100 actions: servers with issues: 192.168.1.3:60020,
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1424)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:624)
at
org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:207)
at org.apache.flume.sink.hbase.HBaseSink.process(HBaseSink.java:200)
at
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662)
2012-10-02 23:13:41,480
(SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:228)]
Failed to commit transaction.Transaction rolled back.
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
Failed 100 actions: servers with issues: 192.168.1.3:60020,
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1424)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:624)
at
org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:207)
at org.apache.flume.sink.hbase.HBaseSink.process(HBaseSink.java:200)
at
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662)
2012-10-02 23:13:41,480
(SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)]
Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to commit
transaction.Transaction rolled back.
at
org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:230)
at org.apache.flume.sink.hbase.HBaseSink.process(HBaseSink.java:200)
at
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
Failed 100 actions: servers with issues: 192.168.1.3:60020,
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1424)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:624)
at
org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:207)
... 4 more
--
Thanks,
Soo Philip Kim
Email : [email protected]
Phone : +82-10-7222-6078