Hi there,

we are currently trying to use flume, to stream log files from a Host to the 
HDFS of a cloudera cluster.

Since we want to have a reliable system, we've setup flume with a failover 
grouping, so that if one of the sinks is failing the other one will take over.
Each sink is configured to connect to a separate namenode.

This is how our config looks like:

# Defining a sinkgroup for failover
agent.sinkgroups = groupOne
agent.sinkgroups.groupOne.sinks = hdfsSink1 hdfsSink2
agent.sinkgroups.groupOne.processor.type = failover
agent.sinkgroups.groupOne.processor.priority.hdfsSink1 = 10
agent.sinkgroups.groupOne.processor.priority.hdfsSink2 = 5

agent.sources = tailSrc
agent.channels = memoryChannel
agent.sinks = hdfsSink1 hdfsSink2

# For each one of the sources, the type is defined
agent.sources.tailSrc.type = exec
agent.sources.tailSrc.command = tail -F /var/log/events.log
agent.sources.tailSrc.channels = memoryChannel

# Definition of first sink
agent.sinks.hdfsSink1.type = hdfs
agent.sinks.hdfsSink1.hdfs.useLocalTimeStamp = True
agent.sinks.hdfsSink1.hdfs.path = hdfs://host1.com:8020/events/%y-%m-%d/%H
agent.sinks.hdfsSink1.hdfs.filePrefix = %M-events
#Specify the channel the sink should use
agent.sinks.hdfsSink1.channel = memoryChannel

# Each sink's type must be defined
agent.sinks.hdfsSink2.type = hdfs
agent.sinks.hdfsSink2.hdfs.useLocalTimeStamp = True
agent.sinks.hdfsSink2.hdfs.path = hdfs://host2.com:8020/events/%y-%m-%d/%H
agent.sinks.hdfsSink2.hdfs.filePrefix = %M-events
#Specify the channel the sink should use
agent.sinks.hdfsSink2.channel = memoryChannel


# Each channel's type is defined.
agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 1000

In the first run we used Flume 1.4.0 and tested the switch to the backup sink 
by doing a manual failover of the namenode. This didn't work at all, flume got 
stuck in exceptions.
A bit of research pointed out that there is known bug, 
(https://issues.apache.org/jira/browse/FLUME-1779) And we patched Flume 1.5.0 
on our own. At least the failover is working now.

Nevertheless flume keeps on throwing exceptions for the sink which has been 
disconnected. Has anyone any idea how to tackle this issue?

Failed to renew lease for [DFSClient_NONMAPREDUCE_-1223354028_45] for 3598 
seconds.  Will retry shortly ...
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
Operation category WRITE is not supported in state standby

Best Regards,

Malte

Reply via email to