For now, increase the file channel’s write-timeout parameter to around 30 or so 
(basically file channel is timing out while writing to disk). But the basic 
problem you are seeing is that your EBS instance is very slow and IO is taking 
too long. You either need to increase your EBS IO capacity, or reduce the rate 
or writes.


Thanks,
Hari


On Thursday, February 27, 2014 at 10:28 AM, Mangtani, Kushal wrote:

>   
>   
> From: Mangtani, Kushal  
> Sent: Wednesday, February 26, 2014 4:51 PM
> To: '[email protected] (mailto:[email protected])'; 
> '[email protected] (mailto:[email protected])'
> Cc: Rangnekar, Rohit; '[email protected] (mailto:[email protected])'
> Subject: File Channel Exception "Failed to obtain lock for writing to the 
> log.Try increasing the log write timeout value"  
>   
> Hi,
>   
> I'm using Flume-Ng 1.4 cdh4.4 Tarball for collecting aggregated logs.
> I am running a 2 tier(agent,collector) Flume Configuration with custom 
> plugins. There are approximately 20 agents (receiving data) and 6 collector 
> flume (writing to HDFS) machines all running independenly. However, I have 
> been facing some File Channel Exceptions on the collector side. The agent 
> appears to be working fine.
>   
>  
>  Error  stacktrace:
>                              org.apache.flume.ChannelException: Failed to 
> obtain lock for writing to the log. Try increasing the log write timeout 
> value. [channel=c2]
>                              at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621)
>                              at 
> org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
>                              at 
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:421)
>                              at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>                              at 
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>                              …..
>                              And I keep on getting the same error
>   
>                              P.S :This same exception is repated in most of 
> the flume collector machines.But, not at the same duration. There is usually 
> a difference of a couple of hours or more.
>   
>  
>  
> 1.  HDFS sinks are written in  the Amazon EC2 cloud instance.
>  
> 2. datadir and checkpoint dir of file channel in all flume collector 
> instances are mounted to a separate hadoop ebs drive .This makes sure that 
> two separate collectors do not overlap their log and checkpoint dir. There is 
> a symbolic link i.e /usr/lib/flume-ng/datasource à /hadoop/ebs/mnt-1
>  
> 3. The Flume works fine for a couple of days and all the agent,collector are 
> initialized properly without exceptions.
>  
>   
> Questions:
> 1.       Exception “Failed to obtain lock for writing to the log. Try 
> increasing the log write timeout value . [channel=c2]” . According to the 
> documentation, such an exception occurs only if two processes are acceesing 
> the same file/directory. However, each channel is configured separately so No 
> two channels should access the same dir. Hence, this exception does not 
> indicates anything. Please correct me, if im wrong.  
> 2.       Also, HDFS.CallTimeout – indicates calling HDFS for open,write 
> operations. If no response within a duration, it timeouts. And , if its 
> timeouts; it closes the File. Please correct me, if im wrong.  Also, if there 
> is a way to specify the number of retries before it closes the file?
>   
> Your inputs/suggestions will be thoroughly appreciated.  
>   
>   
> Regards
> Kushal Mangtani
> Software Engineer
>   
>  
>  
>  
>  
>  


Reply via email to