Does anyone have comment on using time (such as day/hour) as part of the file 
name? When it crosses the boundary of the defined time period, Flume creates a 
new file. What is the expected way of handling the old file (it does not meet 
any of the roll over condition yet)? I would expect Flume to flush data out to 
disk, close that file and remove the .tmp suffix. Am I right? It does not 
behave in this manner right now.

Regards,

Yongcheng

From: Gumnaam Sur [mailto:[email protected]]
Sent: Tuesday, July 31, 2012 2:04 PM
To: [email protected]
Subject: Re: Flume 1.2.0 HDFS Sink Output File Question

Is there a documented way of shutting down flume ?
I just do kill -s TERM <pid> , and I do see flume shutting down normally.
But not all HDFS sink files are closed at times, even with a proper shutdown.
e.g. I was testing a setup with 5 HDFS sinks, and only the last one defined in 
the conf file was
being renamed to remove '.tmp' the other four still had '.tmp' extension.
On Tue, Jul 31, 2012 at 1:52 PM, Denny Ye 
<[email protected]<mailto:[email protected]>> wrote:
hi Yongcheng,
    Flume doesn't recheck the destination in last Agent lifecycle. The last 
temporary file is not be reused in current process. Possible reason of this 
case might be : 1. Did that temporary file was closed normally? If not, Flume 
should close that file with appropriate way like 'recoverLease' interface.  2. 
Does that file name can be reuse in latest path pattern?

    No matter which case, we hope that there is unified activity in path 
pattern. Just like your mention, I agree with you. Need some other guys to 
discuss may be.

-Regards
Denny Ye

2012/7/31 Yongcheng Li <[email protected]<mailto:[email protected]>>
Hi,

I am using Flume 1.2.0 HDFS sink. When Flume crashes (being killed), a file 
name with a suffix of .tmp is generated. I believe it contains the data that 
were flushed into disk when the crash happens. But why does it have a .tmp 
suffix? Shouldn’t Flume just write it into a regular file (without .tmp)?

I am using month/day/hour as part of my HDFS file name (%m_%d_%H). When the 
hour passes, it still has a file like 07_31_09.events.1343742385766.tmp with a 
size of zero. Shouldn’t Flume just close that file and remove the .tmp suffix? 
When I kill Flume, I can see data written into this file but still with a .tmp 
suffix.

Thanks!

Yongcheng


Reply via email to