Re: hdfs.idleTimeout ,what's it used for ?

Juhani Connolly Thu, 17 Jan 2013 22:37:45 -0800

@Mohit:
When flume dies unexpectedly the .tmp file remains. When it restartsthere is some logic in HDFS sink to recover it(and continue writingfrom there). I'm not actually sure of the specifics. You may want totry and just kill -9 a running flume process on a test machine andthen start it up, look at the logs and see what happens with the output.
Does it also work when there is a long delay before flume getsstarted? We are bucketing by the hr so if start occurs in the nexthour but flume actually died in previous hr and had .tmp then does itstill cleanup on restart

I'm not sure. I think your best bet here is to simulate this on a testserver. Start flume, after a bit kill 9 the process, wait until thebucket becomes invalid, and restart.

My gut feeling is that it will recover if you have events with thetimestamp belonging to that bucket still incoming (in your persistentchannelor read in after recovery). If that path doesn't get touchedagain though, it will probably remain as a .tmp file? *This could beblatantly wrong, so I suggest you test it*

Re: hdfs.idleTimeout ,what's it used for ?

Reply via email to