We have noticed a few times that cleanup did not happen properly but a restart 
generally forced a cleanup.

I would recommend putting the data files back unless you did a hard delete.  
Alternatively, make sure you remove (backup first) the checkpoint files if you 
delete the data files.  That should put flume back to a fresh state.

Roy



From: Jeremy Karlson [mailto:[email protected]]
Sent: Thursday, July 18, 2013 10:42 AM
To: [email protected]
Subject: Re: Flume Data Directory Cleanup

Thank you for your suggestion.  I took a careful look at that, and I'm not sure 
it describes my situation.  That refers to the sink, while my problem is with 
the channel.  I'm looking at a dramatic accumulation of log / meta files within 
the channel data directory.

Additionally, I did try doing a manual cleanup of the channel directory, 
deleting the oldest log / meta files.  (This was my experiment.)  Flume really 
did not like that.  If it is required in the channel as well, the cutoff point 
at which the files go from being used to unused is not clear to me.

-- Jeremy

On Thu, Jul 18, 2013 at 10:13 AM, Lenin Raj 
<[email protected]<mailto:[email protected]>> wrote:
Hi Jeremy,

Regarding cleanup, it was discussed already once.

http://mail-archives.apache.org/mod_mbox/flume-user/201306.mbox/%[email protected]%3E
You have to do it manually.


Thanks,
Lenin

On Thu, Jul 18, 2013 at 10:36 PM, Jeremy Karlson 
<[email protected]<mailto:[email protected]>> wrote:
To follow up:

My Flume agent ran out of disk space last night and appeared to stop 
processing.  I shut it down and as an experiment (it's a test machine, why 
not?) I deleted the oldest 10 data files, to see if Flume actually needed these 
when it restarted.

Flume was not happy with my choices.

It spit out a lot of this:

2013-07-18 00:00:00,013 ERROR [pool-40-thread-1]        o.a.f.s.AvroSource Avro 
source mySource: Unable to process event batch. Exception follows. 
java.lang.IllegalStateException: Channel closed [channel=myFileChannel]. Due to 
java.lang.NullPointerException: null
        at 
org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:353)
        at 
org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122)
        ...
Caused by: java.lang.NullPointerException
        at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:895)
        at org.apache.flume.channel.file.Log.replay(Log.java:406)
        at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:303)
        ...

So it seems like these files were actually in use, and not just leftover cruft. 
 A worthwhile thing to know, but I'd like to understand why.  My events are 
probably at most 1k of text, so it seems kind of odd to me that they'd consume 
more than 50GB of disk space in the channel.

-- Jeremy


On Wed, Jul 17, 2013 at 3:24 PM, Jeremy Karlson 
<[email protected]<mailto:[email protected]>> wrote:
Hi All,

I have a very busy channel that has about 100,000 events queued up.  My data 
directory has about 50 data files, each about 1.6 GB.  I don't believe my 100k 
events could be consuming that much space, so I'm jumping to conclusions and 
assuming that most of these files are old and due for cleanup (but I suppose 
it's possible).  I'm not finding much guidance in the user guide on how often 
these files are cleaned up / removed / compacted / etc.

Any thoughts on what's going on here, or what settings I should look for?  
Thanks.

-- Jeremy



Reply via email to