I did a hard delete. (I was out of disk space.) I ended up just deleting the whole channel directory and starting fresh.
I am running a very recent version, so I don't think I'd be affected by the file removal bug... And obviously my files were still in use, for reasons I don't understand yet. -- Jeremy On Thu, Jul 18, 2013 at 11:09 AM, Hari Shreedharan < [email protected]> wrote: > Flume's deletion strategy is quite conservative. We do wait for 2 > checkpoints after all data was removed from a file before the files are > deleted. In this case, it does look like the data was actually still > referenced. We had a bug sometime back that caused files to not be deleted > - but that was fixed quite a while back. > > > Hari > > > Thanks, > Hari > > On Thursday, July 18, 2013 at 10:56 AM, Camp, Roy wrote: > > We have noticed a few times that cleanup did not happen properly but a > restart generally forced a cleanup. **** > > ** ** > > I would recommend putting the data files back unless you did a hard > delete. Alternatively, make sure you remove (backup first) the checkpoint > files if you delete the data files. That should put flume back to a fresh > state. **** > > ** ** > > Roy**** > > ** ** > > ** ** > > ** ** > > *From:* Jeremy Karlson > [mailto:[email protected]<[email protected]>] > > *Sent:* Thursday, July 18, 2013 10:42 AM > *To:* [email protected] > *Subject:* Re: Flume Data Directory Cleanup**** > > ** ** > > Thank you for your suggestion. I took a careful look at that, and I'm not > sure it describes my situation. That refers to the sink, while my problem > is with the channel. I'm looking at a dramatic accumulation of log / meta > files within the channel data directory. > > Additionally, I did try doing a manual cleanup of the channel directory, > deleting the oldest log / meta files. (This was my experiment.) Flume > really did not like that. If it is required in the channel as well, the > cutoff point at which the files go from being used to unused is not clear > to me.**** > > ** ** > > -- Jeremy**** > > ** ** > > On Thu, Jul 18, 2013 at 10:13 AM, Lenin Raj <[email protected]> wrote:* > *** > > Hi Jeremy, > > Regarding cleanup, it was discussed already once. > > > http://mail-archives.apache.org/mod_mbox/flume-user/201306.mbox/%[email protected]%3E > **** > > You have to do it manually.**** > > > **** > > > Thanks, > Lenin**** > > ** ** > > On Thu, Jul 18, 2013 at 10:36 PM, Jeremy Karlson <[email protected]> > wrote:**** > > To follow up:**** > > ** ** > > My Flume agent ran out of disk space last night and appeared to stop > processing. I shut it down and as an experiment (it's a test machine, why > not?) I deleted the oldest 10 data files, to see if Flume actually needed > these when it restarted.**** > > ** ** > > Flume was not happy with my choices.**** > > ** ** > > It spit out a lot of this:**** > > ** ** > > 2013-07-18 00:00:00,013 ERROR [pool-40-thread-1] o.a.f.s.AvroSource > Avro source mySource: Unable to process event batch. Exception follows. > java.lang.IllegalStateException: Channel closed [channel=myFileChannel]. > Due to java.lang.NullPointerException: null**** > > at > org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:353) > **** > > at > org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122) > **** > > ...**** > > Caused by: java.lang.NullPointerException**** > > at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:895) > **** > > at org.apache.flume.channel.file.Log.replay(Log.java:406)**** > > at > org.apache.flume.channel.file.FileChannel.start(FileChannel.java:303)**** > > ...**** > > ** ** > > So it seems like these files were actually in use, and not just leftover > cruft. A worthwhile thing to know, but I'd like to understand why. My > events are probably at most 1k of text, so it seems kind of odd to me that > they'd consume more than 50GB of disk space in the channel.**** > > ** ** > > -- Jeremy**** > > ** ** > > ** ** > > On Wed, Jul 17, 2013 at 3:24 PM, Jeremy Karlson <[email protected]> > wrote:**** > > Hi All,**** > > ** ** > > I have a very busy channel that has about 100,000 events queued up. My > data directory has about 50 data files, each about 1.6 GB. I don't believe > my 100k events could be consuming that much space, so I'm jumping to > conclusions and assuming that most of these files are old and due for > cleanup (but I suppose it's possible). I'm not finding much guidance in > the user guide on how often these files are cleaned up / removed / > compacted / etc.**** > > ** ** > > Any thoughts on what's going on here, or what settings I should look for? > Thanks.**** > > ** ** > > -- Jeremy**** > > ** ** > > ** ** > > ** ** > > >
