It is capped. You can verify this by using the stress source and a null sink. You'll see the disk usage increase to the maximum allowed and then plateau.
From: Zhiwen Sun <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Wed, 20 Mar 2013 02:20:53 -0700 To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: Why used space of flie channel buffer directory increase? Thanks for your reply. I just wanna confirm whether the space of file channel has a limit. Zhiwen Sun On Wed, Mar 20, 2013 at 4:06 PM, Hari Shreedharan <[email protected]<mailto:[email protected]>> wrote: If you reduce the capacity the channel will be able to buffer fewer events. If you want to reduce the space used when there are only a few events remaining set the config param: "maxFileSize" to something lower(this is in bytes). I don't advice setting this to lower than a few hundred megabytes (in fact, the default value works pretty well - do you really need to save 3GB space?)- else you will end up having a huge number of small files if there are many events wait to be taken from the channel. Hari On Wed, Mar 20, 2013 at 12:50 AM, Zhiwen Sun <[email protected]<mailto:[email protected]>> wrote: Hi Hari: Is that means I can reduce the capacity of file channel to cut down max disk space used by file channel? Zhiwen Sun On Wed, Mar 20, 2013 at 3:23 PM, Hari Shreedharan <[email protected]<mailto:[email protected]>> wrote: Hi, Like I mentioned earlier, we will always keep 2 data files in each data directory (the ".meta" files are metadata associated to the actual data). Once a log-8 is created(when log-7 gets rotated when it hits maximum size) and all of the events in log-6 are taken, then log-6 will get deleted, but you will still will see log-7 and log-8. So what you are seeing is not unexpected. Hari -- Hari Shreedharan On Tuesday, March 19, 2013 at 6:30 PM, Zhiwen Sun wrote: Thanks all for your reply. @Kenison I stop my tail -F | nc program and there is no new event file in HDFS, so I think there is no event arrive. To make sure, I will test again with enable JMX. @Alex The latest log is following. I can't see any exception or warning. 13/03/19 15:28:16 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901.tmp<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901.tmp> to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Start checkpoint for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to sync = 3 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1363659953997, queueSize: 0, queueHead: 362981 13/03/19 15:28:17 INFO file.LogFileV3: Updating log-7.meta currentPosition = 216278208, logWriteOrderID = 1363659953997 13/03/19 15:28:17 INFO file.Log: Updated checkpoint for file: /home/zhiwensun/.flume/file-channel/data/log-7 position: 216278208 logWriteOrderID: 1363659953997 13/03/19 15:28:26 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp> to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902> 13/03/19 15:28:27 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp> to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Start checkpoint for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to sync = 2 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1363659954200, queueSize: 0, queueHead: 362981 13/03/19 15:28:47 INFO file.LogFileV3: Updating log-7.meta currentPosition = 216288815, logWriteOrderID = 1363659954200 13/03/19 15:28:47 INFO file.Log: Updated checkpoint for file: /home/zhiwensun/.flume/file-channel/data/log-7 position: 216288815 logWriteOrderID: 1363659954200 13/03/19 15:28:48 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp> to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904> @Hari em, 12 hours passed. The size of file channel directory has no reduce. Files in file channel directory: -rw-r--r-- 1 zhiwensun zhiwensun 0 2013-03-19 09:15 in_use.lock -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 log-6 -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 10:12 log-6.meta -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 log-7 -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 15:28 log-7.meta -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 ./file-channel/data/log-7 -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 10:12 ./file-channel/data/log-6.meta -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 15:28 ./file-channel/data/log-7.meta -rw-r--r-- 1 zhiwensun zhiwensun 0 2013-03-19 09:15 ./file-channel/data/in_use.lock -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 ./file-channel/data/log-6 Zhiwen Sun On Wed, Mar 20, 2013 at 2:32 AM, Hari Shreedharan <[email protected]<mailto:[email protected]>> wrote: It is possible for the directory size to increase even if no writes are going in to the channel. If the channel size is non-zero and the sink is still writing events to HDFS, the takes get written to disk as well (so we know what events in the files were removed when the channel/agent restarts). Eventually the channel will clean up the files which have all events taken (though it will keep at least 2 files per data directory, just to be safe). -- Hari Shreedharan On Tuesday, March 19, 2013 at 10:32 AM, Alexander Alten-Lorenz wrote: Hey, what says debug? Do you can gather logs and attach them? - Alex On Mar 19, 2013, at 5:27 PM, "Kenison, Matt" <[email protected]<mailto:[email protected]>> wrote: Check the JMX counter first, to make sure you really are not sending new events. If not, is it your checkpoint directory or data directory that is increasing in size? From: Zhiwen Sun <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Tue, 19 Mar 2013 01:19:19 -0700 To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Why used space of flie channel buffer directory increase? hi all: I test flume-ng in my local machine. The data flow is : tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs My configuration file is here : a1.sources = r1 a1.channels = c2 a1.sources.r1.type = netcat a1.sources.r1.bind = 192.168.201.197 a1.sources.r1.port = 44444 a1.sources.r1.max-line-length = 1000000 a1.sinks.k1.type = logger a1.channels.c1.type = memory a1.channels.c1.capacity = 10000 a1.channels.c1.transactionCapacity = 10000 a1.channels.c2.type = file a1.sources.r1.channels = c2 a1.sources.r1.interceptors = i1 a1.sources.r1.interceptors.i1.type = timestamp a1.sinks = k2 a1.sinks.k2.type = hdfs a1.sinks.k2.channel = c2 a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d<http://127.0.0.1:9000/flume/events/%Y-%m-%d> a1.sinks.k2.hdfs.writeFormat = Text a1.sinks.k2.hdfs.rollInterval = 10 a1.sinks.k2.hdfs.rollSize = 10000000 a1.sinks.k2.hdfs.rollCount = 0 a1.sinks.k2.hdfs.filePrefix = app a1.sinks.k2.hdfs.fileType = DataStream it seems that events were collected correctly. But there is a problem boring me: Used space of file channel (~/.flume) has always increased, even there is no new event. Is my configuration wrong or other problem? thanks. Best regards. Zhiwen Sun -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF
