Hello All, I'm running into a issue when trying to load app servers request logs in hadoop.
I've a flume agent running with following config. I get the consolidated file in one directory but it get rotated i.e. one file everyday. My following config is not working because it's hard coded file name in it. agent.sources = apache agent.sources.apache.type = exec agent.sources.apache.command = cat /archive/request.log.2013_06_07 Q: how could I use so that it could get the rotated file? Currently for loading the next file I've to kill the agent and collector both, change the config file with hard coded file name. Start collectory and agent both to load the file. Q: Collector is loading the file into hdfs as .tmp extention. Untill I kill the collector it dont' rotate the file to normal name. Following is my config: ## Write to HDFS # http://flume.apache.org/FlumeUserGuide.html#hdfs-sink collector.sinks.HadoopOut.type = hdfs collector.sinks.HadoopOut.channel = mc2 collector.sinks.HadoopOut.hdfs.path = /user/flume/events/%{log_type}/%{host}/%y-%m-%d collector.sinks.HadoopOut.hdfs.fileType = DataStream collector.sinks.HadoopOut.hdfs.writeFormat = Text collector.sinks.HadoopOut.hdfs.rollSize = 0 collector.sinks.HadoopOut.hdfs.rollCount = 0 collector.sinks.HadoopOut.hdfs.rollInterval = 0 If I play with the last above three parms then it create lot of small files and it become a challange to use them in Hive to see data. I wanted one file for per day request logs. I really appreciate your assistance and time. Thanks, -- Sanjeev Sagar *"**Separate yourself from everything that separates you from others !" - Nirankari Baba Hardev Singh ji * **
