You may look into interceptors: http://flume.apache.org/releases/content/1.3.0/apidocs/org/apache/flume/interceptor/Interceptor.html
Regards, Alex On May 1, 2013, at 5:00 PM, Vikas Kanth <[email protected]> wrote: > Hi Jeff, > > Thanks for the reply. Your suggestion worked. > > I've got one more question. The logs generated at the HDFS, using > exec/spooling, are in the following format : > -rw-r--r-- 3 vikas 66 2013-05-01 05:43 /tmp/FlumeData.1367412193479 > -rw-r--r-- 3 vikas 67 2013-05-01 05:43 /tmp/FlumeData.1367412193480 > -rw-r--r-- 3 vikas 61 2013-05-01 05:43 /tmp/FlumeData.1367412193481 > -rw-r--r-- 3 vikas 44 2013-05-01 05:43 /tmp/FlumeData.1367412193482 > > Is there any way where I can copy the files (eg. com/org/test/flm/Sample.txt) > from source to HDFS in the same folder structure/size/name? > If not, what is the alternative? > > Thanks, > Vikas > From: Jeff Lord <[email protected]> > To: [email protected]; Vikas Kanth <[email protected]> > Sent: Wednesday, 1 May 2013 7:05 AM > Subject: Re: Getting "Checking file:conf/flume.conf for changes" message in > loop > > Vikas, > > This message is normal and harmless. > > 2013-04-29 08:26:11,868 (conf-file-poller-0) [DEBUG - > org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] > Checking file:conf/flume.conf for changes > > If you change your log settings to INFO level it will also not show up. > > Regarding the reason for which you do not see the contents of your file in > hdfs. > One thing with the exec source and tail is that the events are buffered until > 20 events have been written to the cache. One way to work around this is to > change the default from 20 -> 1 > > batchSize 20 The max number of lines to read and send to the channel > at a time > > Alternatively there was a recent patch that sets the batchTimeout for exec > source that will let you flush the cache based on elapsed time. That fix is > available on the latest version of trunk. > > -Jeff > > > > On Mon, Apr 29, 2013 at 8:31 AM, Vikas Kanth <[email protected]> wrote: > Hi, > > I am getting following message in loop. The source file hasn't moved to the > destination. > > 2013-04-29 08:24:41,346 (lifecycleSupervisor-1-0) [INFO - > org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:73)] > Component type: CHANNEL, name: Channel-2 started > 2013-04-29 08:24:41,846 (conf-file-poller-0) [INFO - > org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents(DefaultLogicalNodeManager.java:141)] > Starting Sink HDFS > 2013-04-29 08:24:41,847 (conf-file-poller-0) [INFO - > org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents(DefaultLogicalNodeManager.java:152)] > Starting Source tail > 2013-04-29 08:24:41,847 (lifecycleSupervisor-1-3) [INFO - > org.apache.flume.source.ExecSource.start(ExecSource.java:155)] Exec source > starting with command:tail -F /home/vkanth/temp/Sample2.txt > 2013-04-29 08:24:41,850 (lifecycleSupervisor-1-3) [DEBUG - > org.apache.flume.source.ExecSource.start(ExecSource.java:173)] Exec source > started > 2013-04-29 08:24:41,850 (lifecycleSupervisor-1-0) [INFO - > org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:89)] > Monitoried counter group for type: SINK, name: HDFS, registered successfully. > 2013-04-29 08:24:41,851 (lifecycleSupervisor-1-0) [INFO - > org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:73)] > Component type: SINK, name: HDFS started > 2013-04-29 08:24:41,852 (SinkRunner-PollingRunner-DefaultSinkProcessor) > [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:143)] > Polling sink runner starting > 2013-04-29 08:25:11,855 (conf-file-poller-0) [DEBUG - > org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] > Checking file:conf/flume.conf for changes > 2013-04-29 08:25:41,861 (conf-file-poller-0) [DEBUG - > org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] > Checking file:conf/flume.conf for changes > 2013-04-29 08:26:11,868 (conf-file-poller-0) [DEBUG - > org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] > Checking file:conf/flume.conf for changes > ....... > ....... > > > Flume.conf: > agent1.sources = tail > agent1.channels = Channel-2 > agent1.sinks = HDFS > > agent1.sources.tail.type = exec > agent1.sources.tail.command = tail -F /home/vikas/temp/Sample2.txt > agent1.sources.tail.channels = Channel-2 > > agent1.sinks.HDFS.channel = Channel-2 > agent1.sinks.HDFS.type = hdfs > agent1.sinks.HDFS.hdfs.path = hdfs://dev-pub01.xyz.abc.com:8020/tmp > agent1.sinks.HDFS.hdfs.file.fileType = DataStream > > agent1.channels.Channel-2.type = memory > agent1.channels.Channel-2.capacity = 1000 > agent1.channels.Channel-2.transactionCapacity=10 > > Command: > bin/flume-ng agent --conf ./conf/ -f conf/flume.conf > -Dflume.root.logger=DEBUG,console -n agent1 > > Please let me know if I am missing something. > > Thanks, > Vikas > > > > -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF
