You may look into interceptors:
http://flume.apache.org/releases/content/1.3.0/apidocs/org/apache/flume/interceptor/Interceptor.html

Regards,
 Alex

On May 1, 2013, at 5:00 PM, Vikas Kanth <[email protected]> wrote:

> Hi Jeff,
> 
> Thanks for the reply. Your suggestion worked.
> 
> I've got one more question. The logs generated at the HDFS, using 
> exec/spooling,  are in the following format :
> -rw-r--r--   3 vikas         66 2013-05-01 05:43 /tmp/FlumeData.1367412193479
> -rw-r--r--   3 vikas         67 2013-05-01 05:43 /tmp/FlumeData.1367412193480
> -rw-r--r--   3 vikas         61 2013-05-01 05:43 /tmp/FlumeData.1367412193481
> -rw-r--r--   3 vikas         44 2013-05-01 05:43 /tmp/FlumeData.1367412193482
> 
> Is there any way where I can copy the files (eg. com/org/test/flm/Sample.txt) 
> from source to HDFS in the same folder structure/size/name?
> If not, what is the alternative?
> 
> Thanks,
> Vikas
> From: Jeff Lord <[email protected]>
> To: [email protected]; Vikas Kanth <[email protected]> 
> Sent: Wednesday, 1 May 2013 7:05 AM
> Subject: Re: Getting "Checking file:conf/flume.conf for changes" message in 
> loop
> 
> Vikas,
> 
> This message is normal and harmless.
> 
> 2013-04-29 08:26:11,868 (conf-file-poller-0) [DEBUG - 
> org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)]
>  Checking file:conf/flume.conf for changes
> 
> If you change your log settings to INFO level it will also not show up.
> 
> Regarding the reason for which you do not see the contents of your file in 
> hdfs.
> One thing with the exec source and tail is that the events are buffered until 
> 20 events have been written to the cache. One way to work around this is to 
> change the default from 20 -> 1
> 
> batchSize     20      The max number of lines to read and send to the channel 
> at a time
> 
> Alternatively there was a recent patch that sets the batchTimeout for exec 
> source that will let you flush the cache based on elapsed time. That fix is 
> available on the latest version of trunk.
> 
> -Jeff
> 
> 
> 
> On Mon, Apr 29, 2013 at 8:31 AM, Vikas Kanth <[email protected]> wrote:
> Hi,
> 
> I am getting following message in loop. The source file hasn't moved to the 
> destination.
> 
> 2013-04-29 08:24:41,346 (lifecycleSupervisor-1-0) [INFO - 
> org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:73)]
>  Component type: CHANNEL, name: Channel-2 started
> 2013-04-29 08:24:41,846 (conf-file-poller-0) [INFO - 
> org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents(DefaultLogicalNodeManager.java:141)]
>  Starting Sink HDFS
> 2013-04-29 08:24:41,847 (conf-file-poller-0) [INFO - 
> org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents(DefaultLogicalNodeManager.java:152)]
>  Starting Source tail
> 2013-04-29 08:24:41,847 (lifecycleSupervisor-1-3) [INFO - 
> org.apache.flume.source.ExecSource.start(ExecSource.java:155)] Exec source 
> starting with command:tail -F /home/vkanth/temp/Sample2.txt
> 2013-04-29 08:24:41,850 (lifecycleSupervisor-1-3) [DEBUG - 
> org.apache.flume.source.ExecSource.start(ExecSource.java:173)] Exec source 
> started
> 2013-04-29 08:24:41,850 (lifecycleSupervisor-1-0) [INFO - 
> org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:89)]
>  Monitoried counter group for type: SINK, name: HDFS, registered successfully.
> 2013-04-29 08:24:41,851 (lifecycleSupervisor-1-0) [INFO - 
> org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:73)]
>  Component type: SINK, name: HDFS started
> 2013-04-29 08:24:41,852 (SinkRunner-PollingRunner-DefaultSinkProcessor) 
> [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:143)] 
> Polling sink runner starting
> 2013-04-29 08:25:11,855 (conf-file-poller-0) [DEBUG - 
> org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)]
>  Checking file:conf/flume.conf for changes
> 2013-04-29 08:25:41,861 (conf-file-poller-0) [DEBUG - 
> org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)]
>  Checking file:conf/flume.conf for changes
> 2013-04-29 08:26:11,868 (conf-file-poller-0) [DEBUG - 
> org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)]
>  Checking file:conf/flume.conf for changes
> .......
> .......
> 
> 
> Flume.conf:
> agent1.sources = tail
> agent1.channels = Channel-2
> agent1.sinks = HDFS
> 
> agent1.sources.tail.type = exec
> agent1.sources.tail.command = tail -F /home/vikas/temp/Sample2.txt
> agent1.sources.tail.channels = Channel-2
> 
> agent1.sinks.HDFS.channel = Channel-2
> agent1.sinks.HDFS.type = hdfs
> agent1.sinks.HDFS.hdfs.path = hdfs://dev-pub01.xyz.abc.com:8020/tmp
> agent1.sinks.HDFS.hdfs.file.fileType = DataStream
> 
> agent1.channels.Channel-2.type = memory
> agent1.channels.Channel-2.capacity = 1000
> agent1.channels.Channel-2.transactionCapacity=10
> 
> Command:
> bin/flume-ng agent --conf ./conf/ -f conf/flume.conf 
> -Dflume.root.logger=DEBUG,console -n agent1
> 
> Please let me know if I am missing something.
> 
> Thanks,
> Vikas
> 
> 
> 
> 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Reply via email to