Sadu, Flume is designed to transfer a continuous stream of events into hadoop. It appears that in your use case each gzip file is a collection of events that needs to be moved. The closest thing that i can see flume supporting your use case is through the spooling directory source https://issues.apache.org/jira/browse/FLUME-1425 ... which has not yet been released. -roshan
On Mon, Oct 22, 2012 at 11:14 AM, Sadananda Hegde <[email protected]>wrote: > Hi Harish, > > I am still exploring my options and that's part of my question too - which > source should I be using. > > Currently I have set up my flume ng configuration to use exec source (exec > source, file channel and hdfs sink); but can change to use a > different source if it handles the compressed files. > > Thanks, > Sadu > On Mon, Oct 22, 2012 at 10:27 AM, Harish Mandala > <[email protected]>wrote: > >> Hi, >> >> Which of the flume sources are you trying to use? >> >> Regards, >> Harish >> >> On Mon, Oct 22, 2012 at 11:18 AM, Sadananda Hegde <[email protected]>wrote: >> >>> My application servers produce data files that are in compressed format >>> (gzip). I am planning to use flume ng (1.2.0) to collect those files and >>> transfer them to hadoop cluster (write to HDFS). Is it possible to read and >>> transfer them without uncomressing first? My sink would be HDFS and there >>> are options to compress before writing to HDFS. That would work fine if my >>> source is uncompressed text file and need to store hdfs file in compressed >>> format. But in my case, the source itself is compressed. What would be the >>> best options to handle such cases? >>> >>> Thanks for your help. >>> >>> Sadu >>> >> >> >
