Yes, I second that advice. Regards, Harish
On Mon, Oct 22, 2012 at 2:39 PM, Roshan Naik <[email protected]> wrote: > Sadu, > Flume is designed to transfer a continuous stream of events into > hadoop. It appears that in your use case each gzip file is a collection of > events that needs to be moved. The closest thing that i can see flume > supporting your use case is through the spooling directory source > https://issues.apache.org/jira/browse/FLUME-1425 > ... which has not yet been released. > -roshan > > > On Mon, Oct 22, 2012 at 11:14 AM, Sadananda Hegde <[email protected]>wrote: > >> Hi Harish, >> >> I am still exploring my options and that's part of my question too - >> which source should I be using. >> >> Currently I have set up my flume ng configuration to use exec source >> (exec source, file channel and hdfs sink); but can change to use a >> different source if it handles the compressed files. >> >> Thanks, >> Sadu >> On Mon, Oct 22, 2012 at 10:27 AM, Harish Mandala <[email protected] >> > wrote: >> >>> Hi, >>> >>> Which of the flume sources are you trying to use? >>> >>> Regards, >>> Harish >>> >>> On Mon, Oct 22, 2012 at 11:18 AM, Sadananda Hegde >>> <[email protected]>wrote: >>> >>>> My application servers produce data files that are in compressed format >>>> (gzip). I am planning to use flume ng (1.2.0) to collect those files and >>>> transfer them to hadoop cluster (write to HDFS). Is it possible to read and >>>> transfer them without uncomressing first? My sink would be HDFS and there >>>> are options to compress before writing to HDFS. That would work fine if my >>>> source is uncompressed text file and need to store hdfs file in compressed >>>> format. But in my case, the source itself is compressed. What would be the >>>> best options to handle such cases? >>>> >>>> Thanks for your help. >>>> >>>> Sadu >>>> >>> >>> >> >
