Re: Transfering compressed (gzip) files

Roshan Naik Mon, 22 Oct 2012 11:39:39 -0700

Sadu,
   Flume is designed to transfer a continuous stream of events into hadoop.
It appears that in your use case each gzip file is a collection of events
that needs to be moved.  The closest thing that i can see flume supporting
your use case is through the spooling directory source
https://issues.apache.org/jira/browse/FLUME-1425
... which has not yet been released.
-roshan



On Mon, Oct 22, 2012 at 11:14 AM, Sadananda Hegde <[email protected]>wrote:

> Hi Harish,
>
> I am still exploring my options and that's part of my question too - which
> source should I be using.
>
> Currently I have set up my flume ng configuration to use exec source (exec
> source, file channel and hdfs sink); but can change to use a
> different source if it handles the compressed files.
>
> Thanks,
> Sadu
> On Mon, Oct 22, 2012 at 10:27 AM, Harish Mandala 
> <[email protected]>wrote:
>
>> Hi,
>>
>> Which of the flume sources are you trying to use?
>>
>> Regards,
>> Harish
>>
>> On Mon, Oct 22, 2012 at 11:18 AM, Sadananda Hegde <[email protected]>wrote:
>>
>>> My application servers produce data files that are in compressed format
>>> (gzip). I am planning to use flume ng (1.2.0) to collect those files and
>>> transfer them to hadoop cluster (write to HDFS). Is it possible to read and
>>> transfer them without uncomressing first? My sink would be HDFS and there
>>> are options to compress before writing to HDFS. That would work fine if my
>>> source is uncompressed text file and need to store hdfs file in compressed
>>> format. But in my case, the source itself is compressed. What would be the
>>> best options to handle such cases?
>>>
>>> Thanks for your help.
>>>
>>> Sadu
>>>
>>
>>
>

Re: Transfering compressed (gzip) files

Reply via email to