Re: Transfering compressed (gzip) files

Harish Mandala Mon, 22 Oct 2012 13:18:15 -0700

Yes, I second that advice.

Regards,
Harish


On Mon, Oct 22, 2012 at 2:39 PM, Roshan Naik <[email protected]> wrote:

> Sadu,
>    Flume is designed to transfer a continuous stream of events into
> hadoop. It appears that in your use case each gzip file is a collection of
> events that needs to be moved.  The closest thing that i can see flume
> supporting your use case is through the spooling directory source
> https://issues.apache.org/jira/browse/FLUME-1425
> ... which has not yet been released.
> -roshan
>
>
> On Mon, Oct 22, 2012 at 11:14 AM, Sadananda Hegde <[email protected]>wrote:
>
>> Hi Harish,
>>
>> I am still exploring my options and that's part of my question too -
>> which source should I be using.
>>
>> Currently I have set up my flume ng configuration to use exec source
>> (exec source, file channel and hdfs sink); but can change to use a
>> different source if it handles the compressed files.
>>
>> Thanks,
>> Sadu
>> On Mon, Oct 22, 2012 at 10:27 AM, Harish Mandala <[email protected]
>> > wrote:
>>
>>> Hi,
>>>
>>> Which of the flume sources are you trying to use?
>>>
>>> Regards,
>>> Harish
>>>
>>> On Mon, Oct 22, 2012 at 11:18 AM, Sadananda Hegde 
>>> <[email protected]>wrote:
>>>
>>>> My application servers produce data files that are in compressed format
>>>> (gzip). I am planning to use flume ng (1.2.0) to collect those files and
>>>> transfer them to hadoop cluster (write to HDFS). Is it possible to read and
>>>> transfer them without uncomressing first? My sink would be HDFS and there
>>>> are options to compress before writing to HDFS. That would work fine if my
>>>> source is uncompressed text file and need to store hdfs file in compressed
>>>> format. But in my case, the source itself is compressed. What would be the
>>>> best options to handle such cases?
>>>>
>>>> Thanks for your help.
>>>>
>>>> Sadu
>>>>
>>>
>>>
>>
>

Re: Transfering compressed (gzip) files

Reply via email to