Flume is not suited for file transfers as such. With that, please see my
comments below:

- support for variable transaction size that could be set by the source or
> interceptor
>

The transactions are already variable sized. The only configuration that
applies on top is the maximum size of a transaction. How is this different
from what you are proposing?



>  - SpoolDir to support creation of one transaction per file
>

If the file is large, you would run out of heap space quickly. Also, how do
you recover from intermittent failures?


>  - File and Memory channels to support spawning a process on transaction
> successful commit. Such process can be a bash script, but that would be
> implemented in plug-able class


You may be better off using something like an Oozie action to trigger a job
when the dataset is complete.

Regards,
Arvind







On Sun, Dec 7, 2014 at 12:55 PM, Ahmed Vila <[email protected]> wrote:

> Hi group,
>
> Manohar's requirements sound valid. Guess there are other cases such
> "completion notification" could come in handy.
>
> Thus, I would propose these distinct features that would make this
> possible via configuration:
>  - support for variable transaction size that could be set by the source
> or interceptor
>  - SpoolDir to support creation of one transaction per file
>  - File and Memory channels to support spawning a process on transaction
> successful commit. Such process can be a bash script, but that would be
> implemented in plug-able class
>
> The one thing I'm not sure about until I look at the code, if HDFSSink
> will write flush cache to the HDFS once it encounters no more events in a
> transaction.
>
> What do you guys think ?
>
>
> On Sat, Dec 6, 2014 at 7:31 AM, Manohar CS <[email protected]>
> wrote:
>
>>  Thanks Hari for your response.
>>
>>
>>  My requirement goes like this -
>>
>>
>>  1) There are bunch of files coming in at regular intervals (hourly or
>> daily) in my spoolDir
>>
>> 2) I wan tthem to be moved into HDFS via HDFS sink using reg-ex like
>> /target/%Y-%M%D so each day file gets into different destination HDFS
>>
>> 3) Now once this flume completes copying files , I want to kick off my MR
>> job.
>>
>>
>>  Thanks,
>>
>> Manohar
>>  ------------------------------
>> *From:* Hari Shreedharan <[email protected]>
>> *Sent:* Saturday, December 6, 2014 7:16 AM
>> *To:* [email protected]
>> *Cc:* [email protected]
>> *Subject:* Re: Notification support from flume?
>>
>>  Looking at .COMPLETED is not an indication that the data has been
>> written out to HDFS. As of now, unfortunately there is no way to tag an
>> event as coming from a specific file. I can’t think of a way to do this in
>> a fool-proof way off the top of my mind. What is your use-case, there might
>> be another way to do the same thing?
>>
>> Thanks,
>> Hari
>>
>>
>>  On Fri, Dec 5, 2014 at 4:19 AM, Manohar CS <[email protected]>
>> wrote:
>>
>>>  Hi All,
>>>
>>>
>>>
>>> I wanted to know if there is a way of notification mechanism or some way
>>> of finding out if flume has finished transfer of certain file from spoolDir
>>> to HDFS sink? We know by looking at .COMPLETED files in spoolDir we can
>>> assume its completed but wanted to know if there is more reliable way of
>>> call back mechanism ?
>>>
>>>
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Manohar.
>>>
>>>
>>>
>>>
>>> Please consider the environment before printing this e-mail
>>>
>>>
>>> Disclaimer: This  communication  is  for the exclusive use of the intended 
>>> recipient(s) and  shall  not attach any liability on the originator or ITC 
>>> Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group 
>>> Companies. If you are the addressee, the contents of this e-mail are 
>>> intended for your use only and it shall  not be forwarded to any third 
>>> party, without first obtaining written authorization from the originator or 
>>> ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group 
>>> Companies. It may contain information which is confidential and legally 
>>> privileged and the same shall not be used or dealt with  by any  third  
>>> party  in  any manner whatsoever without the specific consent  of  ITC  
>>> Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group 
>>> Companies.
>>
>>
>>
>>
>>
>> Please consider the environment before printing this e-mail
>>
>>
>> Disclaimer: This  communication  is  for the exclusive use of the intended 
>> recipient(s) and  shall  not attach any liability on the originator or ITC 
>> Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group 
>> Companies. If you are the addressee, the contents of this e-mail are 
>> intended for your use only and it shall  not be forwarded to any third 
>> party, without first obtaining written authorization from the originator or 
>> ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group 
>> Companies. It may contain information which is confidential and legally 
>> privileged and the same shall not be used or dealt with  by any  third  
>> party  in  any manner whatsoever without the specific consent  of  ITC  
>> Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group 
>> Companies.
>>
>
>
>
> --
>
> Best regards,
> Ahmed Vila | Senior software developer
> DevLogic | Sarajevo | Bosnia and Herzegovina
>
> Office : +387 33 942 123
> Mobile: +387 62 139 348
>
> Website: www.devlogic.eu
> E-mail   : [email protected]
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>

Reply via email to