[jira] [Updated] (FLUME-2886) Optional Channels can cause OOMs

2016-02-22 Thread Hari Shreedharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Shreedharan updated FLUME-2886:

Attachment: FLUME-2886.patch

> Optional Channels can cause OOMs
> 
>
> Key: FLUME-2886
> URL: https://issues.apache.org/jira/browse/FLUME-2886
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2886.patch
>
>
> If an optional channel is full, the queue backing the executor that is 
> asynchronously submitting the events to the channel can grow indefinitely in 
> size leading to a huge number of events on the heap and causing OOMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2886) Optional Channels can cause OOMs

2016-02-22 Thread Hari Shreedharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Shreedharan updated FLUME-2886:

Attachment: (was: FLUME-2886.patch)

> Optional Channels can cause OOMs
> 
>
> Key: FLUME-2886
> URL: https://issues.apache.org/jira/browse/FLUME-2886
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2886.patch
>
>
> If an optional channel is full, the queue backing the executor that is 
> asynchronously submitting the events to the channel can grow indefinitely in 
> size leading to a huge number of events on the heap and causing OOMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2886) Optional Channels can cause OOMs

2016-02-22 Thread Hari Shreedharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Shreedharan updated FLUME-2886:

Attachment: FLUME-2886.patch

This fixes the issue and adds a unit test

> Optional Channels can cause OOMs
> 
>
> Key: FLUME-2886
> URL: https://issues.apache.org/jira/browse/FLUME-2886
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2886.patch
>
>
> If an optional channel is full, the queue backing the executor that is 
> asynchronously submitting the events to the channel can grow indefinitely in 
> size leading to a huge number of events on the heap and causing OOMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLUME-2886) Optional Channels can cause OOMs

2016-02-22 Thread Hari Shreedharan (JIRA)
Hari Shreedharan created FLUME-2886:
---

 Summary: Optional Channels can cause OOMs
 Key: FLUME-2886
 URL: https://issues.apache.org/jira/browse/FLUME-2886
 Project: Flume
  Issue Type: Bug
Reporter: Hari Shreedharan
Assignee: Hari Shreedharan


If an optional channel is full, the queue backing the executor that is 
asynchronously submitting the events to the channel can grow indefinitely in 
size leading to a huge number of events on the heap and causing OOMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Support for Hive Mutation Streaming

2016-02-22 Thread Roshan Naik
I won't say its of no use, I am sure someone will figure out a reasonable
use case for it with Flume. But there appears to be some impedance
mismatch.


Flume being a streaming event movement product is basically continuously
inserting data into hive,hdfs,etc.
The Hive streaming APIs serve that purpose and were even designed with
Flume kind of products in mind.

As per the Hcat  Mutation API docs, they seem to be designed around use
cases involving:
  - Where you are trying to modify existing data
  - keeping a replica in sync with updates on a master copy
  - infrequently apply large sets of mutations to a data set in an atomic
fashion

As opposed to Streaming APIs which :
  - Focuses on surfacing a continuous stream of new data into a Hive table
and does so by batching small sets of writes into multiple short-lived
transactions


The notion of a stream of relatively smaller batches (short lived
transactions) fits nicely with flume's transactions. In contrast the
'infrequently apply large sets' use case does not seem to fit Flume very
well. That model seems to fit Sqoop better.


Do you have thoughts on some good Flume use cases that would require the
Mutation APIs over Streaming APIs ?

-roshan



On 2/20/16, 5:20 AM, "Amit Jain"  wrote:

>Hi Roshan,
>
>Could you please help me learn why Hive Mutation Streaming APIs would not
>be good value addition to Flume?
>
>
>--
>Thanks,
>Amit
>
>On Sat, Feb 20, 2016 at 1:55 AM, Roshan Naik 
>wrote:
>
>> For the Flume kind of streaming ingest, the Hive Streaming APIs should
>>be
>> more appropriate Š which is already supported.
>> -roshan
>>
>>
>> On 2/19/16, 5:30 AM, "Amit Jain"  wrote:
>>
>> >Hi All,
>> >
>> >We have use case where we want to make use of hive mutation streaming
>> >feature. Do we have support for this in upcoming releases.
>> >
>> >
>> 
>>https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutat
>>i
>> >on+API
>> >
>> >--
>> >Thanks,
>> >Amit
>>
>>