[jira] [Commented] (FLINK-7223) Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector

2017-09-01 Thread Bowen Li (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150154#comment-16150154
 ] 

Bowen Li commented on FLINK-7223:
-

[~StephanEwen] We've already filed such a request to AWS through our company. 
Well, it's not easy and very practical to ask Amazon make such internal changes 
- I guess such changes will require AWS to re-design a lot of stuff. Just look 
at how slow they are responding to a simple KinesisProducer issues on 
https://github.com/awslabs/amazon-kinesis-producer/issues ...

 

> Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector
> 
>
> Key: FLINK-7223
> URL: https://issues.apache.org/jira/browse/FLINK-7223
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.3.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0
>
>
> Background: {{DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS}} in 
> {{org.apache.flink.streaming.connectors.kinesis.config.ConsumerConfigConstants}}
>  is the default value for Flink to call Kinesis's {{describeStream()}} API.
> Problem: Right now, its value is 10,000millis (10sec), which is too short. We 
> ran into problems that Flink-kinesis-connector's call of {{describeStream()}} 
> exceeds Kinesis rate limit, and broken Flink taskmanager.
> According to 
> http://docs.aws.amazon.com/kinesis/latest/APIReference/API_DescribeStream.html,
>  
> "This operation has a limit of 10 transactions per second per account.". What 
> it means is that the 10transaction/account is a limit on a single 
> organization's AWS account..:(  We contacted AWS Support, and confirmed 
> this. If you have more applications (either other Flink apps or non-Flink 
> apps) competing aggressively with your Flink app on this API, your Flink app 
> breaks. 
> I propose increasing the value DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS from 
> 10,000millis(10sec) to preferably 300,000 (5min). Or at least 60,000 (1min) 
> if anyone has a solid reason arguing that 5min is too long, 
> This is also related to https://issues.apache.org/jira/browse/FLINK-6365



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7223) Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector

2017-08-14 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125845#comment-16125845
 ] 

Stephan Ewen commented on FLINK-7223:
-

[~phoenixjiangnan] Do you think it is worthwhile filing an issue at Amazon for 
that?
It seems crazy that the number of describe requests is so low and across all 
apps of one account.


> Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector
> 
>
> Key: FLINK-7223
> URL: https://issues.apache.org/jira/browse/FLINK-7223
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.3.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0
>
>
> Background: {{DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS}} in 
> {{org.apache.flink.streaming.connectors.kinesis.config.ConsumerConfigConstants}}
>  is the default value for Flink to call Kinesis's {{describeStream()}} API.
> Problem: Right now, its value is 10,000millis (10sec), which is too short. We 
> ran into problems that Flink-kinesis-connector's call of {{describeStream()}} 
> exceeds Kinesis rate limit, and broken Flink taskmanager.
> According to 
> http://docs.aws.amazon.com/kinesis/latest/APIReference/API_DescribeStream.html,
>  
> "This operation has a limit of 10 transactions per second per account.". What 
> it means is that the 10transaction/account is a limit on a single 
> organization's AWS account..:(  We contacted AWS Support, and confirmed 
> this. If you have more applications (either other Flink apps or non-Flink 
> apps) competing aggressively with your Flink app on this API, your Flink app 
> breaks. 
> I propose increasing the value DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS from 
> 10,000millis(10sec) to preferably 300,000 (5min). Or at least 60,000 (1min) 
> if anyone has a solid reason arguing that 5min is too long, 
> This is also related to https://issues.apache.org/jira/browse/FLINK-6365



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7223) Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector

2017-08-08 Thread Bowen Li (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119222#comment-16119222
 ] 

Bowen Li commented on FLINK-7223:
-

I'm actually fine with the default value, since we are already aware of this 
issue, and we override the default value ourself. But from a enterprise user's 
point of view, the assumption of 'this means running up to 50 Flink jobs per 
account' is not practical at all for a big AWS enterprise customer. Here's why:

In an enterprise AWS account, lots of other services are contributing to 
saturating the 10requests/sec throughput. In our (OfferUp, inc) prod AWS 
account, we are hitting that cap even before having Flink. Having more and more 
Flink jobs makes things worse, and breaks other services. Our Flink jobs are 
not more important than other services, so we can't either allocate resources 
solely for Flink or compete with other services. Thus we set Flink's discovery 
interval to be {{1 hour}}. 

It's probably not a topic worths a long discussion :) If you guys feel the 
default value is ok, I'll good with it

> Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector
> 
>
> Key: FLINK-7223
> URL: https://issues.apache.org/jira/browse/FLINK-7223
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.3.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0
>
>
> Background: {{DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS}} in 
> {{org.apache.flink.streaming.connectors.kinesis.config.ConsumerConfigConstants}}
>  is the default value for Flink to call Kinesis's {{describeStream()}} API.
> Problem: Right now, its value is 10,000millis (10sec), which is too short. We 
> ran into problems that Flink-kinesis-connector's call of {{describeStream()}} 
> exceeds Kinesis rate limit, and broken Flink taskmanager.
> According to 
> http://docs.aws.amazon.com/kinesis/latest/APIReference/API_DescribeStream.html,
>  
> "This operation has a limit of 10 transactions per second per account.". What 
> it means is that the 10transaction/account is a limit on a single 
> organization's AWS account..:(  We contacted AWS Support, and confirmed 
> this. If you have more applications (either other Flink apps or non-Flink 
> apps) competing aggressively with your Flink app on this API, your Flink app 
> breaks. 
> I propose increasing the value DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS from 
> 10,000millis(10sec) to preferably 300,000 (5min). Or at least 60,000 (1min) 
> if anyone has a solid reason arguing that 5min is too long, 
> This is also related to https://issues.apache.org/jira/browse/FLINK-6365



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7223) Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector

2017-07-31 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107888#comment-16107888
 ] 

Stephan Ewen commented on FLINK-7223:
-

Hmmm... Is there a way to auto-configure this value as in the following way:
  - The Flink job would theoretically do discovery once per 5 seconds (this 
means running up to 50 Flink jobs er account)
  - The interval in which the TaskManagers can discover is than {{5 seconds x 
source parallelism}}


> Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector
> 
>
> Key: FLINK-7223
> URL: https://issues.apache.org/jira/browse/FLINK-7223
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.3.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0
>
>
> Background: {{DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS}} in 
> {{org.apache.flink.streaming.connectors.kinesis.config.ConsumerConfigConstants}}
>  is the default value for Flink to call Kinesis's {{describeStream()}} API.
> Problem: Right now, its value is 10,000millis (10sec), which is too short. We 
> ran into problems that Flink-kinesis-connector's call of {{describeStream()}} 
> exceeds Kinesis rate limit, and broken Flink taskmanager.
> According to 
> http://docs.aws.amazon.com/kinesis/latest/APIReference/API_DescribeStream.html,
>  
> "This operation has a limit of 10 transactions per second per account.". What 
> it means is that the 10transaction/account is a limit on a single 
> organization's AWS account..:(  We contacted AWS Support, and confirmed 
> this. If you have more applications (either other Flink apps or non-Flink 
> apps) competing aggressively with your Flink app on this API, your Flink app 
> breaks. 
> I propose increasing the value DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS from 
> 10,000millis(10sec) to preferably 300,000 (5min). Or at least 60,000 (1min) 
> if anyone has a solid reason arguing that 5min is too long, 
> This is also related to https://issues.apache.org/jira/browse/FLINK-6365



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7223) Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector

2017-07-29 Thread Bowen Li (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106325#comment-16106325
 ] 

Bowen Li commented on FLINK-7223:
-

What do you guys think 3min or 1min?

I strongly believe that this default value should be the most practical one so 
it won't cause much trouble to users, rather than the theoretically shortest 
one but rarely works in practice

> Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector
> 
>
> Key: FLINK-7223
> URL: https://issues.apache.org/jira/browse/FLINK-7223
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.3.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0
>
>
> Background: {{DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS}} in 
> {{org.apache.flink.streaming.connectors.kinesis.config.ConsumerConfigConstants}}
>  is the default value for Flink to call Kinesis's {{describeStream()}} API.
> Problem: Right now, its value is 10,000millis (10sec), which is too short. We 
> ran into problems that Flink-kinesis-connector's call of {{describeStream()}} 
> exceeds Kinesis rate limit, and broken Flink taskmanager.
> According to 
> http://docs.aws.amazon.com/kinesis/latest/APIReference/API_DescribeStream.html,
>  
> "This operation has a limit of 10 transactions per second per account.". What 
> it means is that the 10transaction/account is a limit on a single 
> organization's AWS account..:(  We contacted AWS Support, and confirmed 
> this. If you have more applications (either other Flink apps or non-Flink 
> apps) competing aggressively with your Flink app on this API, your Flink app 
> breaks. 
> I propose increasing the value DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS from 
> 10,000millis(10sec) to preferably 300,000 (5min). Or at least 60,000 (1min) 
> if anyone has a solid reason arguing that 5min is too long, 
> This is also related to https://issues.apache.org/jira/browse/FLINK-6365



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7223) Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector

2017-07-19 Thread Bowen Li (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094173#comment-16094173
 ] 

Bowen Li commented on FLINK-7223:
-

[~sthm] can we also get your insights here?

> Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector
> 
>
> Key: FLINK-7223
> URL: https://issues.apache.org/jira/browse/FLINK-7223
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.3.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0
>
>
> Background: {{DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS}} in 
> {{org.apache.flink.streaming.connectors.kinesis.config.ConsumerConfigConstants}}
>  is the default value for Flink to call Kinesis's {{describeStream()}} API.
> Problem: Right now, its value is 10,000millis (10sec), which is too short. We 
> ran into problems that Flink-kinesis-connector's call of {{describeStream()}} 
> exceeds Kinesis rate limit, and broken Flink taskmanager.
> According to 
> http://docs.aws.amazon.com/kinesis/latest/APIReference/API_DescribeStream.html,
>  
> "This operation has a limit of 10 transactions per second per account.". What 
> it means is that the 10transaction/account is a limit on a single 
> organization's AWS account..:(  We contacted AWS Support, and confirmed 
> this. If you have more applications (either other Flink apps or non-Flink 
> apps) competing aggressively with your Flink app on this API, your Flink app 
> breaks. 
> I propose increasing the value DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS from 
> 10,000millis(10sec) to preferably 300,000 (5min). Or at least 60,000 (1min) 
> if anyone has a solid reason arguing that 5min is too long, 
> This is also related to https://issues.apache.org/jira/browse/FLINK-6365



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7223) Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector

2017-07-19 Thread Bowen Li (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093498#comment-16093498
 ] 

Bowen Li commented on FLINK-7223:
-

I believe what Gordon and I proposed are complementary. We need to do both. We 
can create another ticket for what Gordon proposed for code change, and link it 
to this ticket. 

10sec is simply too short and intensive.  What do you think about 3min or 1min? 
AWS takes about 5 minutes to update kinesics shard itself, so users are of 
right expectation that Flink takes the same amount of time. 




> Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector
> 
>
> Key: FLINK-7223
> URL: https://issues.apache.org/jira/browse/FLINK-7223
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.3.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0
>
>
> Background: {{DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS}} in 
> {{org.apache.flink.streaming.connectors.kinesis.config.ConsumerConfigConstants}}
>  is the default value for Flink to call Kinesis's {{describeStream()}} API.
> Problem: Right now, its value is 10,000millis (10sec), which is too short. We 
> ran into problems that Flink-kinesis-connector's call of {{describeStream()}} 
> exceeds Kinesis rate limit, and broken Flink taskmanager.
> According to 
> http://docs.aws.amazon.com/kinesis/latest/APIReference/API_DescribeStream.html,
>  
> "This operation has a limit of 10 transactions per second per account.". What 
> it means is that the 10transaction/account is a limit on a single 
> organization's AWS account..:(  We contacted AWS Support, and confirmed 
> this. If you have more applications (either other Flink apps or non-Flink 
> apps) competing aggressively with your Flink app on this API, your Flink app 
> breaks. 
> I propose increasing the value DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS from 
> 10,000millis(10sec) to preferably 300,000 (5min). Or at least 60,000 (1min) 
> if anyone has a solid reason arguing that 5min is too long, 
> This is also related to https://issues.apache.org/jira/browse/FLINK-6365



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7223) Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector

2017-07-19 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093375#comment-16093375
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-7223:


This config affects the responsiveness to new shards of the stream. I agree 
that 5 minutes is a bit too much by default.
One other thing: the AWS limitation is "This operation has a limit of 10 
transactions per second per account." Therefore, IMO, even if we change this 
discovery interval to 5 minutes, you would still bump into the limitation if 
the source parallelism is high (each source instance performs discovery 
independently).

We could consider:
1. Be a little less strict when handling exceptions due to the limitation cap
2. Perhaps add a little more jitter into the discovery interval to even out the 
requests across multiple subtasks.

My consideration is that simply increasing the discovery interval would not 
really solve the problem of {{describeStream}} API rate limitations.

> Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector
> 
>
> Key: FLINK-7223
> URL: https://issues.apache.org/jira/browse/FLINK-7223
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.3.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0
>
>
> Background: {{DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS}} in 
> {{org.apache.flink.streaming.connectors.kinesis.config.ConsumerConfigConstants}}
>  is the default value for Flink to call Kinesis's {{describeStream()}} API.
> Problem: Right now, its value is 10,000millis (10sec), which is too short. We 
> ran into problems that Flink-kinesis-connector's call of {{describeStream()}} 
> exceeds Kinesis rate limit, and broken Flink taskmanager.
> According to 
> http://docs.aws.amazon.com/kinesis/latest/APIReference/API_DescribeStream.html,
>  
> "This operation has a limit of 10 transactions per second per account.". What 
> it means is that the 10transaction/account is a limit on a single 
> organization's AWS account..:(  We contacted AWS Support, and confirmed 
> this. If you have more applications (either other Flink apps or non-Flink 
> apps) competing aggressively with your Flink app on this API, your Flink app 
> breaks. 
> I propose increasing the value DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS from 
> 10,000millis(10sec) to preferably 300,000 (5min). Or at least 60,000 (1min) 
> if anyone has a solid reason arguing that 5min is too long, 
> This is also related to https://issues.apache.org/jira/browse/FLINK-6365



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7223) Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector

2017-07-19 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093334#comment-16093334
 ] 

Stephan Ewen commented on FLINK-7223:
-

[~tzulitai] what do you think about that?

What is affected by that? The responsiveness to appearance of new shards 
(elasticity of Kinesis), or the responsiveness to new streams?
If the former was affected, then 5 minutes can be a lot. For the later I would 
think it is not too bad.

> Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector
> 
>
> Key: FLINK-7223
> URL: https://issues.apache.org/jira/browse/FLINK-7223
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.3.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0
>
>
> Background: {{DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS}} in 
> {{org.apache.flink.streaming.connectors.kinesis.config.ConsumerConfigConstants}}
>  is the default value for Flink to call Kinesis's {{describeStream()}} API.
> Problem: Right now, its value is 10,000millis (10sec), which is too short. We 
> ran into problems that Flink-kinesis-connector's call of {{describeStream()}} 
> exceeds Kinesis rate limit, and broken Flink taskmanager.
> According to 
> http://docs.aws.amazon.com/kinesis/latest/APIReference/API_DescribeStream.html,
>  
> "This operation has a limit of 10 transactions per second per account.". What 
> it means is that the 10transaction/account is a limit on a single 
> organization's AWS account..:(  We contacted AWS Support, and confirmed 
> this. If you have more applications (either other Flink apps or non-Flink 
> apps) competing aggressively with your Flink app on this API, your Flink app 
> breaks. 
> I propose increasing the value DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS from 
> 10,000millis(10sec) to preferably 300,000 (5min). Or at least 60,000 (1min) 
> if anyone has a solid reason arguing that 5min is too long, 
> This is also related to https://issues.apache.org/jira/browse/FLINK-6365



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)