[jira] [Commented] (FLINK-4020) Remove shard list querying from Kinesis consumer constructor
[ https://issues.apache.org/jira/browse/FLINK-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338614#comment-15338614 ] ASF GitHub Bot commented on FLINK-4020: --- Github user tzulitai closed the pull request at: https://github.com/apache/flink/pull/2081 > Remove shard list querying from Kinesis consumer constructor > > > Key: FLINK-4020 > URL: https://issues.apache.org/jira/browse/FLINK-4020 > Project: Flink > Issue Type: Sub-task > Components: Kinesis Connector, Streaming Connectors >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > Fix For: 1.1.0 > > > Currently FlinkKinesisConsumer is querying for the whole list of shards in > the constructor, forcing the client to be able to access Kinesis as well. > This is also a drawback for handling Kinesis-side resharding, since we'd want > all shard listing / shard-to-task assigning / shard end (result of > resharding) handling logic to be capable of being independently done within > task life cycle methods, with defined and definite results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4020) Remove shard list querying from Kinesis consumer constructor
[ https://issues.apache.org/jira/browse/FLINK-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338613#comment-15338613 ] ASF GitHub Bot commented on FLINK-4020: --- Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/2081 Hi @rmetzger, Update: I'm closing this PR now. The new PR with FLINK-4020 & FLINK-3231 is at https://github.com/apache/flink/pull/2131. > Remove shard list querying from Kinesis consumer constructor > > > Key: FLINK-4020 > URL: https://issues.apache.org/jira/browse/FLINK-4020 > Project: Flink > Issue Type: Sub-task > Components: Kinesis Connector, Streaming Connectors >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > Fix For: 1.1.0 > > > Currently FlinkKinesisConsumer is querying for the whole list of shards in > the constructor, forcing the client to be able to access Kinesis as well. > This is also a drawback for handling Kinesis-side resharding, since we'd want > all shard listing / shard-to-task assigning / shard end (result of > resharding) handling logic to be capable of being independently done within > task life cycle methods, with defined and definite results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4020) Remove shard list querying from Kinesis consumer constructor
[ https://issues.apache.org/jira/browse/FLINK-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329401#comment-15329401 ] ASF GitHub Bot commented on FLINK-4020: --- Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2081 Okay, thank you. I'll wait then. > Remove shard list querying from Kinesis consumer constructor > > > Key: FLINK-4020 > URL: https://issues.apache.org/jira/browse/FLINK-4020 > Project: Flink > Issue Type: Sub-task > Components: Streaming Connectors >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > > Currently FlinkKinesisConsumer is querying for the whole list of shards in > the constructor, forcing the client to be able to access Kinesis as well. > This is also a drawback for handling Kinesis-side resharding, since we'd want > all shard listing / shard-to-task assigning / shard end (result of > resharding) handling logic to be capable of being independently done within > task life cycle methods, with defined and definite results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4020) Remove shard list querying from Kinesis consumer constructor
[ https://issues.apache.org/jira/browse/FLINK-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328943#comment-15328943 ] ASF GitHub Bot commented on FLINK-4020: --- Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/2081 Hi @rmetzger, Thanks for letting me know. However, I'd like to close this PR for now for the following reasons: 1. The new shard-to-subtask assignment logic introduced with this change will actually need to be moved again to run() as part of implementing Kinesis reshard handling [FLINK-3231](https://issues.apache.org/jira/browse/FLINK-3231). 2. I've testing this change a bit more on Kinesis streams with high shard counts, and it seems like the implementation needs more guarantee on that all subtasks will be able to get the shard list without failing with Amazon's LimitExceededException even after 3 retries. Since the implementation for FLINK-3231 will have a separate thread that polls for changes in the shard list, I'd like to strengthen this guarantee as part of FLINK-3231's PR. I'm almost done with FLINK-3231, and will reopen a PR to resolve FLINK-3231 and FLINK-4020 together. I'll keep you updated! > Remove shard list querying from Kinesis consumer constructor > > > Key: FLINK-4020 > URL: https://issues.apache.org/jira/browse/FLINK-4020 > Project: Flink > Issue Type: Sub-task > Components: Streaming Connectors >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > > Currently FlinkKinesisConsumer is querying for the whole list of shards in > the constructor, forcing the client to be able to access Kinesis as well. > This is also a drawback for handling Kinesis-side resharding, since we'd want > all shard listing / shard-to-task assigning / shard end (result of > resharding) handling logic to be capable of being independently done within > task life cycle methods, with defined and definite results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4020) Remove shard list querying from Kinesis consumer constructor
[ https://issues.apache.org/jira/browse/FLINK-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322207#comment-15322207 ] ASF GitHub Bot commented on FLINK-4020: --- Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2081 I'll try to review this change soon. > Remove shard list querying from Kinesis consumer constructor > > > Key: FLINK-4020 > URL: https://issues.apache.org/jira/browse/FLINK-4020 > Project: Flink > Issue Type: Sub-task > Components: Streaming Connectors >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > > Currently FlinkKinesisConsumer is querying for the whole list of shards in > the constructor, forcing the client to be able to access Kinesis as well. > This is also a drawback for handling Kinesis-side resharding, since we'd want > all shard listing / shard-to-task assigning / shard end (result of > resharding) handling logic to be capable of being independently done within > task life cycle methods, with defined and definite results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4020) Remove shard list querying from Kinesis consumer constructor
[ https://issues.apache.org/jira/browse/FLINK-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320388#comment-15320388 ] ASF GitHub Bot commented on FLINK-4020: --- GitHub user tzulitai opened a pull request: https://github.com/apache/flink/pull/2081 [FLINK-4020][streaming-connectors] Move shard list querying to open() for Kinesis consumer Remove shard list querying from the constructor, and let all subtasks independently discover which shards it should consume from in open(). This change is a prerequisite for [FLINK-3231](https://issues.apache.org/jira/browse/FLINK-3231). Explanation for some changes that might seem irrelevant: 1. Changed naming of some variables / methods: Since the behaviour of shard assignment to subtasks is now (and will continue to be in the future after FLINK-3231) more like "discovering shards for consuming" instead of "being assigned shards", I've changed the "assignedShards" related namings to "discoveredShards". 2. I've removed some tests, due to the fact that the corresponding parts of the code will be subject to quite a bit of change with the upcoming changes of [FLINK-3231](https://issues.apache.org/jira/browse/FLINK-3231). Tests will be added back with FLINK-3231. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tzulitai/flink FLINK-4020 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2081.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2081 commit 1db426be73f572aec2041cb1a9da6ad49425f392 Author: Gordon TaiDate: 2016-06-08T10:46:02Z [FLINK-4020] Move shard list querying to open() for Kinesis consumer > Remove shard list querying from Kinesis consumer constructor > > > Key: FLINK-4020 > URL: https://issues.apache.org/jira/browse/FLINK-4020 > Project: Flink > Issue Type: Sub-task > Components: Streaming Connectors >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > > Currently FlinkKinesisConsumer is querying for the whole list of shards in > the constructor, forcing the client to be able to access Kinesis as well. > This is also a drawback for handling Kinesis-side resharding, since we'd want > all shard listing / shard-to-task assigning / shard end (result of > resharding) handling logic to be capable of being independently done within > task life cycle methods, with defined and definite results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)