[jira] [Commented] (FLINK-8718) Non-parallel DataStreamSource does not set max parallelism
[ https://issues.apache.org/jira/browse/FLINK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17336653#comment-17336653 ] Flink Jira Bot commented on FLINK-8718: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Non-parallel DataStreamSource does not set max parallelism > -- > > Key: FLINK-8718 > URL: https://issues.apache.org/jira/browse/FLINK-8718 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.5.0 >Reporter: Gary Yao >Priority: Major > Labels: stale-major > > {{org.apache.flink.streaming.api.datastream.DataStreamSource}} does not set > {{maxParallelism}} to 1 if it is non-parallel. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-8718) Non-parallel DataStreamSource does not set max parallelism
[ https://issues.apache.org/jira/browse/FLINK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17328659#comment-17328659 ] Flink Jira Bot commented on FLINK-8718: --- This major issue is unassigned and itself and all of its Sub-Tasks have not been updated for 30 days. So, it has been labeled "stale-major". If this ticket is indeed "major", please either assign yourself or give an update. Afterwards, please remove the label. In 7 days the issue will be deprioritized. > Non-parallel DataStreamSource does not set max parallelism > -- > > Key: FLINK-8718 > URL: https://issues.apache.org/jira/browse/FLINK-8718 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.5.0 >Reporter: Gary Yao >Priority: Major > Labels: stale-major > > {{org.apache.flink.streaming.api.datastream.DataStreamSource}} does not set > {{maxParallelism}} to 1 if it is non-parallel. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-8718) Non-parallel DataStreamSource does not set max parallelism
[ https://issues.apache.org/jira/browse/FLINK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492071#comment-16492071 ] Gary Yao commented on FLINK-8718: - Can this be fixed at all? > Non-parallel DataStreamSource does not set max parallelism > -- > > Key: FLINK-8718 > URL: https://issues.apache.org/jira/browse/FLINK-8718 > Project: Flink > Issue Type: Bug > Components: DataStream API, Streaming >Affects Versions: 1.5.0 >Reporter: Gary Yao >Priority: Major > Fix For: 1.5.1 > > > {{org.apache.flink.streaming.api.datastream.DataStreamSource}} does not set > {{maxParallelism}} to 1 if it is non-parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8718) Non-parallel DataStreamSource does not set max parallelism
[ https://issues.apache.org/jira/browse/FLINK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375998#comment-16375998 ] ASF GitHub Bot commented on FLINK-8718: --- Github user GJL closed the pull request at: https://github.com/apache/flink/pull/5545 > Non-parallel DataStreamSource does not set max parallelism > -- > > Key: FLINK-8718 > URL: https://issues.apache.org/jira/browse/FLINK-8718 > Project: Flink > Issue Type: Bug > Components: DataStream API, Streaming >Affects Versions: 1.5.0 >Reporter: Gary Yao >Assignee: Gary Yao >Priority: Major > Fix For: 1.5.0 > > > {{org.apache.flink.streaming.api.datastream.DataStreamSource}} does not set > {{maxParallelism}} to 1 if it is non-parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8718) Non-parallel DataStreamSource does not set max parallelism
[ https://issues.apache.org/jira/browse/FLINK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372660#comment-16372660 ] ASF GitHub Bot commented on FLINK-8718: --- Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/5545 I agree with @aljoscha, there is a sanity check that `maxParallelism` of operators did not change when restoring. > Non-parallel DataStreamSource does not set max parallelism > -- > > Key: FLINK-8718 > URL: https://issues.apache.org/jira/browse/FLINK-8718 > Project: Flink > Issue Type: Bug > Components: DataStream API, Streaming >Affects Versions: 1.5.0 >Reporter: Gary Yao >Assignee: Gary Yao >Priority: Major > Fix For: 1.5.0 > > > {{org.apache.flink.streaming.api.datastream.DataStreamSource}} does not set > {{maxParallelism}} to 1 if it is non-parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8718) Non-parallel DataStreamSource does not set max parallelism
[ https://issues.apache.org/jira/browse/FLINK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372641#comment-16372641 ] ASF GitHub Bot commented on FLINK-8718: --- Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/5545 Yes, no source can use keyed state. 😄 But the `StateAssignmentOperation` will still fail a restore if the `maxParallelism` of an operator changes. Maybe @StefanRRichter can chime in on that but I think we need to at least touch those parts. > Non-parallel DataStreamSource does not set max parallelism > -- > > Key: FLINK-8718 > URL: https://issues.apache.org/jira/browse/FLINK-8718 > Project: Flink > Issue Type: Bug > Components: DataStream API, Streaming >Affects Versions: 1.5.0 >Reporter: Gary Yao >Assignee: Gary Yao >Priority: Major > Fix For: 1.5.0 > > > {{org.apache.flink.streaming.api.datastream.DataStreamSource}} does not set > {{maxParallelism}} to 1 if it is non-parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8718) Non-parallel DataStreamSource does not set max parallelism
[ https://issues.apache.org/jira/browse/FLINK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372076#comment-16372076 ] ASF GitHub Bot commented on FLINK-8718: --- Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/5545 Shouldn't it always be possible to merge n key groups into a single key group? Maybe this could be the backwards compatibility path. Moreover, which sources use keyed state? Don't almost all of the sources use operator state if they are stateful? > Non-parallel DataStreamSource does not set max parallelism > -- > > Key: FLINK-8718 > URL: https://issues.apache.org/jira/browse/FLINK-8718 > Project: Flink > Issue Type: Bug > Components: DataStream API, Streaming >Affects Versions: 1.5.0 >Reporter: Gary Yao >Assignee: Gary Yao >Priority: Major > Fix For: 1.5.0 > > > {{org.apache.flink.streaming.api.datastream.DataStreamSource}} does not set > {{maxParallelism}} to 1 if it is non-parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8718) Non-parallel DataStreamSource does not set max parallelism
[ https://issues.apache.org/jira/browse/FLINK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371789#comment-16371789 ] ASF GitHub Bot commented on FLINK-8718: --- Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/5545 I'm not sure we can do this change since it breaks compatibility for savepoints because previous versions of Flink hat some `maxParallelism` setting for sources which would be incompatible with the new `maxParallelism` of 1. Thoughts? > Non-parallel DataStreamSource does not set max parallelism > -- > > Key: FLINK-8718 > URL: https://issues.apache.org/jira/browse/FLINK-8718 > Project: Flink > Issue Type: Bug > Components: DataStream API, Streaming >Affects Versions: 1.5.0 >Reporter: Gary Yao >Assignee: Gary Yao >Priority: Major > Fix For: 1.5.0 > > > {{org.apache.flink.streaming.api.datastream.DataStreamSource}} does not set > {{maxParallelism}} to 1 if it is non-parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8718) Non-parallel DataStreamSource does not set max parallelism
[ https://issues.apache.org/jira/browse/FLINK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371659#comment-16371659 ] ASF GitHub Bot commented on FLINK-8718: --- Github user GJL commented on the issue: https://github.com/apache/flink/pull/5545 The test relies on being able to set setParallelism on a non-parallel source: `DataStream input1 = env.fromElements(1, 2, 3, 4).setMaxParallelism(128);` > Non-parallel DataStreamSource does not set max parallelism > -- > > Key: FLINK-8718 > URL: https://issues.apache.org/jira/browse/FLINK-8718 > Project: Flink > Issue Type: Bug > Components: DataStream API, Streaming >Affects Versions: 1.5.0 >Reporter: Gary Yao >Assignee: Gary Yao >Priority: Major > Fix For: 1.5.0 > > > {{org.apache.flink.streaming.api.datastream.DataStreamSource}} does not set > {{maxParallelism}} to 1 if it is non-parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8718) Non-parallel DataStreamSource does not set max parallelism
[ https://issues.apache.org/jira/browse/FLINK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371645#comment-16371645 ] ASF GitHub Bot commented on FLINK-8718: --- Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/5545 Could that be a faulty test? > Non-parallel DataStreamSource does not set max parallelism > -- > > Key: FLINK-8718 > URL: https://issues.apache.org/jira/browse/FLINK-8718 > Project: Flink > Issue Type: Bug > Components: DataStream API, Streaming >Affects Versions: 1.5.0 >Reporter: Gary Yao >Assignee: Gary Yao >Priority: Major > Fix For: 1.5.0 > > > {{org.apache.flink.streaming.api.datastream.DataStreamSource}} does not set > {{maxParallelism}} to 1 if it is non-parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8718) Non-parallel DataStreamSource does not set max parallelism
[ https://issues.apache.org/jira/browse/FLINK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371601#comment-16371601 ] ASF GitHub Bot commented on FLINK-8718: --- Github user GJL commented on the issue: https://github.com/apache/flink/pull/5545 I have test failures, e.g., ` StreamGraphGeneratorTest.testMaxParallelismWithConnectedKeyedStream` 😢 > Non-parallel DataStreamSource does not set max parallelism > -- > > Key: FLINK-8718 > URL: https://issues.apache.org/jira/browse/FLINK-8718 > Project: Flink > Issue Type: Bug > Components: DataStream API, Streaming >Affects Versions: 1.5.0 >Reporter: Gary Yao >Assignee: Gary Yao >Priority: Major > Fix For: 1.5.0 > > > {{org.apache.flink.streaming.api.datastream.DataStreamSource}} does not set > {{maxParallelism}} to 1 if it is non-parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8718) Non-parallel DataStreamSource does not set max parallelism
[ https://issues.apache.org/jira/browse/FLINK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371477#comment-16371477 ] ASF GitHub Bot commented on FLINK-8718: --- GitHub user GJL opened a pull request: https://github.com/apache/flink/pull/5545 [FLINK-8718][DataStream] Set maxParallelism on non-parallel DataStreamSource ## What is the purpose of the change *Set maxParallelism to 1 on `org.apache.flink.streaming.api.datastream.DataStreamSource` if the operator is non-parallel.* cc: @tillrohrmann @bowenli86 @aljoscha ## Brief change log - *Call `forceNonParallel()` if source operator is non-parallel.* - *Unit test `DataStreamSource`.* ## Verifying this change This change added tests and can be verified as follows: - *Added unit tests for `DataStreamSource`* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (**yes** / no) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (**yes** / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented) You can merge this pull request into a Git repository by running: $ git pull https://github.com/GJL/flink FLINK-8718 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5545.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5545 commit 3450c455e8e28bc83858f1c611fe8a648cc8cd97 Author: gyao Date: 2018-02-21T14:26:49Z [FLINK-8718][DataStream] Set maxParallelism to 1 on non-parallel DataStreamSource > Non-parallel DataStreamSource does not set max parallelism > -- > > Key: FLINK-8718 > URL: https://issues.apache.org/jira/browse/FLINK-8718 > Project: Flink > Issue Type: Bug > Components: DataStream API, Streaming >Affects Versions: 1.5.0 >Reporter: Gary Yao >Assignee: Gary Yao >Priority: Major > Fix For: 1.5.0 > > > {{org.apache.flink.streaming.api.datastream.DataStreamSource}} does not set > {{maxParallelism}} to 1 if it is non-parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8718) Non-parallel DataStreamSource does not set max parallelism
[ https://issues.apache.org/jira/browse/FLINK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371070#comment-16371070 ] Bowen Li commented on FLINK-8718: - Hi [~gjy] , I looked at the code. As an extend, what's the difference between the two variables {{DataSteamSource#isParallel}} and {{SingleOutputStreamOperator#nonParallel}} ? I feel like they can actually be consolidated into a single variable, but there's some refactoring needed. For example, {{SingleOutputStreamOperator#setMaxParallelism()}} requires {{Preconditions.checkArgument(canBeParallel() || maxParallelism == 1, xxx)}}, and it won't work if we consolidate the above two variables into one. > Non-parallel DataStreamSource does not set max parallelism > -- > > Key: FLINK-8718 > URL: https://issues.apache.org/jira/browse/FLINK-8718 > Project: Flink > Issue Type: Bug > Components: Core >Affects Versions: 1.5.0 >Reporter: Gary Yao >Priority: Blocker > Fix For: 1.5.0 > > > {{org.apache.flink.streaming.api.datastream.DataStreamSource}} does not set > {{maxParallelism}} to 1 if it is non-parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)