[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent
[ https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503240#comment-17503240 ] Jiangjie Qin commented on FLINK-21364: -- [~stevenz3wu] Sorry for the late response. I somehow missed the notification email. Personally I feel that we should probably have a separate {{FinishedSplitsEvent}} regardless whether we piggyback the finishedSplitIds in the {{{}RequestSplitEvent{}}}. That sounds more of an optimization to save an event sending when possible. > piggyback finishedSplitIds in RequestSplitEvent > --- > > Key: FLINK-21364 > URL: https://issues.apache.org/jira/browse/FLINK-21364 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common >Affects Versions: 1.12.1 >Reporter: Steven Zhen Wu >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > > For some split assignment strategy, the enumerator/assigner needs to track > the completed splits to advance watermark for event time alignment or rough > ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support > pass-along of the `finishedSplitIds` info and hence we have to create our own > custom source event type for Iceberg source. > Here is the proposal of add such optional info to `RequestSplitEvent`. > {code} > public RequestSplitEvent( > @Nullable String hostName, > @Nullable Collection finishedSplitIds) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent
[ https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501576#comment-17501576 ] Steven Zhen Wu commented on FLINK-21364: actually, I think we should look into FLIP-182 FLINK-18450 first and see if Iceberg source can use the same mechanism to watermark alignment. > piggyback finishedSplitIds in RequestSplitEvent > --- > > Key: FLINK-21364 > URL: https://issues.apache.org/jira/browse/FLINK-21364 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common >Affects Versions: 1.12.1 >Reporter: Steven Zhen Wu >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > > For some split assignment strategy, the enumerator/assigner needs to track > the completed splits to advance watermark for event time alignment or rough > ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support > pass-along of the `finishedSplitIds` info and hence we have to create our own > custom source event type for Iceberg source. > Here is the proposal of add such optional info to `RequestSplitEvent`. > {code} > public RequestSplitEvent( > @Nullable String hostName, > @Nullable Collection finishedSplitIds) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent
[ https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501528#comment-17501528 ] Steven Zhen Wu commented on FLINK-21364: [~jqin] [~thw] [~pnowojski] I like to restart this discussion. Besides the proposal in this PR, we can also go with the other approach that Thomas mentioned where we create a separate `FinishedSplitsEvent`. Please let me know your thoughts. > piggyback finishedSplitIds in RequestSplitEvent > --- > > Key: FLINK-21364 > URL: https://issues.apache.org/jira/browse/FLINK-21364 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common >Affects Versions: 1.12.1 >Reporter: Steven Zhen Wu >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > > For some split assignment strategy, the enumerator/assigner needs to track > the completed splits to advance watermark for event time alignment or rough > ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support > pass-along of the `finishedSplitIds` info and hence we have to create our own > custom source event type for Iceberg source. > Here is the proposal of add such optional info to `RequestSplitEvent`. > {code} > public RequestSplitEvent( > @Nullable String hostName, > @Nullable Collection finishedSplitIds) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent
[ https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501525#comment-17501525 ] Steven Zhen Wu commented on FLINK-21364: [~pnowojski] sorry for missed you message earlier. Yeah, the motivation is for watermark alignment for Iceberg source, where the watermark alignment is happening in the enumerator. Hence the enumerator needs to know which files/splits are completed and decides whether to advance watermark or not. https://github.com/stevenzwu/iceberg/blob/flip27IcebergSource/flink/v1.13/flink/src/main/java/org/apache/iceberg/flink/source/enumerator/AbstractIcebergEnumerator.java#L89 For sources with unbounded split (like Kafka), I believe the watermark alignment is done at readers side. > piggyback finishedSplitIds in RequestSplitEvent > --- > > Key: FLINK-21364 > URL: https://issues.apache.org/jira/browse/FLINK-21364 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common >Affects Versions: 1.12.1 >Reporter: Steven Zhen Wu >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > > For some split assignment strategy, the enumerator/assigner needs to track > the completed splits to advance watermark for event time alignment or rough > ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support > pass-along of the `finishedSplitIds` info and hence we have to create our own > custom source event type for Iceberg source. > Here is the proposal of add such optional info to `RequestSplitEvent`. > {code} > public RequestSplitEvent( > @Nullable String hostName, > @Nullable Collection finishedSplitIds) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent
[ https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494536#comment-17494536 ] Piotr Nowojski commented on FLINK-21364: Hi [~stevenz3wu] and [~thw]. Sorry for a late response (usually I'm not involved with sources and connectors, but this issue has popped up while I was searching for something else). What is/was the motivation for this feature request? Watermark alignment? If such, isn't this a duplicate of FLIP-182 FLINK-18450 ? > piggyback finishedSplitIds in RequestSplitEvent > --- > > Key: FLINK-21364 > URL: https://issues.apache.org/jira/browse/FLINK-21364 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common >Affects Versions: 1.12.1 >Reporter: Steven Zhen Wu >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > > For some split assignment strategy, the enumerator/assigner needs to track > the completed splits to advance watermark for event time alignment or rough > ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support > pass-along of the `finishedSplitIds` info and hence we have to create our own > custom source event type for Iceberg source. > Here is the proposal of add such optional info to `RequestSplitEvent`. > {code} > public RequestSplitEvent( > @Nullable String hostName, > @Nullable Collection finishedSplitIds) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent
[ https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335907#comment-17335907 ] Flink Jira Bot commented on FLINK-21364: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > piggyback finishedSplitIds in RequestSplitEvent > --- > > Key: FLINK-21364 > URL: https://issues.apache.org/jira/browse/FLINK-21364 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common >Affects Versions: 1.12.1 >Reporter: Steven Zhen Wu >Priority: Major > Labels: pull-request-available, stale-major > > For some split assignment strategy, the enumerator/assigner needs to track > the completed splits to advance watermark for event time alignment or rough > ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support > pass-along of the `finishedSplitIds` info and hence we have to create our own > custom source event type for Iceberg source. > Here is the proposal of add such optional info to `RequestSplitEvent`. > {code} > public RequestSplitEvent( > @Nullable String hostName, > @Nullable Collection finishedSplitIds) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent
[ https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17327336#comment-17327336 ] Flink Jira Bot commented on FLINK-21364: This major issue is unassigned and itself and all of its Sub-Tasks have not been updated for 30 days. So, it has been labeled "stale-major". If this ticket is indeed "major", please either assign yourself or give an update. Afterwards, please remove the label. In 7 days the issue will be deprioritized. > piggyback finishedSplitIds in RequestSplitEvent > --- > > Key: FLINK-21364 > URL: https://issues.apache.org/jira/browse/FLINK-21364 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common >Affects Versions: 1.12.1 >Reporter: Steven Zhen Wu >Priority: Major > Labels: pull-request-available, stale-major > > For some split assignment strategy, the enumerator/assigner needs to track > the completed splits to advance watermark for event time alignment or rough > ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support > pass-along of the `finishedSplitIds` info and hence we have to create our own > custom source event type for Iceberg source. > Here is the proposal of add such optional info to `RequestSplitEvent`. > {code} > public RequestSplitEvent( > @Nullable String hostName, > @Nullable Collection finishedSplitIds) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent
[ https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284024#comment-17284024 ] Steven Zhen Wu commented on FLINK-21364: For pull based source (like file/Iceberg), it is probably more natural/efficient to piggyback the `finishedSplitsIds` in the `RequestSplitEvent`. A reader request a new split when the current split is done. It doesn't mean that a reader has to request for a new split when finishing some splits, like bounded Kafka source case. You have a good point that some sources (like Kafka/Kineses) may still need to communicate the watermark info to coordinator/enumerator. In this case, it definitely will be a separate type of event (like `WatermarkUpdateEvent`). In our Iceberg source use cases, readers didn't actually report watermark. They just need to report which split are finished. But I can see that this may not be a very generic scenario to change the `RequestSplitEvent` in flink-runtime. cc [~sundaram] > piggyback finishedSplitIds in RequestSplitEvent > --- > > Key: FLINK-21364 > URL: https://issues.apache.org/jira/browse/FLINK-21364 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common >Affects Versions: 1.12.1 >Reporter: Steven Zhen Wu >Priority: Major > Labels: pull-request-available > > For some split assignment strategy, the enumerator/assigner needs to track > the completed splits to advance watermark for event time alignment or rough > ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support > pass-along of the `finishedSplitIds` info and hence we have to create our own > custom source event type for Iceberg source. > Here is the proposal of add such optional info to `RequestSplitEvent`. > {code} > public RequestSplitEvent( > @Nullable String hostName, > @Nullable Collection finishedSplitIds) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent
[ https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283981#comment-17283981 ] Thomas Weise commented on FLINK-21364: -- I wonder if finished splits should be communicated separately to the enumerator? Theoretically splits could finish and the reader not (yet) request new splits. Regarding the event time alignment: For the file source the split boundary may provide sufficient granularity. But for sources like Kafka and Kinesis where readers work on their splits "forever", this won't be the case. There would need to be a different mechanism to synchronize. In the old Kinesis source that is accomplished by exchanging the actual watermark information. > piggyback finishedSplitIds in RequestSplitEvent > --- > > Key: FLINK-21364 > URL: https://issues.apache.org/jira/browse/FLINK-21364 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common >Affects Versions: 1.12.1 >Reporter: Steven Zhen Wu >Priority: Major > Labels: pull-request-available > > For some split assignment strategy, the enumerator/assigner needs to track > the completed splits to advance watermark for event time alignment or rough > ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support > pass-along of the `finishedSplitIds` info and hence we have to create our own > custom source event type for Iceberg source. > Here is the proposal of add such optional info to `RequestSplitEvent`. > {code} > public RequestSplitEvent( > @Nullable String hostName, > @Nullable Collection finishedSplitIds) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent
[ https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283152#comment-17283152 ] Steven Zhen Wu commented on FLINK-21364: cc [~sewen] [~jqin] [~thomasWeise] > piggyback finishedSplitIds in RequestSplitEvent > --- > > Key: FLINK-21364 > URL: https://issues.apache.org/jira/browse/FLINK-21364 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common >Affects Versions: 1.12.1 >Reporter: Steven Zhen Wu >Priority: Major > > For some split assignment strategy, the enumerator/assigner needs to track > the completed splits to advance watermark for event time alignment or rough > ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support > pass-along of the `finishedSplitIds` info and hence we have to create our own > custom source event type for Iceberg source. > Here is the proposal of add such optional info to `RequestSplitEvent`. > ``` > public RequestSplitEvent( > @Nullable String hostName, > @Nullable Collection finishedSplitIds) > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)