[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent

2022-03-08 Thread Jiangjie Qin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503240#comment-17503240
 ] 

Jiangjie Qin commented on FLINK-21364:
--

[~stevenz3wu] Sorry for the late response. I somehow missed the notification 
email. Personally I feel that we should probably have a separate 
{{FinishedSplitsEvent}} regardless whether we piggyback the finishedSplitIds in 
the {{{}RequestSplitEvent{}}}. That sounds more of an optimization to save an 
event sending when possible.

> piggyback finishedSplitIds in RequestSplitEvent
> ---
>
> Key: FLINK-21364
> URL: https://issues.apache.org/jira/browse/FLINK-21364
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Affects Versions: 1.12.1
>Reporter: Steven Zhen Wu
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>
> For some split assignment strategy, the enumerator/assigner needs to track 
> the completed splits to advance watermark for event time alignment or rough 
> ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support 
> pass-along of the `finishedSplitIds` info and hence we have to create our own 
> custom source event type for Iceberg source. 
> Here is the proposal of add such optional info to `RequestSplitEvent`.
> {code}
> public RequestSplitEvent(
> @Nullable String hostName, 
> @Nullable Collection finishedSplitIds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent

2022-03-04 Thread Steven Zhen Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501576#comment-17501576
 ] 

Steven Zhen Wu commented on FLINK-21364:


actually, I think we should look into FLIP-182 FLINK-18450 first and see if 
Iceberg source can use the same mechanism to watermark alignment.

> piggyback finishedSplitIds in RequestSplitEvent
> ---
>
> Key: FLINK-21364
> URL: https://issues.apache.org/jira/browse/FLINK-21364
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Affects Versions: 1.12.1
>Reporter: Steven Zhen Wu
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>
> For some split assignment strategy, the enumerator/assigner needs to track 
> the completed splits to advance watermark for event time alignment or rough 
> ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support 
> pass-along of the `finishedSplitIds` info and hence we have to create our own 
> custom source event type for Iceberg source. 
> Here is the proposal of add such optional info to `RequestSplitEvent`.
> {code}
> public RequestSplitEvent(
> @Nullable String hostName, 
> @Nullable Collection finishedSplitIds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent

2022-03-04 Thread Steven Zhen Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501528#comment-17501528
 ] 

Steven Zhen Wu commented on FLINK-21364:


[~jqin] [~thw] [~pnowojski] I like to restart this discussion. Besides the 
proposal in this PR, we can also go with the other approach that Thomas 
mentioned where we create a separate `FinishedSplitsEvent`. Please let me know 
your thoughts.

> piggyback finishedSplitIds in RequestSplitEvent
> ---
>
> Key: FLINK-21364
> URL: https://issues.apache.org/jira/browse/FLINK-21364
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Affects Versions: 1.12.1
>Reporter: Steven Zhen Wu
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>
> For some split assignment strategy, the enumerator/assigner needs to track 
> the completed splits to advance watermark for event time alignment or rough 
> ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support 
> pass-along of the `finishedSplitIds` info and hence we have to create our own 
> custom source event type for Iceberg source. 
> Here is the proposal of add such optional info to `RequestSplitEvent`.
> {code}
> public RequestSplitEvent(
> @Nullable String hostName, 
> @Nullable Collection finishedSplitIds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent

2022-03-04 Thread Steven Zhen Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501525#comment-17501525
 ] 

Steven Zhen Wu commented on FLINK-21364:


[~pnowojski] sorry for missed you message earlier.

Yeah, the motivation is for watermark alignment for Iceberg source, where the 
watermark alignment is happening in the enumerator. Hence the enumerator needs 
to know which files/splits are completed and decides whether to advance 
watermark or not.
https://github.com/stevenzwu/iceberg/blob/flip27IcebergSource/flink/v1.13/flink/src/main/java/org/apache/iceberg/flink/source/enumerator/AbstractIcebergEnumerator.java#L89

For sources with unbounded split (like Kafka), I believe the watermark 
alignment is done at readers side.

> piggyback finishedSplitIds in RequestSplitEvent
> ---
>
> Key: FLINK-21364
> URL: https://issues.apache.org/jira/browse/FLINK-21364
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Affects Versions: 1.12.1
>Reporter: Steven Zhen Wu
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>
> For some split assignment strategy, the enumerator/assigner needs to track 
> the completed splits to advance watermark for event time alignment or rough 
> ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support 
> pass-along of the `finishedSplitIds` info and hence we have to create our own 
> custom source event type for Iceberg source. 
> Here is the proposal of add such optional info to `RequestSplitEvent`.
> {code}
> public RequestSplitEvent(
> @Nullable String hostName, 
> @Nullable Collection finishedSplitIds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent

2022-02-18 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494536#comment-17494536
 ] 

Piotr Nowojski commented on FLINK-21364:


Hi [~stevenz3wu] and [~thw]. Sorry for a late response (usually I'm not 
involved with sources and connectors, but this issue has popped up while I was 
searching for something else). What is/was the motivation for this feature 
request? Watermark alignment? If such, isn't this a duplicate of FLIP-182 
FLINK-18450 ?

> piggyback finishedSplitIds in RequestSplitEvent
> ---
>
> Key: FLINK-21364
> URL: https://issues.apache.org/jira/browse/FLINK-21364
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Affects Versions: 1.12.1
>Reporter: Steven Zhen Wu
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>
> For some split assignment strategy, the enumerator/assigner needs to track 
> the completed splits to advance watermark for event time alignment or rough 
> ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support 
> pass-along of the `finishedSplitIds` info and hence we have to create our own 
> custom source event type for Iceberg source. 
> Here is the proposal of add such optional info to `RequestSplitEvent`.
> {code}
> public RequestSplitEvent(
> @Nullable String hostName, 
> @Nullable Collection finishedSplitIds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent

2021-04-29 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335907#comment-17335907
 ] 

Flink Jira Bot commented on FLINK-21364:


This issue was labeled "stale-major" 7 ago and has not received any updates so 
it is being deprioritized. If this ticket is actually Major, please raise the 
priority and ask a committer to assign you the issue or revive the public 
discussion.


> piggyback finishedSplitIds in RequestSplitEvent
> ---
>
> Key: FLINK-21364
> URL: https://issues.apache.org/jira/browse/FLINK-21364
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Affects Versions: 1.12.1
>Reporter: Steven Zhen Wu
>Priority: Major
>  Labels: pull-request-available, stale-major
>
> For some split assignment strategy, the enumerator/assigner needs to track 
> the completed splits to advance watermark for event time alignment or rough 
> ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support 
> pass-along of the `finishedSplitIds` info and hence we have to create our own 
> custom source event type for Iceberg source. 
> Here is the proposal of add such optional info to `RequestSplitEvent`.
> {code}
> public RequestSplitEvent(
> @Nullable String hostName, 
> @Nullable Collection finishedSplitIds)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent

2021-04-22 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17327336#comment-17327336
 ] 

Flink Jira Bot commented on FLINK-21364:


This major issue is unassigned and itself and all of its Sub-Tasks have not 
been updated for 30 days. So, it has been labeled "stale-major". If this ticket 
is indeed "major", please either assign yourself or give an update. Afterwards, 
please remove the label. In 7 days the issue will be deprioritized.

> piggyback finishedSplitIds in RequestSplitEvent
> ---
>
> Key: FLINK-21364
> URL: https://issues.apache.org/jira/browse/FLINK-21364
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Affects Versions: 1.12.1
>Reporter: Steven Zhen Wu
>Priority: Major
>  Labels: pull-request-available, stale-major
>
> For some split assignment strategy, the enumerator/assigner needs to track 
> the completed splits to advance watermark for event time alignment or rough 
> ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support 
> pass-along of the `finishedSplitIds` info and hence we have to create our own 
> custom source event type for Iceberg source. 
> Here is the proposal of add such optional info to `RequestSplitEvent`.
> {code}
> public RequestSplitEvent(
> @Nullable String hostName, 
> @Nullable Collection finishedSplitIds)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent

2021-02-12 Thread Steven Zhen Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284024#comment-17284024
 ] 

Steven Zhen Wu commented on FLINK-21364:


For pull based source (like file/Iceberg), it is probably more 
natural/efficient to piggyback the `finishedSplitsIds` in the 
`RequestSplitEvent`. A reader request a new split when the current split is 
done.

It doesn't mean that a reader has to request for a new split when finishing 
some splits, like bounded Kafka source case.

You have a good point that some sources (like Kafka/Kineses) may still need to 
communicate the watermark info to coordinator/enumerator. In this case, it 
definitely will be a separate type of event (like `WatermarkUpdateEvent`). 

In our Iceberg source use cases, readers didn't actually report watermark. They 
just need to report which split are finished. But I can see that this may not 
be a very generic scenario to change the `RequestSplitEvent` in flink-runtime.

cc [~sundaram]

> piggyback finishedSplitIds in RequestSplitEvent
> ---
>
> Key: FLINK-21364
> URL: https://issues.apache.org/jira/browse/FLINK-21364
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Affects Versions: 1.12.1
>Reporter: Steven Zhen Wu
>Priority: Major
>  Labels: pull-request-available
>
> For some split assignment strategy, the enumerator/assigner needs to track 
> the completed splits to advance watermark for event time alignment or rough 
> ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support 
> pass-along of the `finishedSplitIds` info and hence we have to create our own 
> custom source event type for Iceberg source. 
> Here is the proposal of add such optional info to `RequestSplitEvent`.
> {code}
> public RequestSplitEvent(
> @Nullable String hostName, 
> @Nullable Collection finishedSplitIds)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent

2021-02-12 Thread Thomas Weise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283981#comment-17283981
 ] 

Thomas Weise commented on FLINK-21364:
--

I wonder if finished splits should be communicated separately to the 
enumerator? Theoretically splits could finish and the reader not (yet) request 
new splits.

Regarding the event time alignment: For the file source the split boundary may 
provide sufficient granularity. But for sources like Kafka and Kinesis where 
readers work on their splits "forever", this won't be the case. There would 
need to be a different mechanism to synchronize. In the old Kinesis source that 
is accomplished by exchanging the actual watermark information. 

> piggyback finishedSplitIds in RequestSplitEvent
> ---
>
> Key: FLINK-21364
> URL: https://issues.apache.org/jira/browse/FLINK-21364
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Affects Versions: 1.12.1
>Reporter: Steven Zhen Wu
>Priority: Major
>  Labels: pull-request-available
>
> For some split assignment strategy, the enumerator/assigner needs to track 
> the completed splits to advance watermark for event time alignment or rough 
> ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support 
> pass-along of the `finishedSplitIds` info and hence we have to create our own 
> custom source event type for Iceberg source. 
> Here is the proposal of add such optional info to `RequestSplitEvent`.
> {code}
> public RequestSplitEvent(
> @Nullable String hostName, 
> @Nullable Collection finishedSplitIds)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21364) piggyback finishedSplitIds in RequestSplitEvent

2021-02-11 Thread Steven Zhen Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283152#comment-17283152
 ] 

Steven Zhen Wu commented on FLINK-21364:


cc [~sewen] [~jqin] [~thomasWeise]

> piggyback finishedSplitIds in RequestSplitEvent
> ---
>
> Key: FLINK-21364
> URL: https://issues.apache.org/jira/browse/FLINK-21364
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Affects Versions: 1.12.1
>Reporter: Steven Zhen Wu
>Priority: Major
>
> For some split assignment strategy, the enumerator/assigner needs to track 
> the completed splits to advance watermark for event time alignment or rough 
> ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support 
> pass-along of the `finishedSplitIds` info and hence we have to create our own 
> custom source event type for Iceberg source. 
> Here is the proposal of add such optional info to `RequestSplitEvent`.
> ```
> public RequestSplitEvent(
> @Nullable String hostName, 
> @Nullable Collection finishedSplitIds)
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)