date:20230301

[jira] [Updated] (FLINK-29852) Adaptive Scheduler duplicates operators for each parallel instance in the Web UI

2023-03-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/FLINK-29852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Morávek updated FLINK-29852:
--
Summary: Adaptive Scheduler duplicates operators for each parallel instance 
in the Web UI  (was: The operator is repeatedly displayed on the Flink Web UI)

> Adaptive Scheduler duplicates operators for each parallel instance in the Web 
> UI
> 
>
> Key: FLINK-29852
> URL: https://issues.apache.org/jira/browse/FLINK-29852
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Runtime / Web Frontend
>Affects Versions: 1.16.0
> Environment: Flink 1.16.0
>Reporter: JasonLee
>Assignee: Weihua Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2
>
> Attachments: image-2022-11-02-23-57-39-387.png, 
> image-2022-11-09-16-09-44-233.png, image-2022-11-09-17-32-27-377.png
>
>
> All the operators in the DAG are shown repeatedly
> !image-2022-11-02-23-57-39-387.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-29852) The operator is repeatedly displayed on the Flink Web UI

2023-03-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/FLINK-29852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Morávek updated FLINK-29852:
--
Component/s: Runtime / Coordination

> The operator is repeatedly displayed on the Flink Web UI
> 
>
> Key: FLINK-29852
> URL: https://issues.apache.org/jira/browse/FLINK-29852
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Runtime / Web Frontend
>Affects Versions: 1.16.0
> Environment: Flink 1.16.0
>Reporter: JasonLee
>Assignee: Weihua Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2
>
> Attachments: image-2022-11-02-23-57-39-387.png, 
> image-2022-11-09-16-09-44-233.png, image-2022-11-09-17-32-27-377.png
>
>
> All the operators in the DAG are shown repeatedly
> !image-2022-11-02-23-57-39-387.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-29852) The operator is repeatedly displayed on the Flink Web UI

2023-03-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/FLINK-29852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Morávek updated FLINK-29852:
--
Priority: Major  (was: Critical)

> The operator is repeatedly displayed on the Flink Web UI
> 
>
> Key: FLINK-29852
> URL: https://issues.apache.org/jira/browse/FLINK-29852
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.16.0
> Environment: Flink 1.16.0
>Reporter: JasonLee
>Assignee: Weihua Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2
>
> Attachments: image-2022-11-02-23-57-39-387.png, 
> image-2022-11-09-16-09-44-233.png, image-2022-11-09-17-32-27-377.png
>
>
> All the operators in the DAG are shown repeatedly
> !image-2022-11-02-23-57-39-387.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-web] dannycranmer commented on a diff in pull request #611: Flink 1.15.4

2023-03-01 Thread via GitHub



dannycranmer commented on code in PR #611:
URL: https://github.com/apache/flink-web/pull/611#discussion_r1122714090


##
docs/content/posts/2023-02-23-release-1.15.4.md:
##
@@ -0,0 +1,144 @@
+---
+authors:
+- danny: null
+  name: Danny Cranmer
+date: "2023-02-23T17:00:00Z"
+excerpt: The Apache Flink Community is pleased to announce a bug fix release 
for Flink
+  1.15.
+title: Apache Flink 1.15.4 Release Announcement
+---
+
+The Apache Flink Community is pleased to announce the fourth bug fix release 
of the Flink 1.15 series.
+
+This release includes 46 bug fixes, vulnerability fixes, and minor 
improvements for Flink 1.15.

Review Comment:
   > Checking the [JIRA 
list](https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.15.4),
 there're totally 49 issues with 1.15.4 as FixVersion, and 48 are resolved. 
Please check and update the list here.
   
   [FLINK-31272](https://issues.apache.org/jira/browse/FLINK-31272) and 
[FLINK-31286](https://issues.apache.org/jira/browse/FLINK-31286) had been 
merged after the release branch was cut. I have bumped them to 1.15.5. Now I 
see [46](https://issues.apache.org/jira/projects/FLINK/versions/12352526) 
issues in the release again. FYI there is a step in the release process to 
check this later. 
   
   > Related, [FLINK-31133](https://issues.apache.org/jira/browse/FLINK-31133) 
is still not resolved, and we need to confirm whether it should be excluded. 
I've also left a comment on the JIRA issue.
   
   [FLINK-31133](https://issues.apache.org/jira/browse/FLINK-31133) has been 
punted to 1.15.5 already.



##
docs/content/posts/2023-02-23-release-1.15.4.md:
##
@@ -0,0 +1,144 @@
+---
+authors:
+- danny: null
+  name: Danny Cranmer
+date: "2023-02-23T17:00:00Z"
+excerpt: The Apache Flink Community is pleased to announce a bug fix release 
for Flink
+  1.15.
+title: Apache Flink 1.15.4 Release Announcement
+---
+
+The Apache Flink Community is pleased to announce the fourth bug fix release 
of the Flink 1.15 series.
+
+This release includes 46 bug fixes, vulnerability fixes, and minor 
improvements for Flink 1.15.

Review Comment:
   > Checking the [JIRA 
list](https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.15.4),
 there're totally 49 issues with 1.15.4 as FixVersion, and 48 are resolved. 
Please check and update the list here.
   
   [FLINK-31272](https://issues.apache.org/jira/browse/FLINK-31272) and 
[FLINK-31286](https://issues.apache.org/jira/browse/FLINK-31286) had been 
merged after the release branch was cut. I have bumped them to 1.15.5. Now I 
see [46](https://issues.apache.org/jira/projects/FLINK/versions/12352526) 
issues in the release again. FYI there is a step in the release process to 
check this later. 
   
   > Related, [FLINK-31133](https://issues.apache.org/jira/browse/FLINK-31133) 
is still not resolved, and we need to confirm whether it should be excluded. 
I've also left a comment on the JIRA issue.
   
   [FLINK-31133](https://issues.apache.org/jira/browse/FLINK-31133) has been 
punted to 1.15.5 now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-29852) The operator is repeatedly displayed on the Flink Web UI

2023-03-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/FLINK-29852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Morávek updated FLINK-29852:
--
Fix Version/s: 1.17.0
   1.16.2

> The operator is repeatedly displayed on the Flink Web UI
> 
>
> Key: FLINK-29852
> URL: https://issues.apache.org/jira/browse/FLINK-29852
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.16.0
> Environment: Flink 1.16.0
>Reporter: JasonLee
>Assignee: Weihua Hu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2
>
> Attachments: image-2022-11-02-23-57-39-387.png, 
> image-2022-11-09-16-09-44-233.png, image-2022-11-09-17-32-27-377.png
>
>
> All the operators in the DAG are shown repeatedly
> !image-2022-11-02-23-57-39-387.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-web] dannycranmer commented on a diff in pull request #611: Flink 1.15.4

2023-03-01 Thread via GitHub



dannycranmer commented on code in PR #611:
URL: https://github.com/apache/flink-web/pull/611#discussion_r1122714090


##
docs/content/posts/2023-02-23-release-1.15.4.md:
##
@@ -0,0 +1,144 @@
+---
+authors:
+- danny: null
+  name: Danny Cranmer
+date: "2023-02-23T17:00:00Z"
+excerpt: The Apache Flink Community is pleased to announce a bug fix release 
for Flink
+  1.15.
+title: Apache Flink 1.15.4 Release Announcement
+---
+
+The Apache Flink Community is pleased to announce the fourth bug fix release 
of the Flink 1.15 series.
+
+This release includes 46 bug fixes, vulnerability fixes, and minor 
improvements for Flink 1.15.

Review Comment:
   > Checking the [JIRA 
list](https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.15.4),
 there're totally 49 issues with 1.15.4 as FixVersion, and 48 are resolved. 
Please check and update the list here.
   
   [FLINK-31272](https://issues.apache.org/jira/browse/FLINK-31272) and 
[FLINK-31286](https://issues.apache.org/jira/browse/FLINK-31286) had been 
merged after the release branch was cut. I have bumped them to 1.15.5. Now I 
see [46](https://issues.apache.org/jira/projects/FLINK/versions/12352526) 
issues in the release again. FYI there is a step in the release process to 
check this later. 
   
   [FLINK-31133](https://issues.apache.org/jira/browse/FLINK-31133) has been 
punted to 1.15.5 already.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-web] dannycranmer commented on a diff in pull request #611: Flink 1.15.4

2023-03-01 Thread via GitHub



dannycranmer commented on code in PR #611:
URL: https://github.com/apache/flink-web/pull/611#discussion_r1122714090


##
docs/content/posts/2023-02-23-release-1.15.4.md:
##
@@ -0,0 +1,144 @@
+---
+authors:
+- danny: null
+  name: Danny Cranmer
+date: "2023-02-23T17:00:00Z"
+excerpt: The Apache Flink Community is pleased to announce a bug fix release 
for Flink
+  1.15.
+title: Apache Flink 1.15.4 Release Announcement
+---
+
+The Apache Flink Community is pleased to announce the fourth bug fix release 
of the Flink 1.15 series.
+
+This release includes 46 bug fixes, vulnerability fixes, and minor 
improvements for Flink 1.15.

Review Comment:
   > Checking the [JIRA 
list](https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.15.4),
 there're totally 49 issues with 1.15.4 as FixVersion, and 48 are resolved. 
Please check and update the list here.
   
   (FLINK-31272)[https://issues.apache.org/jira/browse/FLINK-31272] and 
(FLINK-31286)[https://issues.apache.org/jira/browse/FLINK-31286] had been 
merged after the release branch was cut. I have bumped them to 1.15.5. Now I 
see [46](https://issues.apache.org/jira/projects/FLINK/versions/12352526) 
issues in the release again. FYI there is a step in the release process to 
check this later. 
   
   (FLINK-31133)[https://issues.apache.org/jira/browse/FLINK-31133] has been 
punted to 1.15.5 already.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #22067: [FLINK-31288][runtime] Disable overdraft buffer for non pipelined result partition

2023-03-01 Thread via GitHub



flinkbot commented on PR #22067:
URL: https://github.com/apache/flink/pull/22067#issuecomment-1451441513

   
   ## CI report:
   
   * dc1a58eb2e7966be6aed8b3cf3d21b047b5b4e53 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-30517) Decrease log output interval while waiting for YARN JobManager be allocated

2023-03-01 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo closed FLINK-30517.
--
Fix Version/s: 1.17.0
   Resolution: Fixed

master(1.18) via 1993ad0c968efa60ad5071e8472f630b247251f0.

release-1.17 via 30e32eaecf926b8a32916f0b614c1a57dc58bf22.

> Decrease log output interval while waiting for YARN JobManager be allocated
> ---
>
> Key: FLINK-30517
> URL: https://issues.apache.org/jira/browse/FLINK-30517
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.16.0
>Reporter: Weihua Hu
>Assignee: Weihua Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
> Attachments: image-2022-12-28-15-56-56-045.png
>
>
> Flink Client will retrieve the application status every 250ms after 
> submitting to YARN. 
> If JobManager does not start in 60 seconds, it will log "Deployment took more 
> than 60 seconds. Please check if the requested resources are available in the 
> YARN cluster" every 250ms. This will lead to too many logs. 
> We can keep the check interval at 250ms, but log the message every 1 minute.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-30517) Decrease log output interval while waiting for YARN JobManager be allocated

2023-03-01 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo reassigned FLINK-30517:
--

Assignee: Weihua Hu

> Decrease log output interval while waiting for YARN JobManager be allocated
> ---
>
> Key: FLINK-30517
> URL: https://issues.apache.org/jira/browse/FLINK-30517
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.16.0
>Reporter: Weihua Hu
>Assignee: Weihua Hu
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-12-28-15-56-56-045.png
>
>
> Flink Client will retrieve the application status every 250ms after 
> submitting to YARN. 
> If JobManager does not start in 60 seconds, it will log "Deployment took more 
> than 60 seconds. Please check if the requested resources are available in the 
> YARN cluster" every 250ms. This will lead to too many logs. 
> We can keep the check interval at 250ms, but log the message every 1 minute.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-29852) The operator is repeatedly displayed on the Flink Web UI

2023-03-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/FLINK-29852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Morávek reassigned FLINK-29852:
-

Assignee: Weihua Hu

> The operator is repeatedly displayed on the Flink Web UI
> 
>
> Key: FLINK-29852
> URL: https://issues.apache.org/jira/browse/FLINK-29852
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.16.0
> Environment: Flink 1.16.0
>Reporter: JasonLee
>Assignee: Weihua Hu
>Priority: Critical
>  Labels: pull-request-available
> Attachments: image-2022-11-02-23-57-39-387.png, 
> image-2022-11-09-16-09-44-233.png, image-2022-11-09-17-32-27-377.png
>
>
> All the operators in the DAG are shown repeatedly
> !image-2022-11-02-23-57-39-387.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31286) Python processes are still alive when shutting down a session cluster directly without stopping the jobs

2023-03-01 Thread Danny Cranmer (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-31286:
--
Fix Version/s: 1.15.5
   (was: 1.15.4)

> Python processes are still alive when shutting down a session cluster 
> directly without stopping the jobs
> 
>
> Key: FLINK-31286
> URL: https://issues.apache.org/jira/browse/FLINK-31286
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2, 1.15.5
>
> Attachments: image-2023-03-02-10-55-34-863.png
>
>
> Reproduce steps:
> 1) start a standalone cluster: ./bin/start_cluster.sh
> 2) submit a PyFlink job which contains Python UDFs
> 3) stop the cluster: ./bin/stop_cluster.sh
> 4) Check if Python process still exists: ps aux | grep -i beam_boot
> !image-2023-03-02-10-55-34-863.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31272) Duplicate operators appear in the StreamGraph for Python DataStream API jobs

2023-03-01 Thread Danny Cranmer (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-31272:
--
Fix Version/s: 1.15.5
   (was: 1.15.4)

> Duplicate operators appear in the StreamGraph for Python DataStream API jobs
> 
>
> Key: FLINK-31272
> URL: https://issues.apache.org/jira/browse/FLINK-31272
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.0
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.16.2, 1.15.5
>
>
> For the following job:
> {code}
> import argparse
> import json
> import sys
> import time
> from typing import Iterable, cast
> from pyflink.common import Types, Time, Encoder
> from pyflink.datastream import StreamExecutionEnvironment, 
> ProcessWindowFunction, EmbeddedRocksDBStateBackend, \
> PredefinedOptions, FileSystemCheckpointStorage, CheckpointingMode, 
> ExternalizedCheckpointCleanup
> from pyflink.datastream.connectors.file_system import FileSink, 
> RollingPolicy, OutputFileConfig
> from pyflink.datastream.state import ReducingState, ReducingStateDescriptor
> from pyflink.datastream.window import TimeWindow, Trigger, TriggerResult, T, 
> TumblingProcessingTimeWindows, \
> ProcessingTimeTrigger
> class CountWithProcessTimeoutTrigger(ProcessingTimeTrigger):
> def __init__(self, window_size: int):
> self._window_size = window_size
> self._count_state_descriptor = ReducingStateDescriptor(
> "count", lambda a, b: a + b, Types.LONG())
> @staticmethod
> def of(window_size: int) -> 'CountWithProcessTimeoutTrigger':
> return CountWithProcessTimeoutTrigger(window_size)
> def on_element(self,
>element: T,
>timestamp: int,
>window: TimeWindow,
>ctx: 'Trigger.TriggerContext') -> TriggerResult:
> count_state = cast(ReducingState, 
> ctx.get_partitioned_state(self._count_state_descriptor))
> count_state.add(1)
> # print("element arrive:", element, "count_state:", 
> count_state.get(), window.max_timestamp(),
> #   ctx.get_current_watermark())
> if count_state.get() >= self._window_size:  # 必须fire
> print("fire element count", element, count_state.get(), 
> window.max_timestamp(),
>   ctx.get_current_watermark())
> count_state.clear()
> return TriggerResult.FIRE_AND_PURGE
> if timestamp >= window.end:
> count_state.clear()
> return TriggerResult.FIRE_AND_PURGE
> else:
> return TriggerResult.CONTINUE
> def on_processing_time(self,
>timestamp: int,
>window: TimeWindow,
>ctx: Trigger.TriggerContext) -> TriggerResult:
> if timestamp >= window.end:
> return TriggerResult.CONTINUE
> else:
> print("fire with process_time:", timestamp)
> count_state = cast(ReducingState, 
> ctx.get_partitioned_state(self._count_state_descriptor))
> count_state.clear()
> return TriggerResult.FIRE_AND_PURGE
> def on_event_time(self,
>   timestamp: int,
>   window: TimeWindow,
>   ctx: 'Trigger.TriggerContext') -> TriggerResult:
> return TriggerResult.CONTINUE
> def clear(self,
>   window: TimeWindow,
>   ctx: 'Trigger.TriggerContext') -> None:
> count_state = ctx.get_partitioned_state(self._count_state_descriptor)
> count_state.clear()
> def to_dict_map(v):
> time.sleep(1)
> dict_value = json.loads(v)
> return dict_value
> def get_group_key(value, keys):
> group_key_values = []
> for key in keys:
> one_key_value = 'null'
> if key in value:
> list_value = value[key]
> if list_value:
> one_key_value = str(list_value[0])
> group_key_values.append(one_key_value)
> group_key = '_'.join(group_key_values)
> # print("group_key=", group_key)
> return group_key
> class CountWindowProcessFunction(ProcessWindowFunction[dict, dict, str, 
> TimeWindow]):
> def __init__(self, uf):
> self._user_function = uf
> def process(self,
> key: str,
> context: ProcessWindowFunction.Context[TimeWindow],
> elements: Iterable[dict]) -> Iterable[dict]:
> result_list = 
> self._user_function.process_after_group_by_function(elements)
> return result_list
> if __name__ == '__main__':
> parser =

[GitHub] [flink] dmvk commented on a diff in pull request #22066: [FLINK-29852][Runtime] Fix AdaptiveScheduler add operator repeatedly in json plan

2023-03-01 Thread via GitHub



dmvk commented on code in PR #22066:
URL: https://github.com/apache/flink/pull/22066#discussion_r1122709873


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/CreatingExecutionGraph.java:
##
@@ -128,10 +129,10 @@ private void handleExecutionGraphCreation(
 () ->
 StreamSupport.stream(
 executionGraph
-
.getAllExecutionVertices()
+
.getVerticesTopologically()
 .spliterator(),
 false)
-.map(v -> 
v.getJobVertex().getJobVertex())
+
.map(ExecutionJobVertex::getJobVertex)
 .iterator(),

Review Comment:
   nit: it would be nice to mark `updatedPlan` as final along-side the above 
change, to make it consistent with rest of the code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] reswqa merged pull request #21569: [FLINK-30517][log]Decrease log output interval while waiting for YARN…

2023-03-01 Thread via GitHub



reswqa merged PR #21569:
URL: https://github.com/apache/flink/pull/21569


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-31288) Disable overdraft buffer for batch shuffle

2023-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31288:
---
Labels: pull-request-available  (was: )

> Disable overdraft buffer for batch shuffle
> --
>
> Key: FLINK-31288
> URL: https://issues.apache.org/jira/browse/FLINK-31288
> Project: Flink
>  Issue Type: Bug
>Reporter: Weijie Guo
>Assignee: Weijie Guo
>Priority: Major
>  Labels: pull-request-available
>
> Only pipelined / pipelined-bounded partition needs overdraft buffer. More 
> specifically, there is no reason to request more buffers for non-pipelined 
> (i.e. batch) shuffle. The reasons are as follows:
>  # For BoundedBlockingShuffle, each full buffer will be directly released.
>  # For SortMergeShuffle, the maximum capacity of buffer pool is 4 * 
> numSubpartitions. It is efficient enough to spill this part of memory to disk.
>  # For Hybrid Shuffle, the buffer pool is unbounded. If it can't get a normal 
> buffer, it also can't get an overdraft buffer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] dmvk commented on a diff in pull request #22066: [FLINK-29852][Runtime] Fix AdaptiveScheduler add operator repeatedly in json plan

2023-03-01 Thread via GitHub



dmvk commented on code in PR #22066:
URL: https://github.com/apache/flink/pull/22066#discussion_r1122707632


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/CreatingExecutionGraph.java:
##
@@ -128,10 +129,10 @@ private void handleExecutionGraphCreation(
 () ->
 StreamSupport.stream(
 executionGraph
-
.getAllExecutionVertices()
+
.getVerticesTopologically()
 .spliterator(),
 false)
-.map(v -> 
v.getJobVertex().getJobVertex())
+
.map(ExecutionJobVertex::getJobVertex)
 .iterator(),

Review Comment:
   Can we reuse `IterableUtils` here to make this slightly more readable?
   
   
   ```
   IterableUtils.toStream(
   executionGraph.getVerticesTopologically())
   .map(ExecutionJobVertex::getJobVertex)
   .iterator(),
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] reswqa opened a new pull request, #22067: [FLINK-31288][runtime] Disable overdraft buffer for non pipelined result partition

2023-03-01 Thread via GitHub



reswqa opened a new pull request, #22067:
URL: https://github.com/apache/flink/pull/22067

   ## What is the purpose of the change
   
   *Only pipelined / pipelined-bounded partition needs overdraft buffer. More 
specifically, there is no reason to request more buffers for non-pipelined 
(i.e. batch) shuffle. The reasons are as follows:*
   
- *For BoundedBlockingShuffle, each full buffer will be directly released.*
- *For SortMergeShuffle, the maximum capacity of buffer pool is 4 * 
numSubpartitions. It is efficient enough to spill this part of memory to disk.*
- *For Hybrid Shuffle, the buffer pool is unbounded. If it can't get a 
normal buffer, it also can't get an overdraft buffer.*
   
   
   ## Brief change log
   
 - *Disable overdraft buffer for non pipelined result partition.*
   
   
   ## Verifying this change
   
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] dmvk commented on a diff in pull request #22066: [FLINK-29852][Runtime] Fix AdaptiveScheduler add operator repeatedly in json plan

2023-03-01 Thread via GitHub



dmvk commented on code in PR #22066:
URL: https://github.com/apache/flink/pull/22066#discussion_r1122707632


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/CreatingExecutionGraph.java:
##
@@ -128,10 +129,10 @@ private void handleExecutionGraphCreation(
 () ->
 StreamSupport.stream(
 executionGraph
-
.getAllExecutionVertices()
+
.getVerticesTopologically()
 .spliterator(),
 false)
-.map(v -> 
v.getJobVertex().getJobVertex())
+
.map(ExecutionJobVertex::getJobVertex)
 .iterator(),

Review Comment:
   Can we reuse `IterableUtils` here to make this slightly more readable?
   
   
   ```
   IterableUtils.toStream(
   
executionGraph.getVerticesTopologically())
   
.map(ExecutionJobVertex::getJobVertex)
   .iterator(),
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-31234) Add an option to redirect stdout/stderr for flink on kubernetes

2023-03-01 Thread Weihua Hu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695527#comment-17695527
 ] 

Weihua Hu commented on FLINK-31234:
---

Hi, [~wangyang0918] [~chesnay] Looking forward to your opinion 

> Add an option to redirect stdout/stderr for flink on kubernetes
> ---
>
> Key: FLINK-31234
> URL: https://issues.apache.org/jira/browse/FLINK-31234
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.17.0
>Reporter: Weihua Hu
>Priority: Major
> Fix For: 1.18.0
>
>
> Flink on Kubernetes does not support redirecting stdout/stderr to files. This 
> is to allow users to get logs via "kubectl logs".
> But for our internal scenario, we use a kubernetes user to submit all jobs to 
> the k8s cluster and provide a platform for users to submit jobs. Users can't 
> access kubernetes directly. so we need to display logs/stdout in flink webui.
> Because the web ui retrieves the stdout file by filename, which has the same 
> prefix as \{taskmanager}.log (such as 
> flink--kubernetes-taskmanager-0-my-first-flink-cluster-taskmanager-1-4.log). 
> We can't support this with a simple custom image.
> IMO, we should add an option for redirecting stdout/stderr to files. When 
> this is enabled.
> 1. flink-console.sh will redirect stdout/err to file.
> 2. flink-console.sh use log4j.properties as log4j configuration to avoid logs 
> both in log file and stdout file.
> Of course, this option is false by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31233) no error should be logged when retrieving the task manager's stdout if it does not exist

2023-03-01 Thread Weihua Hu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695526#comment-17695526
 ] 

Weihua Hu commented on FLINK-31233:
---

Hi, [~wangyang0918] [~chesnay] Looking forward to your opinion 

> no error should be logged  when retrieving the task manager's stdout if it 
> does not exist
> -
>
> Key: FLINK-31233
> URL: https://issues.apache.org/jira/browse/FLINK-31233
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST
>Affects Versions: 1.17.0
>Reporter: Weihua Hu
>Priority: Minor
> Fix For: 1.18.0
>
> Attachments: image-2023-02-27-13-56-40-718.png, 
> image-2023-02-27-13-57-27-190.png
>
>
> When running Flink on Kubernetes, the stdout logs is not redirected to files 
> so it will not shown in WEB UI. This is as expected.
> But It returns “500 Internal error” in REST API and produces an error log in 
> jobmanager.log. This is confusing and misleading.
>  
> I think this API should return “404 Not Found” without any error logs, 
> similar to how jobmanager/stdout works. 
>  
> !image-2023-02-27-13-57-27-190.png!
> !image-2023-02-27-13-56-40-718.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31288) Disable overdraft buffer for batch shuffle

2023-03-01 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-31288:
---
Description: 
Only pipelined / pipelined-bounded partition needs overdraft buffer. More 
specifically, there is no reason to request more buffers for non-pipelined 
(i.e. batch) shuffle. The reasons are as follows:
 # For BoundedBlockingShuffle, each full buffer will be directly released.
 # For SortMergeShuffle, the maximum capacity of buffer pool is 4 * 
numSubpartitions. It is efficient enough to spill this part of memory to disk.
 # For Hybrid Shuffle, the buffer pool is unbounded. If it can't get a normal 
buffer, it also can't get an overdraft buffer.

  was:
Only pipelined / pipelined-bounded partition needs overdraft buffer. More 
specifically, there is no reason to request more buffers for non-pipelined 
(i.e. batch) shuffle. The reasons are as follows:
1. For BoundedBlockingShuffle, each full buffer will be directly released.
2. For SortMergeShuffle, the maximum capacity of buffer pool is 4 * 
numSubpartitions. It is efficient enough to spill this part of memory to disk.
3. For Hybrid Shuffle, the buffer pool is unbounded. If it can't get a normal 
buffer, it also can't get an overdraft buffer.


> Disable overdraft buffer for batch shuffle
> --
>
> Key: FLINK-31288
> URL: https://issues.apache.org/jira/browse/FLINK-31288
> Project: Flink
>  Issue Type: Bug
>Reporter: Weijie Guo
>Assignee: Weijie Guo
>Priority: Major
>
> Only pipelined / pipelined-bounded partition needs overdraft buffer. More 
> specifically, there is no reason to request more buffers for non-pipelined 
> (i.e. batch) shuffle. The reasons are as follows:
>  # For BoundedBlockingShuffle, each full buffer will be directly released.
>  # For SortMergeShuffle, the maximum capacity of buffer pool is 4 * 
> numSubpartitions. It is efficient enough to spill this part of memory to disk.
>  # For Hybrid Shuffle, the buffer pool is unbounded. If it can't get a normal 
> buffer, it also can't get an overdraft buffer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31288) Disable overdraft buffer for batch shuffle

2023-03-01 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-31288:
---
Description: 
Only pipelined / pipelined-bounded partition needs overdraft buffer. More 
specifically, there is no reason to request more buffers for non-pipelined 
(i.e. batch) shuffle. The reasons are as follows:
1. For BoundedBlockingShuffle, each full buffer will be directly released.
2. For SortMergeShuffle, the maximum capacity of buffer pool is 4 * 
numSubpartitions. It is efficient enough to spill this part of memory to disk.
3. For Hybrid Shuffle, the buffer pool is unbounded. If it can't get a normal 
buffer, it also can't get an overdraft buffer.

> Disable overdraft buffer for batch shuffle
> --
>
> Key: FLINK-31288
> URL: https://issues.apache.org/jira/browse/FLINK-31288
> Project: Flink
>  Issue Type: Bug
>Reporter: Weijie Guo
>Assignee: Weijie Guo
>Priority: Major
>
> Only pipelined / pipelined-bounded partition needs overdraft buffer. More 
> specifically, there is no reason to request more buffers for non-pipelined 
> (i.e. batch) shuffle. The reasons are as follows:
> 1. For BoundedBlockingShuffle, each full buffer will be directly released.
> 2. For SortMergeShuffle, the maximum capacity of buffer pool is 4 * 
> numSubpartitions. It is efficient enough to spill this part of memory to disk.
> 3. For Hybrid Shuffle, the buffer pool is unbounded. If it can't get a normal 
> buffer, it also can't get an overdraft buffer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31133) PartiallyFinishedSourcesITCase hangs if a checkpoint fails

2023-03-01 Thread Roman Khachatryan (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695523#comment-17695523
 ] 

Roman Khachatryan commented on FLINK-31133:
---

Thanks [~liyu] , you are right, it will likely not make it into 1.15.4. I'll 
updated the version.

> PartiallyFinishedSourcesITCase hangs if a checkpoint fails
> --
>
> Key: FLINK-31133
> URL: https://issues.apache.org/jira/browse/FLINK-31133
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.3, 1.16.1, 1.18.0, 1.17.1
>Reporter: Matthias Pohl
>Assignee: Roman Khachatryan
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.16.2, 1.18.0, 1.17.1, 1.15.5
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b
> This build ran into a timeout. Based on the stacktraces reported, it was 
> either caused by 
> [SnapshotMigrationTestBase.restoreAndExecute|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=13475]:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7f23d800b800 nid=0x60cdd waiting on 
> condition [0x7f23e1c0d000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.flink.test.checkpointing.utils.SnapshotMigrationTestBase.restoreAndExecute(SnapshotMigrationTestBase.java:382)
>   at 
> org.apache.flink.test.migration.TypeSerializerSnapshotMigrationITCase.testSnapshot(TypeSerializerSnapshotMigrationITCase.java:172)
>   at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source)
> [...]
> {code}
> or 
> [PartiallyFinishedSourcesITCase.test|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=10401]:
> {code}
> 2023-02-20T07:13:05.6084711Z "main" #1 prio=5 os_prio=0 
> tid=0x7fd35c00b800 nid=0x8c8a waiting on condition [0x7fd363d0f000]
> 2023-02-20T07:13:05.6085149Zjava.lang.Thread.State: TIMED_WAITING 
> (sleeping)
> 2023-02-20T07:13:05.6085487Z  at java.lang.Thread.sleep(Native Method)
> 2023-02-20T07:13:05.6085925Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
> 2023-02-20T07:13:05.6086512Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:138)
> 2023-02-20T07:13:05.6087103Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitForSubtasksToFinish(CommonTestUtils.java:291)
> 2023-02-20T07:13:05.6087730Z  at 
> org.apache.flink.runtime.operators.lifecycle.TestJobExecutor.waitForSubtasksToFinish(TestJobExecutor.java:226)
> 2023-02-20T07:13:05.6088410Z  at 
> org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:138)
> 2023-02-20T07:13:05.6088957Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [...]
> {code}
> Still, it sounds odd: Based on a code analysis it's quite unlikely that those 
> two caused the issue. The former one has a 5 min timeout (see related code in 
> [SnapshotMigrationTestBase:382|https://github.com/apache/flink/blob/release-1.15/flink-tests/src/test/java/org/apache/flink/test/checkpointing/utils/SnapshotMigrationTestBase.java#L382]).
>  For the other one, we found it being not responsible in the past when some 
> other concurrent test caused the issue (see FLINK-30261).
> An investigation on where we lose the time for the timeout revealed that 
> {{AdaptiveSchedulerITCase}} took 2980s to finish (see [build 
> logs|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=5265]).
> {code}
> 2023-02-20T03:43:55.4546050Z Feb 20 03:43:55 [ERROR] Picked up 
> JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError
> 2023-02-20T03:43:58.0448506Z Feb 20 03:43:58 [INFO] Running 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> 2023-02-20T04:33:38.6824634Z Feb 20 04:33:38 [INFO] Tests run: 6, Failures: 
> 0, Errors: 0, Skipped: 0, Time elapsed: 2,980.445 s - in 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31133) PartiallyFinishedSourcesITCase hangs if a checkpoint fails

2023-03-01 Thread Roman Khachatryan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-31133:
--
Fix Version/s: 1.15.5
   (was: 1.15.4)

> PartiallyFinishedSourcesITCase hangs if a checkpoint fails
> --
>
> Key: FLINK-31133
> URL: https://issues.apache.org/jira/browse/FLINK-31133
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.3, 1.16.1, 1.18.0, 1.17.1
>Reporter: Matthias Pohl
>Assignee: Roman Khachatryan
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.16.2, 1.18.0, 1.17.1, 1.15.5
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b
> This build ran into a timeout. Based on the stacktraces reported, it was 
> either caused by 
> [SnapshotMigrationTestBase.restoreAndExecute|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=13475]:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7f23d800b800 nid=0x60cdd waiting on 
> condition [0x7f23e1c0d000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.flink.test.checkpointing.utils.SnapshotMigrationTestBase.restoreAndExecute(SnapshotMigrationTestBase.java:382)
>   at 
> org.apache.flink.test.migration.TypeSerializerSnapshotMigrationITCase.testSnapshot(TypeSerializerSnapshotMigrationITCase.java:172)
>   at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source)
> [...]
> {code}
> or 
> [PartiallyFinishedSourcesITCase.test|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=10401]:
> {code}
> 2023-02-20T07:13:05.6084711Z "main" #1 prio=5 os_prio=0 
> tid=0x7fd35c00b800 nid=0x8c8a waiting on condition [0x7fd363d0f000]
> 2023-02-20T07:13:05.6085149Zjava.lang.Thread.State: TIMED_WAITING 
> (sleeping)
> 2023-02-20T07:13:05.6085487Z  at java.lang.Thread.sleep(Native Method)
> 2023-02-20T07:13:05.6085925Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
> 2023-02-20T07:13:05.6086512Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:138)
> 2023-02-20T07:13:05.6087103Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitForSubtasksToFinish(CommonTestUtils.java:291)
> 2023-02-20T07:13:05.6087730Z  at 
> org.apache.flink.runtime.operators.lifecycle.TestJobExecutor.waitForSubtasksToFinish(TestJobExecutor.java:226)
> 2023-02-20T07:13:05.6088410Z  at 
> org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:138)
> 2023-02-20T07:13:05.6088957Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [...]
> {code}
> Still, it sounds odd: Based on a code analysis it's quite unlikely that those 
> two caused the issue. The former one has a 5 min timeout (see related code in 
> [SnapshotMigrationTestBase:382|https://github.com/apache/flink/blob/release-1.15/flink-tests/src/test/java/org/apache/flink/test/checkpointing/utils/SnapshotMigrationTestBase.java#L382]).
>  For the other one, we found it being not responsible in the past when some 
> other concurrent test caused the issue (see FLINK-30261).
> An investigation on where we lose the time for the timeout revealed that 
> {{AdaptiveSchedulerITCase}} took 2980s to finish (see [build 
> logs|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=5265]).
> {code}
> 2023-02-20T03:43:55.4546050Z Feb 20 03:43:55 [ERROR] Picked up 
> JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError
> 2023-02-20T03:43:58.0448506Z Feb 20 03:43:58 [INFO] Running 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> 2023-02-20T04:33:38.6824634Z Feb 20 04:33:38 [INFO] Tests run: 6, Failures: 
> 0, Errors: 0, Skipped: 0, Time elapsed: 2,980.445 s - in 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] ruanhang1993 commented on pull request #22048: [FLINK-31268][metrics] Change not to initialize operator coordinator metric group lazily

2023-03-01 Thread via GitHub



ruanhang1993 commented on PR #22048:
URL: https://github.com/apache/flink/pull/22048#issuecomment-1451412221

   @PatrickRen @zentol Would you like to help to review this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #540: [FLINK-31087] Introduce merge-into action

2023-03-01 Thread via GitHub



JingsongLi commented on code in PR #540:
URL: https://github.com/apache/flink-table-store/pull/540#discussion_r1122671042


##
docs/content/docs/how-to/writing-tables.md:
##
@@ -268,3 +268,146 @@ For more information of 'delete', see
 {{< /tab >}}
 
 {{< /tabs >}}
+
+## Merging into table
+
+Table Store supports "MERGE INTO" via submitting the 'merge-into' job through 
`flink run`. The design referenced such syntax:

Review Comment:
   Document only primary key table is supported, you can refer to 
https://nightlies.apache.org/flink/flink-table-store-docs-master/docs/concepts/primary-key-table/



##
docs/content/docs/how-to/writing-tables.md:
##
@@ -268,3 +268,146 @@ For more information of 'delete', see
 {{< /tab >}}
 
 {{< /tabs >}}
+
+## Merging into table
+
+Table Store supports "MERGE INTO" via submitting the 'merge-into' job through 
`flink run`. The design referenced such syntax:
+```sql
+MERGE INTO target-table
+  USING source-table | source-expr AS source-alias
+  ON merge-condition
+  WHEN MATCHED [AND matched-condition]
+THEN UPDATE SET xxx
+  WHEN MATCHED [AND matched-condition]
+THEN DELETE
+  WHEN NOT MATCHED [AND not-matched-condition]
+THEN INSERT VALUES (xxx)
+  WHEN NOT MATCHED BY SOURCE [AND not-matched-by-source-condition]
+THEN UPDATE SET xxx
+  WHEN NOT MATCHED BY SOURCE [AND not-matched-by-source-condition]
+THEN DELETE
+```
+
+{{< tabs "merge-into" >}}
+
+{{< tab "Flink Job" >}}
+
+Run the following command to submit a 'merge-into' job for the table.
+
+```bash
+/bin/flink run \
+-c org.apache.flink.table.store.connector.action.FlinkActions \
+/path/to/flink-table-store-flink-**-{{< version >}}.jar \
+merge-into \
+--warehouse  \
+--database  \
+--table 
+--using-table 
+--on 
+--merge-actions 

+--matched-upsert-condition 
+--matched-upsert-set 
+--matched-delete-condition 
+--not-matched-insert-condition 
+--not-matched-insert-values 
+--not-matched-by-source-upsert-condition 
+--not-matched-by-source-upsert-set 
+--not-matched-by-source-delete-condition 
+
+An example:
+-- Target table T (pk (k, dt)) is:
+++--+-+---+
+|  k |v | last_action |dt |
+++--+-+---+
+|  1 |  v_1 |creation | 02-27 |
+|  2 |  v_2 |creation | 02-27 |
+|  3 |  v_3 |creation | 02-27 |
+|  4 |  v_4 |creation | 02-27 |
+|  5 |  v_5 |creation | 02-28 |
+|  6 |  v_6 |creation | 02-28 |
+|  7 |  v_7 |creation | 02-28 |
+|  8 |  v_8 |creation | 02-28 |
+|  9 |  v_9 |creation | 02-28 |
+| 10 | v_10 |creation | 02-28 |
+++--+-+---+
+
+-- Source table S is:
++++---+
+|  k |  v |dt |
++++---+
+|  1 |v_1 | 02-27 |
+|  4 |  | 02-27 |
+|  7 |  Seven | 02-28 |
+|  8 |  | 02-28 |
+|  8 |v_8 | 02-29 |
+| 11 |   v_11 | 02-29 |
+| 12 |   v_12 | 02-29 |
++++---+
+
+-- Supposed SQL is:
+MERGE INTO T

Review Comment:
   Can we provide a simpler example?



##
flink-table-store-common/src/main/java/org/apache/flink/table/store/types/DataTypeCasts.java:
##
@@ -180,6 +201,32 @@ public static boolean supportsExplicitCast(DataType 
sourceType, DataType targetT
 return supportsCasting(sourceType, targetType, true);
 }
 
+/**
+ * Returns whether the source type can be compatibly cast to the target 
type.
+ *
+ * If two types are compatible, they should have the same underlying 
data structure. For
+ * example, {@link CharType} and {@link VarCharType} are both in the {@link
+ * DataTypeFamily#CHARACTER_STRING} family, meaning they both represent a 
character string. But
+ * the rest types are only compatible with themselves. For example, 
although {@link IntType} and
+ * {@link BigIntType} are both in the {@link DataTypeFamily#NUMERIC} 
family, they are not
+ * compatible because IntType represents a 4-byte signed integer while 
BigIntType represents an
+ * 8-byte signed integer. Especially, two {@link DecimalType}s are 
compatible only when they
+ * have the same {@code precision} and {@code scale}.
+ */
+public static boolean supportsCompatibleCast(DataType sourceType, DataType 
targetType) {
+if (sourceType.isNullable() && !targetType.isNullable()) {

Review Comment:
   Remove this? Not null check is very difficult to use.



##
docs/content/docs/how-to/writing-tables.md:
##
@@ -268,3 +268,146 @@ For more information of 'delete', see
 {{< /tab >}}
 
 {{< /tabs >}}
+
+## Merging into table
+
+Table Store supports "MERGE INTO" via submitting the 'merge-into' job through 
`flink run`. The design referenced such syntax:
+```sql
+MERGE INTO target-table
+  USING source-table | source-expr AS source-alias
+  ON merge-condition
+  WHEN MATCHED [AND matched-condition]
+THEN UPDATE SET xxx
+  WHEN MATCHED [AND matched-condition]
+

[GitHub] [flink-web] carp84 commented on a diff in pull request #611: Flink 1.15.4

2023-03-01 Thread via GitHub



carp84 commented on code in PR #611:
URL: https://github.com/apache/flink-web/pull/611#discussion_r1122666900


##
docs/content/posts/2023-02-23-release-1.15.4.md:
##
@@ -0,0 +1,144 @@
+---
+authors:
+- danny: null
+  name: Danny Cranmer
+date: "2023-02-23T17:00:00Z"
+excerpt: The Apache Flink Community is pleased to announce a bug fix release 
for Flink
+  1.15.
+title: Apache Flink 1.15.4 Release Announcement
+---
+
+The Apache Flink Community is pleased to announce the fourth bug fix release 
of the Flink 1.15 series.
+
+This release includes 46 bug fixes, vulnerability fixes, and minor 
improvements for Flink 1.15.

Review Comment:
   Checking the [JIRA 
list](https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.15.4),
 there're totally 49 issues with 1.15.4 as FixVersion, and 48 are resolved. 
Please check and update the list here.
   
   Related, FLINK-31133 is still not resolved, and we need to confirm whether 
it should be excluded. I've also left a comment on the JIRA issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-31133) PartiallyFinishedSourcesITCase hangs if a checkpoint fails

2023-03-01 Thread Yu Li (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695501#comment-17695501
 ] 

Yu Li commented on FLINK-31133:
---

The 1.15.4 version is about to release with [RC under 
vote|https://lists.apache.org/thread/4463cypc257l7j9rj2pycofbsdbbjx59]. Please 
check and confirm whether this issue could still make into 1.15.4 and move it 
out if not. Thanks.

> PartiallyFinishedSourcesITCase hangs if a checkpoint fails
> --
>
> Key: FLINK-31133
> URL: https://issues.apache.org/jira/browse/FLINK-31133
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.3, 1.16.1, 1.18.0, 1.17.1
>Reporter: Matthias Pohl
>Assignee: Roman Khachatryan
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.15.4, 1.16.2, 1.18.0, 1.17.1
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b
> This build ran into a timeout. Based on the stacktraces reported, it was 
> either caused by 
> [SnapshotMigrationTestBase.restoreAndExecute|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=13475]:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7f23d800b800 nid=0x60cdd waiting on 
> condition [0x7f23e1c0d000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.flink.test.checkpointing.utils.SnapshotMigrationTestBase.restoreAndExecute(SnapshotMigrationTestBase.java:382)
>   at 
> org.apache.flink.test.migration.TypeSerializerSnapshotMigrationITCase.testSnapshot(TypeSerializerSnapshotMigrationITCase.java:172)
>   at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source)
> [...]
> {code}
> or 
> [PartiallyFinishedSourcesITCase.test|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=10401]:
> {code}
> 2023-02-20T07:13:05.6084711Z "main" #1 prio=5 os_prio=0 
> tid=0x7fd35c00b800 nid=0x8c8a waiting on condition [0x7fd363d0f000]
> 2023-02-20T07:13:05.6085149Zjava.lang.Thread.State: TIMED_WAITING 
> (sleeping)
> 2023-02-20T07:13:05.6085487Z  at java.lang.Thread.sleep(Native Method)
> 2023-02-20T07:13:05.6085925Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
> 2023-02-20T07:13:05.6086512Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:138)
> 2023-02-20T07:13:05.6087103Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitForSubtasksToFinish(CommonTestUtils.java:291)
> 2023-02-20T07:13:05.6087730Z  at 
> org.apache.flink.runtime.operators.lifecycle.TestJobExecutor.waitForSubtasksToFinish(TestJobExecutor.java:226)
> 2023-02-20T07:13:05.6088410Z  at 
> org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:138)
> 2023-02-20T07:13:05.6088957Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [...]
> {code}
> Still, it sounds odd: Based on a code analysis it's quite unlikely that those 
> two caused the issue. The former one has a 5 min timeout (see related code in 
> [SnapshotMigrationTestBase:382|https://github.com/apache/flink/blob/release-1.15/flink-tests/src/test/java/org/apache/flink/test/checkpointing/utils/SnapshotMigrationTestBase.java#L382]).
>  For the other one, we found it being not responsible in the past when some 
> other concurrent test caused the issue (see FLINK-30261).
> An investigation on where we lose the time for the timeout revealed that 
> {{AdaptiveSchedulerITCase}} took 2980s to finish (see [build 
> logs|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=5265]).
> {code}
> 2023-02-20T03:43:55.4546050Z Feb 20 03:43:55 [ERROR] Picked up 
> JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError
> 2023-02-20T03:43:58.0448506Z Feb 20 03:43:58 [INFO] Running 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> 2023-02-20T04:33:38.6824634Z Feb 20 04:33:38 [INFO] Tests run: 6, Failures: 
> 0, Errors: 0, Skipped: 0, Time elapsed: 2,980.445 s - in 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-table-store] yuzelin commented on a diff in pull request #563: [FLINK-31252] Improve StaticFileStoreSplitEnumerator to assign batch splits

2023-03-01 Thread via GitHub



yuzelin commented on code in PR #563:
URL: https://github.com/apache/flink-table-store/pull/563#discussion_r1122663315


##
flink-table-store-flink/flink-table-store-flink-common/src/main/java/org/apache/flink/table/store/connector/source/StaticFileStoreSplitEnumerator.java:
##
@@ -61,17 +75,27 @@ public void handleSplitRequest(int subtask, @Nullable 
String hostname) {
 return;
 }
 
-FileStoreSourceSplit split = splits.poll();
-if (split != null) {
-context.assignSplit(split, subtask);
+// The following batch assignment operation is for two things:
+// 1. It can be evenly distributed during batch reading to avoid 
scheduling problems (for
+// example, the current resource can only schedule part of the tasks) 
that cause some tasks
+// to fail to read data.
+// 2. Read with limit, if split is assigned one by one, it may cause 
the task to repeatedly
+// create SplitFetchers. After the task is created, it is found that 
it is idle and then
+// closed. Then, new split coming, it will create SplitFetcher and 
repeatedly read the data
+// of the limit number (the limit status is in the SplitFetcher).
+List splits = 
pendingSplitAssignment.remove(subtask);
+if (splits != null && splits.size() > 0) {
+context.assignSplits(new 
SplitsAssignment<>(Collections.singletonMap(subtask, splits)));

Review Comment:
   Comment:
   The following batch assignment operation is for two purposes:
   1. To distribute splits evenly when batch reading to avoid failure of some 
tasks to read data caused by scheduling problems (for example, the current 
resource can only schedule part of the tasks).
   2. Optimize limit reading. In limit reading, the task will repeatedly create 
SplitFetcher to read the data of the limit number for each coming split (the 
limit status is in the SplitFetcher). So if the splits are divided too small, 
the task will cost more time on creating SplitFetcher and reading data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-31252) Improve StaticFileStoreSplitEnumerator to assign batch splits

2023-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31252:
---
Labels: pull-request-available  (was: )

> Improve StaticFileStoreSplitEnumerator to assign batch splits
> -
>
> Key: FLINK-31252
> URL: https://issues.apache.org/jira/browse/FLINK-31252
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>
> The following batch assignment operation is for two things:
> 1. It can be evenly distributed during batch reading to avoid scheduling 
> problems (for example, the current resource can only schedule part of the 
> tasks) that cause some tasks to fail to read data.
> 2. Read with limit, if split is assigned one by one, it may cause the task to 
> repeatedly create SplitFetchers. After the task is created, it is found that 
> it is idle and then closed. Then, new split coming, it will create 
> SplitFetcher and repeatedly read the data of the limit number (the limit 
> status is in the SplitFetcher).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-table-store] yuzelin commented on a diff in pull request #563: [FLINK-31252] Improve StaticFileStoreSplitEnumerator to assign batch splits

2023-03-01 Thread via GitHub



yuzelin commented on code in PR #563:
URL: https://github.com/apache/flink-table-store/pull/563#discussion_r1122663315


##
flink-table-store-flink/flink-table-store-flink-common/src/main/java/org/apache/flink/table/store/connector/source/StaticFileStoreSplitEnumerator.java:
##
@@ -61,17 +75,27 @@ public void handleSplitRequest(int subtask, @Nullable 
String hostname) {
 return;
 }
 
-FileStoreSourceSplit split = splits.poll();
-if (split != null) {
-context.assignSplit(split, subtask);
+// The following batch assignment operation is for two things:
+// 1. It can be evenly distributed during batch reading to avoid 
scheduling problems (for
+// example, the current resource can only schedule part of the tasks) 
that cause some tasks
+// to fail to read data.
+// 2. Read with limit, if split is assigned one by one, it may cause 
the task to repeatedly
+// create SplitFetchers. After the task is created, it is found that 
it is idle and then
+// closed. Then, new split coming, it will create SplitFetcher and 
repeatedly read the data
+// of the limit number (the limit status is in the SplitFetcher).
+List splits = 
pendingSplitAssignment.remove(subtask);
+if (splits != null && splits.size() > 0) {
+context.assignSplits(new 
SplitsAssignment<>(Collections.singletonMap(subtask, splits)));

Review Comment:
   Comment:
   The following batch assignment operation is for two purposes:
   1. To distribute splits evenly when batch reading to avoid failure of some 
tasks to read data caused by scheduling problems (for example, the current 
resource can only schedule part of the tasks).
   2. Optimize limit reading. In limit reading, the task will repeatedly create 
SplitFetcher to read the data of the limit number (the limit status is in the 
SplitFetcher). So if the splits are assigned one by one, it will cost more time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #22066: [FLINK-29852][Runtime] Fix AdaptiveScheduler add operator repeatedly in json plan

2023-03-01 Thread via GitHub



flinkbot commented on PR #22066:
URL: https://github.com/apache/flink/pull/22066#issuecomment-1451378565

   
   ## CI report:
   
   * 4c27324d51a07484182bbca3e54c4c0d20a6b9f1 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-29852) The operator is repeatedly displayed on the Flink Web UI

2023-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-29852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-29852:
---
Labels: pull-request-available  (was: )

> The operator is repeatedly displayed on the Flink Web UI
> 
>
> Key: FLINK-29852
> URL: https://issues.apache.org/jira/browse/FLINK-29852
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.16.0
> Environment: Flink 1.16.0
>Reporter: JasonLee
>Priority: Critical
>  Labels: pull-request-available
> Attachments: image-2022-11-02-23-57-39-387.png, 
> image-2022-11-09-16-09-44-233.png, image-2022-11-09-17-32-27-377.png
>
>
> All the operators in the DAG are shown repeatedly
> !image-2022-11-02-23-57-39-387.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] huwh opened a new pull request, #22066: [FLINK-29852][Runtime] Fix AdaptiveScheduler add operator repeatedly in json plan

2023-03-01 Thread via GitHub



huwh opened a new pull request, #22066:
URL: https://github.com/apache/flink/pull/22066

   ## What is the purpose of the change
   adaptive scheduler will update the execution graph when parallelism changed.
   It add all execution vertices to the json plan which should only add job 
vertices.
   This cause the operator is repeatedly displayed on the Flink Web UI. 
   
   
   ## Brief change log
 - *Use JobVertices instead ExecutionVertices when generate new json plan*
   
   
   ## Verifying this change
   This change added tests and can be verified as follows:
 - *Enrich the unit test to verify operator in json plan is not repeat*
 - *Manually verified the change by running adaptive scheduler with 
parallelism = 4*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-31286) Python processes are still alive when shutting down a session cluster directly without stopping the jobs

2023-03-01 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-31286.
---
Fix Version/s: 1.17.0
   1.15.4
   1.16.2
   Resolution: Fixed

Fixed in:
- master: ab4e85e8bda51088cf64d5ddfb9bc0dab1c6e1fd
- 1.17: ecec13a1cdf6622b8f0257f35c24d597f9956f41
- 1.16: ed47440231a75e5de50038919a21a1e914458baa
- 1.15: ada67c0c1bd5b51a2c7cf10624a0e4f2870a9cc5

> Python processes are still alive when shutting down a session cluster 
> directly without stopping the jobs
> 
>
> Key: FLINK-31286
> URL: https://issues.apache.org/jira/browse/FLINK-31286
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.15.4, 1.16.2
>
> Attachments: image-2023-03-02-10-55-34-863.png
>
>
> Reproduce steps:
> 1) start a standalone cluster: ./bin/start_cluster.sh
> 2) submit a PyFlink job which contains Python UDFs
> 3) stop the cluster: ./bin/stop_cluster.sh
> 4) Check if Python process still exists: ps aux | grep -i beam_boot
> !image-2023-03-02-10-55-34-863.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31293) Request memory segment from LocalBufferPool may hanging forever.

2023-03-01 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-31293:
---
Description: 
In our TPC-DS test, we found that in the case of fierce competition in network 
memory, some tasks may hanging forever.

>From the thread dump information, we can see that the task is waiting for the 
>{{LocalBufferPool}} to become available. It is strange that other tasks have 
>finished and released network memory already. Undoubtedly, this is an 
>unexpected behavior, which implies that there must be a bug in the 
>{{LocalBufferPool}} or {{{}NetworkBufferPool{}}}.

!image-2023-03-02-12-23-50-572.png|width=650,height=153!

By dumping the heap memory, we can find a strange phenomenon that there are 
available buffers in the {{{}LocalBufferPool{}}}, but it was considered to be 
un-available. Another thing to note is that it now holds an overdraft buffer.

!image-2023-03-02-12-28-48-437.png|width=520,height=200!

!image-2023-03-02-12-29-03-003.png|width=438,height=84!

TL;DR: This problem occurred in multi-thread race related to the introduction 
of overdraft buffer.

Suppose we have two threads, called A and B. For simplicity, 
{{LocalBufferPool}} is called {{LocalPool}} and {{NetworkBufferPool}} is called 
{{{}GlobalPool{}}}.

Thread A continuously request buffers blocking from the {{{}LocalPool{}}}.
Thread B continuously return buffers to {{{}GlobalPool{}}}.
 # If thread A takes the last available buffer of {{{}LocalPool{}}}, but 
{{GlobalPool}} does not have a buffer at this time, it will register a callback 
function with {{{}GlobalPool{}}}.
 # Thread B returns one buffer to {{{}GlobalPool{}}}, but has not started to 
trigger the callback.
 # Thread A continues to request buffer. Because the 
{{availableMemorySegments}} of {{LocalPool}} is empty, it requests the 
overdraftBuffer instead. But there is already a buffer in the 
{{{}GlobalPool{}}}, it successfully gets the buffer.
 # Thread B triggers the callback. Since there is no buffer in {{GlobalPool}} 
now, the callback is re-registered.
 # Thread A continues to request buffer. Because there is no buffer in 
{{{}GlobalPool{}}}, it will block on {{{}CompletableFuture# get{}}}.
 # Thread B continues to return a buffer and triggers the recently registered 
callback. As a result, {{LocalPool}} puts the buffer into 
{{{}availableMemorySegments{}}}. Unfortunately, the current logic of 
{{shouldBeAvailable}} method is: if there is an overdraft buffer, {{LocalPool}} 
is considered as un-available.
 # Thread A will keep blocking forever.

  was:
In our TPC-DS test, we found that in the case of fierce competition in network 
memory, some tasks may hanging forever.

>From the thread dump information, we can see that the task is waiting for the 
>{{LocalBufferPool}} to become available. It is strange that other tasks have 
>finished and released network memory already. Undoubtedly, this is an 
>unexpected behavior, which implies that there must be a bug in the 
>{{LocalBufferPool}} or {{{}NetworkBufferPool{}}}.

!image-2023-03-02-12-23-50-572.png|width=650,height=153!

By dumping the heap memory, we can find a strange phenomenon that there are 
available buffers in the {{{}LocalBufferPool{}}}, but it was considered to be 
un-available. Another thing to note is that it now holds an overdraft buffer.

!image-2023-03-02-12-28-48-437.png|width=520,height=200!

!image-2023-03-02-12-29-03-003.png|width=438,height=84!

TL;DR: This problem occurred in multi-thread race related to the introduction 
of overdraft buffer.

Suppose we have two threads, called A and B. For simplicity, 
{{LocalBufferPool}} is called {{LocalPool}} and {{NetworkBufferPool}} is called 
{{{}GlobalPool{}}}.

Thread A continuously request buffers blocking from the {{{}LocalPool{}}}.
Thread B continuously return buffers to {{{}GlobalPool{}}}.
 # If thread A takes the last available buffer of {{{}LocalPool{}}}, but 
{{GlobalPool}} does not have a buffer at this time, it will register a callback 
function with {{{}GlobalPool{}}}.
 #  Thread B returns one buffer to {{{}GlobalPool{}}}, but has not started to 
trigger the callback.
 # Thread A continues to request buffer. Because the 
{{availableMemorySegments}} of {{LocalPool}} is empty, it requests the 
overdraftBuffer instead. But there is already a buffer in the 
{{{}GlobalPool{}}}, it successfully gets the buffer.
 # Thread B triggers the callback. Since there is no buffer in {{GlobalPool}} 
now, the callback is re-registered.
 # Thread A continues to request buffer. Because there is no buffer in 
{{{}GlobalPool{}}}, it will block on {{{}CompletableFuture# get{}}}.
 # Thread B continues to return a buffer and triggers the recently registered 
callback. As a result, {{LocalPool}} puts the buffer into 
{{{}availableMemorySegments{}}}. Unfortunately, the current logic of 
{{shouldBeAvailable}} method is: if there is an overdraft buffer,

[jira] [Updated] (FLINK-31293) Request memory segment from LocalBufferPool may hanging forever.

2023-03-01 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-31293:
---
Description: 
In our TPC-DS test, we found that in the case of fierce competition in network 
memory, some tasks may hanging forever.

>From the thread dump information, we can see that the task is waiting for the 
>{{LocalBufferPool}} to become available. It is strange that other tasks have 
>finished and released network memory already. Undoubtedly, this is an 
>unexpected behavior, which implies that there must be a bug in the 
>{{LocalBufferPool}} or {{{}NetworkBufferPool{}}}.

!image-2023-03-02-12-23-50-572.png|width=650,height=153!

By dumping the heap memory, we can find a strange phenomenon that there are 
available buffers in the {{{}LocalBufferPool{}}}, but it was considered to be 
un-available. Another thing to note is that it now holds an overdraft buffer.

!image-2023-03-02-12-28-48-437.png|width=520,height=200!

!image-2023-03-02-12-29-03-003.png|width=438,height=84!

TL;DR: This problem occurred in multi-thread race related to the introduction 
of overdraft buffer.

Suppose we have two threads, called A and B. For simplicity, 
{{LocalBufferPool}} is called {{LocalPool}} and {{NetworkBufferPool}} is called 
{{{}GlobalPool{}}}.

Thread A continuously request buffers blocking from the {{{}LocalPool{}}}.
Thread B continuously return buffers to {{{}GlobalPool{}}}.
 # If thread A takes the last available buffer of {{{}LocalPool{}}}, but 
{{GlobalPool}} does not have a buffer at this time, it will register a callback 
function with {{{}GlobalPool{}}}.
 # Thread B returns one buffer to {{{}GlobalPool{}}}, but has not started to 
trigger the callback.
 # Thread A continues to request buffer. Because the 
{{availableMemorySegments}} of {{LocalPool}} is empty, it requests the 
overdraftBuffer instead. But there is already a buffer in the 
{{{}GlobalPool{}}}, it successfully gets the buffer.
 # Thread B triggers the callback. Since there is no buffer in {{GlobalPool}} 
now, the callback is re-registered.
 # Thread A continues to request buffer. Because there is no buffer in 
{{{}GlobalPool{}}}, it will block on {{{}CompletableFuture#get{}}}.
 # Thread B continues to return a buffer and triggers the recently registered 
callback. As a result, {{LocalPool}} puts the buffer into 
{{{}availableMemorySegments{}}}. Unfortunately, the current logic of 
{{shouldBeAvailable}} method is: if there is an overdraft buffer, {{LocalPool}} 
is considered as un-available.
 # Thread A will keep blocking forever.

  was:
In our TPC-DS test, we found that in the case of fierce competition in network 
memory, some tasks may hanging forever.

>From the thread dump information, we can see that the task is waiting for the 
>{{LocalBufferPool}} to become available. It is strange that other tasks have 
>finished and released network memory already. Undoubtedly, this is an 
>unexpected behavior, which implies that there must be a bug in the 
>{{LocalBufferPool}} or {{{}NetworkBufferPool{}}}.

!image-2023-03-02-12-23-50-572.png|width=650,height=153!

By dumping the heap memory, we can find a strange phenomenon that there are 
available buffers in the {{{}LocalBufferPool{}}}, but it was considered to be 
un-available. Another thing to note is that it now holds an overdraft buffer.

!image-2023-03-02-12-28-48-437.png|width=520,height=200!

!image-2023-03-02-12-29-03-003.png|width=438,height=84!

TL;DR: This problem occurred in multi-thread race related to the introduction 
of overdraft buffer.

Suppose we have two threads, called A and B. For simplicity, 
{{LocalBufferPool}} is called {{LocalPool}} and {{NetworkBufferPool}} is called 
{{{}GlobalPool{}}}.

Thread A continuously request buffers blocking from the {{{}LocalPool{}}}.
Thread B continuously return buffers to {{{}GlobalPool{}}}.
 # If thread A takes the last available buffer of {{{}LocalPool{}}}, but 
{{GlobalPool}} does not have a buffer at this time, it will register a callback 
function with {{{}GlobalPool{}}}.
 # Thread B returns one buffer to {{{}GlobalPool{}}}, but has not started to 
trigger the callback.
 # Thread A continues to request buffer. Because the 
{{availableMemorySegments}} of {{LocalPool}} is empty, it requests the 
overdraftBuffer instead. But there is already a buffer in the 
{{{}GlobalPool{}}}, it successfully gets the buffer.
 # Thread B triggers the callback. Since there is no buffer in {{GlobalPool}} 
now, the callback is re-registered.
 # Thread A continues to request buffer. Because there is no buffer in 
{{{}GlobalPool{}}}, it will block on {{{}CompletableFuture# get{}}}.
 # Thread B continues to return a buffer and triggers the recently registered 
callback. As a result, {{LocalPool}} puts the buffer into 
{{{}availableMemorySegments{}}}. Unfortunately, the current logic of 
{{shouldBeAvailable}} method is: if there is an overdraft buffer,

[jira] [Updated] (FLINK-31293) Request memory segment from LocalBufferPool may hanging forever.

2023-03-01 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-31293:
---
Description: 
In our TPC-DS test, we found that in the case of fierce competition in network 
memory, some tasks may hanging forever.

>From the thread dump information, we can see that the task is waiting for the 
>{{LocalBufferPool}} to become available. It is strange that other tasks have 
>finished and released network memory already. Undoubtedly, this is an 
>unexpected behavior, which implies that there must be a bug in the 
>{{LocalBufferPool}} or {{{}NetworkBufferPool{}}}.

!image-2023-03-02-12-23-50-572.png|width=650,height=153!

By dumping the heap memory, we can find a strange phenomenon that there are 
available buffers in the {{{}LocalBufferPool{}}}, but it was considered to be 
un-available. Another thing to note is that it now holds an overdraft buffer.

!image-2023-03-02-12-28-48-437.png|width=520,height=200!

!image-2023-03-02-12-29-03-003.png|width=438,height=84!

TL;DR: This problem occurred in multi-thread race related to the introduction 
of overdraft buffer.

Suppose we have two threads, called A and B. For simplicity, 
{{LocalBufferPool}} is called {{LocalPool}} and {{NetworkBufferPool}} is called 
{{{}GlobalPool{}}}.

Thread A continuously request buffers blocking from the {{{}LocalPool{}}}.
Thread B continuously return buffers to {{{}GlobalPool{}}}.
 # If thread A takes the last available buffer of {{{}LocalPool{}}}, but 
{{GlobalPool}} does not have a buffer at this time, it will register a callback 
function with {{{}GlobalPool{}}}.
 #  Thread B returns one buffer to {{{}GlobalPool{}}}, but has not started to 
trigger the callback.
 # Thread A continues to request buffer. Because the 
{{availableMemorySegments}} of {{LocalPool}} is empty, it requests the 
overdraftBuffer instead. But there is already a buffer in the 
{{{}GlobalPool{}}}, it successfully gets the buffer.
 # Thread B triggers the callback. Since there is no buffer in {{GlobalPool}} 
now, the callback is re-registered.
 # Thread A continues to request buffer. Because there is no buffer in 
{{{}GlobalPool{}}}, it will block on {{{}CompletableFuture# get{}}}.
 # Thread B continues to return a buffer and triggers the recently registered 
callback. As a result, {{LocalPool}} puts the buffer into 
{{{}availableMemorySegments{}}}. Unfortunately, the current logic of 
{{shouldBeAvailable}} method is: if there is an overdraft buffer, 
\{{LocalPool}} is considered as un-available.
 # Thread A will keep blocking forever.

  was:
In our TPC-DS test, we found that in the case of fierce competition in network 
memory, some tasks may hanging forever.

>From the thread dump information, we can see that the task is waiting for the 
>{{LocalBufferPool}} to become available. It is strange that other tasks have 
>finished and released network memory already. Undoubtedly, this is an 
>unexpected behavior, which implies that there must be a bug in the 
>{{LocalBufferPool}} or {{{}NetworkBufferPool{}}}.

!image-2023-03-02-12-23-50-572.png|width=650,height=153!

By dumping the heap memory, we can find a strange phenomenon that there are 
available buffers in the {{{}LocalBufferPool{}}}, but it was considered to be 
un-available. Another thing to note is that it now holds an overdraft buffer.

!image-2023-03-02-12-28-48-437.png|width=520,height=200!

!image-2023-03-02-12-29-03-003.png|width=438,height=84!

TL;DR: This problem occurred in multi-thread race related to the introduction 
of overdraft buffer.

Suppose we have two threads, called A and B. For simplicity, 
{{LocalBufferPool}} is called {{LocalPool}} and {{NetworkBufferPool}} is called 
{{{}GlobalPool{}}}.

Thread A continuously request buffers blocking from the \{{LocalPool}}.
Thread B continuously return buffers to \{{GlobalPool}}.


1.  If thread A takes the last available buffer of {{{}LocalPool{}}}, but 
{{GlobalPool}} does not have a buffer at this time, it will register a callback 
function with {{{}GlobalPool{}}}.
2. Thread B returns one buffer to {{{}GlobalPool{}}}, but has not started to 
trigger the callback.
3. Thread A continues to request buffer. Because the 
{{availableMemorySegments}} of {{LocalPool}} is empty, it requests the 
overdraftBuffer instead. But there is already a buffer in the 
{{{}GlobalPool{}}}, it successfully gets the buffer.
4. Thread B triggers the callback. Since there is no buffer in {{GlobalPool}} 
now, the callback is re-registered.
5. Thread B continues to return a buffer and triggers the last callback. 
LocalPool puts the buffer into availableMemorySegments. Because the current 
logic of the shouldBeAvailable method is: if there is an overflow buffer, 
LocalPool is not available.


> Request memory segment from LocalBufferPool may hanging forever.
> 
>
> Key:

[GitHub] [flink] dianfu closed pull request #22065: [FLINK-31286][python] Make sure Python processes are cleaned up when TaskManager crashes

2023-03-01 Thread via GitHub



dianfu closed pull request #22065: [FLINK-31286][python] Make sure Python 
processes are cleaned up when TaskManager crashes
URL: https://github.com/apache/flink/pull/22065


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-31296) Add JoinConditionEqualityTransferRule to stream optimizer

2023-03-01 Thread Aitozi (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695491#comment-17695491
 ] 

Aitozi commented on FLINK-31296:


But this will affect the join key pair in the Join operator which may affect 
the state compatibility, I'm not sure how to handle this implicit breaking

> Add JoinConditionEqualityTransferRule to stream optimizer
> -
>
> Key: FLINK-31296
> URL: https://issues.apache.org/jira/browse/FLINK-31296
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Reporter: Aitozi
>Priority: Major
>
> I find that {{JoinConditionEqualityTransferRule}} is a common rule for batch 
> and stream mode. So it should be added to the stream optimizer which will 
> bring performance improvement in some case.
> Maybe, other rules also need to be reviewed whether can be aligned in batch 
> and stream case.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31293) Request memory segment from LocalBufferPool may hanging forever.

2023-03-01 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-31293:
---
Description: 
In our TPC-DS test, we found that in the case of fierce competition in network 
memory, some tasks may hanging forever.

>From the thread dump information, we can see that the task is waiting for the 
>{{LocalBufferPool}} to become available. It is strange that other tasks have 
>finished and released network memory already. Undoubtedly, this is an 
>unexpected behavior, which implies that there must be a bug in the 
>{{LocalBufferPool}} or {{{}NetworkBufferPool{}}}.

!image-2023-03-02-12-23-50-572.png|width=650,height=153!

By dumping the heap memory, we can find a strange phenomenon that there are 
available buffers in the {{{}LocalBufferPool{}}}, but it was considered to be 
un-available. Another thing to note is that it now holds an overdraft buffer.

!image-2023-03-02-12-28-48-437.png|width=520,height=200!

!image-2023-03-02-12-29-03-003.png|width=438,height=84!

TL;DR: This problem occurred in multi-thread race related to the introduction 
of overdraft buffer.

Suppose we have two threads, called A and B. For simplicity, 
{{LocalBufferPool}} is called {{LocalPool}} and {{NetworkBufferPool}} is called 
{{{}GlobalPool{}}}.

Thread A continuously request buffers blocking from the \{{LocalPool}}.
Thread B continuously return buffers to \{{GlobalPool}}.


1.  If thread A takes the last available buffer of {{{}LocalPool{}}}, but 
{{GlobalPool}} does not have a buffer at this time, it will register a callback 
function with {{{}GlobalPool{}}}.
2. Thread B returns one buffer to {{{}GlobalPool{}}}, but has not started to 
trigger the callback.
3. Thread A continues to request buffer. Because the 
{{availableMemorySegments}} of {{LocalPool}} is empty, it requests the 
overdraftBuffer instead. But there is already a buffer in the 
{{{}GlobalPool{}}}, it successfully gets the buffer.
4. Thread B triggers the callback. Since there is no buffer in {{GlobalPool}} 
now, the callback is re-registered.
5. Thread B continues to return a buffer and triggers the last callback. 
LocalPool puts the buffer into availableMemorySegments. Because the current 
logic of the shouldBeAvailable method is: if there is an overflow buffer, 
LocalPool is not available.

  was:
In our TPC-DS test, we found that in the case of fierce competition in network 
memory, some tasks may hanging forever.

>From the thread dump information, we can see that the task is waiting for the 
>{{LocalBufferPool}} to become available. It is strange that other tasks have 
>finished and released network memory already. Undoubtedly, this is an 
>unexpected behavior, which implies that there must be a bug in the 
>{{LocalBufferPool}} or {{{}NetworkBufferPool{}}}.

!image-2023-03-02-12-23-50-572.png|width=650,height=153!

By dumping the heap memory, we can find a strange phenomenon that there are 
available buffers in the {{{}LocalBufferPool{}}}, but it was considered to be 
un-available. Another thing to note is that it now holds an overdraft buffer.

!image-2023-03-02-12-28-48-437.png|width=520,height=200!

!image-2023-03-02-12-29-03-003.png|width=438,height=84!

 


> Request memory segment from LocalBufferPool may hanging forever.
> 
>
> Key: FLINK-31293
> URL: https://issues.apache.org/jira/browse/FLINK-31293
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Weijie Guo
>Priority: Major
> Attachments: image-2023-03-02-12-23-50-572.png, 
> image-2023-03-02-12-28-48-437.png, image-2023-03-02-12-29-03-003.png
>
>
> In our TPC-DS test, we found that in the case of fierce competition in 
> network memory, some tasks may hanging forever.
> From the thread dump information, we can see that the task is waiting for the 
> {{LocalBufferPool}} to become available. It is strange that other tasks have 
> finished and released network memory already. Undoubtedly, this is an 
> unexpected behavior, which implies that there must be a bug in the 
> {{LocalBufferPool}} or {{{}NetworkBufferPool{}}}.
> !image-2023-03-02-12-23-50-572.png|width=650,height=153!
> By dumping the heap memory, we can find a strange phenomenon that there are 
> available buffers in the {{{}LocalBufferPool{}}}, but it was considered to be 
> un-available. Another thing to note is that it now holds an overdraft buffer.
> !image-2023-03-02-12-28-48-437.png|width=520,height=200!
> !image-2023-03-02-12-29-03-003.png|width=438,height=84!
> TL;DR: This problem occurred in multi-thread race related to the introduction 
> of overdraft buffer.
> Suppose we have two threads, called A and B. For simplicity, 
> {{LocalBufferPool}} is called {{LocalPool}} and {{NetworkBufferPool}} is 
> called {{{}GlobalPool{}}}.
> Thread A

[GitHub] [flink] snuyanzin commented on pull request #22063: [FLINK-24909][Table SQL/Client] Add syntax highlighting

2023-03-01 Thread via GitHub



snuyanzin commented on PR #22063:
URL: https://github.com/apache/flink/pull/22063#issuecomment-1451363517

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-26088) Add Elasticsearch 8.0 support

2023-03-01 Thread Matheus Felisberto (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695484#comment-17695484
 ] 

Matheus Felisberto commented on FLINK-26088:


Hi there [~qinjunjerry], how are you? No problem, it's a pleasure to work with 
you all. I'm sorry for my lack of updates here, I had a planned vacation for 
February. I'll be back at full speed next Monday, and I'll have plenty of time 
to dedicate myself to this.

> Add Elasticsearch 8.0 support
> -
>
> Key: FLINK-26088
> URL: https://issues.apache.org/jira/browse/FLINK-26088
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / ElasticSearch
>Reporter: Yuhao Bi
>Assignee: Matheus Felisberto
>Priority: Major
>
> Since Elasticsearch 8.0 is officially released, I think it's time to consider 
> adding es8 connector support.
> The High Level REST Client we used for connection [is marked deprecated in es 
> 7.15.0|https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html].
>  Maybe we can migrate to use the new [Java API 
> Client|https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/8.0/index.html]
>  at this time.
> Elasticsearch8.0 release note: 
> [https://www.elastic.co/guide/en/elasticsearch/reference/8.0/release-notes-8.0.0.html]
> release highlights: 
> [https://www.elastic.co/guide/en/elasticsearch/reference/8.0/release-highlights.html]
> REST API compatibility: 
> https://www.elastic.co/guide/en/elasticsearch/reference/8.0/rest-api-compatibility.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-31222) Remove usage of deprecated ConverterUtils.toApplicationId

2023-03-01 Thread Rui Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan closed FLINK-31222.
---
Fix Version/s: 1.18.0
   Resolution: Fixed

> Remove usage of deprecated ConverterUtils.toApplicationId
> -
>
> Key: FLINK-31222
> URL: https://issues.apache.org/jira/browse/FLINK-31222
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.17.1
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> When reading the code, I found that we use ConverterUtils.toApplicationId to 
> convert applicationId, this method is deprecated, we should use 
> ApplicationId.fromString



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-31222) Remove usage of deprecated ConverterUtils.toApplicationId

2023-03-01 Thread Rui Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan reassigned FLINK-31222:
---

Assignee: Shilun Fan

> Remove usage of deprecated ConverterUtils.toApplicationId
> -
>
> Key: FLINK-31222
> URL: https://issues.apache.org/jira/browse/FLINK-31222
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.17.1
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>
> When reading the code, I found that we use ConverterUtils.toApplicationId to 
> convert applicationId, this method is deprecated, we should use 
> ApplicationId.fromString



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31222) Remove usage of deprecated ConverterUtils.toApplicationId

2023-03-01 Thread Rui Fan (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695483#comment-17695483
 ] 

Rui Fan commented on FLINK-31222:
-

Thanks for the fix:)

 

apache:master: 95dd5423aad847ec41b81676deaf4cb94d9e11d6

> Remove usage of deprecated ConverterUtils.toApplicationId
> -
>
> Key: FLINK-31222
> URL: https://issues.apache.org/jira/browse/FLINK-31222
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.17.1
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>
> When reading the code, I found that we use ConverterUtils.toApplicationId to 
> convert applicationId, this method is deprecated, we should use 
> ApplicationId.fromString



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] 1996fanrui merged pull request #22020: [FLINK-31222] Remove usage of deprecated ConverterUtils.toApplicationId.

2023-03-01 Thread via GitHub



1996fanrui merged PR #22020:
URL: https://github.com/apache/flink/pull/22020


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-31178) Public Writer API

2023-03-01 Thread Caizhi Weng (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caizhi Weng closed FLINK-31178.
---
Resolution: Fixed

master: 9b8dc4e2f3cf8bc28a95d5381c8544868bd5b688

> Public Writer API
> -
>
> Key: FLINK-31178
> URL: https://issues.apache.org/jira/browse/FLINK-31178
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-29852) The operator is repeatedly displayed on the Flink Web UI

2023-03-01 Thread JasonLee (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-29852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695482#comment-17695482
 ] 

JasonLee commented on FLINK-29852:
--

[~huwh] yeah,Thank you for your reply,I've already open it.

> The operator is repeatedly displayed on the Flink Web UI
> 
>
> Key: FLINK-29852
> URL: https://issues.apache.org/jira/browse/FLINK-29852
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.16.0
> Environment: Flink 1.16.0
>Reporter: JasonLee
>Priority: Critical
> Attachments: image-2022-11-02-23-57-39-387.png, 
> image-2022-11-09-16-09-44-233.png, image-2022-11-09-17-32-27-377.png
>
>
> All the operators in the DAG are shown repeatedly
> !image-2022-11-02-23-57-39-387.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-31248) Improve documentation for append-only table

2023-03-01 Thread Caizhi Weng (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caizhi Weng closed FLINK-31248.
---
Resolution: Fixed

master: b8a700082a6032dfed7cee4273f3f76ce0483b5a

> Improve documentation for append-only table
> ---
>
> Key: FLINK-31248
> URL: https://issues.apache.org/jira/browse/FLINK-31248
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Reopened] (FLINK-29852) The operator is repeatedly displayed on the Flink Web UI

2023-03-01 Thread JasonLee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-29852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JasonLee reopened FLINK-29852:
--

> The operator is repeatedly displayed on the Flink Web UI
> 
>
> Key: FLINK-29852
> URL: https://issues.apache.org/jira/browse/FLINK-29852
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.16.0
> Environment: Flink 1.16.0
>Reporter: JasonLee
>Priority: Critical
> Attachments: image-2022-11-02-23-57-39-387.png, 
> image-2022-11-09-16-09-44-233.png, image-2022-11-09-17-32-27-377.png
>
>
> All the operators in the DAG are shown repeatedly
> !image-2022-11-02-23-57-39-387.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-table-store] tsreaper merged pull request #550: [FLINK-31178] Public Writer API

2023-03-01 Thread via GitHub



tsreaper merged PR #550:
URL: https://github.com/apache/flink-table-store/pull/550


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-30694) Translate "Windowing TVF" page of "Querys" into Chinese

2023-03-01 Thread chenhaiyang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695477#comment-17695477
 ] 

chenhaiyang commented on FLINK-30694:
-

Hi [~jark], I have pull the pr for this ticket.



> Translate "Windowing TVF" page of "Querys" into Chinese 
> 
>
> Key: FLINK-30694
> URL: https://issues.apache.org/jira/browse/FLINK-30694
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Affects Versions: 1.16.0
>Reporter: chenhaiyang
>Assignee: chenhaiyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.2
>
>
> The page url is[ 
> [https://nightlies.apache.org/flink/flink-docs-master/zh/docs/dev/table/sql/queries/window-tvf/]
>  
> The markdown file is located in 
> docs/content.zh/docs/dev/table/sql/queries/window-tvf.md



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-29852) The operator is repeatedly displayed on the Flink Web UI

2023-03-01 Thread Weihua Hu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-29852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695476#comment-17695476
 ] 

Weihua Hu commented on FLINK-29852:
---

Hi, [~JasonLee] I found this is a bug while adaptiveScheduler update the 
ExecutionGraph#JsonPlan. It put all the execution vertices into the jsonPlan. 
Could you reopen this tickets, I would like to fix it. cc [~dwysakowicz] 

> The operator is repeatedly displayed on the Flink Web UI
> 
>
> Key: FLINK-29852
> URL: https://issues.apache.org/jira/browse/FLINK-29852
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.16.0
> Environment: Flink 1.16.0
>Reporter: JasonLee
>Priority: Critical
> Attachments: image-2022-11-02-23-57-39-387.png, 
> image-2022-11-09-16-09-44-233.png, image-2022-11-09-17-32-27-377.png
>
>
> All the operators in the DAG are shown repeatedly
> !image-2022-11-02-23-57-39-387.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31296) Add JoinConditionEqualityTransferRule to stream optimizer

2023-03-01 Thread Aitozi (Jira)

Aitozi created FLINK-31296:
--

 Summary: Add JoinConditionEqualityTransferRule to stream optimizer
 Key: FLINK-31296
 URL: https://issues.apache.org/jira/browse/FLINK-31296
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: Aitozi


I find that {{JoinConditionEqualityTransferRule}} is a common rule for batch 
and stream mode. So it should be added to the stream optimizer which will bring 
performance improvement in some case.

Maybe, other rules also need to be reviewed whether can be aligned in batch and 
stream case.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31292) User HadoopUtils to get Configuration in CatalogContext

2023-03-01 Thread Yubin Li (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695473#comment-17695473
 ] 

Yubin Li commented on FLINK-31292:
--

[~lzljs3620320] Good idea! Could you please assign this to me?

> User HadoopUtils to get Configuration in CatalogContext
> ---
>
> Key: FLINK-31292
> URL: https://issues.apache.org/jira/browse/FLINK-31292
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Priority: Major
> Fix For: table-store-0.4.0
>
>
> At present, if HadoopConf is not passed in the CatalogContext, a new 
> HadoopConf will be directly generated, which may not have the required 
> parameters.
> We can refer to HadoopUtils to obtain hadoopConf from the configuration and 
> environment variables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] lindong28 commented on pull request #21557: [FLINK-30501] Update Flink build instruction to deprecate Java 8 instead of requiring Java 11

2023-03-01 Thread via GitHub



lindong28 commented on PR #21557:
URL: https://github.com/apache/flink/pull/21557#issuecomment-1451332786

   @zentol @yunfengzhou-hub Do you have time to review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-31087) Introduce MergeIntoAction.

2023-03-01 Thread yuzelin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzelin updated FLINK-31087:

Summary: Introduce MergeIntoAction.  (was: Introduce InsertChangesAction.)

> Introduce MergeIntoAction.
> --
>
> Key: FLINK-31087
> URL: https://issues.apache.org/jira/browse/FLINK-31087
> Project: Flink
>  Issue Type: New Feature
>  Components: Table Store
>Affects Versions: table-store-0.4.0
>Reporter: yuzelin
>Priority: Major
>  Labels: pull-request-available
>
> Applying streaming changes to table need to use DataStream API in Flink. This 
> action can help do the job.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31087) Introduce MergeIntoAction.

2023-03-01 Thread yuzelin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzelin updated FLINK-31087:

Description: This action simulates the 'MERGE INTO' syntax.  (was: Applying 
streaming changes to table need to use DataStream API in Flink. This action can 
help do the job.)

> Introduce MergeIntoAction.
> --
>
> Key: FLINK-31087
> URL: https://issues.apache.org/jira/browse/FLINK-31087
> Project: Flink
>  Issue Type: New Feature
>  Components: Table Store
>Affects Versions: table-store-0.4.0
>Reporter: yuzelin
>Priority: Major
>  Labels: pull-request-available
>
> This action simulates the 'MERGE INTO' syntax.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-30501) Update Flink build instruction to deprecate Java 8 instead of requiring Java 11

2023-03-01 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated FLINK-30501:
-
Description: 
Flink 1.15 and later versions require at least Java 11 to build from sources 
[1], whereas the pom.xml specifies the source/target is 1.8. This inconsistency 
confuses users.

As mentioned in the FLINK-25247 title, the goal of that ticket is to "Inform 
users about deprecation". It will be better to inform users that "Java 8 is 
deprecated" instead of saying "Fink requires at least Java 11 to build", so 
that users have the right information to make the right choice for themselves.

[1] 
[https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/flinkdev/building/]

  was:
Flink 1.15 and later versions require at least Java 11 to build from sources 
[1], whereas the pom.xml specifies the source/target is 1.8. 

This is inconsistent. We update our doc to mention that Java 8 is deprecated 
instead of saying that Java 11 is required.

The statement "Fink requires at least Java 11 to build" was added in 
https://issues.apache.org/jira/browse/FLINK-25247. However the JIRA title only 
says "Inform users about deprecation".

 [1] 
[https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/flinkdev/building/]


> Update Flink build instruction to deprecate Java 8 instead of requiring Java 
> 11
> ---
>
> Key: FLINK-30501
> URL: https://issues.apache.org/jira/browse/FLINK-30501
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System / CI
>Reporter: Dong Lin
>Assignee: Dong Lin
>Priority: Major
>  Labels: pull-request-available
>
> Flink 1.15 and later versions require at least Java 11 to build from sources 
> [1], whereas the pom.xml specifies the source/target is 1.8. This 
> inconsistency confuses users.
> As mentioned in the FLINK-25247 title, the goal of that ticket is to "Inform 
> users about deprecation". It will be better to inform users that "Java 8 is 
> deprecated" instead of saying "Fink requires at least Java 11 to build", so 
> that users have the right information to make the right choice for themselves.
> [1] 
> [https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/flinkdev/building/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-30501) Update Flink build instruction to deprecate Java 8 instead of requiring Java 11

2023-03-01 Thread Dong Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated FLINK-30501:
-
Description: 
Flink 1.15 and later versions require at least Java 11 to build from sources 
[1], whereas the pom.xml specifies the source/target is 1.8. 

This is inconsistent. We update our doc to mention that Java 8 is deprecated 
instead of saying that Java 11 is required.

The statement "Fink requires at least Java 11 to build" was added in 
https://issues.apache.org/jira/browse/FLINK-25247. However the JIRA title only 
says "Inform users about deprecation".

 [1] 
[https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/flinkdev/building/]

  was:
Flink 1.15 and later versions require at least Java 11 to build from sources 
[1], whereas the pom.xml specifies the source/target is 1.8. 

This is inconsistent. We update our doc to mention that Java 8 is deprecated 
instead of saying that Java 11 is required.

The statement "Fink requires at least Java 11 to build" was added in 
https://issues.apache.org/jira/browse/FLINK-25247. However the JIRA title only 
says "Inform users about deprecation". And the PR was not approved by any 
committer other than the PR author.

 [1] 
[https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/flinkdev/building/]


> Update Flink build instruction to deprecate Java 8 instead of requiring Java 
> 11
> ---
>
> Key: FLINK-30501
> URL: https://issues.apache.org/jira/browse/FLINK-30501
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System / CI
>Reporter: Dong Lin
>Assignee: Dong Lin
>Priority: Major
>  Labels: pull-request-available
>
> Flink 1.15 and later versions require at least Java 11 to build from sources 
> [1], whereas the pom.xml specifies the source/target is 1.8. 
> This is inconsistent. We update our doc to mention that Java 8 is deprecated 
> instead of saying that Java 11 is required.
> The statement "Fink requires at least Java 11 to build" was added in 
> https://issues.apache.org/jira/browse/FLINK-25247. However the JIRA title 
> only says "Inform users about deprecation".
>  [1] 
> [https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/flinkdev/building/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-31290) Remove features in documentation

2023-03-01 Thread Jingsong Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee reassigned FLINK-31290:


Assignee: Guojun Li

> Remove features in documentation
> 
>
> Key: FLINK-31290
> URL: https://issues.apache.org/jira/browse/FLINK-31290
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Guojun Li
>Priority: Major
> Fix For: table-store-0.4.0
>
>
> Features is confused in documentation.
> Now, there are two pages in features, log system and lookup join.
> We can move log system to concepts.
> And move lookup join to how-to.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-31291) Document table.exec.sink.upsert-materialize to none

2023-03-01 Thread Jingsong Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee reassigned FLINK-31291:


Assignee: Guojun Li

> Document table.exec.sink.upsert-materialize to none
> ---
>
> Key: FLINK-31291
> URL: https://issues.apache.org/jira/browse/FLINK-31291
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Guojun Li
>Priority: Major
> Fix For: table-store-0.4.0
>
>
> The table store has the ability to correct disorder, such as:
> [https://nightlies.apache.org/flink/flink-table-store-docs-master/docs/concepts/primary-key-table/#sequence-field]
> But Flink SQL default sink materialize will result strange behavior, In 
> particular, write to the agg table of the fts.
> We should document this, set table.exec.sink.upsert-materialize to none 
> always, set 'sequence.field' to table in case of disorder.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-31289) Default aggregate-function for field can be last_non_null_value

2023-03-01 Thread Jingsong Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-31289.

  Assignee: Shammon
Resolution: Fixed

master: 30f03264bf188df3ccf753cc53da421b3464318c

> Default aggregate-function for field can be last_non_null_value
> ---
>
> Key: FLINK-31289
> URL: https://issues.apache.org/jira/browse/FLINK-31289
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Shammon
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>
> At present, when aggfunc is not configured, NPE will be generated. When the 
> table is oriented to many fields, the configuration will be more troublesome.
> We can give the field the default aggfunc, such as last_ non_ null_ Value, 
> which is consistent with the partial-update table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31290) Remove features in documentation

2023-03-01 Thread Guojun Li (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695468#comment-17695468
 ] 

Guojun Li commented on FLINK-31290:
---

Hi Jingsong, I can take this task. Would you please assign it to me?

> Remove features in documentation
> 
>
> Key: FLINK-31290
> URL: https://issues.apache.org/jira/browse/FLINK-31290
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Priority: Major
> Fix For: table-store-0.4.0
>
>
> Features is confused in documentation.
> Now, there are two pages in features, log system and lookup join.
> We can move log system to concepts.
> And move lookup join to how-to.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31291) Document table.exec.sink.upsert-materialize to none

2023-03-01 Thread Guojun Li (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695466#comment-17695466
 ] 

Guojun Li commented on FLINK-31291:
---

Hi jingsong, I can take this task, would you please assign it to me?

> Document table.exec.sink.upsert-materialize to none
> ---
>
> Key: FLINK-31291
> URL: https://issues.apache.org/jira/browse/FLINK-31291
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Priority: Major
> Fix For: table-store-0.4.0
>
>
> The table store has the ability to correct disorder, such as:
> [https://nightlies.apache.org/flink/flink-table-store-docs-master/docs/concepts/primary-key-table/#sequence-field]
> But Flink SQL default sink materialize will result strange behavior, In 
> particular, write to the agg table of the fts.
> We should document this, set table.exec.sink.upsert-materialize to none 
> always, set 'sequence.field' to table in case of disorder.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-table-store] JingsongLi merged pull request #568: [FLINK-31289] Default aggregate-function for field can be last_non_null_value

2023-03-01 Thread via GitHub



JingsongLi merged PR #568:
URL: https://github.com/apache/flink-table-store/pull/568


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-31295) When kerbero authentication is enabled, hadoop_ Proxy cannot take effect.

2023-03-01 Thread HunterXHunter (Jira)

HunterXHunter created FLINK-31295:
-

 Summary: When kerbero authentication is enabled, hadoop_ Proxy 
cannot take effect.
 Key: FLINK-31295
 URL: https://issues.apache.org/jira/browse/FLINK-31295
 Project: Flink
  Issue Type: Bug
Reporter: HunterXHunter


When I set ：

security.kerberos.login.keytab: kerbero_user.keytab
security.kerberos.login.principal: kerbero_user

and set  HADOOP_PROXY_USER = proxy_user

Data is still written to hdfs as user kerbero_user.

But : 

When I turn off kerbero authentication.  data is written to hdfs as user 
proxy_user.

Finally, I found the logic in HadoopModule#install would not be used as a 
hadoop proxy when kerbero authentication is enabled.

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31294) CommitterOperator forgot to close Committer when closing.

2023-03-01 Thread Ming Li (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695457#comment-17695457
 ] 

Ming Li commented on FLINK-31294:
-

[~lzljs3620320], hi, please help take a look at this issue if you have time.

> CommitterOperator forgot to close Committer when closing.
> -
>
> Key: FLINK-31294
> URL: https://issues.apache.org/jira/browse/FLINK-31294
> Project: Flink
>  Issue Type: Bug
>  Components: Table Store
>Reporter: Ming Li
>Priority: Major
>
> {{CommitterOperator}} does not close the {{Committer}} when it closes, which 
> may lead to resource leaks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31140) Load additional dependencies in operator classpath

2023-03-01 Thread Tamir Sagi (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamir Sagi updated FLINK-31140:
---
Description: 
To date is is not possible to add additional jars to operator classpath. 

In our case, We have a Kafka appender with custom layout that works with IAM 
authentication and SSL along with AWS MSK.

log4j.properties file

 
{code:java}
appender.kafka.type = Kafka
appender.kafka.name = Kafka
appender.kafka.bootstrap.servers = 
appender.kafka.topic = 
appender.kafka.security.protocol = SASL_SSL
appender.kafka.sasl.mechanism = AWS_MSK_IAM
appender.kafka.sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule 
required;
appender.kafka.sasl.client.callback.handler.class = 
software.amazon.msk.auth.iam.IAMClientCallbackHandler
appender.kafka.layout.type = CustomJsonLayout
appender.kafka.layout.class = our.package.layouts.CustomJsonLayout 
appender.kafka.layout.service_name = {code}
if CustomLayout is not present in classpath the logger fails to start.

*main ERROR Could not create plugin of type class 
org.apache.logging.log4j.core.appender.KafkaAppender for element Kafka: 
java.lang.NullPointerException java.lang.NullPointerException*

furthermore, all the essential AWS dependencies must be added to path.

 

To support additional jars I needed to add them and then replace the following 
lines

[https://github.com/apache/flink-kubernetes-operator/blob/main/docker-entrypoint.sh#L32]

[https://github.com/apache/flink-kubernetes-operator/blob/main/docker-entrypoint.sh#L37]

using 'sed' command.

LOGGER_JAR in that case is an uber jar contains all the necessary jars.
{code:java}
ENV USER_DEPENDENCIES_DIR=/opt/flink/dependencies
RUN mkdir -p $USER_DEPENDENCIES_DIR
COPY target/$LOGGER_JAR $USER_DEPENDENCIES_DIR

USER root

RUN sed -i 's/java -cp /java -cp $USER_DEPENDENCIES_DIR/*:/' 
/docker-entrypoint.sh

USER flink{code}
It works great but not ideal and IMO should not be handled that way.

 

The idea of that ticket is to allow users to add additional jars into specific 
location which is added to classpath while Kubernetes operator starts.

I'd like to add the 'OPERATOR_LIB' ENV and create that folder 
(/opt/flink/operator-lib) as well in root Dockerfile.

along with a minor modification in 'docker-entrypoint.sh' (Add  $OPERATOR_LIB 
to 'cp' command)

 

Once it's supported , the only thing users have to do is copy all dependencies 
to that folder ($OPERATOR_LIB) while building custom image

I tested it locally and it seems to be working.

  was:
To date is is not possible to add additional jars to operator classpath. 

In our case, We have a Kafka appender with custom layout that works with IAM 
authentication and SSL along with AWS MSK.

log4j.properties file

 
{code:java}
appender.kafka.type = Kafka
appender.kafka.name = Kafka
appender.kafka.bootstrap.servers = 
appender.kafka.topic = 
appender.kafka.security.protocol = SASL_SSL
appender.kafka.sasl.mechanism = AWS_MSK_IAM
appender.kafka.sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule 
required;
appender.kafka.sasl.client.callback.handler.class = 
software.amazon.msk.auth.iam.IAMClientCallbackHandler
appender.kafka.layout.type = CustomJsonLayout
appender.kafka.layout.class = our.package.layouts.CustomJsonLayout 
appender.kafka.layout.service_name = {code}
if CustomLayout is not present in classpath the logger fails to start.

*main ERROR Could not create plugin of type class 
org.apache.logging.log4j.core.appender.KafkaAppender for element Kafka: 
java.lang.NullPointerException java.lang.NullPointerException*

furthermore, all the essential AWS dependencies must be added to path.

 

To support additional jars I needed to add them and then replace the following 
lines

[https://github.com/apache/flink-kubernetes-operator/blob/main/docker-entrypoint.sh#L32]

[https://github.com/apache/flink-kubernetes-operator/blob/main/docker-entrypoint.sh#L37]

using 'sed' command.

LOGGER_JAR in that case is an uber jar contains all the necessary jars.
{code:java}
ENV USER_DEPENDENCIES_DIR=/opt/flink/dependencies
RUN mkdir -p $USER_DEPENDENCIES_DIR
COPY target/$LOGGER_JAR $USER_DEPENDENCIES_DIR

USER root

RUN sed -i 's/java -cp /java -cp $USER_DEPENDENCIES_DIR/*:/' 
/docker-entrypoint.sh

USER flink{code}
It works great but not ideal and IMO should not be handled that way.

 

The idea of that ticket is to allow users to add additional jars into specific 
location which is added to classpath while Kubernetes operator starts.

I'd like to add the 'USER_DEPENDENCIES_DIR' ENV and create that folder as well 
in root Dockerfile.

along with a minor modification in 'docker-entrypoint.sh' (Add  
$USER_DEPENDENCIES_DIR to 'cp' command)

 

Once it's supported , the only thing users have to do is copy all dependencies 
to that folder ($USER_DEPENDENCIES_DIR) while building custom image

I tested it locally and it seems to be working.


> Load additional

[jira] [Created] (FLINK-31294) CommitterOperator forgot to close Committer when closing.

2023-03-01 Thread ming li (Jira)

ming li created FLINK-31294:
---

 Summary: CommitterOperator forgot to close Committer when closing.
 Key: FLINK-31294
 URL: https://issues.apache.org/jira/browse/FLINK-31294
 Project: Flink
  Issue Type: Bug
  Components: Table Store
Reporter: ming li


{{CommitterOperator}} does not close the {{Committer}} when it closes, which 
may lead to resource leaks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-table-store] Jackofmn12 commented on a diff in pull request #562: [FLINK-31248] Improve documentation for append-only table

2023-03-01 Thread via GitHub



Jackofmn12 commented on code in PR #562:
URL: https://github.com/apache/flink-table-store/pull/562#discussion_r1122591066


##
docs/content/docs/features/append-only-table.md:
##
@@ -0,0 +1,98 @@
+---
+title: "Append Only Table"
+weight: 2
+type: docs
+aliases:
+- /features/append-only-table.html
+---
+
+
+# Append Only Table
+
+By specifying `'write-mode' = 'append-only'` when creating the table, user 
creates an append-only table.
+
+You can only insert a whole record into the table. No delete or update is 
supported and you cannot define primary keys.

Review Comment:
   A



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-31289) Default aggregate-function for field can be last_non_null_value

2023-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31289:
---
Labels: pull-request-available  (was: )

> Default aggregate-function for field can be last_non_null_value
> ---
>
> Key: FLINK-31289
> URL: https://issues.apache.org/jira/browse/FLINK-31289
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>
> At present, when aggfunc is not configured, NPE will be generated. When the 
> table is oriented to many fields, the configuration will be more troublesome.
> We can give the field the default aggfunc, such as last_ non_ null_ Value, 
> which is consistent with the partial-update table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-table-store] FangYongs opened a new pull request, #568: [FLINK-31289] Default aggregate-function for field can be last_non_null_value

2023-03-01 Thread via GitHub



FangYongs opened a new pull request, #568:
URL: https://github.com/apache/flink-table-store/pull/568

   If the field has no aggregate function for aggregation merge-engine, use 
last_non_null_value aggregation as default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-31293) Request memory segment from LocalBufferPool may hanging forever.

2023-03-01 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-31293:
---
Attachment: image-2023-03-02-12-29-03-003.png

> Request memory segment from LocalBufferPool may hanging forever.
> 
>
> Key: FLINK-31293
> URL: https://issues.apache.org/jira/browse/FLINK-31293
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Weijie Guo
>Priority: Major
> Attachments: image-2023-03-02-12-23-50-572.png, 
> image-2023-03-02-12-28-48-437.png, image-2023-03-02-12-29-03-003.png
>
>
> From the thread dump information, we can see that the task is waiting for the 
> \{{LocalBufferPool}} to become available. It is strange that other tasks have 
> finished and released network memory already. Undoubtedly, this is an 
> unexpected behavior, which implies that there must be a bug in the 
> \{{LocalBufferPool}} or \{{NetworkBufferPool}}.
> !image-2023-03-02-12-23-50-572.png|width=650,height=153!
> We can find a strange phenomenon that there are available buffers in the 
> LocalBufferPool, but they are considered to be non-available. Another thing 
> to note is that it now holds an overdraft buffer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31293) Request memory segment from LocalBufferPool may hanging forever.

2023-03-01 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-31293:
---
Description: 
In our TPC-DS test, we found that in the case of fierce competition in network 
memory, some tasks may hanging forever.

>From the thread dump information, we can see that the task is waiting for the 
>{{LocalBufferPool}} to become available. It is strange that other tasks have 
>finished and released network memory already. Undoubtedly, this is an 
>unexpected behavior, which implies that there must be a bug in the 
>{{LocalBufferPool}} or {{{}NetworkBufferPool{}}}.

!image-2023-03-02-12-23-50-572.png|width=650,height=153!

By dumping the heap memory, we can find a strange phenomenon that there are 
available buffers in the {{{}LocalBufferPool{}}}, but it was considered to be 
un-available. Another thing to note is that it now holds an overdraft buffer.

!image-2023-03-02-12-28-48-437.png|width=520,height=200!

!image-2023-03-02-12-29-03-003.png|width=438,height=84!

 

  was:
>From the thread dump information, we can see that the task is waiting for the 
>\{{LocalBufferPool}} to become available. It is strange that other tasks have 
>finished and released network memory already. Undoubtedly, this is an 
>unexpected behavior, which implies that there must be a bug in the 
>\{{LocalBufferPool}} or \{{NetworkBufferPool}}.

!image-2023-03-02-12-23-50-572.png|width=650,height=153!

We can find a strange phenomenon that there are available buffers in the 
LocalBufferPool, but they are considered to be non-available. Another thing to 
note is that it now holds an overdraft buffer


> Request memory segment from LocalBufferPool may hanging forever.
> 
>
> Key: FLINK-31293
> URL: https://issues.apache.org/jira/browse/FLINK-31293
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Weijie Guo
>Priority: Major
> Attachments: image-2023-03-02-12-23-50-572.png, 
> image-2023-03-02-12-28-48-437.png, image-2023-03-02-12-29-03-003.png
>
>
> In our TPC-DS test, we found that in the case of fierce competition in 
> network memory, some tasks may hanging forever.
> From the thread dump information, we can see that the task is waiting for the 
> {{LocalBufferPool}} to become available. It is strange that other tasks have 
> finished and released network memory already. Undoubtedly, this is an 
> unexpected behavior, which implies that there must be a bug in the 
> {{LocalBufferPool}} or {{{}NetworkBufferPool{}}}.
> !image-2023-03-02-12-23-50-572.png|width=650,height=153!
> By dumping the heap memory, we can find a strange phenomenon that there are 
> available buffers in the {{{}LocalBufferPool{}}}, but it was considered to be 
> un-available. Another thing to note is that it now holds an overdraft buffer.
> !image-2023-03-02-12-28-48-437.png|width=520,height=200!
> !image-2023-03-02-12-29-03-003.png|width=438,height=84!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31293) Request memory segment from LocalBufferPool may hanging forever.

2023-03-01 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-31293:
---
Attachment: image-2023-03-02-12-28-48-437.png

> Request memory segment from LocalBufferPool may hanging forever.
> 
>
> Key: FLINK-31293
> URL: https://issues.apache.org/jira/browse/FLINK-31293
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Weijie Guo
>Priority: Major
> Attachments: image-2023-03-02-12-23-50-572.png, 
> image-2023-03-02-12-28-48-437.png
>
>
> From the thread dump information, we can see that the task is waiting for the 
> \{{LocalBufferPool}} to become available. It is strange that other tasks have 
> finished and released network memory already. Undoubtedly, this is an 
> unexpected behavior, which implies that there must be a bug in the 
> \{{LocalBufferPool}} or \{{NetworkBufferPool}}.
> !image-2023-03-02-12-23-50-572.png|width=650,height=153!
> We can find a strange phenomenon that there are available buffers in the 
> LocalBufferPool, but they are considered to be non-available. Another thing 
> to note is that it now holds an overdraft buffer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-31208) KafkaSourceReader overrides meaninglessly a method(pauseOrResumeSplits)

2023-03-01 Thread Qingsheng Ren (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qingsheng Ren reassigned FLINK-31208:
-

Assignee: Hongshun Wang

> KafkaSourceReader overrides meaninglessly a method(pauseOrResumeSplits)
> ---
>
> Key: FLINK-31208
> URL: https://issues.apache.org/jira/browse/FLINK-31208
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Reporter: Hongshun Wang
>Assignee: Hongshun Wang
>Priority: Not a Priority
>  Labels: starter
>
> KafkaSourceReader overrides meaninglessly a method(pauseOrResumeSplits) 
> ，which is no difference with its Parent class (SourceReaderBase). why not 
> remove this override method?
>  
> Relative code is here, which we can see is no difference?
> {code:java}
> //org.apache.flink.connector.kafka.source.reader.KafkaSourceReader#pauseOrResumeSplits
> @Override
> public void pauseOrResumeSplits(
> Collection splitsToPause, Collection splitsToResume) {
> splitFetcherManager.pauseOrResumeSplits(splitsToPause, splitsToResume);
> } 
> //org.apache.flink.connector.base.source.reader.SourceReaderBase#pauseOrResumeSplits
> @Override
> public void pauseOrResumeSplits(
> Collection splitsToPause, Collection splitsToResume) {
> splitFetcherManager.pauseOrResumeSplits(splitsToPause, splitsToResume);
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31293) Request memory segment from LocalBufferPool may hanging forever.

2023-03-01 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-31293:
---
Description: 
>From the thread dump information, we can see that the task is waiting for the 
>\{{LocalBufferPool}} to become available. It is strange that other tasks have 
>finished and released network memory already. Undoubtedly, this is an 
>unexpected behavior, which implies that there must be a bug in the 
>\{{LocalBufferPool}} or \{{NetworkBufferPool}}.

!image-2023-03-02-12-23-50-572.png|width=650,height=153!

We can find a strange phenomenon that there are available buffers in the 
LocalBufferPool, but they are considered to be non-available. Another thing to 
note is that it now holds an overdraft buffer

> Request memory segment from LocalBufferPool may hanging forever.
> 
>
> Key: FLINK-31293
> URL: https://issues.apache.org/jira/browse/FLINK-31293
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Weijie Guo
>Priority: Major
> Attachments: image-2023-03-02-12-23-50-572.png
>
>
> From the thread dump information, we can see that the task is waiting for the 
> \{{LocalBufferPool}} to become available. It is strange that other tasks have 
> finished and released network memory already. Undoubtedly, this is an 
> unexpected behavior, which implies that there must be a bug in the 
> \{{LocalBufferPool}} or \{{NetworkBufferPool}}.
> !image-2023-03-02-12-23-50-572.png|width=650,height=153!
> We can find a strange phenomenon that there are available buffers in the 
> LocalBufferPool, but they are considered to be non-available. Another thing 
> to note is that it now holds an overdraft buffer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31293) Request memory segment from LocalBufferPool may hanging forever.

2023-03-01 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-31293:
---
Attachment: image-2023-03-02-12-23-50-572.png

> Request memory segment from LocalBufferPool may hanging forever.
> 
>
> Key: FLINK-31293
> URL: https://issues.apache.org/jira/browse/FLINK-31293
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Weijie Guo
>Priority: Major
> Attachments: image-2023-03-02-12-23-50-572.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31293) Request memory segment from LocalBufferPool may hanging forever.

2023-03-01 Thread Weijie Guo (Jira)

Weijie Guo created FLINK-31293:
--

 Summary: Request memory segment from LocalBufferPool may hanging 
forever.
 Key: FLINK-31293
 URL: https://issues.apache.org/jira/browse/FLINK-31293
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.17.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31292) User HadoopUtils to get Configuration in CatalogContext

2023-03-01 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-31292:


 Summary: User HadoopUtils to get Configuration in CatalogContext
 Key: FLINK-31292
 URL: https://issues.apache.org/jira/browse/FLINK-31292
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.4.0


At present, if HadoopConf is not passed in the CatalogContext, a new HadoopConf 
will be directly generated, which may not have the required parameters.

We can refer to HadoopUtils to obtain hadoopConf from the configuration and 
environment variables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31291) Document table.exec.sink.upsert-materialize to none

2023-03-01 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-31291:


 Summary: Document table.exec.sink.upsert-materialize to none
 Key: FLINK-31291
 URL: https://issues.apache.org/jira/browse/FLINK-31291
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.4.0


The table store has the ability to correct disorder, such as:

[https://nightlies.apache.org/flink/flink-table-store-docs-master/docs/concepts/primary-key-table/#sequence-field]

But Flink SQL default sink materialize will result strange behavior, In 
particular, write to the agg table of the fts.

We should document this, set table.exec.sink.upsert-materialize to none always, 
set 'sequence.field' to table in case of disorder.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31287) Default value of 'changelog-producer.compaction-interval' can be zero

2023-03-01 Thread xzw0223 (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695442#comment-17695442
 ] 

xzw0223 commented on FLINK-31287:
-

[~lzljs3620320]  Can I have a ticket? I think I can do it.

> Default value of 'changelog-producer.compaction-interval' can be zero
> -
>
> Key: FLINK-31287
> URL: https://issues.apache.org/jira/browse/FLINK-31287
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Priority: Major
> Fix For: table-store-0.4.0
>
>
> At present, the 30-minute interval is too conservative. We can set it to 0 by 
> default, so that each checkpoint will have a full-compaction and generate a 
> changelog.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31290) Remove features in documentation

2023-03-01 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-31290:


 Summary: Remove features in documentation
 Key: FLINK-31290
 URL: https://issues.apache.org/jira/browse/FLINK-31290
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.4.0


Features is confused in documentation.

Now, there are two pages in features, log system and lookup join.

We can move log system to concepts.

And move lookup join to how-to.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] flinkbot commented on pull request #22065: [FLINK-31286][python] Make sure Python processes are cleaned up when TaskManager crashes

2023-03-01 Thread via GitHub



flinkbot commented on PR #22065:
URL: https://github.com/apache/flink/pull/22065#issuecomment-1451258837

   
   ## CI report:
   
   * 849025d00ad19a620ea1930e4fdd0ffd56dbfb3d UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-31289) Default aggregate-function for field can be last_non_null_value

2023-03-01 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-31289:


 Summary: Default aggregate-function for field can be 
last_non_null_value
 Key: FLINK-31289
 URL: https://issues.apache.org/jira/browse/FLINK-31289
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.4.0


At present, when aggfunc is not configured, NPE will be generated. When the 
table is oriented to many fields, the configuration will be more troublesome.

We can give the field the default aggfunc, such as last_ non_ null_ Value, 
which is consistent with the partial-update table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31288) Disable overdraft buffer for batch shuffle

2023-03-01 Thread Weijie Guo (Jira)

Weijie Guo created FLINK-31288:
--

 Summary: Disable overdraft buffer for batch shuffle
 Key: FLINK-31288
 URL: https://issues.apache.org/jira/browse/FLINK-31288
 Project: Flink
  Issue Type: Bug
Reporter: Weijie Guo
Assignee: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31286) Python processes are still alive when shutting down a session cluster directly without stopping the jobs

2023-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31286:
---
Labels: pull-request-available  (was: )

> Python processes are still alive when shutting down a session cluster 
> directly without stopping the jobs
> 
>
> Key: FLINK-31286
> URL: https://issues.apache.org/jira/browse/FLINK-31286
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-03-02-10-55-34-863.png
>
>
> Reproduce steps:
> 1) start a standalone cluster: ./bin/start_cluster.sh
> 2) submit a PyFlink job which contains Python UDFs
> 3) stop the cluster: ./bin/stop_cluster.sh
> 4) Check if Python process still exists: ps aux | grep -i beam_boot
> !image-2023-03-02-10-55-34-863.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] dianfu opened a new pull request, #22065: [FLINK-31286][python] Make sure Python processes are cleaned up when TaskManager crashes

2023-03-01 Thread via GitHub



dianfu opened a new pull request, #22065:
URL: https://github.com/apache/flink/pull/22065

   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-31287) Default value of 'changelog-producer.compaction-interval' can be zero

2023-03-01 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-31287:


 Summary: Default value of 'changelog-producer.compaction-interval' 
can be zero
 Key: FLINK-31287
 URL: https://issues.apache.org/jira/browse/FLINK-31287
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.4.0


At present, the 30-minute interval is too conservative. We can set it to 0 by 
default, so that each checkpoint will have a full-compaction and generate a 
changelog.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] X-czh commented on a diff in pull request #21673: [FLINK-30513] Cleanup HA storage path on cluster termination

2023-03-01 Thread via GitHub



X-czh commented on code in PR #21673:
URL: https://github.com/apache/flink/pull/21673#discussion_r1122567761


##
flink-runtime/src/test/java/org/apache/flink/runtime/highavailability/AbstractHaServicesTest.java:
##
@@ -44,18 +44,20 @@
 
 import static org.hamcrest.Matchers.contains;
 import static org.hamcrest.Matchers.is;
+import static org.hamcrest.Matchers.not;

Review Comment:
   Thanks for the comment, I'll migrate this class to junit 5.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] X-czh commented on a diff in pull request #21673: [FLINK-30513] Cleanup HA storage path on cluster termination

2023-03-01 Thread via GitHub



X-czh commented on code in PR #21673:
URL: https://github.com/apache/flink/pull/21673#discussion_r1122567580


##
flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/AbstractHaServices.java:
##
@@ -225,6 +237,13 @@ public CompletableFuture globalCleanupAsync(JobID 
jobID, Executor executor
 executor);
 }
 
+protected void cleanupClusterHaStoragePath() throws Exception {
+final Path clusterHaStoragePath =
+
HighAvailabilityServicesUtils.getClusterHighAvailableStoragePath(configuration);
+final FileSystem fileSystem = clusterHaStoragePath.getFileSystem();
+fileSystem.delete(clusterHaStoragePath, true);

Review Comment:
   Yes, it will already delete the blob dir. However, as you mentioned, the 
protocol of BlobStoreService#closeAndCleanupAllData() is not limited to 
deleting directory, so it is still required.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-31285) FileSource should support reading files in order

2023-03-01 Thread Yaroslav Tkachenko (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695425#comment-17695425
 ] 

Yaroslav Tkachenko commented on FLINK-31285:


I think FileSplitAssigner is the only thing that has to be fixed to support 
this, so no other ideas. I'd also love to include an implementation that 
actually performs sorting, so users can just choose it when building a 
FileSource. Probably a similar approach for the Table API.

> FileSource should support reading files in order
> 
>
> Key: FLINK-31285
> URL: https://issues.apache.org/jira/browse/FLINK-31285
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / FileSystem
>Affects Versions: 1.18.0
>Reporter: Yaroslav Tkachenko
>Priority: Major
>
> Currently, Flink's *FileSource* uses *LocalityAwareSplitAssigner* as a 
> default *FileSplitAssigner* and it doesn't guarantee any order. In many 
> scenarios involving processing historical data, reading files in order can be 
> a requirement, especially when using event-time processing. 
> I believe a new FileSplitAssigner should be implemented that supports 
> ordering. FileSourceBuilder should be extended to allow choosing a different 
> FileSplitAssigner.
> It's also clear that the files may not be read in _perfect_ order with 
> parallelism > 1. However, in some cases, using parallelism of 1 might be fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31286) Python processes are still alive when shutting down a session cluster directly without stopping the jobs

2023-03-01 Thread Dian Fu (Jira)

Dian Fu created FLINK-31286:
---

 Summary: Python processes are still alive when shutting down a 
session cluster directly without stopping the jobs
 Key: FLINK-31286
 URL: https://issues.apache.org/jira/browse/FLINK-31286
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Attachments: image-2023-03-02-10-55-34-863.png

Reproduce steps:
1) start a standalone cluster: ./bin/start_cluster.sh
2) submit a PyFlink job which contains Python UDFs
3) stop the cluster: ./bin/stop_cluster.sh
4) Check if Python process still exists: ps aux | grep -i beam_boot

!image-2023-03-02-10-55-34-863.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] flinkbot commented on pull request #22064: [FLINK-26945][table] add DATE_SUB function.

2023-03-01 Thread via GitHub



flinkbot commented on PR #22064:
URL: https://github.com/apache/flink/pull/22064#issuecomment-1451176068

   
   ## CI report:
   
   * 4cde341af47b24b20088c5379ff260bed0e9849c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] liuyongvs commented on pull request #22064: [FLINK-26945][table] add DATE_SUB function.

2023-03-01 Thread via GitHub



liuyongvs commented on PR #22064:
URL: https://github.com/apache/flink/pull/22064#issuecomment-1451174143

   after this https://github.com/apache/flink/pull/22050 fixed, the unit test 
will be passed.
   so when it be merged, i wll rebase 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] liuyongvs opened a new pull request, #22064: [FLINK-26945][table] add DATE_SUB function.

2023-03-01 Thread via GitHub



liuyongvs opened a new pull request, #22064:
URL: https://github.com/apache/flink/pull/22064

   ## What is the purpose of the change
   
   *add DATE_SUB function*
   
   
   ## Brief change log
   - *SELECT DATE_SUB('2018-05-01',INTERVAL '1' YEAR/MONTH/DAY);*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: ( no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes)
 - If yes, how is the feature documented? (docs)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #550: [FLINK-31178] Public Writer API

2023-03-01 Thread via GitHub



JingsongLi commented on code in PR #550:
URL: https://github.com/apache/flink-table-store/pull/550#discussion_r1122527375


##
flink-table-store-core/src/main/java/org/apache/flink/table/store/table/sink/TableWrite.java:
##
@@ -18,39 +18,42 @@
 
 package org.apache.flink.table.store.table.sink;
 
+import org.apache.flink.table.store.annotation.Experimental;
 import org.apache.flink.table.store.data.BinaryRow;
 import org.apache.flink.table.store.data.InternalRow;
 import org.apache.flink.table.store.file.disk.IOManager;
-import org.apache.flink.table.store.file.io.DataFileMeta;
-
-import java.util.List;
+import org.apache.flink.table.store.table.Table;
 
 /**
- * An abstraction layer above {@link 
org.apache.flink.table.store.file.operation.FileStoreWrite} to
- * provide {@link InternalRow} writing.
+ * Write of {@link Table} to provide {@link InternalRow} writing.
+ *
+ * @since 0.4.0
  */
+@Experimental
 public interface TableWrite extends AutoCloseable {
 
-TableWrite withOverwrite(boolean overwrite);
-
+/** With {@link IOManager}, this is needed if 'write-buffer-spillable' is 
set to true. */
 TableWrite withIOManager(IOManager ioManager);
 
-SinkRecord write(InternalRow rowData) throws Exception;
+/** Calculate which partition {@code row} belongs to. */
+BinaryRow getPartition(InternalRow row);
 
-/** Log record need to preserve original pk (which includes partition 
fields). */
-SinkRecord toLogRecord(SinkRecord record);
+/** Calculate which bucket {@code row} belongs to. */
+int getBucket(InternalRow row);
 
-void compact(BinaryRow partition, int bucket, boolean fullCompaction) 
throws Exception;
+/** Write a row to the writer. */
+void write(InternalRow row) throws Exception;
 
 /**
- * Notify that some new files are created at given snapshot in given 
bucket.
+ * Compact a bucket of a partition. By default, it will determine whether 
to perform the
+ * compaction according to the 'num-sorted-run.compaction-trigger' option. 
If fullCompaction is
+ * true, it will force a full compaction, which is expensive.
  *
- * Most probably, these files are created by another job. Currently 
this method is only used
- * by the dedicated compact job to see files created by writer jobs.
+ * NOTE: In Java API, full compaction is not automatically executed. If 
you open
+ * 'changelog-producer' of 'full-compaction', please execute this method 
regularly to produce

Review Comment:
   set 'changelog-producer' to 'full-compaction'



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 3 >

1 - 100 of 269 matches

Mail list logo