[GitHub] [flink] X-czh commented on pull request #23302: [FLINK-32848][tests][JUnit5 migration] Migrate flink-runtime/shuffle tests to JUnit5

2023-09-13 Thread via GitHub


X-czh commented on PR #23302:
URL: https://github.com/apache/flink/pull/23302#issuecomment-1718807193

   Hi @FangYongs, could you help check the change when you are free?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-33066) Enable to inject environment variable from secret/configmap to operatorPod

2023-09-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-33066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Márton Balassi closed FLINK-33066.
--
Fix Version/s: kubernetes-operator-1.7.0
   Resolution: Fixed

1053e26 in main

> Enable to inject environment variable from secret/configmap to operatorPod
> --
>
> Key: FLINK-33066
> URL: https://issues.apache.org/jira/browse/FLINK-33066
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: dongwoo.kim
>Assignee: dongwoo.kim
>Priority: Minor
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.7.0
>
>
> Hello, I've been working with the Flink Kubernetes operator and noticed that 
> the {{operatorPod.env}} only allows for simple key-value pairs and doesn't 
> support Kubernetes {{valueFrom}} syntax.
> How about changing template to support more various k8s syntax? 
> *Current template*
> {code:java}
> {{- range $k, $v := .Values.operatorPod.env }}
>   - name: {{ $v.name | quote }}
>     value: {{ $v.value | quote }}
> {{- end }}{code}
>  
> *Proposed template*
> 1) Modify template like below 
> {code:java}
> {{- with .Values.operatorPod.env }} 
> {{- toYaml . | nindent 12 }} 
> {{- end }} 
> {code}
> 2) create extra config, *Values.operatorPod.envFrom* and utilize this
>  
> I'd be happy to implement this update if it's approved.
> Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-kubernetes-operator] mbalassi merged pull request #671: [FLINK-33066] Support all k8s methods to configure env variable in operatorPod

2023-09-13 Thread via GitHub


mbalassi merged PR #671:
URL: https://github.com/apache/flink-kubernetes-operator/pull/671


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-33051) GlobalFailureHandler interface should be retired in favor of LabeledGlobalFailureHandler

2023-09-13 Thread Fang Yong (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang Yong reassigned FLINK-33051:
-

Assignee: Matt Wang

> GlobalFailureHandler interface should be retired in favor of 
> LabeledGlobalFailureHandler
> 
>
> Key: FLINK-33051
> URL: https://issues.apache.org/jira/browse/FLINK-33051
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Panagiotis Garefalakis
>Assignee: Matt Wang
>Priority: Minor
>
> FLIP-304 introduced `LabeledGlobalFailureHandler` interface that is an 
> extension of `GlobalFailureHandler` interface.  The later can thus be removed 
> in the future to avoid the existence of interfaces with duplicate functions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33051) GlobalFailureHandler interface should be retired in favor of LabeledGlobalFailureHandler

2023-09-13 Thread Matt Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764970#comment-17764970
 ] 

Matt Wang commented on FLINK-33051:
---

[~pgaref] got it, i will take it

> GlobalFailureHandler interface should be retired in favor of 
> LabeledGlobalFailureHandler
> 
>
> Key: FLINK-33051
> URL: https://issues.apache.org/jira/browse/FLINK-33051
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Panagiotis Garefalakis
>Priority: Minor
>
> FLIP-304 introduced `LabeledGlobalFailureHandler` interface that is an 
> extension of `GlobalFailureHandler` interface.  The later can thus be removed 
> in the future to avoid the existence of interfaces with duplicate functions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33051) GlobalFailureHandler interface should be retired in favor of LabeledGlobalFailureHandler

2023-09-13 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764964#comment-17764964
 ] 

Panagiotis Garefalakis commented on FLINK-33051:


[~wangm92]  – not sure I am going to have spare cycles to wrap this is up this 
week, feel free to take over if you are still interested

> GlobalFailureHandler interface should be retired in favor of 
> LabeledGlobalFailureHandler
> 
>
> Key: FLINK-33051
> URL: https://issues.apache.org/jira/browse/FLINK-33051
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Panagiotis Garefalakis
>Priority: Minor
>
> FLIP-304 introduced `LabeledGlobalFailureHandler` interface that is an 
> extension of `GlobalFailureHandler` interface.  The later can thus be removed 
> in the future to avoid the existence of interfaces with duplicate functions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] swuferhong commented on a diff in pull request #23412: [FLINK-33083] Properly apply ReadingMetadataSpec for a TableSourceScan

2023-09-13 Thread via GitHub


swuferhong commented on code in PR #23412:
URL: https://github.com/apache/flink/pull/23412#discussion_r1325238554


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/spec/DynamicTableSourceSpec.java:
##
@@ -95,29 +92,19 @@ private DynamicTableSource getTableSource(FlinkContext 
context, FlinkTypeFactory
 FactoryUtil.createDynamicTableSource(
 factory,
 contextResolvedTable.getIdentifier(),
-resolvedCatalogTable,
+contextResolvedTable.getResolvedTable(),
 loadOptionsFromCatalogTable(contextResolvedTable, 
context),
 context.getTableConfig(),
 context.getClassLoader(),
 contextResolvedTable.isTemporary());
-// validate DynamicSource and apply Metadata
-DynamicSourceUtils.prepareDynamicSource(
-contextResolvedTable.getIdentifier().toString(),
-resolvedCatalogTable,
-tableSource,
-false,
-context.getTableConfig().getConfiguration());
 
 if (sourceAbilities != null) {
-//  Note: use DynamicSourceUtils.createProducedType to produce 
the type info so that
-//  keep consistent with sql2Rel phase which also called the 
method producing
-//  deterministic format (PHYSICAL COLUMNS + METADATA COLUMNS) 
when converts a given
-//  DynamicTableSource to a RelNode.
-// TODO should do a refactor(e.g., add serialized input type 
info into each
-//  SourceAbilitySpec so as to avoid this implicit logic 
dependency)
 RowType newProducedType =
-DynamicSourceUtils.createProducedType(
-contextResolvedTable.getResolvedSchema(), 
tableSource);
+(RowType)
+contextResolvedTable
+.getResolvedSchema()
+.toSourceRowDataType()
+.getLogicalType();

Review Comment:
   Hi, @dawidwys, thanks for your PR. I have a small question that why we 
change the spec column index from `PHYSICAL COLUMNS + METADATA COLUMNS` to the 
original user-defined schema order here ? If we want to change this order, more 
code that uses source specs need to be modified, such as 
`ScanReuser.reuseDuplicatedScan`, cc @twalthr .



##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/spec/DynamicTableSourceSpec.java:
##
@@ -95,29 +92,19 @@ private DynamicTableSource getTableSource(FlinkContext 
context, FlinkTypeFactory
 FactoryUtil.createDynamicTableSource(
 factory,
 contextResolvedTable.getIdentifier(),
-resolvedCatalogTable,
+contextResolvedTable.getResolvedTable(),
 loadOptionsFromCatalogTable(contextResolvedTable, 
context),
 context.getTableConfig(),
 context.getClassLoader(),
 contextResolvedTable.isTemporary());
-// validate DynamicSource and apply Metadata
-DynamicSourceUtils.prepareDynamicSource(
-contextResolvedTable.getIdentifier().toString(),
-resolvedCatalogTable,
-tableSource,
-false,
-context.getTableConfig().getConfiguration());
 
 if (sourceAbilities != null) {
-//  Note: use DynamicSourceUtils.createProducedType to produce 
the type info so that
-//  keep consistent with sql2Rel phase which also called the 
method producing
-//  deterministic format (PHYSICAL COLUMNS + METADATA COLUMNS) 
when converts a given
-//  DynamicTableSource to a RelNode.
-// TODO should do a refactor(e.g., add serialized input type 
info into each
-//  SourceAbilitySpec so as to avoid this implicit logic 
dependency)
 RowType newProducedType =
-DynamicSourceUtils.createProducedType(
-contextResolvedTable.getResolvedSchema(), 
tableSource);
+(RowType)
+contextResolvedTable
+.getResolvedSchema()
+.toSourceRowDataType()
+.getLogicalType();

Review Comment:
   cc @twalthr 



-- 
This is an automated message from the Apache Git 

[GitHub] [flink] swuferhong commented on a diff in pull request #23412: [FLINK-33083] Properly apply ReadingMetadataSpec for a TableSourceScan

2023-09-13 Thread via GitHub


swuferhong commented on code in PR #23412:
URL: https://github.com/apache/flink/pull/23412#discussion_r1325320081


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/spec/DynamicTableSourceSpec.java:
##
@@ -95,29 +92,19 @@ private DynamicTableSource getTableSource(FlinkContext 
context, FlinkTypeFactory
 FactoryUtil.createDynamicTableSource(
 factory,
 contextResolvedTable.getIdentifier(),
-resolvedCatalogTable,
+contextResolvedTable.getResolvedTable(),
 loadOptionsFromCatalogTable(contextResolvedTable, 
context),
 context.getTableConfig(),
 context.getClassLoader(),
 contextResolvedTable.isTemporary());
-// validate DynamicSource and apply Metadata
-DynamicSourceUtils.prepareDynamicSource(
-contextResolvedTable.getIdentifier().toString(),
-resolvedCatalogTable,
-tableSource,
-false,
-context.getTableConfig().getConfiguration());
 
 if (sourceAbilities != null) {
-//  Note: use DynamicSourceUtils.createProducedType to produce 
the type info so that
-//  keep consistent with sql2Rel phase which also called the 
method producing
-//  deterministic format (PHYSICAL COLUMNS + METADATA COLUMNS) 
when converts a given
-//  DynamicTableSource to a RelNode.
-// TODO should do a refactor(e.g., add serialized input type 
info into each
-//  SourceAbilitySpec so as to avoid this implicit logic 
dependency)
 RowType newProducedType =
-DynamicSourceUtils.createProducedType(
-contextResolvedTable.getResolvedSchema(), 
tableSource);
+(RowType)
+contextResolvedTable
+.getResolvedSchema()
+.toSourceRowDataType()
+.getLogicalType();

Review Comment:
   cc @twalthr 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-ml] zhaozijun109 commented on pull request #255: [FLINK-33003] Support isolation forest algorithm in Flink ML ([fix] related ListStateWithCache)

2023-09-13 Thread via GitHub


zhaozijun109 commented on PR #255:
URL: https://github.com/apache/flink-ml/pull/255#issuecomment-1718703862

   @lindong28 Hi, could you please Re-run triggered it? Because the failures( 
Error:
BoundedPerRoundStreamIterationITCase.testPerRoundIterationWithState:170 
expected:<3> but was:<4> ) is ok on local test, so I'm not sure how to fix it, 
thank you teacher.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-25593) A redundant scan could be skipped if it is an input of join and the other input is empty after partition prune

2023-09-13 Thread luoyuxia (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia reassigned FLINK-25593:


Assignee: Yunhong Zheng  (was: luoyuxia)

> A redundant scan could be skipped if it is an input of join and the other 
> input is empty after partition prune
> --
>
> Key: FLINK-25593
> URL: https://issues.apache.org/jira/browse/FLINK-25593
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Jing Zhang
>Assignee: Yunhong Zheng
>Priority: Major
>
> A redundant scan could be skipped if it is an input of join and the other 
> input is empty after partition prune.
> For example:
> ltable has two partitions: pt=0 ad pt=1, rtable has one partition pt1=0.
> The schema of ltable is (lkey string, value int).
> The schema of rtable is (rkey string, value int).
> {code:sql}
> SELECT * FROM ltable, rtable WHERE pt=2 and pt1=0 and `lkey`=rkey
> {code}
> The plan is as following.
> {code:java}
> Calc(select=[lkey, value, CAST(2 AS INTEGER) AS pt, rkey, value1, CAST(0 AS 
> INTEGER) AS pt1])
> +- HashJoin(joinType=[InnerJoin], where=[=(lkey, rkey)], select=[lkey, value, 
> rkey, value1], build=[right])
>:- Exchange(distribution=[hash[lkey]])
>:  +- TableSourceScan(table=[[hive, source_db, ltable, partitions=[], 
> project=[lkey, value]]], fields=[lkey, value])
>+- Exchange(distribution=[hash[rkey]])
>   +- TableSourceScan(table=[[hive, source_db, rtable, 
> partitions=[{pt1=0}], project=[rkey, value1]]], fields=[rkey, value1])
> {code}
> There is no need to scan right side because the left input of join has 0 
> partitions after partition prune.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-25593) A redundant scan could be skipped if it is an input of join and the other input is empty after partition prune

2023-09-13 Thread Yunhong Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764951#comment-17764951
 ] 

Yunhong Zheng commented on FLINK-25593:
---

Hi, [~jark] . I think the better final plan of this case should be empty values.

> A redundant scan could be skipped if it is an input of join and the other 
> input is empty after partition prune
> --
>
> Key: FLINK-25593
> URL: https://issues.apache.org/jira/browse/FLINK-25593
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Jing Zhang
>Assignee: luoyuxia
>Priority: Major
>
> A redundant scan could be skipped if it is an input of join and the other 
> input is empty after partition prune.
> For example:
> ltable has two partitions: pt=0 ad pt=1, rtable has one partition pt1=0.
> The schema of ltable is (lkey string, value int).
> The schema of rtable is (rkey string, value int).
> {code:sql}
> SELECT * FROM ltable, rtable WHERE pt=2 and pt1=0 and `lkey`=rkey
> {code}
> The plan is as following.
> {code:java}
> Calc(select=[lkey, value, CAST(2 AS INTEGER) AS pt, rkey, value1, CAST(0 AS 
> INTEGER) AS pt1])
> +- HashJoin(joinType=[InnerJoin], where=[=(lkey, rkey)], select=[lkey, value, 
> rkey, value1], build=[right])
>:- Exchange(distribution=[hash[lkey]])
>:  +- TableSourceScan(table=[[hive, source_db, ltable, partitions=[], 
> project=[lkey, value]]], fields=[lkey, value])
>+- Exchange(distribution=[hash[rkey]])
>   +- TableSourceScan(table=[[hive, source_db, rtable, 
> partitions=[{pt1=0}], project=[rkey, value1]]], fields=[rkey, value1])
> {code}
> There is no need to scan right side because the left input of join has 0 
> partitions after partition prune.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-13 Thread Zili Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764950#comment-17764950
 ] 

Zili Chen commented on FLINK-33053:
---

But it's possible to add an option to explicitly identify the ownership. You 
can open an issue on the Curator JIRA project and let me with the other 
maintainers to figure it out.

> Watcher leak in Zookeeper HA mode
> -
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.18.0, 1.17.1
>Reporter: Yangze Guo
>Assignee: Yangze Guo
>Priority: Blocker
> Attachments: 26.dump.zip, 26.log, 
> taskmanager_flink-native-test-117-taskmanager-1-9_thread_dump (1).json
>
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-25593) A redundant scan could be skipped if it is an input of join and the other input is empty after partition prune

2023-09-13 Thread Yunhong Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764949#comment-17764949
 ] 

Yunhong Zheng commented on FLINK-25593:
---

Hi, [~luoyuxia] .Is this still in progress? If not, could you assign to me! 
Thanks!

> A redundant scan could be skipped if it is an input of join and the other 
> input is empty after partition prune
> --
>
> Key: FLINK-25593
> URL: https://issues.apache.org/jira/browse/FLINK-25593
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Jing Zhang
>Assignee: luoyuxia
>Priority: Major
>
> A redundant scan could be skipped if it is an input of join and the other 
> input is empty after partition prune.
> For example:
> ltable has two partitions: pt=0 ad pt=1, rtable has one partition pt1=0.
> The schema of ltable is (lkey string, value int).
> The schema of rtable is (rkey string, value int).
> {code:sql}
> SELECT * FROM ltable, rtable WHERE pt=2 and pt1=0 and `lkey`=rkey
> {code}
> The plan is as following.
> {code:java}
> Calc(select=[lkey, value, CAST(2 AS INTEGER) AS pt, rkey, value1, CAST(0 AS 
> INTEGER) AS pt1])
> +- HashJoin(joinType=[InnerJoin], where=[=(lkey, rkey)], select=[lkey, value, 
> rkey, value1], build=[right])
>:- Exchange(distribution=[hash[lkey]])
>:  +- TableSourceScan(table=[[hive, source_db, ltable, partitions=[], 
> project=[lkey, value]]], fields=[lkey, value])
>+- Exchange(distribution=[hash[rkey]])
>   +- TableSourceScan(table=[[hive, source_db, rtable, 
> partitions=[{pt1=0}], project=[rkey, value1]]], fields=[rkey, value1])
> {code}
> There is no need to scan right side because the left input of join has 0 
> partitions after partition prune.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-13 Thread Zili Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764948#comment-17764948
 ] 

Zili Chen commented on FLINK-33053:
---

No. Both {{CuratorCache}} and {{TreeCache}} doesn't "own" the path so it's 
unclear if other recipes share the same client (connection) set up watches 
also. This is different from {{LeaderLatch}} which owns the path so it can 
ensure that no one else (should) access the related nodes.

> Watcher leak in Zookeeper HA mode
> -
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.18.0, 1.17.1
>Reporter: Yangze Guo
>Assignee: Yangze Guo
>Priority: Blocker
> Attachments: 26.dump.zip, 26.log, 
> taskmanager_flink-native-test-117-taskmanager-1-9_thread_dump (1).json
>
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32863) Improve Flink UI's time precision from second level to millisecond level

2023-09-13 Thread Yangze Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yangze Guo closed FLINK-32863.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

> Improve Flink UI's time precision from second level to millisecond level
> 
>
> Key: FLINK-32863
> URL: https://issues.apache.org/jira/browse/FLINK-32863
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Web Frontend
>Affects Versions: 1.17.1
>Reporter: Runkang He
>Assignee: Jufang He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> This an UI improvement for OLAP jobs.
> OLAP queries are generally small queries which will finish at the seconds or 
> milliseconds, but currently the time precision displayed is second level and 
> not enough for OLAP queries. Millisecond part of time is very important for 
> users and developers, to see accurate time, for performance measurement and 
> optimization. The displayed time includes job duration, task duration, task 
> start time, end time and so on.
> It would be nice to improve this for better OLAP user experience.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32863) Improve Flink UI's time precision from second level to millisecond level

2023-09-13 Thread Yangze Guo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764946#comment-17764946
 ] 

Yangze Guo commented on FLINK-32863:


master: 2d74260f41275add3c16ea72ee4ecedbec45f9d4

> Improve Flink UI's time precision from second level to millisecond level
> 
>
> Key: FLINK-32863
> URL: https://issues.apache.org/jira/browse/FLINK-32863
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Web Frontend
>Affects Versions: 1.17.1
>Reporter: Runkang He
>Assignee: Jufang He
>Priority: Major
>  Labels: pull-request-available
>
> This an UI improvement for OLAP jobs.
> OLAP queries are generally small queries which will finish at the seconds or 
> milliseconds, but currently the time precision displayed is second level and 
> not enough for OLAP queries. Millisecond part of time is very important for 
> users and developers, to see accurate time, for performance measurement and 
> optimization. The displayed time includes job duration, task duration, task 
> start time, end time and so on.
> It would be nice to improve this for better OLAP user experience.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32925) Select executing Release Manager

2023-09-13 Thread Qingsheng Ren (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764945#comment-17764945
 ] 

Qingsheng Ren commented on FLINK-32925:
---

Jing's public key has been published: 
[https://dist.apache.org/repos/dist/release/flink/KEYS]

[~jingge] could you check if all steps described above has been done and close 
this issue before creating the first RC? Thanks

> Select executing Release Manager
> 
>
> Key: FLINK-32925
> URL: https://issues.apache.org/jira/browse/FLINK-32925
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Sergey Nuyanzin
>Assignee: Jing Ge
>Priority: Major
>
> h4. GPG Key
> You need to have a GPG key to sign the release artifacts. Please be aware of 
> the ASF-wide [release signing 
> guidelines|https://www.apache.org/dev/release-signing.html]. If you don’t 
> have a GPG key associated with your Apache account, please create one 
> according to the guidelines.
> Determine your Apache GPG Key and Key ID, as follows:
> {code:java}
> $ gpg --list-keys
> {code}
> This will list your GPG keys. One of these should reflect your Apache 
> account, for example:
> {code:java}
> --
> pub   2048R/845E6689 2016-02-23
> uid                  Nomen Nescio 
> sub   2048R/BA4D50BE 2016-02-23
> {code}
> In the example above, the key ID is the 8-digit hex string in the \{{pub}} 
> line: \{{{}845E6689{}}}.
> Now, add your Apache GPG key to the Flink’s \{{KEYS}} file in the [Apache 
> Flink release KEYS 
> file|https://dist.apache.org/repos/dist/release/flink/KEYS] repository at 
> [dist.apache.org|http://dist.apache.org/]. Follow the instructions listed at 
> the top of these files. (Note: Only PMC members have write access to the 
> release repository. If you end up getting 403 errors ask on the mailing list 
> for assistance.)
> Configure \{{git}} to use this key when signing code by giving it your key 
> ID, as follows:
> {code:java}
> $ git config --global user.signingkey 845E6689
> {code}
> You may drop the \{{--global}} option if you’d prefer to use this key for the 
> current repository only.
> You may wish to start \{{gpg-agent}} to unlock your GPG key only once using 
> your passphrase. Otherwise, you may need to enter this passphrase hundreds of 
> times. The setup for \{{gpg-agent}} varies based on operating system, but may 
> be something like this:
> {code:bash}
> $ eval $(gpg-agent --daemon --no-grab --write-env-file $HOME/.gpg-agent-info)
> $ export GPG_TTY=$(tty)
> $ export GPG_AGENT_INFO
> {code}
> h4. Access to Apache Nexus repository
> Configure access to the [Apache Nexus 
> repository|https://repository.apache.org/], which enables final deployment of 
> releases to the Maven Central Repository.
>  # You log in with your Apache account.
>  # Confirm you have appropriate access by finding \{{org.apache.flink}} under 
> \{{{}Staging Profiles{}}}.
>  # Navigate to your \{{Profile}} (top right drop-down menu of the page).
>  # Choose \{{User Token}} from the dropdown, then click \{{{}Access User 
> Token{}}}. Copy a snippet of the Maven XML configuration block.
>  # Insert this snippet twice into your global Maven \{{settings.xml}} file, 
> typically \{{{}${HOME}/.m2/settings.xml{}}}. The end result should look like 
> this, where \{{TOKEN_NAME}} and \{{TOKEN_PASSWORD}} are your secret tokens:
> {code:xml}
> 
>    
>      
>        apache.releases.https
>        TOKEN_NAME
>        TOKEN_PASSWORD
>      
>      
>        apache.snapshots.https
>        TOKEN_NAME
>        TOKEN_PASSWORD
>      
>    
>  
> {code}
> h4. Website development setup
> Get ready for updating the Flink website by following the [website 
> development 
> instructions|https://flink.apache.org/contributing/improve-website.html].
> h4. GNU Tar Setup for Mac (Skip this step if you are not using a Mac)
> The default tar application on Mac does not support GNU archive format and 
> defaults to Pax. This bloats the archive with unnecessary metadata that can 
> result in additional files when decompressing (see [1.15.2-RC2 vote 
> thread|https://lists.apache.org/thread/mzbgsb7y9vdp9bs00gsgscsjv2ygy58q]). 
> Install gnu-tar and create a symbolic link to use in preference of the 
> default tar program.
> {code:bash}
> $ brew install gnu-tar
> $ ln -s /usr/local/bin/gtar /usr/local/bin/tar
> $ which tar
> {code}
>  
> 
> h3. Expectations
>  * Release Manager’s GPG key is published to 
> [dist.apache.org|http://dist.apache.org/]
>  * Release Manager’s GPG key is configured in git configuration
>  * Release Manager's GPG key is configured as the default gpg key.
>  * Release Manager has \{{org.apache.flink}} listed under Staging Profiles in 
> Nexus
>  * Release Manager’s Nexus User Token is configured in settings.xml



--
This message was sent by Atlassian 

[jira] [Assigned] (FLINK-32925) Select executing Release Manager

2023-09-13 Thread Qingsheng Ren (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qingsheng Ren reassigned FLINK-32925:
-

Assignee: Jing Ge

> Select executing Release Manager
> 
>
> Key: FLINK-32925
> URL: https://issues.apache.org/jira/browse/FLINK-32925
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Sergey Nuyanzin
>Assignee: Jing Ge
>Priority: Major
>
> h4. GPG Key
> You need to have a GPG key to sign the release artifacts. Please be aware of 
> the ASF-wide [release signing 
> guidelines|https://www.apache.org/dev/release-signing.html]. If you don’t 
> have a GPG key associated with your Apache account, please create one 
> according to the guidelines.
> Determine your Apache GPG Key and Key ID, as follows:
> {code:java}
> $ gpg --list-keys
> {code}
> This will list your GPG keys. One of these should reflect your Apache 
> account, for example:
> {code:java}
> --
> pub   2048R/845E6689 2016-02-23
> uid                  Nomen Nescio 
> sub   2048R/BA4D50BE 2016-02-23
> {code}
> In the example above, the key ID is the 8-digit hex string in the \{{pub}} 
> line: \{{{}845E6689{}}}.
> Now, add your Apache GPG key to the Flink’s \{{KEYS}} file in the [Apache 
> Flink release KEYS 
> file|https://dist.apache.org/repos/dist/release/flink/KEYS] repository at 
> [dist.apache.org|http://dist.apache.org/]. Follow the instructions listed at 
> the top of these files. (Note: Only PMC members have write access to the 
> release repository. If you end up getting 403 errors ask on the mailing list 
> for assistance.)
> Configure \{{git}} to use this key when signing code by giving it your key 
> ID, as follows:
> {code:java}
> $ git config --global user.signingkey 845E6689
> {code}
> You may drop the \{{--global}} option if you’d prefer to use this key for the 
> current repository only.
> You may wish to start \{{gpg-agent}} to unlock your GPG key only once using 
> your passphrase. Otherwise, you may need to enter this passphrase hundreds of 
> times. The setup for \{{gpg-agent}} varies based on operating system, but may 
> be something like this:
> {code:bash}
> $ eval $(gpg-agent --daemon --no-grab --write-env-file $HOME/.gpg-agent-info)
> $ export GPG_TTY=$(tty)
> $ export GPG_AGENT_INFO
> {code}
> h4. Access to Apache Nexus repository
> Configure access to the [Apache Nexus 
> repository|https://repository.apache.org/], which enables final deployment of 
> releases to the Maven Central Repository.
>  # You log in with your Apache account.
>  # Confirm you have appropriate access by finding \{{org.apache.flink}} under 
> \{{{}Staging Profiles{}}}.
>  # Navigate to your \{{Profile}} (top right drop-down menu of the page).
>  # Choose \{{User Token}} from the dropdown, then click \{{{}Access User 
> Token{}}}. Copy a snippet of the Maven XML configuration block.
>  # Insert this snippet twice into your global Maven \{{settings.xml}} file, 
> typically \{{{}${HOME}/.m2/settings.xml{}}}. The end result should look like 
> this, where \{{TOKEN_NAME}} and \{{TOKEN_PASSWORD}} are your secret tokens:
> {code:xml}
> 
>    
>      
>        apache.releases.https
>        TOKEN_NAME
>        TOKEN_PASSWORD
>      
>      
>        apache.snapshots.https
>        TOKEN_NAME
>        TOKEN_PASSWORD
>      
>    
>  
> {code}
> h4. Website development setup
> Get ready for updating the Flink website by following the [website 
> development 
> instructions|https://flink.apache.org/contributing/improve-website.html].
> h4. GNU Tar Setup for Mac (Skip this step if you are not using a Mac)
> The default tar application on Mac does not support GNU archive format and 
> defaults to Pax. This bloats the archive with unnecessary metadata that can 
> result in additional files when decompressing (see [1.15.2-RC2 vote 
> thread|https://lists.apache.org/thread/mzbgsb7y9vdp9bs00gsgscsjv2ygy58q]). 
> Install gnu-tar and create a symbolic link to use in preference of the 
> default tar program.
> {code:bash}
> $ brew install gnu-tar
> $ ln -s /usr/local/bin/gtar /usr/local/bin/tar
> $ which tar
> {code}
>  
> 
> h3. Expectations
>  * Release Manager’s GPG key is published to 
> [dist.apache.org|http://dist.apache.org/]
>  * Release Manager’s GPG key is configured in git configuration
>  * Release Manager's GPG key is configured as the default gpg key.
>  * Release Manager has \{{org.apache.flink}} listed under Staging Profiles in 
> Nexus
>  * Release Manager’s Nexus User Token is configured in settings.xml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-32802) Release Testing: Verify FLIP-291: Externalized Declarative Resource Management

2023-09-13 Thread Qingsheng Ren (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qingsheng Ren resolved FLINK-32802.
---
Resolution: Done

> Release Testing: Verify FLIP-291: Externalized Declarative Resource Management
> --
>
> Key: FLINK-32802
> URL: https://issues.apache.org/jira/browse/FLINK-32802
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.18.0
>Reporter: Qingsheng Ren
>Assignee: ConradJam
>Priority: Major
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-32726) [Umbrella] Test Flink Release 1.18

2023-09-13 Thread Qingsheng Ren (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qingsheng Ren resolved FLINK-32726.
---
Resolution: Done

> [Umbrella] Test Flink Release 1.18
> --
>
> Key: FLINK-32726
> URL: https://issues.apache.org/jira/browse/FLINK-32726
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.18.0
>Reporter: Qingsheng Ren
>Assignee: Qingsheng Ren
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-13 Thread Yangze Guo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764943#comment-17764943
 ] 

Yangze Guo commented on FLINK-33053:


Thanks for the pointer [~tison] . I'd like to add a safetynet atm as 
ZOOKEEPER-4625 has not been fixed and the patch might not be cherry-pick to old 
version. BTW, could we fix it in curator?

> Watcher leak in Zookeeper HA mode
> -
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.18.0, 1.17.1
>Reporter: Yangze Guo
>Priority: Blocker
> Attachments: 26.dump.zip, 26.log, 
> taskmanager_flink-native-test-117-taskmanager-1-9_thread_dump (1).json
>
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-13 Thread Yangze Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yangze Guo reassigned FLINK-33053:
--

Assignee: Yangze Guo

> Watcher leak in Zookeeper HA mode
> -
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.18.0, 1.17.1
>Reporter: Yangze Guo
>Assignee: Yangze Guo
>Priority: Blocker
> Attachments: 26.dump.zip, 26.log, 
> taskmanager_flink-native-test-117-taskmanager-1-9_thread_dump (1).json
>
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #23413: [FLINK-33086] Protect failure enrichment against unhandled exceptions

2023-09-13 Thread via GitHub


flinkbot commented on PR #23413:
URL: https://github.com/apache/flink/pull/23413#issuecomment-1718644833

   
   ## CI report:
   
   * b88ee0b7f2f6bb9a9ab8f496d333261434152c61 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on pull request #23413: [FLINK-33086] Protect failure enrichment against unhandled exceptions

2023-09-13 Thread via GitHub


pgaref commented on PR #23413:
URL: https://github.com/apache/flink/pull/23413#issuecomment-1718641984

   cc @zentol 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33086) Protect failure enrichment against unhandled exceptions

2023-09-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33086:
---
Labels: pull-request-available  (was: )

> Protect failure enrichment against unhandled exceptions
> ---
>
> Key: FLINK-33086
> URL: https://issues.apache.org/jira/browse/FLINK-33086
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>
> Existing 
> [labelFailure|https://github.com/apache/flink/blob/603181da811edb47c0d573492639a381fbbedc28/flink-runtime/src/main/java/org/apache/flink/runtime/failure/FailureEnricherUtils.java#L175]
>  async logic is expecting FailureEnricher future to never fail (or do their 
> own exception handling) however there is no way to enforce that as they are 
> loaded as they are implemented as pluggable components. This could result to 
> throwing away labels from other enrichers that successfully completed. 
> A better solution would be to handle the failures and LOG the errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] pgaref opened a new pull request, #23413: [FLINK-33086] Protect failure enrichment against unhandled exceptions

2023-09-13 Thread via GitHub


pgaref opened a new pull request, #23413:
URL: https://github.com/apache/flink/pull/23413

   
   
   ## What is the purpose of the change
   
   https://issues.apache.org/jira/browse/FLINK-33086
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] libenchao commented on pull request #23323: [FLINK-32738][formats] PROTOBUF format supports projection push down

2023-09-13 Thread via GitHub


libenchao commented on PR #23323:
URL: https://github.com/apache/flink/pull/23323#issuecomment-1718633599

   @maosuhan @ljw-hit Are you interested in reviewing this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33086) Protect failure enrichment against unhandled exceptions

2023-09-13 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated FLINK-33086:
---
Description: 
Existing 
[labelFailure|https://github.com/apache/flink/blob/603181da811edb47c0d573492639a381fbbedc28/flink-runtime/src/main/java/org/apache/flink/runtime/failure/FailureEnricherUtils.java#L175]
 async logic is expecting FailureEnricher future to never fail (or do their own 
exception handling) however there is no way to enforce that as they are loaded 
as they are implemented as pluggable components. This could result to throwing 
away labels from other enrichers that successfully completed. 

A better solution would be to handle the failures and LOG the errors.

> Protect failure enrichment against unhandled exceptions
> ---
>
> Key: FLINK-33086
> URL: https://issues.apache.org/jira/browse/FLINK-33086
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Panagiotis Garefalakis
>Priority: Major
>
> Existing 
> [labelFailure|https://github.com/apache/flink/blob/603181da811edb47c0d573492639a381fbbedc28/flink-runtime/src/main/java/org/apache/flink/runtime/failure/FailureEnricherUtils.java#L175]
>  async logic is expecting FailureEnricher future to never fail (or do their 
> own exception handling) however there is no way to enforce that as they are 
> loaded as they are implemented as pluggable components. This could result to 
> throwing away labels from other enrichers that successfully completed. 
> A better solution would be to handle the failures and LOG the errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] wangzzu commented on pull request #23405: [FLINK-31895] Add End-to-end integration tests for failure labels

2023-09-13 Thread via GitHub


wangzzu commented on PR #23405:
URL: https://github.com/apache/flink/pull/23405#issuecomment-1718632460

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-32775) Support yarn.provided.lib.dirs to add parent directory to classpath

2023-09-13 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764937#comment-17764937
 ] 

Yang Wang commented on FLINK-32775:
---

Thanks [~argoyal]  for your contribution.

> Support yarn.provided.lib.dirs to add parent directory to classpath
> ---
>
> Key: FLINK-32775
> URL: https://issues.apache.org/jira/browse/FLINK-32775
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / YARN
>Reporter: Archit Goyal
>Assignee: Archit Goyal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently with {*}yarn.provided.lib.dirs{*}, Flink libs can be copied to HDFS 
> location in each cluster and when set Flink tries to reuse the same jars 
> avoiding uploading it every time and YARN also caches it in the nodes.
>  
> This works fine with jars but if we try to add the xml file parent directory 
> to path, Flink job submission fails. If I add the parent directory of the xml 
> to the 
> {noformat}
> yarn.ship-files{noformat}
>  Flink job is submitted successfully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32775) Support yarn.provided.lib.dirs to add parent directory to classpath

2023-09-13 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang closed FLINK-32775.
-
Resolution: Fixed

> Support yarn.provided.lib.dirs to add parent directory to classpath
> ---
>
> Key: FLINK-32775
> URL: https://issues.apache.org/jira/browse/FLINK-32775
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / YARN
>Reporter: Archit Goyal
>Assignee: Archit Goyal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently with {*}yarn.provided.lib.dirs{*}, Flink libs can be copied to HDFS 
> location in each cluster and when set Flink tries to reuse the same jars 
> avoiding uploading it every time and YARN also caches it in the nodes.
>  
> This works fine with jars but if we try to add the xml file parent directory 
> to path, Flink job submission fails. If I add the parent directory of the xml 
> to the 
> {noformat}
> yarn.ship-files{noformat}
>  Flink job is submitted successfully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32775) Support yarn.provided.lib.dirs to add parent directory to classpath

2023-09-13 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-32775:
--
Fix Version/s: 1.19.0

> Support yarn.provided.lib.dirs to add parent directory to classpath
> ---
>
> Key: FLINK-32775
> URL: https://issues.apache.org/jira/browse/FLINK-32775
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / YARN
>Reporter: Archit Goyal
>Assignee: Archit Goyal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently with {*}yarn.provided.lib.dirs{*}, Flink libs can be copied to HDFS 
> location in each cluster and when set Flink tries to reuse the same jars 
> avoiding uploading it every time and YARN also caches it in the nodes.
>  
> This works fine with jars but if we try to add the xml file parent directory 
> to path, Flink job submission fails. If I add the parent directory of the xml 
> to the 
> {noformat}
> yarn.ship-files{noformat}
>  Flink job is submitted successfully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32775) Support yarn.provided.lib.dirs to add parent directory to classpath

2023-09-13 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764936#comment-17764936
 ] 

Yang Wang commented on FLINK-32775:
---

Merged in master via 1f9621806451411af26ccbab5c5342ef3308e219.

> Support yarn.provided.lib.dirs to add parent directory to classpath
> ---
>
> Key: FLINK-32775
> URL: https://issues.apache.org/jira/browse/FLINK-32775
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / YARN
>Reporter: Archit Goyal
>Assignee: Archit Goyal
>Priority: Minor
>  Labels: pull-request-available
>
> Currently with {*}yarn.provided.lib.dirs{*}, Flink libs can be copied to HDFS 
> location in each cluster and when set Flink tries to reuse the same jars 
> avoiding uploading it every time and YARN also caches it in the nodes.
>  
> This works fine with jars but if we try to add the xml file parent directory 
> to path, Flink job submission fails. If I add the parent directory of the xml 
> to the 
> {noformat}
> yarn.ship-files{noformat}
>  Flink job is submitted successfully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33086) Protect failure enrichment against unhandled exceptions

2023-09-13 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created FLINK-33086:
--

 Summary: Protect failure enrichment against unhandled exceptions
 Key: FLINK-33086
 URL: https://issues.apache.org/jira/browse/FLINK-33086
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] wangyang0918 merged pull request #23164: [FLINK- 32775]Add parent dir of files to classpath using yarn.provided.lib.dirs

2023-09-13 Thread via GitHub


wangyang0918 merged PR #23164:
URL: https://github.com/apache/flink/pull/23164


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on pull request #23386: [FLINK-33022] Log an error when enrichers defined as part of the configuration can not be found/loaded

2023-09-13 Thread via GitHub


pgaref commented on PR #23386:
URL: https://github.com/apache/flink/pull/23386#issuecomment-1718622106

   > Thanks @pgaref for preparing this PR. I left two comments. Please take a 
look.
   
   Appreciate the review @huwh ! Updated the PR, please let me know what you 
think 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] swuferhong commented on a diff in pull request #23412: [FLINK-33083] Properly apply ReadingMetadataSpec for a TableSourceScan

2023-09-13 Thread via GitHub


swuferhong commented on code in PR #23412:
URL: https://github.com/apache/flink/pull/23412#discussion_r1325238554


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/spec/DynamicTableSourceSpec.java:
##
@@ -95,29 +92,19 @@ private DynamicTableSource getTableSource(FlinkContext 
context, FlinkTypeFactory
 FactoryUtil.createDynamicTableSource(
 factory,
 contextResolvedTable.getIdentifier(),
-resolvedCatalogTable,
+contextResolvedTable.getResolvedTable(),
 loadOptionsFromCatalogTable(contextResolvedTable, 
context),
 context.getTableConfig(),
 context.getClassLoader(),
 contextResolvedTable.isTemporary());
-// validate DynamicSource and apply Metadata
-DynamicSourceUtils.prepareDynamicSource(
-contextResolvedTable.getIdentifier().toString(),
-resolvedCatalogTable,
-tableSource,
-false,
-context.getTableConfig().getConfiguration());
 
 if (sourceAbilities != null) {
-//  Note: use DynamicSourceUtils.createProducedType to produce 
the type info so that
-//  keep consistent with sql2Rel phase which also called the 
method producing
-//  deterministic format (PHYSICAL COLUMNS + METADATA COLUMNS) 
when converts a given
-//  DynamicTableSource to a RelNode.
-// TODO should do a refactor(e.g., add serialized input type 
info into each
-//  SourceAbilitySpec so as to avoid this implicit logic 
dependency)
 RowType newProducedType =
-DynamicSourceUtils.createProducedType(
-contextResolvedTable.getResolvedSchema(), 
tableSource);
+(RowType)
+contextResolvedTable
+.getResolvedSchema()
+.toSourceRowDataType()
+.getLogicalType();

Review Comment:
   Hi, @dawidwys, thanks for your PR. I have a small question that why we 
change the spec column index from `PHYSICAL COLUMNS + METADATA COLUMNS` to the 
original user-defined schema order here ? If we want to change this order, more 
code that uses source specs need to be modified, such as 
`ScanReuser.reuseDuplicatedScan`, cc @tillrohrmann .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33052) codespeed and benchmark server is down

2023-09-13 Thread Jing Ge (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Ge updated FLINK-33052:

Summary: codespeed and benchmark server is down  (was: codespeed server is 
down)

> codespeed and benchmark server is down
> --
>
> Key: FLINK-33052
> URL: https://issues.apache.org/jira/browse/FLINK-33052
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks, Test Infrastructure
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Assignee: Zakelly Lan
>Priority: Blocker
>
> No update in #flink-dev-benchmarks slack channel since 25th August.
> It was a EC2 running in a legacy aws account. Currently on one knows which 
> account it is. 
>  
> https://apache-flink.slack.com/archives/C0471S0DFJ9/p1693932155128359



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32995) TPC-DS end-to-end test fails with chmod: cannot access '../target/generator/dsdgen_linux':

2023-09-13 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-32995:
---
Labels: stale-critical test-stability  (was: test-stability)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issues has been marked as 
Critical but is unassigned and neither itself nor its Sub-Tasks have been 
updated for 14 days. I have gone ahead and marked it "stale-critical". If this 
ticket is critical, please either assign yourself or give an update. 
Afterwards, please remove the label or in 7 days the issue will be 
deprioritized.


> TPC-DS end-to-end test fails with chmod: cannot access 
> '../target/generator/dsdgen_linux': 
> ---
>
> Key: FLINK-32995
> URL: https://issues.apache.org/jira/browse/FLINK-32995
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: stale-critical, test-stability
>
> This build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=52773=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=0f3adb59-eefa-51c6-2858-3654d9e0749d=5504
>  fails as
> {noformat}
> Aug 29 10:03:20 [INFO] 10:03:20 Generating TPC-DS qualification data, this 
> need several minutes, please wait...
> chmod: cannot access '../target/generator/dsdgen_linux': No such file or 
> directory
> Aug 29 10:03:20 [FAIL] Test script contains errors.
> Aug 29 10:03:20 Checking for errors...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32753) Print JVM flags on AZP

2023-09-13 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-32753:
---
Labels: pull-request-available stale-assigned  (was: pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issue is assigned but has not 
received an update in 30 days, so it has been labeled "stale-assigned".
If you are still working on the issue, please remove the label and add a 
comment updating the community on your progress.  If this issue is waiting on 
feedback, please consider this a reminder to the committer/reviewer. Flink is a 
very active project, and so we appreciate your patience.
If you are no longer working on the issue, please unassign yourself so someone 
else may work on it.


> Print JVM flags on AZP
> --
>
> Key: FLINK-32753
> URL: https://issues.apache.org/jira/browse/FLINK-32753
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System / Azure Pipelines
>Reporter: Zakelly Lan
>Assignee: Zakelly Lan
>Priority: Minor
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.18.0
>
>
> I suggest printing JVM flags before the tests run, which could help 
> investigate the test failures (especially memory or GC related issue). An 
> example of pipeline output 
> [here|https://dev.azure.com/lzq82555906/flink-for-Zakelly/_build/results?buildId=122=logs=9dc1b5dc-bcfa-5f83-eaa7-0cb181ddc267=511d2595-ec54-5ab7-86ce-92f328796f20=165].
>  You may search 'JVM information' in this log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-kubernetes-operator] srpraneeth commented on a diff in pull request #670: [FLINK-31871] Interpret Flink MemoryUnits according to the actual user input

2023-09-13 Thread via GitHub


srpraneeth commented on code in PR #670:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/670#discussion_r1325010706


##
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkConfigBuilder.java:
##
@@ -420,13 +421,24 @@ private void setResource(Resource resource, Configuration 
effectiveConfig, boole
 ? JobManagerOptions.TOTAL_PROCESS_MEMORY
 : TaskManagerOptions.TOTAL_PROCESS_MEMORY;
 if (resource.getMemory() != null) {
-effectiveConfig.setString(memoryConfigOption.key(), 
resource.getMemory());
+effectiveConfig.setString(
+memoryConfigOption.key(), 
parseResourceMemoryString(resource.getMemory()));
 }
 
 configureCpu(resource, effectiveConfig, isJM);
 }
 }
 
+// Using the K8s units specification for the JM and TM memory settings
+private String parseResourceMemoryString(String memory) {
+try {
+return MemorySize.parse(memory).toString();

Review Comment:
   @gyfora Sorry my bad I should not have force pushed. 
   
   Just for the sake of understanding I am using the term Approach 1 with the 
current approach and Approach2 using try/catch as suggested by Gyula(My 
previous approach). 
   The interpreted values for different configurations is listed below. 
   
   | Configuration | Previous InterpretedValue  | New 
InterpretedValue (Approach1)  | New InterpretedValue (Approach2)

| 
   
|---|---|---|-|
   | 2g| 2147483648 b   | 20 b  
| 2147483648 b  

  |
   | 2gb   | 2147483648 b   | 20 b  
| Fail (Could not parse the syntax i by the 
MemorySize, could not parse the syntax g by K8s spec) |
   | 2G| 2147483648 b   | 20 b  
| 2147483648 b  

  |
   | 2 g   | 2147483648 b   | 20 b  
| 2147483648 b  

  |
   | 512m  | 536870912 b
| 51200 b   | 536870912 b   

  |
   | 2gi   | Fail (Could not parse value '2gi') | 2147483648 b  
| Fail (Could not parse the syntax i by 
the MemorySize, could not parse the syntax g by K8s spec) |
   | 2Gi   | Fail (Could not parse value '2Gi') | 2147483648 b  
| 2147483648 b  

  |
   | 2gib  | Fail (Could not parse value '2gi') | 2147483648 b  
| Fail (Could not parse the syntax i by 
the MemorySize, could not parse the syntax g by K8s spec) |
   | 2 Gi  | Fail (Could not parse value '2 Gi')| 2147483648 b  
| 2147483648 b  

  |
   | 512mi | Fail (Could not parse value '512mi')   | 536870912 b   
| Fail (Could not parse the syntax i by 
the MemorySize, could not parse the syntax g by K8s spec) |
   | 100b  | 100b   
| 100 b | 100 b 


  |
   | 100 b | 100b   
| 

[GitHub] [flink] architgyl commented on pull request #23164: [FLINK- 32775]Add parent dir of files to classpath using yarn.provided.lib.dirs

2023-09-13 Thread via GitHub


architgyl commented on PR #23164:
URL: https://github.com/apache/flink/pull/23164#issuecomment-1718211945

   > @architgyl Could you please verify this PR in a real YARN cluster whether 
it solves your original requirement about hive config? After then I will merge 
this PR.
   
   I have cerified, the changes and they work as expected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] venkata91 commented on a diff in pull request #23313: [FLINK-20767][table planner] Support filter push down on nested fields

2023-09-13 Thread via GitHub


venkata91 commented on code in PR #23313:
URL: https://github.com/apache/flink/pull/23313#discussion_r1324917758


##
flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/utils/RexNodeExtractor.scala:
##
@@ -395,9 +397,19 @@ class RexNodeToExpressionConverter(
 inputNames: Array[String],
 functionCatalog: FunctionCatalog,
 catalogManager: CatalogManager,
-timeZone: TimeZone)
+timeZone: TimeZone,
+relDataType: Option[RelDataType] = None)
   extends RexVisitor[Option[ResolvedExpression]] {
 
+  def this(
+  rexBuilder: RexBuilder,
+  inputNames: Array[String],
+  functionCatalog: FunctionCatalog,
+  catalogManager: CatalogManager,
+  timeZone: TimeZone) = {
+this(rexBuilder, inputNames, functionCatalog, catalogManager, timeZone, 
null)

Review Comment:
   Will change it to `None` instead of `null` to conform with Scala.



##
flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/utils/RexNodeExtractor.scala:
##
@@ -395,9 +397,19 @@ class RexNodeToExpressionConverter(
 inputNames: Array[String],
 functionCatalog: FunctionCatalog,
 catalogManager: CatalogManager,
-timeZone: TimeZone)
+timeZone: TimeZone,
+relDataType: Option[RelDataType] = None)
   extends RexVisitor[Option[ResolvedExpression]] {
 
+  def this(
+  rexBuilder: RexBuilder,
+  inputNames: Array[String],
+  functionCatalog: FunctionCatalog,
+  catalogManager: CatalogManager,
+  timeZone: TimeZone) = {
+this(rexBuilder, inputNames, functionCatalog, catalogManager, timeZone, 
null)

Review Comment:
   Will change it to `None` instead of `null` to conform with Scala 
rules/conventions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] venkata91 commented on a diff in pull request #23313: [FLINK-20767][table planner] Support filter push down on nested fields

2023-09-13 Thread via GitHub


venkata91 commented on code in PR #23313:
URL: https://github.com/apache/flink/pull/23313#discussion_r1324916866


##
flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/utils/RexNodeExtractor.scala:
##
@@ -395,9 +397,19 @@ class RexNodeToExpressionConverter(
 inputNames: Array[String],
 functionCatalog: FunctionCatalog,
 catalogManager: CatalogManager,
-timeZone: TimeZone)
+timeZone: TimeZone,
+relDataType: Option[RelDataType] = None)
   extends RexVisitor[Option[ResolvedExpression]] {
 
+  def this(
+  rexBuilder: RexBuilder,
+  inputNames: Array[String],
+  functionCatalog: FunctionCatalog,
+  catalogManager: CatalogManager,
+  timeZone: TimeZone) = {
+this(rexBuilder, inputNames, functionCatalog, catalogManager, timeZone, 
null)

Review Comment:
   I think `None` or `null` is required without which scala compiler fails with 
`constructor invokes itself`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] venkata91 commented on a diff in pull request #23313: [FLINK-20767][table planner] Support filter push down on nested fields

2023-09-13 Thread via GitHub


venkata91 commented on code in PR #23313:
URL: https://github.com/apache/flink/pull/23313#discussion_r1324893510


##
flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/utils/RexNodeExtractor.scala:
##
@@ -538,8 +550,34 @@ class RexNodeToExpressionConverter(
 }
   }
 
-  override def visitFieldAccess(fieldAccess: RexFieldAccess): 
Option[ResolvedExpression] = None
+  override def visitFieldAccess(fieldAccess: RexFieldAccess): 
Option[ResolvedExpression] = {
+fieldAccess.getReferenceExpr match {
+  // push down on nested field inside a composite type like map or array 
is not supported
+  case _: RexCall => return None

Review Comment:
   Do you mean make it clear in the comment that `lower(a.b.c) = "foo"` is not 
supported? The above addresses cases like:
   
   ```
   "SELECT * FROM NestedItemTable WHERE `Result`.`Mid`.data_arr[2].`value` = 3"
   ```
   where `data_arr` is an Array and `data_arr[2]` is expressed as 
`RexCall(ITEM, data_arr, 2)` which won't be supported.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] venkata91 commented on a diff in pull request #23313: [FLINK-20767][table planner] Support filter push down on nested fields

2023-09-13 Thread via GitHub


venkata91 commented on code in PR #23313:
URL: https://github.com/apache/flink/pull/23313#discussion_r1324865286


##
flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/rules/logical/PushFilterIntoTableSourceScanRuleTest.java:
##
@@ -118,4 +151,34 @@ public void testWithInterval() {
 util.tableEnv().executeSql(ddl);
 super.testWithInterval();
 }
+
+@Test
+public void testBasicNestedFilter() {
+util.verifyRelPlan("SELECT * FROM NestedTable WHERE 
deepNested.nested1.`value` > 2");
+}
+
+@Test
+public void testNestedFilterWithDotInTheName() {
+util.verifyRelPlan(
+"SELECT id FROM NestedTable WHERE 
`deepNestedWith.`.nested.`.value` > 5");
+}
+
+@Test
+public void testNestedFilterWithBacktickInTheName() {
+util.verifyRelPlan(
+"SELECT id FROM NestedTable WHERE 
`deepNestedWith.`.nested.```name` = 'foo'");

Review Comment:
   This tests the case where the column name has backtick (`) in it and should 
be escaped as the whole nested field expression name itself has to be wrapped 
inside backticks. File formats like ORC etc requires the entire nested field to 
be with in backticks (`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] mxm commented on a diff in pull request #23406: [FLINK-32884] [flink-clients] PyFlink remote execution should support URLs with paths and https scheme

2023-09-13 Thread via GitHub


mxm commented on code in PR #23406:
URL: https://github.com/apache/flink/pull/23406#discussion_r1324778392


##
flink-core/src/main/java/org/apache/flink/configuration/RestOptions.java:
##
@@ -80,6 +80,23 @@ public class RestOptions {
 .withDescription(
 "The address that should be used by clients to 
connect to the server. Attention: This option is respected only if the 
high-availability configuration is NONE.");
 
+/** The address that should be used by clients to interact with the 
server. */
+@Documentation.Section(Documentation.Sections.COMMON_HOST_PORT)
+public static final ConfigOption PATH =
+key("rest.path")
+.stringType()
+.defaultValue("")
+.withDescription(
+"The path that should be used by clients to 
interact to the server which is accessible via URL.");
+
+/** The address that should be used by clients to interact with the 
server. */

Review Comment:
   ```suggestion
   /** The protocol that should be used by clients to interact with the 
server. */
   ```



##
flink-clients/src/test/java/org/apache/flink/client/cli/DefaultCLITest.java:
##
@@ -46,6 +46,28 @@ void testCommandLineMaterialization() throws Exception {
 
 assertThat(configuration.get(RestOptions.ADDRESS)).isEqualTo(hostname);
 assertThat(configuration.get(RestOptions.PORT)).isEqualTo(port);
+
+final String httpProtocol = "http";
+final String hostnameWithHttpScheme = httpProtocol + "://" + hostname;
+final String[] httpArgs = {"-m", hostnameWithHttpScheme + ':' + port};
+final CommandLine httpCommandLine = 
defaultCLI.parseCommandLineOptions(httpArgs, false);
+
+Configuration newConfiguration = 
defaultCLI.toConfiguration(httpCommandLine);
+
+
assertThat(newConfiguration.get(RestOptions.ADDRESS)).isEqualTo(hostname);
+assertThat(newConfiguration.get(RestOptions.PORT)).isEqualTo(port);
+
assertThat(newConfiguration.get(RestOptions.PROTOCOL)).isEqualTo(httpProtocol);
+
+final String httpsProtocol = "https";
+final String hostnameWithHttpsScheme = httpsProtocol + "://" + 
hostname;
+final String[] httpsArgs = {"-m", hostnameWithHttpsScheme + ':' + 
port};
+final CommandLine httpsCommandLine = 
defaultCLI.parseCommandLineOptions(httpsArgs, false);
+
+Configuration httpsConfiguration = 
defaultCLI.toConfiguration(httpsCommandLine);
+
+
assertThat(httpsConfiguration.get(RestOptions.ADDRESS)).isEqualTo(hostname);
+assertThat(httpsConfiguration.get(RestOptions.PORT)).isEqualTo(port);
+
assertThat(httpsConfiguration.get(RestOptions.PROTOCOL)).isEqualTo(httpsProtocol);

Review Comment:
   Can we test the added `PATH` here as well?



##
flink-core/src/main/java/org/apache/flink/configuration/RestOptions.java:
##
@@ -80,6 +80,23 @@ public class RestOptions {
 .withDescription(
 "The address that should be used by clients to 
connect to the server. Attention: This option is respected only if the 
high-availability configuration is NONE.");
 
+/** The address that should be used by clients to interact with the 
server. */

Review Comment:
   ```suggestion
   /** The path that should be used by clients to interact with the server. 
*/
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (FLINK-33029) Drop python 3.7 support

2023-09-13 Thread Gabor Somogyi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Somogyi resolved FLINK-33029.
---
Resolution: Fixed

50cb4ee on master

> Drop python 3.7 support
> ---
>
> Key: FLINK-33029
> URL: https://issues.apache.org/jira/browse/FLINK-33029
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Python
>Affects Versions: 1.19.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33029) Drop python 3.7 support

2023-09-13 Thread Gabor Somogyi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Somogyi closed FLINK-33029.
-

> Drop python 3.7 support
> ---
>
> Key: FLINK-33029
> URL: https://issues.apache.org/jira/browse/FLINK-33029
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Python
>Affects Versions: 1.19.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33029) Drop python 3.7 support

2023-09-13 Thread Gabor Somogyi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Somogyi updated FLINK-33029:
--
Fix Version/s: 1.19.0

> Drop python 3.7 support
> ---
>
> Key: FLINK-33029
> URL: https://issues.apache.org/jira/browse/FLINK-33029
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Python
>Affects Versions: 1.19.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] gaborgsomogyi merged pull request #23359: [FLINK-33029][python] Drop python 3.7 support

2023-09-13 Thread via GitHub


gaborgsomogyi merged PR #23359:
URL: https://github.com/apache/flink/pull/23359


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33018) GCP Pubsub PubSubConsumingTest.testStoppingConnectorWhenDeserializationSchemaIndicatesEndOfStream failed

2023-09-13 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764749#comment-17764749
 ] 

Ryan Skraba commented on FLINK-33018:
-

I don't think this JIRA is necessarily a blocker -- it looks like an error in 
the test that shows up as flakiness (as opposed to an error in the connector).  
The above PR fixes the test, if anybody wants to take a look!

The "eventual segfault" that I noticed early happens before and after the fix, 
and is *very likely* a different error due to the 
{{MockStreamingRuntimeContext}} creating and not closing a {{MockEnvironment}}. 
 We could probably fix this, but it should never occur unless you try and run 
one of these tests 100K times!  What do you think?

> GCP Pubsub 
> PubSubConsumingTest.testStoppingConnectorWhenDeserializationSchemaIndicatesEndOfStream
>  failed
> 
>
> Key: FLINK-33018
> URL: https://issues.apache.org/jira/browse/FLINK-33018
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Google Cloud PubSub
>Affects Versions: gcp-pubsub-3.0.2
>Reporter: Martijn Visser
>Priority: Blocker
>  Labels: pull-request-available, stale-blocker
>
> https://github.com/apache/flink-connector-gcp-pubsub/actions/runs/6061318336/job/16446392844#step:13:507
> {code:java}
> [INFO] 
> [INFO] Results:
> [INFO] 
> Error:  Failures: 
> Error:
> PubSubConsumingTest.testStoppingConnectorWhenDeserializationSchemaIndicatesEndOfStream:119
>  
> expected: ["1", "2", "3"]
>  but was: ["1", "2"]
> [INFO] 
> Error:  Tests run: 30, Failures: 1, Errors: 0, Skipped: 0
> [INFO] 
> [INFO] 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] becketqin commented on a diff in pull request #23313: [FLINK-20767][table planner] Support filter push down on nested fields

2023-09-13 Thread via GitHub


becketqin commented on code in PR #23313:
URL: https://github.com/apache/flink/pull/23313#discussion_r1324619744


##
flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/utils/RexNodeExtractor.scala:
##
@@ -538,8 +550,34 @@ class RexNodeToExpressionConverter(
 }
   }
 
-  override def visitFieldAccess(fieldAccess: RexFieldAccess): 
Option[ResolvedExpression] = None
+  override def visitFieldAccess(fieldAccess: RexFieldAccess): 
Option[ResolvedExpression] = {
+fieldAccess.getReferenceExpr match {
+  // push down on nested field inside a composite type like map or array 
is not supported
+  case _: RexCall => return None

Review Comment:
   Nit: The RexCall also includes aggregation. Maybe update the  comment to 
make that clear as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #672: [FLINK-33081] Apply parallelism overrides in scale

2023-09-13 Thread via GitHub


gyfora commented on code in PR #672:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/672#discussion_r1324638095


##
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/JobAutoScalerImpl.java:
##
@@ -74,6 +76,36 @@ public JobAutoScalerImpl(
 this.infoManager = new AutoscalerInfoManager();
 }
 
+@Override
+public void scale(FlinkResourceContext ctx) {
+var conf = ctx.getObserveConfig();
+var resource = ctx.getResource();
+var resourceId = ResourceID.fromResource(resource);
+var autoscalerMetrics = getOrInitAutoscalerFlinkMetrics(ctx, 
resourceId);
+
+try {
+if (resource.getSpec().getJob() == null || 
!conf.getBoolean(AUTOSCALER_ENABLED)) {
+LOG.debug("Autoscaler is disabled");
+return;
+}
+
+// Initialize metrics only if autoscaler is enabled
+var status = resource.getStatus();
+if (status.getLifecycleState() != ResourceLifecycleState.STABLE
+|| 
!status.getJobStatus().getState().equals(JobStatus.RUNNING.name())) {
+LOG.info("Autoscaler is waiting for RUNNING job state");
+lastEvaluatedMetrics.remove(resourceId);
+return;
+}
+
+updateParallelismOverrides(ctx, conf, resource, resourceId, 
autoscalerMetrics);
+} catch (Throwable e) {
+onError(ctx, resource, autoscalerMetrics, e);
+} finally {
+applyParallelismOverrides(ctx);

Review Comment:
I will move the disabled check in front of the try block which will clear 
the overrides and simply return. That way we won't actually call this method if 
it's disabled.



##
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/JobAutoScalerImpl.java:
##
@@ -74,6 +76,36 @@ public JobAutoScalerImpl(
 this.infoManager = new AutoscalerInfoManager();
 }
 
+@Override
+public void scale(FlinkResourceContext ctx) {
+var conf = ctx.getObserveConfig();
+var resource = ctx.getResource();
+var resourceId = ResourceID.fromResource(resource);
+var autoscalerMetrics = getOrInitAutoscalerFlinkMetrics(ctx, 
resourceId);
+
+try {
+if (resource.getSpec().getJob() == null || 
!conf.getBoolean(AUTOSCALER_ENABLED)) {
+LOG.debug("Autoscaler is disabled");
+return;
+}
+
+// Initialize metrics only if autoscaler is enabled
+var status = resource.getStatus();
+if (status.getLifecycleState() != ResourceLifecycleState.STABLE
+|| 
!status.getJobStatus().getState().equals(JobStatus.RUNNING.name())) {
+LOG.info("Autoscaler is waiting for RUNNING job state");
+lastEvaluatedMetrics.remove(resourceId);
+return;
+}
+
+updateParallelismOverrides(ctx, conf, resource, resourceId, 
autoscalerMetrics);
+} catch (Throwable e) {
+onError(ctx, resource, autoscalerMetrics, e);
+} finally {
+applyParallelismOverrides(ctx);

Review Comment:
   good point



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #672: [FLINK-33081] Apply parallelism overrides in scale

2023-09-13 Thread via GitHub


gyfora commented on code in PR #672:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/672#discussion_r1324635598


##
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/JobAutoScalerImpl.java:
##
@@ -140,86 +171,73 @@ public void 
applyParallelismOverrides(FlinkResourceContext ctx) {
 ConfigurationUtils.convertValue(userOverrides, 
String.class));
 }
 
-@Override
-public boolean scale(FlinkResourceContext ctx) {
+private boolean updateParallelismOverrides(

Review Comment:
   good catch



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #672: [FLINK-33081] Apply parallelism overrides in scale

2023-09-13 Thread via GitHub


gyfora commented on code in PR #672:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/672#discussion_r1324633942


##
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/JobAutoScalerImpl.java:
##
@@ -74,6 +76,36 @@ public JobAutoScalerImpl(
 this.infoManager = new AutoscalerInfoManager();
 }
 
+@Override
+public void scale(FlinkResourceContext ctx) {
+var conf = ctx.getObserveConfig();
+var resource = ctx.getResource();
+var resourceId = ResourceID.fromResource(resource);
+var autoscalerMetrics = getOrInitAutoscalerFlinkMetrics(ctx, 
resourceId);
+

Review Comment:
   I think the `finally` block is fairly clean and makes it obvious that 
overrides are applied always, exactly once at the very end 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] sbrother commented on pull request #668: Give cluster/job role access to k8s services API

2023-09-13 Thread via GitHub


sbrother commented on PR #668:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/668#issuecomment-1717779633

   No problem, I just requested an Apache Jira account.
   
   And yes, I think that's what I mean. I had created a Flink Session 
Controller using a basic FlinkDeployment manifest with no job listed (I'm not 
100% on the naming of all the different pods, but this is the pod that exposes 
the Flink dashboard over port 8081). When I sshed into this pod I was surprised 
that I couldn't run `flink list`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-13 Thread Zili Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764723#comment-17764723
 ] 

Zili Chen commented on FLINK-33053:
---

But we don't have other shared watchers so we can force remove watches as above.

> Watcher leak in Zookeeper HA mode
> -
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.18.0, 1.17.1
>Reporter: Yangze Guo
>Priority: Blocker
> Attachments: 26.dump.zip, 26.log, 
> taskmanager_flink-native-test-117-taskmanager-1-9_thread_dump (1).json
>
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-13 Thread Zili Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764721#comment-17764721
 ] 

Zili Chen commented on FLINK-33053:
---

See https://lists.apache.org/thread/3b9hn9j4c05yfztlr2zcctbg7sqwdh58.

This seems to be a ZK issue that I met one year ago..

> Watcher leak in Zookeeper HA mode
> -
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.18.0, 1.17.1
>Reporter: Yangze Guo
>Priority: Blocker
> Attachments: 26.dump.zip, 26.log, 
> taskmanager_flink-native-test-117-taskmanager-1-9_thread_dump (1).json
>
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #23412: [FLINK-33083] Properly apply ReadingMetadataSpec for a TableSourceScan

2023-09-13 Thread via GitHub


flinkbot commented on PR #23412:
URL: https://github.com/apache/flink/pull/23412#issuecomment-1717739419

   
   ## CI report:
   
   * 75b651538a1fa083c211ab8c4020822590b89043 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33083) SupportsReadingMetadata is not applied when loading a CompiledPlan

2023-09-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33083:
---
Labels: pull-request-available  (was: )

> SupportsReadingMetadata is not applied when loading a CompiledPlan
> --
>
> Key: FLINK-33083
> URL: https://issues.apache.org/jira/browse/FLINK-33083
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.16.2, 1.17.1
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Major
>  Labels: pull-request-available
>
> If a few conditions are met, we can not apply ReadingMetadata interface:
> # source overwrites:
>  {code}
> @Override
> public boolean supportsMetadataProjection() {
> return false;
> }
>  {code}
> # source does not implement {{SupportsProjectionPushDown}}
> # table has metadata columns e.g.
> {code}
> CREATE TABLE src (
>   physical_name STRING,
>   physical_sum INT,
>   timestamp TIMESTAMP_LTZ(3) NOT NULL METADATA VIRTUAL
> )
> {code}
> # we query the table {{SELECT * FROM src}}
> It fails with:
> {code}
> Caused by: java.lang.IllegalArgumentException: Row arity: 1, but serializer 
> arity: 2
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:124)
> {code}
> The reason is {{SupportsReadingMetadataSpec}} is created only in the 
> {{PushProjectIntoTableSourceScanRule}}, but the rule is not applied when 1 & 2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] dawidwys opened a new pull request, #23412: [FLINK-33083] Properly apply ReadingMetadataSpec for a TableSourceScan

2023-09-13 Thread via GitHub


dawidwys opened a new pull request, #23412:
URL: https://github.com/apache/flink/pull/23412

   ## What is the purpose of the change
   
   The PR fixes creating a TableSourceScan to also include a proper 
ReadingMetadataSpec which has been applied on the source.
   
   
   ## Brief change log
   
   * revert the solution added in #22894 because it works around the outcome of 
the bug rather than fixes the bug
   * create a ReadingMetadataSpec when applying it on the `TableSource` in a 
`TableSourceScan`
   
   ## Verifying this change
   
   * tests added in #22894 should still pass
   * added a dedicated test 
`TableSourceJsonPlanITCase#testReadingMetadataWithProjectionPushDownDisabled`
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (FLINK-33052) codespeed server is down

2023-09-13 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764675#comment-17764675
 ] 

Yuan Mei edited comment on FLINK-33052 at 9/13/23 2:14 PM:
---

Hey [~pnowojski] 
 # Strictly speaking, yes it is a blocker for release.
 # But since 1.18 has been branch cut and most of the tests have already been 
done (No feature is allowed to be merged), maybe we should not make this a 
blocker issue for RC?
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # I agree we should have this set up as soon as possible, best before 1.18 
release.


was (Author: ym):
Hey [~pnowojski] 
 # Strictly speaking, yes it is a blocker for release.
 # But since 1.18 has been branch cut and most of the tests have already been 
done (No feature is allowed to be merged), maybe we should not make this a 
blocker issue for RC?
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # I agree we should have this set up before 1.18 release.

> codespeed server is down
> 
>
> Key: FLINK-33052
> URL: https://issues.apache.org/jira/browse/FLINK-33052
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks, Test Infrastructure
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Assignee: Zakelly Lan
>Priority: Blocker
>
> No update in #flink-dev-benchmarks slack channel since 25th August.
> It was a EC2 running in a legacy aws account. Currently on one knows which 
> account it is. 
>  
> https://apache-flink.slack.com/archives/C0471S0DFJ9/p1693932155128359



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-13 Thread Zili Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764702#comment-17764702
 ] 

Zili Chen commented on FLINK-33053:
---

I noticed that the {{TreeCache}}'s close call {{removeWatches}} instead of 
{{removeAllWatches}} called by your scripts above.

{{removeWatches}} only remove the watcher in client side so remain the server 
side watcher as is.

> Watcher leak in Zookeeper HA mode
> -
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.18.0, 1.17.1
>Reporter: Yangze Guo
>Priority: Blocker
> Attachments: 26.dump.zip, 26.log, 
> taskmanager_flink-native-test-117-taskmanager-1-9_thread_dump (1).json
>
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-kubernetes-operator] mxm commented on a diff in pull request #672: [FLINK-33081] Apply parallelism overrides in scale

2023-09-13 Thread via GitHub


mxm commented on code in PR #672:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/672#discussion_r1324308777


##
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/JobAutoScalerImpl.java:
##
@@ -74,6 +76,36 @@ public JobAutoScalerImpl(
 this.infoManager = new AutoscalerInfoManager();
 }
 
+@Override
+public void scale(FlinkResourceContext ctx) {
+var conf = ctx.getObserveConfig();
+var resource = ctx.getResource();
+var resourceId = ResourceID.fromResource(resource);
+var autoscalerMetrics = getOrInitAutoscalerFlinkMetrics(ctx, 
resourceId);
+
+try {
+if (resource.getSpec().getJob() == null || 
!conf.getBoolean(AUTOSCALER_ENABLED)) {
+LOG.debug("Autoscaler is disabled");
+return;
+}
+
+// Initialize metrics only if autoscaler is enabled
+var status = resource.getStatus();
+if (status.getLifecycleState() != ResourceLifecycleState.STABLE
+|| 
!status.getJobStatus().getState().equals(JobStatus.RUNNING.name())) {
+LOG.info("Autoscaler is waiting for RUNNING job state");
+lastEvaluatedMetrics.remove(resourceId);
+return;
+}
+
+updateParallelismOverrides(ctx, conf, resource, resourceId, 
autoscalerMetrics);
+} catch (Throwable e) {
+onError(ctx, resource, autoscalerMetrics, e);
+} finally {
+applyParallelismOverrides(ctx);

Review Comment:
   At first sight, this looks like the overrides will get applied, even if the 
autoscaler is disabled. There is another check though that prevents this here: 
https://github.com/apache/flink-kubernetes-operator/pull/672/files?diff=unified=1#diff-7df0c6b50a32c0055e6a1dcfcf9ab25cddb2a245b2125119fd9b57d65918698dR128
 (line 128)
   
   A bit confusing. See other comment line 88.



##
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/JobAutoScalerImpl.java:
##
@@ -74,6 +76,36 @@ public JobAutoScalerImpl(
 this.infoManager = new AutoscalerInfoManager();
 }
 
+@Override
+public void scale(FlinkResourceContext ctx) {
+var conf = ctx.getObserveConfig();
+var resource = ctx.getResource();
+var resourceId = ResourceID.fromResource(resource);
+var autoscalerMetrics = getOrInitAutoscalerFlinkMetrics(ctx, 
resourceId);
+
+try {
+if (resource.getSpec().getJob() == null || 
!conf.getBoolean(AUTOSCALER_ENABLED)) {
+LOG.debug("Autoscaler is disabled");
+return;
+}
+
+// Initialize metrics only if autoscaler is enabled
+var status = resource.getStatus();
+if (status.getLifecycleState() != ResourceLifecycleState.STABLE
+|| 
!status.getJobStatus().getState().equals(JobStatus.RUNNING.name())) {
+LOG.info("Autoscaler is waiting for RUNNING job state");
+lastEvaluatedMetrics.remove(resourceId);
+return;
+}
+
+updateParallelismOverrides(ctx, conf, resource, resourceId, 
autoscalerMetrics);

Review Comment:
   ```suggestion
   runScalingLogic(ctx, conf, resource, resourceId, 
autoscalerMetrics);
   ```



##
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/JobAutoScalerImpl.java:
##
@@ -74,6 +76,36 @@ public JobAutoScalerImpl(
 this.infoManager = new AutoscalerInfoManager();
 }
 
+@Override
+public void scale(FlinkResourceContext ctx) {
+var conf = ctx.getObserveConfig();
+var resource = ctx.getResource();
+var resourceId = ResourceID.fromResource(resource);
+var autoscalerMetrics = getOrInitAutoscalerFlinkMetrics(ctx, 
resourceId);
+
+try {
+if (resource.getSpec().getJob() == null || 
!conf.getBoolean(AUTOSCALER_ENABLED)) {
+LOG.debug("Autoscaler is disabled");

Review Comment:
   Would reset the overrides here.



##
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/JobAutoScalerImpl.java:
##
@@ -74,6 +76,36 @@ public JobAutoScalerImpl(
 this.infoManager = new AutoscalerInfoManager();
 }
 
+@Override
+public void scale(FlinkResourceContext ctx) {
+var conf = ctx.getObserveConfig();
+var resource = ctx.getResource();
+var resourceId = ResourceID.fromResource(resource);
+var autoscalerMetrics = getOrInitAutoscalerFlinkMetrics(ctx, 
resourceId);
+

Review Comment:
   An alternative would be to apply the current overrides here and the new 
overrides after the scaling. That would get rid of the finally block.



-- 
This is an automated message from the Apache 

[jira] [Comment Edited] (FLINK-33052) codespeed server is down

2023-09-13 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764675#comment-17764675
 ] 

Yuan Mei edited comment on FLINK-33052 at 9/13/23 1:44 PM:
---

Hey [~pnowojski] 
 # Strictly speaking, yes it is a blocker for release.
 # But since 1.18 has been branch cut and most of the tests have already been 
done (No feature is allowed to be merged), maybe we should not make this a 
blocker issue for RC?
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # I agree we should have this set up before 1.18 release.


was (Author: ym):
Hey [~pnowojski] 
 # Strictly speaking, yes it is a blocker for release.
 # But since 1.18 has been branch cut and most of the tests have already been 
done (No feature is allowed to be merged), maybe we should not make this a 
blocker issue for RC1 of 1.18 (RC0 is out, waiting for RC1)?
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # I agree we should have this set up before 1.18 release.

> codespeed server is down
> 
>
> Key: FLINK-33052
> URL: https://issues.apache.org/jira/browse/FLINK-33052
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks, Test Infrastructure
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Assignee: Zakelly Lan
>Priority: Blocker
>
> No update in #flink-dev-benchmarks slack channel since 25th August.
> It was a EC2 running in a legacy aws account. Currently on one knows which 
> account it is. 
>  
> https://apache-flink.slack.com/archives/C0471S0DFJ9/p1693932155128359



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33052) codespeed server is down

2023-09-13 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764675#comment-17764675
 ] 

Yuan Mei edited comment on FLINK-33052 at 9/13/23 1:41 PM:
---

Hey [~pnowojski] 
 # Strictly speaking, yes it is a blocker for release.
 # But since 1.18 has been branch cut and most of the tests have already been 
done (No feature is allowed to be merged), maybe we should not make this a 
blocker issue for RC1 of 1.18 (RC0 is out, waiting for RC1)?
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # I agree we should have this set up before 1.18 release.


was (Author: ym):
Hey [~pnowojski] 
 # I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18 (RC0 is out, waiting for RC1)
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker for 1.18, we can chat off-line as 
well. Or at least I do not want this to block the RC? 

> codespeed server is down
> 
>
> Key: FLINK-33052
> URL: https://issues.apache.org/jira/browse/FLINK-33052
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks, Test Infrastructure
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Assignee: Zakelly Lan
>Priority: Blocker
>
> No update in #flink-dev-benchmarks slack channel since 25th August.
> It was a EC2 running in a legacy aws account. Currently on one knows which 
> account it is. 
>  
> https://apache-flink.slack.com/archives/C0471S0DFJ9/p1693932155128359



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33052) codespeed server is down

2023-09-13 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764675#comment-17764675
 ] 

Yuan Mei edited comment on FLINK-33052 at 9/13/23 1:39 PM:
---

Hey [~pnowojski] 
 # I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18 (RC0 is out, waiting for RC1)
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker for 1.18, we can chat off-line as 
well. Or at least I do not want this to block the RC? 


was (Author: ym):
Hey [~pnowojski] 
 # I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18 (RC0 is out, waiting for RC1)
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker for 1.18, we can chat off-line as well.

> codespeed server is down
> 
>
> Key: FLINK-33052
> URL: https://issues.apache.org/jira/browse/FLINK-33052
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks, Test Infrastructure
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Assignee: Zakelly Lan
>Priority: Blocker
>
> No update in #flink-dev-benchmarks slack channel since 25th August.
> It was a EC2 running in a legacy aws account. Currently on one knows which 
> account it is. 
>  
> https://apache-flink.slack.com/archives/C0471S0DFJ9/p1693932155128359



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33052) codespeed server is down

2023-09-13 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764675#comment-17764675
 ] 

Yuan Mei edited comment on FLINK-33052 at 9/13/23 1:35 PM:
---

Hey [~pnowojski] 
 # I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker for 1.18, we can chat off-line as well.


was (Author: ym):
Hey [~pnowojski] 
 # I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker, we can chat off-line as well.

> codespeed server is down
> 
>
> Key: FLINK-33052
> URL: https://issues.apache.org/jira/browse/FLINK-33052
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks, Test Infrastructure
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Assignee: Zakelly Lan
>Priority: Blocker
>
> No update in #flink-dev-benchmarks slack channel since 25th August.
> It was a EC2 running in a legacy aws account. Currently on one knows which 
> account it is. 
>  
> https://apache-flink.slack.com/archives/C0471S0DFJ9/p1693932155128359



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33052) codespeed server is down

2023-09-13 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764675#comment-17764675
 ] 

Yuan Mei edited comment on FLINK-33052 at 9/13/23 1:35 PM:
---

Hey [~pnowojski] 
 # I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18 (RC0 is out, waiting for RC1)
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker for 1.18, we can chat off-line as well.


was (Author: ym):
Hey [~pnowojski] 
 # I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker for 1.18, we can chat off-line as well.

> codespeed server is down
> 
>
> Key: FLINK-33052
> URL: https://issues.apache.org/jira/browse/FLINK-33052
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks, Test Infrastructure
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Assignee: Zakelly Lan
>Priority: Blocker
>
> No update in #flink-dev-benchmarks slack channel since 25th August.
> It was a EC2 running in a legacy aws account. Currently on one knows which 
> account it is. 
>  
> https://apache-flink.slack.com/archives/C0471S0DFJ9/p1693932155128359



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-28758) FlinkKafkaConsumer fails to stop with savepoint

2023-09-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-28758:
---
Labels: pull-request-available  (was: )

> FlinkKafkaConsumer fails to stop with savepoint 
> 
>
> Key: FLINK-28758
> URL: https://issues.apache.org/jira/browse/FLINK-28758
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0, 1.17.0, 1.16.1
> Environment: Flink version:1.15.0
> deploy mode :K8s applicaiton Mode.   local mini cluster also have this 
> problem.
> Kafka Connector : use Kafka SourceFunction . No new Api.
>Reporter: hjw
>Assignee: Piotr Nowojski
>Priority: Critical
>  Labels: pull-request-available
> Attachments: image-2022-10-13-19-47-56-635.png
>
>
> I post a stop with savepoint request to Flink Job throught rest api.
> A Error happened in Kafka connector close.
> The job will enter restarting .
> It is successful to use savepoint command alone.
> {code:java}
> 13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean 
> (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
> clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed
> 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean 
> (1/1)#0] INFO org.apache.kafka.common.utils.AppInfoParser - App info 
> kafka.consumer for consumer-hjw-4 unregistered
> 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean 
> (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
> clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed
> 13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG 
> org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask 
> (operators closed: false, cancelled: false)
> 13:33:42.860 [jobmanager-io-thread-4] INFO 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline 
> checkpoint 5 by task eeefcb27475446241861ad8db3f33144 of job 
> d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ 
> 127.0.0.1 (dataPort=-1).
> org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: 
> nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed.
>  at 
> org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388)
>  at 
> org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331)
>  at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
>  at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343)
> Caused by: org.apache.flink.util.SerializedThrowable: 
> org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException
>  at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>  at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>  at 
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
>  at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
>  ... 3 common frames omitted
> Caused by: org.apache.flink.util.SerializedThrowable: null
>  at 
> org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177)
>  at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164)
>  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945)
>  at 
> org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
>  at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
>  at 
> 

[GitHub] [flink-connector-kafka] boring-cyborg[bot] commented on pull request #48: [FLINK-28758] Fix stop-with-savepoint for FlinkKafkaConsumer

2023-09-13 Thread via GitHub


boring-cyborg[bot] commented on PR #48:
URL: 
https://github.com/apache/flink-connector-kafka/pull/48#issuecomment-1717647945

   Thanks for opening this pull request! Please check out our contributing 
guidelines. (https://flink.apache.org/contributing/how-to-contribute.html)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (FLINK-33052) codespeed server is down

2023-09-13 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764675#comment-17764675
 ] 

Yuan Mei edited comment on FLINK-33052 at 9/13/23 1:31 PM:
---

Hey [~pnowojski] 
 # I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker, we can chat off-line as well.


was (Author: ym):
Hey [~pnowojski] 
 # I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker, we can chat off-line as well.

> codespeed server is down
> 
>
> Key: FLINK-33052
> URL: https://issues.apache.org/jira/browse/FLINK-33052
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks, Test Infrastructure
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Assignee: Zakelly Lan
>Priority: Blocker
>
> No update in #flink-dev-benchmarks slack channel since 25th August.
> It was a EC2 running in a legacy aws account. Currently on one knows which 
> account it is. 
>  
> https://apache-flink.slack.com/archives/C0471S0DFJ9/p1693932155128359



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33052) codespeed server is down

2023-09-13 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764675#comment-17764675
 ] 

Yuan Mei edited comment on FLINK-33052 at 9/13/23 1:31 PM:
---

Hey [~pnowojski] 
 # I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker, we can chat off-line as well.


was (Author: ym):
Hey [~pnowojski] 
 # I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker, we can chat off-line as well.

> codespeed server is down
> 
>
> Key: FLINK-33052
> URL: https://issues.apache.org/jira/browse/FLINK-33052
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks, Test Infrastructure
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Assignee: Zakelly Lan
>Priority: Blocker
>
> No update in #flink-dev-benchmarks slack channel since 25th August.
> It was a EC2 running in a legacy aws account. Currently on one knows which 
> account it is. 
>  
> https://apache-flink.slack.com/archives/C0471S0DFJ9/p1693932155128359



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33052) codespeed server is down

2023-09-13 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764675#comment-17764675
 ] 

Yuan Mei edited comment on FLINK-33052 at 9/13/23 1:31 PM:
---

Hey [~pnowojski] 
 # I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker, we can chat off-line as well.


was (Author: ym):
Hey [~pnowojski] 
 # I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker, we can chat off-line as well.

> codespeed server is down
> 
>
> Key: FLINK-33052
> URL: https://issues.apache.org/jira/browse/FLINK-33052
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks, Test Infrastructure
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Assignee: Zakelly Lan
>Priority: Blocker
>
> No update in #flink-dev-benchmarks slack channel since 25th August.
> It was a EC2 running in a legacy aws account. Currently on one knows which 
> account it is. 
>  
> https://apache-flink.slack.com/archives/C0471S0DFJ9/p1693932155128359



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33052) codespeed server is down

2023-09-13 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764675#comment-17764675
 ] 

Yuan Mei edited comment on FLINK-33052 at 9/13/23 1:30 PM:
---

Hey [~pnowojski] 
 # I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker, we can chat off-line as well.


was (Author: ym):
# I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker, we can chat off-line as well.

> codespeed server is down
> 
>
> Key: FLINK-33052
> URL: https://issues.apache.org/jira/browse/FLINK-33052
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks, Test Infrastructure
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Assignee: Zakelly Lan
>Priority: Blocker
>
> No update in #flink-dev-benchmarks slack channel since 25th August.
> It was a EC2 running in a legacy aws account. Currently on one knows which 
> account it is. 
>  
> https://apache-flink.slack.com/archives/C0471S0DFJ9/p1693932155128359



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33052) codespeed server is down

2023-09-13 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764675#comment-17764675
 ] 

Yuan Mei commented on FLINK-33052:
--

# I think this issue is very critical as you mentioned
 # Since 1.18 has been branch cut and most of the tests have already been done 
(No feature is allowed to be merged). I do not see why this should be a blocker 
issue for release 1.18
 # [~Zakelly] has already been working on this, but as you can see this issue 
takes time (applying grants for buying new machines and set up everything, 
e.t.c)
 # If you think this is indeed a blocker, we can chat off-line as well.

> codespeed server is down
> 
>
> Key: FLINK-33052
> URL: https://issues.apache.org/jira/browse/FLINK-33052
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks, Test Infrastructure
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Assignee: Zakelly Lan
>Priority: Blocker
>
> No update in #flink-dev-benchmarks slack channel since 25th August.
> It was a EC2 running in a legacy aws account. Currently on one knows which 
> account it is. 
>  
> https://apache-flink.slack.com/archives/C0471S0DFJ9/p1693932155128359



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31966) Flink Kubernetes operator lacks TLS support

2023-09-13 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764674#comment-17764674
 ] 

Gyula Fora commented on FLINK-31966:


[~tagarr] I think that sounds reasonable. I think this would work but I don't 
really know the exact expectation of users requiring this feature unfortunately 
:) 

> Flink Kubernetes operator lacks TLS support 
> 
>
> Key: FLINK-31966
> URL: https://issues.apache.org/jira/browse/FLINK-31966
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.4.0
>Reporter: Adrian Vasiliu
>Priority: Major
>
> *Summary*
> The Flink Kubernetes operator lacks support inside the FlinkDeployment 
> operand for configuring Flink with TLS (both one-way and mutual) for the 
> internal communication between jobmanagers and taskmanagers, and for the 
> external REST endpoint. Although a workaround exists to configure the job and 
> task managers, this breaks the operator and renders it unable to reconcile.
> *Additional information*
>  * The Apache Flink operator supports passing through custom flink 
> configuration to be applied to job and task managers.
>  * If you supply SSL-based properties, the operator can no longer speak to 
> the deployed job manager. The operator is reading the flink conf and using it 
> to create a connection to the job manager REST endpoint, but it uses the 
> truststore file paths within flink-conf.yaml, which are unresolvable from the 
> operator. This leaves the operator hanging in a pending state as it cannot 
> complete a reconcile.
> *Proposal*
> Our proposal is to make changes to the operator code. A simple change exists 
> that would be enough to enable anonymous SSL at the REST endpoint, but more 
> invasive changes would be required to enable full mTLS throughout.
> The simple change to enable anonymous SSL would be for the operator to parse 
> flink-conf and podTemplate to identify the Kubernetes resource that contains 
> the certificate from the job manager keystore and use it inside the 
> operator’s trust store.
> In the case of mutual TLS, further changes are required: the operator would 
> need to generate a certificate signed by the same issuing authority as the 
> job manager’s certificates and then use it in a keystore when challenged by 
> that job manager. We propose that the operator becomes responsible for making 
> CertificateSigningRequests to generate certificates for job manager, task 
> manager and operator. The operator can then coordinate deploying the job and 
> task managers with the correct flink-conf and volume mounts. This would also 
> work for anonymous SSL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-28758) FlinkKafkaConsumer fails to stop with savepoint

2023-09-13 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski reassigned FLINK-28758:
--

Assignee: Piotr Nowojski  (was: Mark Cho)

> FlinkKafkaConsumer fails to stop with savepoint 
> 
>
> Key: FLINK-28758
> URL: https://issues.apache.org/jira/browse/FLINK-28758
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0, 1.17.0, 1.16.1
> Environment: Flink version:1.15.0
> deploy mode :K8s applicaiton Mode.   local mini cluster also have this 
> problem.
> Kafka Connector : use Kafka SourceFunction . No new Api.
>Reporter: hjw
>Assignee: Piotr Nowojski
>Priority: Critical
> Attachments: image-2022-10-13-19-47-56-635.png
>
>
> I post a stop with savepoint request to Flink Job throught rest api.
> A Error happened in Kafka connector close.
> The job will enter restarting .
> It is successful to use savepoint command alone.
> {code:java}
> 13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean 
> (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
> clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed
> 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean 
> (1/1)#0] INFO org.apache.kafka.common.utils.AppInfoParser - App info 
> kafka.consumer for consumer-hjw-4 unregistered
> 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean 
> (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
> clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed
> 13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG 
> org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask 
> (operators closed: false, cancelled: false)
> 13:33:42.860 [jobmanager-io-thread-4] INFO 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline 
> checkpoint 5 by task eeefcb27475446241861ad8db3f33144 of job 
> d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ 
> 127.0.0.1 (dataPort=-1).
> org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: 
> nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed.
>  at 
> org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388)
>  at 
> org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331)
>  at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
>  at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343)
> Caused by: org.apache.flink.util.SerializedThrowable: 
> org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException
>  at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>  at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>  at 
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
>  at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
>  ... 3 common frames omitted
> Caused by: org.apache.flink.util.SerializedThrowable: null
>  at 
> org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177)
>  at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164)
>  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945)
>  at 
> org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
>  at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)

[jira] [Commented] (FLINK-28758) FlinkKafkaConsumer fails to stop with savepoint

2023-09-13 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764673#comment-17764673
 ] 

Piotr Nowojski commented on FLINK-28758:


I will try to take care of that

> FlinkKafkaConsumer fails to stop with savepoint 
> 
>
> Key: FLINK-28758
> URL: https://issues.apache.org/jira/browse/FLINK-28758
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0, 1.17.0, 1.16.1
> Environment: Flink version:1.15.0
> deploy mode :K8s applicaiton Mode.   local mini cluster also have this 
> problem.
> Kafka Connector : use Kafka SourceFunction . No new Api.
>Reporter: hjw
>Assignee: Piotr Nowojski
>Priority: Critical
> Attachments: image-2022-10-13-19-47-56-635.png
>
>
> I post a stop with savepoint request to Flink Job throught rest api.
> A Error happened in Kafka connector close.
> The job will enter restarting .
> It is successful to use savepoint command alone.
> {code:java}
> 13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean 
> (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
> clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed
> 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean 
> (1/1)#0] INFO org.apache.kafka.common.utils.AppInfoParser - App info 
> kafka.consumer for consumer-hjw-4 unregistered
> 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean 
> (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
> clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed
> 13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG 
> org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask 
> (operators closed: false, cancelled: false)
> 13:33:42.860 [jobmanager-io-thread-4] INFO 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline 
> checkpoint 5 by task eeefcb27475446241861ad8db3f33144 of job 
> d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ 
> 127.0.0.1 (dataPort=-1).
> org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: 
> nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed.
>  at 
> org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388)
>  at 
> org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331)
>  at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
>  at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343)
> Caused by: org.apache.flink.util.SerializedThrowable: 
> org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException
>  at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>  at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>  at 
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
>  at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
>  ... 3 common frames omitted
> Caused by: org.apache.flink.util.SerializedThrowable: null
>  at 
> org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177)
>  at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164)
>  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945)
>  at 
> org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
>  at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
>  at 
> 

[jira] [Assigned] (FLINK-33083) SupportsReadingMetadata is not applied when loading a CompiledPlan

2023-09-13 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz reassigned FLINK-33083:


Assignee: Dawid Wysakowicz

> SupportsReadingMetadata is not applied when loading a CompiledPlan
> --
>
> Key: FLINK-33083
> URL: https://issues.apache.org/jira/browse/FLINK-33083
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.16.2, 1.17.1
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Major
>
> If a few conditions are met, we can not apply ReadingMetadata interface:
> # source overwrites:
>  {code}
> @Override
> public boolean supportsMetadataProjection() {
> return false;
> }
>  {code}
> # source does not implement {{SupportsProjectionPushDown}}
> # table has metadata columns e.g.
> {code}
> CREATE TABLE src (
>   physical_name STRING,
>   physical_sum INT,
>   timestamp TIMESTAMP_LTZ(3) NOT NULL METADATA VIRTUAL
> )
> {code}
> # we query the table {{SELECT * FROM src}}
> It fails with:
> {code}
> Caused by: java.lang.IllegalArgumentException: Row arity: 1, but serializer 
> arity: 2
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:124)
> {code}
> The reason is {{SupportsReadingMetadataSpec}} is created only in the 
> {{PushProjectIntoTableSourceScanRule}}, but the rule is not applied when 1 & 2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33083) SupportsReadingMetadata is not applied when loading a CompiledPlan

2023-09-13 Thread Yunhong Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764666#comment-17764666
 ] 

Yunhong Zheng commented on FLINK-33083:
---

It looks like this is a planner bug, and we don't have related tests to cover 
the situation that connector implement 

SupportsReadingMetadata and supportsMetadataProjection return false:
{code:java}
default boolean supportsMetadataProjection() {
return false;
}{code}
 

> SupportsReadingMetadata is not applied when loading a CompiledPlan
> --
>
> Key: FLINK-33083
> URL: https://issues.apache.org/jira/browse/FLINK-33083
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.16.2, 1.17.1
>Reporter: Dawid Wysakowicz
>Priority: Major
>
> If a few conditions are met, we can not apply ReadingMetadata interface:
> # source overwrites:
>  {code}
> @Override
> public boolean supportsMetadataProjection() {
> return false;
> }
>  {code}
> # source does not implement {{SupportsProjectionPushDown}}
> # table has metadata columns e.g.
> {code}
> CREATE TABLE src (
>   physical_name STRING,
>   physical_sum INT,
>   timestamp TIMESTAMP_LTZ(3) NOT NULL METADATA VIRTUAL
> )
> {code}
> # we query the table {{SELECT * FROM src}}
> It fails with:
> {code}
> Caused by: java.lang.IllegalArgumentException: Row arity: 1, but serializer 
> arity: 2
>   at 
> org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:124)
> {code}
> The reason is {{SupportsReadingMetadataSpec}} is created only in the 
> {{PushProjectIntoTableSourceScanRule}}, but the rule is not applied when 1 & 2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #23411: [FLINK-21949][table] Support ARRAY_AGG aggregate function

2023-09-13 Thread via GitHub


flinkbot commented on PR #23411:
URL: https://github.com/apache/flink/pull/23411#issuecomment-1717597767

   
   ## CI report:
   
   * 10081ad9bdba84b3dac22fb7a6137994cc79622b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-21949) Support ARRAY_AGG aggregate function

2023-09-13 Thread Jiabao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764658#comment-17764658
 ] 

Jiabao Sun commented on FLINK-21949:


The pull request is ready for review now.

This implementation made some simplifications based on Calcite's 
SqlLibraryOperators.ARRAY_AGG.
{code:java}
// calcite
ARRAY_AGG([ ALL | DISTINCT ] value [ RESPECT NULLS | IGNORE NULLS ] [ ORDER BY 
orderItem [, orderItem ]* ] )
// flink
ARRAY_AGG([ ALL | DISTINCT ] expression)
{code}

The differences from Calcite are as follows:
# Null values are ignored.
# The order by expression within the function is not supported because the 
complete row record cannot be accessed within the function implementation.
# The function returns null when there's no input rows, but calcite definition 
returns an empty array. The behavior was referenced from BigQuery and Postgres.

https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#array_agg
https://www.postgresql.org/docs/8.4/functions-aggregate.html

> Support ARRAY_AGG aggregate function
> 
>
> Key: FLINK-21949
> URL: https://issues.apache.org/jira/browse/FLINK-21949
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.12.0
>Reporter: Jiabao Sun
>Assignee: Jiabao Sun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Some nosql databases like mongodb and elasticsearch support nested data types.
> Aggregating multiple rows into ARRAY is a common requirement.
> The CollectToArray function is similar to Collect, except that it returns 
> ARRAY instead of MULTISET.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-21949) Support ARRAY_AGG aggregate function

2023-09-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-21949:
---
Labels: pull-request-available  (was: )

> Support ARRAY_AGG aggregate function
> 
>
> Key: FLINK-21949
> URL: https://issues.apache.org/jira/browse/FLINK-21949
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.12.0
>Reporter: Jiabao Sun
>Assignee: Jiabao Sun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Some nosql databases like mongodb and elasticsearch support nested data types.
> Aggregating multiple rows into ARRAY is a common requirement.
> The CollectToArray function is similar to Collect, except that it returns 
> ARRAY instead of MULTISET.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] Jiabao-Sun opened a new pull request, #23411: [FLINK-21949][table] Support ARRAY_AGG aggregate function

2023-09-13 Thread via GitHub


Jiabao-Sun opened a new pull request, #23411:
URL: https://github.com/apache/flink/pull/23411

   
   
   ## What is the purpose of the change
   
   [FLINK-21949][table] Support ARRAY_AGG aggregate function
   
   Some nosql databases like mongodb and elasticsearch support nested data 
types.
   Aggregating multiple rows into ARRAY is a common requirement.
   
   ## Brief change log
   
   Introduce built in function `ARRAY_AGG([ ALL | DISTINCT ] expression)` to 
return an array that concatenates the input rows
   and returns NULL if there are no input rows. NULL values will be ignored. 
Use DISTINCT for one unique instance of each value.
   
   ```sql
   SELECT ARRAY_AGG(f1)
 FROM tmp
GROUP BY f0
   ```
   
   
![image](https://github.com/apache/flink/assets/27403841/4ba953d0-92bf-485f-afc2-3fc292fc81ce)
   
   Note that we have made some simplifications based on Calcite's 
`SqlLibraryOperators.ARRAY_AGG`.
   ```sql
   -- calcite
   ARRAY_AGG([ ALL | DISTINCT ] value [ RESPECT NULLS | IGNORE NULLS ] [ ORDER 
BY orderItem [, orderItem ]* ] )
   -- flink
   ARRAY_AGG([ ALL | DISTINCT ] expression)
   ```
   
   **The differences from Calcite are as follows:**
 1. **Null values are ignored.**
 2. **The order by expression within the function is not supported because 
the complete row record cannot be accessed within the function implementation.**
 3. **The function returns null when there's no input rows, but calcite 
definition returns an empty array. The behavior was referenced from BigQuery 
and Postgres.**
   
   - 
https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#array_agg
   - https://www.postgresql.org/docs/8.4/functions-aggregate.html
   
   
   ## Verifying this change
   ITCase and UnitCase are added.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes)
 - The serializers: (no )
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes)
 - If yes, how is the feature documented? (docs)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-32704) Supports spilling to disk when feedback channel memory buffer is full

2023-09-13 Thread Dong Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin closed FLINK-32704.

Resolution: Fixed

> Supports spilling to disk when feedback channel memory buffer is full
> -
>
> Key: FLINK-32704
> URL: https://issues.apache.org/jira/browse/FLINK-32704
> Project: Flink
>  Issue Type: Improvement
>  Components: Library / Machine Learning
>Reporter: Jiang Xin
>Assignee: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.4.0
>
>
> Currently, the Flink ML Iteration cache feedback data in memory, which would 
> cause OOM in some cases. We need to support spilling to disk when feedback 
> channel memory buffer is full.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] hejufang commented on pull request #23403: [FLINK-32863][runtime-web] Improve Flink UI's time precision from sec…

2023-09-13 Thread via GitHub


hejufang commented on PR #23403:
URL: https://github.com/apache/flink/pull/23403#issuecomment-1717554816

   @masteryhx Thank you for your suggestion.  I have adjusted the precision of 
checkpoint related time.  please review. cc @KarmaGYZ 
   
![img_v2_a30822c5-c560-4789-9226-ca3899483b7g](https://github.com/apache/flink/assets/28342990/33f7b5b4-b01d-4217-b19c-b7425b0568d1)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] dongwoo6kim commented on pull request #671: [FLINK-33066] Support all k8s methods to configure env variable in operatorPod

2023-09-13 Thread via GitHub


dongwoo6kim commented on PR #671:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/671#issuecomment-1717548787

   Thanks @mbalassi. I have updated the docs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-13 Thread Yangze Guo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764640#comment-17764640
 ] 

Yangze Guo commented on FLINK-33053:


JFYI, I'm still investigating the root cause of this, but I found the issue 
will be fixed if we add a safetynet in 
ZooKeeperLeaderRetrievalDriver#close like this:
 
{code:java}
client.watchers()
   .removeAll()
   .ofType(Watcher.WatcherType.Any)
   .forPath(connectionInformationPath);{code}

> Watcher leak in Zookeeper HA mode
> -
>
> Key: FLINK-33053
> URL: https://issues.apache.org/jira/browse/FLINK-33053
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0, 1.18.0, 1.17.1
>Reporter: Yangze Guo
>Priority: Blocker
> Attachments: 26.dump.zip, 26.log, 
> taskmanager_flink-native-test-117-taskmanager-1-9_thread_dump (1).json
>
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #23410: [FLINK-33085][table-planner] Improve the error message when the invalidated lookupTableSource without primary key is used as temporal join t

2023-09-13 Thread via GitHub


flinkbot commented on PR #23410:
URL: https://github.com/apache/flink/pull/23410#issuecomment-1717541375

   
   ## CI report:
   
   * a5f93f4153d0b8430d836a35a6f79b80caa76457 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-32704) Supports spilling to disk when feedback channel memory buffer is full

2023-09-13 Thread Dong Lin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764635#comment-17764635
 ] 

Dong Lin commented on FLINK-32704:
--

Merged to apache/flink-ml master branch 865404910caf53259df5cea1fc25ca29f96ae9bd

> Supports spilling to disk when feedback channel memory buffer is full
> -
>
> Key: FLINK-32704
> URL: https://issues.apache.org/jira/browse/FLINK-32704
> Project: Flink
>  Issue Type: Improvement
>  Components: Library / Machine Learning
>Reporter: Jiang Xin
>Assignee: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.4.0
>
>
> Currently, the Flink ML Iteration cache feedback data in memory, which would 
> cause OOM in some cases. We need to support spilling to disk when feedback 
> channel memory buffer is full.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32704) Supports spilling to disk when feedback channel memory buffer is full

2023-09-13 Thread Dong Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reassigned FLINK-32704:


Assignee: Jiang Xin

> Supports spilling to disk when feedback channel memory buffer is full
> -
>
> Key: FLINK-32704
> URL: https://issues.apache.org/jira/browse/FLINK-32704
> Project: Flink
>  Issue Type: Improvement
>  Components: Library / Machine Learning
>Reporter: Jiang Xin
>Assignee: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.4.0
>
>
> Currently, the Flink ML Iteration cache feedback data in memory, which would 
> cause OOM in some cases. We need to support spilling to disk when feedback 
> channel memory buffer is full.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-ml] lindong28 merged pull request #248: [FLINK-32704] Supports spilling to disk when feedback channel memory buffer is full

2023-09-13 Thread via GitHub


lindong28 merged PR #248:
URL: https://github.com/apache/flink-ml/pull/248


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-ml] lindong28 commented on pull request #248: [FLINK-32704] Supports spilling to disk when feedback channel memory buffer is full

2023-09-13 Thread via GitHub


lindong28 commented on PR #248:
URL: https://github.com/apache/flink-ml/pull/248#issuecomment-1717534003

   Thanks for the update! LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33085) Improve the error message when the invalidate lookupTableSource without primary key is used as temporal join table

2023-09-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33085:
---
Labels: pull-request-available  (was: )

> Improve the error message when the invalidate lookupTableSource without 
> primary key is used as temporal join table
> --
>
> Key: FLINK-33085
> URL: https://issues.apache.org/jira/browse/FLINK-33085
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Yunhong Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Improve the error message when the invalidate lookupTableSource without 
> primary key is used as temporal join table.  This pr can check the legality 
> of temporary table join syntax in sqlToRel phase and make the thrown error 
> clearer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] swuferhong opened a new pull request, #23410: [FLINK-33085][table-planner] Improve the error message when the invalidated lookupTableSource without primary key is used as temporal joi

2023-09-13 Thread via GitHub


swuferhong opened a new pull request, #23410:
URL: https://github.com/apache/flink/pull/23410

   
   
   ## What is the purpose of the change
   
   Improve the error message when the invalidate `lookupTableSource` without 
primary key is used as temporal join table.  This pr can check the legality of 
temporary table join syntax in `sqlToRel` phase and make the thrown error 
clearer.
   
   
   ## Brief change log
   
   - Adding the check logical in `SqlToRelConverter`.
   - Adding test in `LookupJoinTest` and `TemporalJoinTest`
   
   ## Verifying this change
   
   - Adding test in `LookupJoinTest` and `TemporalJoinTest`
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper:  no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? no docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-33011) Operator deletes HA data unexpectedly

2023-09-13 Thread Gyula Fora (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora closed FLINK-33011.
--
Fix Version/s: kubernetes-operator-1.7.0
   Resolution: Fixed

merged to main 82739f62adda33e686da7d8aa30cbd41ea13012f

> Operator deletes HA data unexpectedly
> -
>
> Key: FLINK-33011
> URL: https://issues.apache.org/jira/browse/FLINK-33011
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: 1.17.1, kubernetes-operator-1.6.0
> Environment: Flink: 1.17.1
> Flink Kubernetes Operator: 1.6.0
>Reporter: Ruibin Xing
>Assignee: Gyula Fora
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.7.0
>
> Attachments: flink_operator_logs_0831.csv
>
>
> We encountered a problem where the operator unexpectedly deleted HA data.
> The timeline is as follows:
> 12:08 We submitted the first spec, which suspended the job with savepoint 
> upgrade mode.
> 12:08 The job was suspended, while the HA data was preserved, and the log 
> showed the observed job deployment status was MISSING.
> 12:10 We submitted the second spec, which deployed the job with the last 
> state upgrade mode.
> 12:10 Logs showed the operator deleted both the Flink deployment and the HA 
> data again.
> 12:10 The job failed to start because the HA data was missing.
> According to the log, the deletion was triggered by 
> https://github.com/apache/flink-kubernetes-operator/blob/a728ba768e20236184e2b9e9e45163304b8b196c/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconciler.java#L168
> I think this would only be triggered if the job deployment status wasn't 
> MISSING. But the log before the deletion showed the observed job status was 
> MISSING at that moment.
> Related logs:
>  
> {code:java}
> 2023-08-30 12:08:48.190 + o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][default/pipeline-pipeline-se-3] Cluster shutdown completed.
> 2023-08-30 12:10:27.010 + o.a.f.k.o.o.d.ApplicationObserver [INFO 
> ][default/pipeline-pipeline-se-3] Observing JobManager deployment. Previous 
> status: MISSING
> 2023-08-30 12:10:27.533 + o.a.f.k.o.l.AuditUtils         [INFO 
> ][default/pipeline-pipeline-se-3] >>> Event  | Info    | SPECCHANGED     | 
> UPGRADE change(s) detected (Diff: FlinkDeploymentSpec[image : 
> docker-registry.randomcompany.com/octopus/pipeline-pipeline-online:0835137c-362
>  -> 
> docker-registry.randomcompany.com/octopus/pipeline-pipeline-online:23db7ae8-365,
>  podTemplate.metadata.labels.app.kubernetes.io~1version : 
> 0835137cd803b7258695eb53a6ec520cb62a48a7 -> 
> 23db7ae84bdab8d91fa527fe2f8f2fce292d0abc, job.state : suspended -> running, 
> job.upgradeMode : last-state -> savepoint, restartNonce : 1545 -> 1547]), 
> starting reconciliation.
> 2023-08-30 12:10:27.679 + o.a.f.k.o.s.NativeFlinkService [INFO 
> ][default/pipeline-pipeline-se-3] Deleting JobManager deployment and HA 
> metadata.
> {code}
> A more complete log file is attached. Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33085) Improve the error message when the invalidate lookupTableSource without primary key is used as temporal join table

2023-09-13 Thread Yunhong Zheng (Jira)
Yunhong Zheng created FLINK-33085:
-

 Summary: Improve the error message when the invalidate 
lookupTableSource without primary key is used as temporal join table
 Key: FLINK-33085
 URL: https://issues.apache.org/jira/browse/FLINK-33085
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Affects Versions: 1.19.0
Reporter: Yunhong Zheng
 Fix For: 1.19.0


Improve the error message when the invalidate lookupTableSource without primary 
key is used as temporal join table.  This pr can check the legality of 
temporary table join syntax in sqlToRel phase and make the thrown error clearer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-21949) Support ARRAY_AGG aggregate function

2023-09-13 Thread Jiabao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiabao Sun updated FLINK-21949:
---
Summary: Support ARRAY_AGG aggregate function  (was: Support collect to 
array aggregate function)

> Support ARRAY_AGG aggregate function
> 
>
> Key: FLINK-21949
> URL: https://issues.apache.org/jira/browse/FLINK-21949
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.12.0
>Reporter: Jiabao Sun
>Assignee: Jiabao Sun
>Priority: Minor
> Fix For: 1.19.0
>
>
> Some nosql databases like mongodb and elasticsearch support nested data types.
> Aggregating multiple rows into ARRAY is a common requirement.
> The CollectToArray function is similar to Collect, except that it returns 
> ARRAY instead of MULTISET.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >