[jira] [Commented] (FLINK-35039) Create Profiling JobManager/TaskManager Instance failed

2024-04-19 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838951#comment-17838951
 ] 

Yun Tang commented on FLINK-35039:
--

[~wczhu] Already assigned to you.

> Create Profiling JobManager/TaskManager Instance failed
> ---
>
> Key: FLINK-35039
> URL: https://issues.apache.org/jira/browse/FLINK-35039
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
> Environment: Hadoop 3.2.2
> Flink 1.19
>Reporter: ude
>Assignee: ude
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-04-08-10-21-31-066.png, 
> image-2024-04-08-10-21-48-417.png, image-2024-04-08-10-30-16-683.png
>
>
> I'm test the "async-profiler" feature in version 1.19, but when I submit a 
> task in yarn per-job mode, I get an error  when I click Create Profiling 
> Instance on the flink Web UI page.
> !image-2024-04-08-10-21-31-066.png!
> !image-2024-04-08-10-21-48-417.png!
> The error message obviously means that the yarn proxy server does not support 
> *POST* calls. I checked the code of _*WebAppProxyServlet.java*_ and found 
> that the *POST* method is indeed not supported, so I changed it to *PUT* 
> method and the call was successful.
> !image-2024-04-08-10-30-16-683.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35039) Create Profiling JobManager/TaskManager Instance failed

2024-04-19 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-35039:


Assignee: ude

> Create Profiling JobManager/TaskManager Instance failed
> ---
>
> Key: FLINK-35039
> URL: https://issues.apache.org/jira/browse/FLINK-35039
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
> Environment: Hadoop 3.2.2
> Flink 1.19
>Reporter: ude
>Assignee: ude
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-04-08-10-21-31-066.png, 
> image-2024-04-08-10-21-48-417.png, image-2024-04-08-10-30-16-683.png
>
>
> I'm test the "async-profiler" feature in version 1.19, but when I submit a 
> task in yarn per-job mode, I get an error  when I click Create Profiling 
> Instance on the flink Web UI page.
> !image-2024-04-08-10-21-31-066.png!
> !image-2024-04-08-10-21-48-417.png!
> The error message obviously means that the yarn proxy server does not support 
> *POST* calls. I checked the code of _*WebAppProxyServlet.java*_ and found 
> that the *POST* method is indeed not supported, so I changed it to *PUT* 
> method and the call was successful.
> !image-2024-04-08-10-30-16-683.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35111) Modify the spelling mistakes in the taskmanager html

2024-04-19 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-35111:
-
Affects Version/s: (was: 1.19.0)

> Modify the spelling mistakes in the taskmanager html
> 
>
> Key: FLINK-35111
> URL: https://issues.apache.org/jira/browse/FLINK-35111
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Reporter: ude
>Assignee: Yun Tang
>Priority: Major
> Fix For: 1.19.0
>
>
> Fix the spelling error from "profiler"  to "profiling"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35111) Modify the spelling mistakes in the taskmanager html

2024-04-19 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-35111:
-
Fix Version/s: 1.19.0
   (was: 1.20.0)

> Modify the spelling mistakes in the taskmanager html
> 
>
> Key: FLINK-35111
> URL: https://issues.apache.org/jira/browse/FLINK-35111
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: ude
>Assignee: Yun Tang
>Priority: Major
> Fix For: 1.19.0
>
>
> Fix the spelling error from "profiler"  to "profiling"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35039) Create Profiling JobManager/TaskManager Instance failed

2024-04-19 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838867#comment-17838867
 ] 

Yun Tang commented on FLINK-35039:
--

[~wczhu] Thanks for finding this problem in YARN environment. I just feel 
curious why YARN does not support POST?

> Create Profiling JobManager/TaskManager Instance failed
> ---
>
> Key: FLINK-35039
> URL: https://issues.apache.org/jira/browse/FLINK-35039
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
> Environment: Hadoop 3.2.2
> Flink 1.19
>Reporter: ude
>Priority: Major
> Attachments: image-2024-04-08-10-21-31-066.png, 
> image-2024-04-08-10-21-48-417.png, image-2024-04-08-10-30-16-683.png
>
>
> I'm test the "async-profiler" feature in version 1.19, but when I submit a 
> task in yarn per-job mode, I get an error  when I click Create Profiling 
> Instance on the flink Web UI page.
> !image-2024-04-08-10-21-31-066.png!
> !image-2024-04-08-10-21-48-417.png!
> The error message obviously means that the yarn proxy server does not support 
> *POST* calls. I checked the code of _*WebAppProxyServlet.java*_ and found 
> that the *POST* method is indeed not supported, so I changed it to *PUT* 
> method and the call was successful.
> !image-2024-04-08-10-30-16-683.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33934) Flink SQL Source use raw format maybe lead to data lost

2024-04-07 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-33934:


Assignee: Yuan Kui

> Flink SQL Source use raw format maybe lead to data lost
> ---
>
> Key: FLINK-33934
> URL: https://issues.apache.org/jira/browse/FLINK-33934
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Runtime
>Affects Versions: 1.12.0, 1.13.0, 1.14.0, 1.15.0, 1.16.0, 1.17.0, 1.18.0, 
> 1.19.0
>Reporter: Cai Liuyang
>Assignee: Yuan Kui
>Priority: Major
>
> In our product we encounter a case that lead to data lost, the job info: 
>    1. using flinkSQL that read data from messageQueue (our internal mq) and 
> write to hive (only select value field, doesn't contain metadata field)
>    2. the format of source table is raw format
>  
> But if we select value field and metadata field at the same time, than the 
> data lost will not appear
>  
> After we review the code, we found that the reason is the object reuse of 
> Raw-format(see code 
> [RawFormatDeserializationSchema|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62]),
>  why object reuse will lead to this problem is below (take kafka as example):
>     1. RawFormatDeserializationSchema will be used in the Fetcher-Thread of 
> SourceOperator, Fetcher-Thread will read and deserialize data from kafka 
> partition, than put data to ElementQueue (see code [SourceOperator 
> FetcherTask 
> |https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/fetcher/FetchTask.java#L64])
>     2. SourceOperator's main thread will pull data from the 
> ElementQueue(which is shared with the FetcherThread) and process it (see code 
> [SourceOperator main 
> thread|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.java#L188])
>     3. For RawFormatDeserializationSchema, its deserialize function will 
> return the same object([reuse rowData 
> object|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62])
>     4. So, if elementQueue have element that not be consumed, than the 
> fetcherThread can change the filed of the reused rawData that 
> RawFormatDeserializationSchema::deserialize returned, this will lead to data 
> lost;
>  
> The reason that we select value and metadata field at the same time will not 
> encounter data lost is:
>    if we select metadata field there will return a new RowData object see 
> code: [DynamicKafkaDeserializationSchema deserialize with metadata field 
> |https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L249]
>  and if we only select value filed, it will reuse the RowData object that 
> formatDeserializationSchema returned see code 
> [DynamicKafkaDeserializationSchema deserialize only with value 
> field|https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L113]
>  
> To solve this problem, i think we should remove reuse object of 
> RawFormatDeserializationSchema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33934) Flink SQL Source use raw format maybe lead to data lost

2024-04-07 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-33934:
-
Affects Version/s: 1.19.0
   1.18.0
   1.17.0
   1.16.0
   1.15.0
   1.14.0
   1.13.0
   1.12.0

> Flink SQL Source use raw format maybe lead to data lost
> ---
>
> Key: FLINK-33934
> URL: https://issues.apache.org/jira/browse/FLINK-33934
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Runtime
>Affects Versions: 1.12.0, 1.13.0, 1.14.0, 1.15.0, 1.16.0, 1.17.0, 1.18.0, 
> 1.19.0
>Reporter: Cai Liuyang
>Priority: Major
>
> In our product we encounter a case that lead to data lost, the job info: 
>    1. using flinkSQL that read data from messageQueue (our internal mq) and 
> write to hive (only select value field, doesn't contain metadata field)
>    2. the format of source table is raw format
>  
> But if we select value field and metadata field at the same time, than the 
> data lost will not appear
>  
> After we review the code, we found that the reason is the object reuse of 
> Raw-format(see code 
> [RawFormatDeserializationSchema|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62]),
>  why object reuse will lead to this problem is below (take kafka as example):
>     1. RawFormatDeserializationSchema will be used in the Fetcher-Thread of 
> SourceOperator, Fetcher-Thread will read and deserialize data from kafka 
> partition, than put data to ElementQueue (see code [SourceOperator 
> FetcherTask 
> |https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/fetcher/FetchTask.java#L64])
>     2. SourceOperator's main thread will pull data from the 
> ElementQueue(which is shared with the FetcherThread) and process it (see code 
> [SourceOperator main 
> thread|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.java#L188])
>     3. For RawFormatDeserializationSchema, its deserialize function will 
> return the same object([reuse rowData 
> object|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62])
>     4. So, if elementQueue have element that not be consumed, than the 
> fetcherThread can change the filed of the reused rawData that 
> RawFormatDeserializationSchema::deserialize returned, this will lead to data 
> lost;
>  
> The reason that we select value and metadata field at the same time will not 
> encounter data lost is:
>    if we select metadata field there will return a new RowData object see 
> code: [DynamicKafkaDeserializationSchema deserialize with metadata field 
> |https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L249]
>  and if we only select value filed, it will reuse the RowData object that 
> formatDeserializationSchema returned see code 
> [DynamicKafkaDeserializationSchema deserialize only with value 
> field|https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L113]
>  
> To solve this problem, i think we should remove reuse object of 
> RawFormatDeserializationSchema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33934) Flink SQL Source use raw format maybe lead to data lost

2024-04-07 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834765#comment-17834765
 ] 

Yun Tang commented on FLINK-33934:
--

I think it's easy to let data lost when using {{raw}} format with another 
customized connector. Moreover, the object reuse semantic is hidden in the 
{{raw}} format description. From my point of view, this is a potential bug, cc 
[~jark].

> Flink SQL Source use raw format maybe lead to data lost
> ---
>
> Key: FLINK-33934
> URL: https://issues.apache.org/jira/browse/FLINK-33934
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Runtime
>Reporter: Cai Liuyang
>Priority: Major
>
> In our product we encounter a case that lead to data lost, the job info: 
>    1. using flinkSQL that read data from messageQueue (our internal mq) and 
> write to hive (only select value field, doesn't contain metadata field)
>    2. the format of source table is raw format
>  
> But if we select value field and metadata field at the same time, than the 
> data lost will not appear
>  
> After we review the code, we found that the reason is the object reuse of 
> Raw-format(see code 
> [RawFormatDeserializationSchema|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62]),
>  why object reuse will lead to this problem is below (take kafka as example):
>     1. RawFormatDeserializationSchema will be used in the Fetcher-Thread of 
> SourceOperator, Fetcher-Thread will read and deserialize data from kafka 
> partition, than put data to ElementQueue (see code [SourceOperator 
> FetcherTask 
> |https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/fetcher/FetchTask.java#L64])
>     2. SourceOperator's main thread will pull data from the 
> ElementQueue(which is shared with the FetcherThread) and process it (see code 
> [SourceOperator main 
> thread|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.java#L188])
>     3. For RawFormatDeserializationSchema, its deserialize function will 
> return the same object([reuse rowData 
> object|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/formats/raw/RawFormatDeserializationSchema.java#L62])
>     4. So, if elementQueue have element that not be consumed, than the 
> fetcherThread can change the filed of the reused rawData that 
> RawFormatDeserializationSchema::deserialize returned, this will lead to data 
> lost;
>  
> The reason that we select value and metadata field at the same time will not 
> encounter data lost is:
>    if we select metadata field there will return a new RowData object see 
> code: [DynamicKafkaDeserializationSchema deserialize with metadata field 
> |https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L249]
>  and if we only select value filed, it will reuse the RowData object that 
> formatDeserializationSchema returned see code 
> [DynamicKafkaDeserializationSchema deserialize only with value 
> field|https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L113]
>  
> To solve this problem, i think we should remove reuse object of 
> RawFormatDeserializationSchema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34967) Update Website copyright footer to 2024

2024-04-01 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang closed FLINK-34967.

Resolution: Fixed

> Update Website copyright footer to 2024
> ---
>
> Key: FLINK-34967
> URL: https://issues.apache.org/jira/browse/FLINK-34967
> Project: Flink
>  Issue Type: Improvement
>  Components: Project Website
>Reporter: Yun Tang
>Assignee: Yu Chen
>Priority: Major
>
> It's already the year 2024, we should update the copyright footer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34968) Update flink-web copyright to 2024

2024-04-01 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-34968.
--
  Assignee: Yu Chen
Resolution: Fixed

merged in asf-site: 3cda9946df98f0d2abda396ac87f7bc192ec70e0

> Update flink-web copyright to 2024
> --
>
> Key: FLINK-34968
> URL: https://issues.apache.org/jira/browse/FLINK-34968
> Project: Flink
>  Issue Type: Improvement
>  Components: Project Website
>Reporter: Yu Chen
>Assignee: Yu Chen
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34976) LD_PRELOAD environment may not be effective after su to flink user

2024-04-01 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832742#comment-17832742
 ] 

Yun Tang commented on FLINK-34976:
--

I don't think the running Flink process would drop the {{LD_PRELOAD}} 
environment, please use `pmap` to check whether {{jemalloc.so}} is used by your 
process.

> LD_PRELOAD environment may not be effective after su to flink user
> --
>
> Key: FLINK-34976
> URL: https://issues.apache.org/jira/browse/FLINK-34976
> Project: Flink
>  Issue Type: New Feature
>  Components: flink-docker
>Affects Versions: 1.19.0
>Reporter: xiaogang zhou
>Priority: Major
>
> I am not sure if LD_PRELOAD  still takes effect after drop_privs_cmd. Should 
> we  create a .bashrc file in home directory of flink, and export LD_PRELOAD 
> for flink user?
>  
> [https://github.com/apache/flink-docker/blob/627987997ca7ec86bcc3d80b26df58aa595b91af/1.17/scala_2.12-java11-ubuntu/docker-entrypoint.sh#L92]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34967) Update Website copyright footer to 2024

2024-03-30 Thread Yun Tang (Jira)
Yun Tang created FLINK-34967:


 Summary: Update Website copyright footer to 2024
 Key: FLINK-34967
 URL: https://issues.apache.org/jira/browse/FLINK-34967
 Project: Flink
  Issue Type: Improvement
  Components: Project Website
Reporter: Yun Tang
Assignee: Yu Chen


It's already the year 2024, we should update the copyright footer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32299) Upload python jar when sql contains python udf jar

2024-03-29 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-32299:
-
Fix Version/s: 1.19.0

> Upload python jar when sql contains python udf jar
> --
>
> Key: FLINK-32299
> URL: https://issues.apache.org/jira/browse/FLINK-32299
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Gateway, Table SQL / Runtime
>Reporter: Shengkai Fang
>Assignee: Yangze Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, sql gateway always uploads the python jar when submitting jobs. 
> However, it's not required for every sql job. We should add the python jar 
> into the PipelineOpitons.JARS only when user jobs contain python udf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34617) Correct the Javadoc of org.apache.flink.api.common.time.Time

2024-03-08 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-34617.
--
Fix Version/s: 1.20.0
   1.19.1
   Resolution: Fixed

merged
master: 9617598de33b2b23b97ddb84887392659070c344
release-1.19: c6d96b7f7c07faad363779a5175d5772140891a5

> Correct the Javadoc of org.apache.flink.api.common.time.Time
> 
>
> Key: FLINK-34617
> URL: https://issues.apache.org/jira/browse/FLINK-34617
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0, 1.19.1
>
>
> The current Javadoc of {{org.apache.flink.api.common.time.Time}} said it will 
> fully replace {{org.apache.flink.streaming.api.windowing.time.Time}} in Flink 
> 2.0. However, the {{Time}} class has been deprecated, and we should remove 
> the description.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34622) Typo of execution_mode configuration name in Chinese document

2024-03-08 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-34622.
--
Fix Version/s: 1.18.2
   1.20.0
   1.19.1
 Assignee: Yu Chen
   Resolution: Fixed

merged
master: 17487c0c944c3925b89b26eadf38169da35410f7
release-1.19: 0a85a08303ced5715437ead15ce60203c70aa58d
release-1.18: ff256ef85f4edf4e86ce2bd73e4bdef8b7e07fbb

> Typo of execution_mode configuration name in Chinese document
> -
>
> Key: FLINK-34622
> URL: https://issues.apache.org/jira/browse/FLINK-34622
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Yu Chen
>Assignee: Yu Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
> Attachments: image-2024-03-08-14-46-34-859.png
>
>
> !image-2024-03-08-14-46-34-859.png|width=794,height=380!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34617) Correct the Javadoc of org.apache.flink.api.common.time.Time

2024-03-07 Thread Yun Tang (Jira)
Yun Tang created FLINK-34617:


 Summary: Correct the Javadoc of 
org.apache.flink.api.common.time.Time
 Key: FLINK-34617
 URL: https://issues.apache.org/jira/browse/FLINK-34617
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.19.0
Reporter: Yun Tang
Assignee: Yun Tang


The current Javadoc of {{org.apache.flink.api.common.time.Time}} said it will 
fully replace {{org.apache.flink.streaming.api.windowing.time.Time}} in Flink 
2.0. However, the {{Time}} class has been deprecated, and we should remove the 
description.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34522) StateTtlConfig#cleanupInRocksdbCompactFilter still uses the deprecated Time class

2024-03-01 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822754#comment-17822754
 ] 

Yun Tang commented on FLINK-34522:
--

merged in release-1.19: 161defe0bb2dc8136133e07699b6ac433d52dc65 ... 
7618bdeeab06c09219136a04a62262148c677134

> StateTtlConfig#cleanupInRocksdbCompactFilter still uses the deprecated Time 
> class
> -
>
> Key: FLINK-34522
> URL: https://issues.apache.org/jira/browse/FLINK-34522
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.20.0
>
>
> FLINK-32570 deprecated the Time class and refactor all Public or 
> PublicEvolving apis to use the Java's Duration.
> StateTtlConfig.Builder#cleanupInRocksdbCompactFilter is still using the Time 
> class. In general, we expect:
>  * Mark {{cleanupInRocksdbCompactFilter(long, Time)}} as {{@Deprecated}}
>  * Provide a new cleanupInRocksdbCompactFilter(long, Duration)
> Note: This is exactly what FLINK-32570 does, so I guess FLINK-32570 missed 
> cleanupInRocksdbCompactFilter.
> But I found this method is introduced in 1.19(FLINK-30854), so a better 
> solution may be: only provide cleanupInRocksdbCompactFilter(long, Duration) 
> and don't use Time.
> The deprecated Api should be keep for 2 minor version. IIUC, we cannot remove 
> Time related class in Flink 2.0 if we don't deprecate it in 1.19. If so, I 
> think it's better to merge this JIRA in 1.19.0 as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-33436) Documentation on the built-in Profiler

2024-03-01 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-33436.
--
Fix Version/s: 1.19.0
   1.20.0
   Resolution: Fixed

master: 5f06ce765256b375945b9e69db2f16123b53f194
release-1.19: 12ea64c0e2a56da3c5f6a656b23a2f2ac54f19d5

> Documentation on the built-in Profiler
> --
>
> Key: FLINK-33436
> URL: https://issues.apache.org/jira/browse/FLINK-33436
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Yu Chen
>Assignee: Yu Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34500) Release Testing: Verify FLINK-33261 Support Setting Parallelism for Table/SQL Sources

2024-02-27 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-34500.
--
Resolution: Fixed

> Release Testing: Verify FLINK-33261 Support Setting Parallelism for Table/SQL 
> Sources
> -
>
> Key: FLINK-34500
> URL: https://issues.apache.org/jira/browse/FLINK-34500
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Parent, Table SQL / API
>Affects Versions: 1.19.0
>Reporter: SuDewei
>Assignee: Yun Tang
>Priority: Blocker
> Fix For: 1.19.0
>
>
> This issue aims to verify 
> [FLIP-367|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150].
> Volunteers can verify it by following the [doc 
> changes|https://github.com/apache/flink/pull/24234]. Since currently only the 
> pre-defined DataGen connector and user-defined connector supports setting 
> source parallelism, volunteers can verify it through DataGen Connector.
> The basic steps include:
> 1. Start a Flink cluster and submit a Flink SQL Job to the cluster.
> 2. In this Flink Job, use the DataGen SQL Connector to generate data.
> 3. Specify the parameter scan.parallelism in DataGen connector options as 
> user-defined parallelism instead of default parallelism.
> 4. Observe whether the parallelism of the source has changed on the job graph 
> of the Flink Application UI, and whether the shuffle mode is correct.
> If everything is normal, you will see that the parallelism of the source 
> operator is indeed different from that of downstream, and the shuffle mode is 
> rebalanced by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34500) Release Testing: Verify FLINK-33261 Support Setting Parallelism for Table/SQL Sources

2024-02-27 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821299#comment-17821299
 ] 

Yun Tang commented on FLINK-34500:
--

I started a local standalone cluster, and submitted the SQL queries via 
sql-client.
DataGen source with parallelism different with `parallelism.default`, no matter 
larger or smaller, the source operator will be the one with exact parallelism 
just as `scan.parallelism` configured. 
If we did not add `scan.parallelism` option, the source-parallelism will be the 
same as the `parallelism.default` option.

> Release Testing: Verify FLINK-33261 Support Setting Parallelism for Table/SQL 
> Sources
> -
>
> Key: FLINK-34500
> URL: https://issues.apache.org/jira/browse/FLINK-34500
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Parent, Table SQL / API
>Affects Versions: 1.19.0
>Reporter: SuDewei
>Assignee: Yun Tang
>Priority: Blocker
> Fix For: 1.19.0
>
>
> This issue aims to verify 
> [FLIP-367|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150].
> Volunteers can verify it by following the [doc 
> changes|https://github.com/apache/flink/pull/24234]. Since currently only the 
> pre-defined DataGen connector and user-defined connector supports setting 
> source parallelism, volunteers can verify it through DataGen Connector.
> The basic steps include:
> 1. Start a Flink cluster and submit a Flink SQL Job to the cluster.
> 2. In this Flink Job, use the DataGen SQL Connector to generate data.
> 3. Specify the parameter scan.parallelism in DataGen connector options as 
> user-defined parallelism instead of default parallelism.
> 4. Observe whether the parallelism of the source has changed on the job graph 
> of the Flink Application UI, and whether the shuffle mode is correct.
> If everything is normal, you will see that the parallelism of the source 
> operator is indeed different from that of downstream, and the shuffle mode is 
> rebalanced by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34500) Release Testing: Verify FLINK-33261 Support Setting Parallelism for Table/SQL Sources

2024-02-22 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-34500:


Assignee: Yun Tang

> Release Testing: Verify FLINK-33261 Support Setting Parallelism for Table/SQL 
> Sources
> -
>
> Key: FLINK-34500
> URL: https://issues.apache.org/jira/browse/FLINK-34500
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Parent, Table SQL / API
>Affects Versions: 1.19.0
>Reporter: SuDewei
>Assignee: Yun Tang
>Priority: Blocker
> Fix For: 1.19.0
>
>
> This issue aims to verify 
> [FLIP-367|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150].
> Volunteers can verify it by following the [doc 
> changes|https://github.com/apache/flink/pull/24234]. Since currently only the 
> pre-defined DataGen connector and user-defined connector supports setting 
> source parallelism, volunteers can verify it through DataGen Connector.
> The basic steps include:
> 1. Start a Flink cluster and submit a Flink SQL Job to the cluster.
> 2. In this Flink Job, use the DataGen SQL Connector to generate data.
> 3. Specify the parameter scan.parallelism in DataGen connector options as 
> user-defined parallelism instead of default parallelism.
> 4. Observe whether the parallelism of the source has changed on the job graph 
> of the Flink Application UI, and whether the shuffle mode is correct.
> If everything is normal, you will see that the parallelism of the source 
> operator is indeed different from that of downstream, and the shuffle mode is 
> rebalanced by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34388) Release Testing: Verify FLINK-28915 Support artifact fetching in Standalone and native K8s application mode

2024-02-22 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-34388.
--
Resolution: Fixed

Thanks for [~Yu Chen]'s work!
I'll mark this ticket as resolved first, and we can continue discussing the 
left question [~ferenc-csaky]

> Release Testing: Verify FLINK-28915 Support artifact fetching in Standalone 
> and native K8s application mode
> ---
>
> Key: FLINK-34388
> URL: https://issues.apache.org/jira/browse/FLINK-34388
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics
>Affects Versions: 1.19.0
>Reporter: Ferenc Csaky
>Assignee: Yu Chen
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This ticket covers testing FLINK-28915. More details and the added docs are 
> accessible on the [PR|https://github.com/apache/flink/pull/24065]
> Test 1: Pass {{local://}} job jar in standalone mode, check the artifacts are 
> not actually copied.
> Test 2: Pass multiple artifacts in standalone mode.
> Test 3: Pass a non-local job jar in native k8s mode. [1]
> Test 4: Pass additional remote artifacts in native k8s mode.
> Available config options: 
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#artifact-fetching
> [1] Custom docker image build instructions: 
> https://github.com/apache/flink-docker/tree/dev-master
> Note: The docker build instructions also contains a web server example that 
> can be used to serve HTTP artifacts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34390) Release Testing: Verify FLINK-33325 Built-in cross-platform powerful java profiler

2024-02-17 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-34390.
--
Resolution: Fixed

> Release Testing: Verify FLINK-33325 Built-in cross-platform powerful java 
> profiler
> --
>
> Key: FLINK-34390
> URL: https://issues.apache.org/jira/browse/FLINK-34390
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Yun Tang
>Assignee: junzhong qin
>Priority: Major
>  Labels: release-testing
> Fix For: 1.19.0
>
> Attachments: image-2024-02-08-10-43-27-679.png, 
> image-2024-02-08-10-44-55-401.png, image-2024-02-08-10-45-13-951.png, 
> image-2024-02-08-10-45-31-564.png
>
>
> See https://issues.apache.org/jira/browse/FLINK-34310



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34390) Release Testing: Verify FLINK-33325 Built-in cross-platform powerful java profiler

2024-02-17 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818228#comment-17818228
 ] 

Yun Tang commented on FLINK-34390:
--

[~easonqin] Thanks for the testing!

> Release Testing: Verify FLINK-33325 Built-in cross-platform powerful java 
> profiler
> --
>
> Key: FLINK-34390
> URL: https://issues.apache.org/jira/browse/FLINK-34390
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Yun Tang
>Assignee: junzhong qin
>Priority: Major
>  Labels: release-testing
> Fix For: 1.19.0
>
> Attachments: image-2024-02-08-10-43-27-679.png, 
> image-2024-02-08-10-44-55-401.png, image-2024-02-08-10-45-13-951.png, 
> image-2024-02-08-10-45-31-564.png
>
>
> See https://issues.apache.org/jira/browse/FLINK-34310



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34388) Release Testing: Verify FLINK-28915 Support artifact fetching in Standalone and native K8s application mode

2024-02-07 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-34388:


Assignee: Yu Chen

> Release Testing: Verify FLINK-28915 Support artifact fetching in Standalone 
> and native K8s application mode
> ---
>
> Key: FLINK-34388
> URL: https://issues.apache.org/jira/browse/FLINK-34388
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics
>Affects Versions: 1.19.0
>Reporter: Ferenc Csaky
>Assignee: Yu Chen
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This ticket covers testing FLINK-28915. More details and the added docs are 
> accessible on the [PR|https://github.com/apache/flink/pull/24065]
> Test 1: Pass {{local://}} job jar in standalone mode, check the artifacts are 
> not actually copied.
> Test 2: Pass multiple artifacts in standalone mode.
> Test 3: Pass a non-local job jar in native k8s mode. [1]
> Test 4: Pass additional remote artifacts in native k8s mode.
> Available config options: 
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#artifact-fetching
> [1] Custom docker image build instructions: 
> https://github.com/apache/flink-docker/tree/dev-master
> Note: The docker build instructions also contains a web server example that 
> can be used to serve HTTP artifacts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33644) FLIP-393: Make QueryOperations SQL serializable

2024-02-07 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-33644:
-
Description: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-393%3A+Make+QueryOperations+SQL+serializable
  (was: https://cwiki.apache.org/confluence/x/4guZE)

> FLIP-393: Make QueryOperations SQL serializable
> ---
>
> Key: FLINK-33644
> URL: https://issues.apache.org/jira/browse/FLINK-33644
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Major
> Fix For: 1.19.0
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-393%3A+Make+QueryOperations+SQL+serializable



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34355) Release Testing: Verify FLINK-34054 Support named parameters for functions and procedures

2024-02-07 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815192#comment-17815192
 ] 

Yun Tang commented on FLINK-34355:
--

[~xu_shuai_] Already assigned to you, please go ahead.

> Release Testing: Verify FLINK-34054 Support named parameters for functions 
> and procedures
> -
>
> Key: FLINK-34355
> URL: https://issues.apache.org/jira/browse/FLINK-34355
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Shuai Xu
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> Test suggestion:
> 1. Implement a test UDF or Procedure and support Named Parameters.
> 2. When calling a function or procedure, use named parameters to verify if 
> the results are as expected.
> You can test the following scenarios:
> 1. Normal usage of named parameters, fully specifying each parameter.
> 2. Omitting unnecessary parameters.
> 3. Omitting necessary parameters to confirm if an error is reported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34355) Release Testing: Verify FLINK-34054 Support named parameters for functions and procedures

2024-02-07 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-34355:


Assignee: Shuai Xu

> Release Testing: Verify FLINK-34054 Support named parameters for functions 
> and procedures
> -
>
> Key: FLINK-34355
> URL: https://issues.apache.org/jira/browse/FLINK-34355
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Shuai Xu
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> Test suggestion:
> 1. Implement a test UDF or Procedure and support Named Parameters.
> 2. When calling a function or procedure, use named parameters to verify if 
> the results are as expected.
> You can test the following scenarios:
> 1. Normal usage of named parameters, fully specifying each parameter.
> 2. Omitting unnecessary parameters.
> 3. Omitting necessary parameters to confirm if an error is reported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34390) Release Testing: Verify FLINK-33325 Built-in cross-platform powerful java profiler

2024-02-06 Thread Yun Tang (Jira)
Yun Tang created FLINK-34390:


 Summary: Release Testing: Verify FLINK-33325 Built-in 
cross-platform powerful java profiler
 Key: FLINK-34390
 URL: https://issues.apache.org/jira/browse/FLINK-34390
 Project: Flink
  Issue Type: Sub-task
Reporter: Yun Tang
Assignee: Rui Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34310) Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform powerful java profiler

2024-02-06 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-34310.
--
Resolution: Fixed

> Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform 
> powerful java profiler
> ---
>
> Key: FLINK-34310
> URL: https://issues.apache.org/jira/browse/FLINK-34310
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / REST, Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Yun Tang
>Priority: Blocker
> Fix For: 1.19.0
>
> Attachments: image-2024-02-06-14-09-39-874.png, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png, screenshot-4.png, screenshot-5.png
>
>
> Instructions:
> 1. For the default case, it will print the hint to tell users how to enable 
> this feature.
>  !screenshot-2.png! 
> 2. After we add {{rest.profiling.enabled: true}} in the configurations, we 
> can use this feature now, and the default mode should be {{ITIMER}}
>  !screenshot-3.png! 
> 3. We cannot create another profiling while one is running
>  !screenshot-4.png! 
> 4. We can get at most 10 profilling snapshots by default, and the older one 
> will be deleted automaticially.
>  !screenshot-5.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34310) Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform powerful java profiler

2024-02-06 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-34310:
-
Description: 
Instructions:
1. For the default case, it will print the hint to tell users how to enable 
this feature.
 !screenshot-2.png! 

2. After we add {{rest.profiling.enabled: true}} in the configurations, we can 
use this feature now, and the default mode should be {{ITIMER}}
 !screenshot-3.png! 

3. We cannot create another profiling while one is running
 !screenshot-4.png! 

4. We can get at most 10 profilling snapshots by default, and the older one 
will be deleted automaticially.
 !screenshot-5.png! 


> Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform 
> powerful java profiler
> ---
>
> Key: FLINK-34310
> URL: https://issues.apache.org/jira/browse/FLINK-34310
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / REST, Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Yun Tang
>Priority: Blocker
> Fix For: 1.19.0
>
> Attachments: image-2024-02-06-14-09-39-874.png, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png, screenshot-4.png, screenshot-5.png
>
>
> Instructions:
> 1. For the default case, it will print the hint to tell users how to enable 
> this feature.
>  !screenshot-2.png! 
> 2. After we add {{rest.profiling.enabled: true}} in the configurations, we 
> can use this feature now, and the default mode should be {{ITIMER}}
>  !screenshot-3.png! 
> 3. We cannot create another profiling while one is running
>  !screenshot-4.png! 
> 4. We can get at most 10 profilling snapshots by default, and the older one 
> will be deleted automaticially.
>  !screenshot-5.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34310) Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform powerful java profiler

2024-02-06 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-34310:
-
Attachment: screenshot-5.png

> Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform 
> powerful java profiler
> ---
>
> Key: FLINK-34310
> URL: https://issues.apache.org/jira/browse/FLINK-34310
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / REST, Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Yun Tang
>Priority: Blocker
> Fix For: 1.19.0
>
> Attachments: image-2024-02-06-14-09-39-874.png, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png, screenshot-4.png, screenshot-5.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34310) Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform powerful java profiler

2024-02-06 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-34310:
-
Attachment: screenshot-4.png

> Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform 
> powerful java profiler
> ---
>
> Key: FLINK-34310
> URL: https://issues.apache.org/jira/browse/FLINK-34310
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / REST, Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Yun Tang
>Priority: Blocker
> Fix For: 1.19.0
>
> Attachments: image-2024-02-06-14-09-39-874.png, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34310) Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform powerful java profiler

2024-02-06 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-34310:
-
Attachment: screenshot-3.png

> Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform 
> powerful java profiler
> ---
>
> Key: FLINK-34310
> URL: https://issues.apache.org/jira/browse/FLINK-34310
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / REST, Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Yun Tang
>Priority: Blocker
> Fix For: 1.19.0
>
> Attachments: image-2024-02-06-14-09-39-874.png, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34310) Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform powerful java profiler

2024-02-06 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-34310:
-
Attachment: screenshot-2.png

> Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform 
> powerful java profiler
> ---
>
> Key: FLINK-34310
> URL: https://issues.apache.org/jira/browse/FLINK-34310
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / REST, Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Yun Tang
>Priority: Blocker
> Fix For: 1.19.0
>
> Attachments: image-2024-02-06-14-09-39-874.png, screenshot-1.png, 
> screenshot-2.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34310) Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform powerful java profiler

2024-02-06 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-34310:


Assignee: Yun Tang  (was: Yu Chen)

> Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform 
> powerful java profiler
> ---
>
> Key: FLINK-34310
> URL: https://issues.apache.org/jira/browse/FLINK-34310
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / REST, Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Yun Tang
>Priority: Blocker
> Fix For: 1.19.0
>
> Attachments: image-2024-02-06-14-09-39-874.png, screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34310) Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform powerful java profiler

2024-02-06 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814666#comment-17814666
 ] 

Yun Tang commented on FLINK-34310:
--

I could help to provide the testing instructions, and I will assign it to 
[~fanrui] later.

> Release Testing Instructions: Verify FLINK-33325 Built-in cross-platform 
> powerful java profiler
> ---
>
> Key: FLINK-34310
> URL: https://issues.apache.org/jira/browse/FLINK-34310
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / REST, Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Yu Chen
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
> Attachments: image-2024-02-06-14-09-39-874.png, screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34007) Flink Job stuck in suspend state after losing leadership in HA Mode

2024-02-01 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813138#comment-17813138
 ] 

Yun Tang commented on FLINK-34007:
--

It seems we have had a long discussion, does this problem also exist in 
Flink-1.17? [~wangyang0918] [~mapohl]

> Flink Job stuck in suspend state after losing leadership in HA Mode
> ---
>
> Key: FLINK-34007
> URL: https://issues.apache.org/jira/browse/FLINK-34007
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1, 1.18.2
>Reporter: Zhenqiu Huang
>Assignee: Matthias Pohl
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: Debug.log, LeaderElector-Debug.json, job-manager.log
>
>
> The observation is that Job manager goes to suspend state with a failed 
> container not able to register itself to resource manager after timeout.
> JM Log, see attached
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33325) FLIP-375: Built-in cross-platform powerful java profiler

2024-01-22 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-33325:
-
Fix Version/s: 1.19.0

> FLIP-375: Built-in cross-platform powerful java profiler
> 
>
> Key: FLINK-33325
> URL: https://issues.apache.org/jira/browse/FLINK-33325
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST, Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Yu Chen
>Assignee: Yu Chen
>Priority: Major
> Fix For: 1.19.0
>
>
> This is an umbrella JIRA of 
> [FLIP-375|https://cwiki.apache.org/confluence/x/64lEE]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34029) Support different profiling mode on Flink WEB

2024-01-22 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-34029.
--
Fix Version/s: 1.19.0
 Assignee: Yu Chen
   Resolution: Fixed

merged in master: 4db6e72ed766791d25ee0379c7c29d1b4e2c08df

> Support different profiling mode on Flink WEB
> -
>
> Key: FLINK-34029
> URL: https://issues.apache.org/jira/browse/FLINK-34029
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Yu Chen
>Assignee: Yu Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33696) FLIP-385: Add OpenTelemetryTraceReporter and OpenTelemetryMetricReporter

2024-01-22 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809320#comment-17809320
 ] 

Yun Tang commented on FLINK-33696:
--

[~pnowojski], is this ticket done for FLIP-385?

> FLIP-385: Add OpenTelemetryTraceReporter and OpenTelemetryMetricReporter
> 
>
> Key: FLINK-33696
> URL: https://issues.apache.org/jira/browse/FLINK-33696
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Metrics
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
>Priority: Major
> Fix For: 1.19.0
>
>
> h1. Motivation
> [FLIP-384|https://cwiki.apache.org/confluence/display/FLINK/FLIP-384%3A+Introduce+TraceReporter+and+use+it+to+create+checkpointing+and+recovery+traces]
>  is adding TraceReporter interface. However with 
> [FLIP-384|https://cwiki.apache.org/confluence/display/FLINK/FLIP-384%3A+Introduce+TraceReporter+and+use+it+to+create+checkpointing+and+recovery+traces]
>  alone, Log4jTraceReporter would be the only available implementation of 
> TraceReporter interface, which is not very helpful.
> In this FLIP I’m proposing to contribute both MetricExporter and 
> TraceReporter implementation using OpenTelemetry.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34164) [Benchmark] Compilation error since Jan. 16th

2024-01-18 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-34164:
-
Fix Version/s: 1.19.0

> [Benchmark] Compilation error since Jan. 16th
> -
>
> Key: FLINK-34164
> URL: https://issues.apache.org/jira/browse/FLINK-34164
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks
>Reporter: Zakelly Lan
>Assignee: Junrui Li
>Priority: Critical
> Fix For: 1.19.0
>
>
> An error occured during the benchmark compile:
> {code:java}
> 13:17:40 [ERROR] 
> /mnt/jenkins/workspace/flink-main-benchmarks/flink-benchmarks/warning:[options]
>  bootstrap class path not set in conjunction with -source 8
> 13:17:40 
> /mnt/jenkins/workspace/flink-main-benchmarks/flink-benchmarks/src/main/java/org/apache/flink/benchmark/StreamGraphUtils.java:38:19:
>  error: cannot find symbol {code}
> It seems related with the FLINK-33980



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34164) [Benchmark] Compilation error since Jan. 16th

2024-01-18 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-34164:
-
Priority: Critical  (was: Major)

> [Benchmark] Compilation error since Jan. 16th
> -
>
> Key: FLINK-34164
> URL: https://issues.apache.org/jira/browse/FLINK-34164
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks
>Reporter: Zakelly Lan
>Assignee: Junrui Li
>Priority: Critical
>
> An error occured during the benchmark compile:
> {code:java}
> 13:17:40 [ERROR] 
> /mnt/jenkins/workspace/flink-main-benchmarks/flink-benchmarks/warning:[options]
>  bootstrap class path not set in conjunction with -source 8
> 13:17:40 
> /mnt/jenkins/workspace/flink-main-benchmarks/flink-benchmarks/src/main/java/org/apache/flink/benchmark/StreamGraphUtils.java:38:19:
>  error: cannot find symbol {code}
> It seems related with the FLINK-33980



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34148) Potential regression (Jan. 13): stringWrite with Java8

2024-01-18 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-34148:
-
Priority: Critical  (was: Major)

> Potential regression (Jan. 13): stringWrite with Java8
> --
>
> Key: FLINK-34148
> URL: https://issues.apache.org/jira/browse/FLINK-34148
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System
>Reporter: Zakelly Lan
>Priority: Critical
> Fix For: 1.19.0
>
>
> Significant drop of performance in stringWrite with Java8 from commit 
> [881062f352|https://github.com/apache/flink/commit/881062f352f8bf8c21ab7cbea95e111fd82fdf20]
>  to 
> [5d9d8748b6|https://github.com/apache/flink/commit/5d9d8748b64ff1a75964a5cd2857ab5061312b51]
>  . It only involves strings not so long (128 or 4).
> stringWrite.128.ascii(Java8) baseline=1089.107756 current_value=754.52452
> stringWrite.128.chinese(Java8) baseline=504.244575 current_value=295.358989
> stringWrite.128.russian(Java8) baseline=655.582639 current_value=421.030188
> stringWrite.4.chinese(Java8) baseline=9598.791964 current_value=6627.929927
> stringWrite.4.russian(Java8) baseline=11070.666415 current_value=8289.95767



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34148) Potential regression (Jan. 13): stringWrite with Java8

2024-01-18 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-34148:
-
Fix Version/s: 1.19.0

> Potential regression (Jan. 13): stringWrite with Java8
> --
>
> Key: FLINK-34148
> URL: https://issues.apache.org/jira/browse/FLINK-34148
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System
>Reporter: Zakelly Lan
>Priority: Major
> Fix For: 1.19.0
>
>
> Significant drop of performance in stringWrite with Java8 from commit 
> [881062f352|https://github.com/apache/flink/commit/881062f352f8bf8c21ab7cbea95e111fd82fdf20]
>  to 
> [5d9d8748b6|https://github.com/apache/flink/commit/5d9d8748b64ff1a75964a5cd2857ab5061312b51]
>  . It only involves strings not so long (128 or 4).
> stringWrite.128.ascii(Java8) baseline=1089.107756 current_value=754.52452
> stringWrite.128.chinese(Java8) baseline=504.244575 current_value=295.358989
> stringWrite.128.russian(Java8) baseline=655.582639 current_value=421.030188
> stringWrite.4.chinese(Java8) baseline=9598.791964 current_value=6627.929927
> stringWrite.4.russian(Java8) baseline=11070.666415 current_value=8289.95767



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-33434) Support invoke async-profiler on Taskmanager through REST API

2024-01-18 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-33434.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

merged in master: 525f6bc818eb7f15a19fa81421584920de8f8876 ... 
4bee4e6e8ddb41ae9933d04bf21183223db6c2de

> Support invoke async-profiler on Taskmanager through REST API
> -
>
> Key: FLINK-33434
> URL: https://issues.apache.org/jira/browse/FLINK-33434
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / REST
>Affects Versions: 1.19.0
>Reporter: Yu Chen
>Assignee: Yu Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34072) Use JAVA_RUN in shell scripts

2024-01-17 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-34072.
--
Resolution: Fixed

merged in master: e7c8cd1562ebd45c1f7b48f519a11c6cd4fdf100

> Use JAVA_RUN in shell scripts
> -
>
> Key: FLINK-34072
> URL: https://issues.apache.org/jira/browse/FLINK-34072
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Scripts
>Reporter: Yun Tang
>Assignee: Yu Chen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> We should call {{JAVA_RUN}} in all cases when we launch {{java}} command, 
> otherwise we might be able to run the {{java}} if JAVA_HOME is not set.
> such as:
> {code:java}
> flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT/bin/config.sh: line 339: > 17 : 
> syntax error: operand expected (error token is "> 17 ")
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-18255) Add API annotations to RocksDB user-facing classes

2024-01-16 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807535#comment-17807535
 ] 

Yun Tang commented on FLINK-18255:
--

I think we should need a FLIP, just like 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278465498
[~lijinzhong], already assigned to you.

> Add API annotations to RocksDB user-facing classes
> --
>
> Key: FLINK-18255
> URL: https://issues.apache.org/jira/browse/FLINK-18255
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.11.0
>Reporter: Nico Kruber
>Assignee: Jinzhong Li
>Priority: Major
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
> Fix For: 1.19.0
>
>
> Several user-facing classes in {{flink-statebackend-rocksdb}} don't have any 
> API annotations, not even {{@PublicEvolving}}. These should be added to 
> clarify their usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-18255) Add API annotations to RocksDB user-facing classes

2024-01-16 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-18255:


Assignee: Jinzhong Li

> Add API annotations to RocksDB user-facing classes
> --
>
> Key: FLINK-18255
> URL: https://issues.apache.org/jira/browse/FLINK-18255
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.11.0
>Reporter: Nico Kruber
>Assignee: Jinzhong Li
>Priority: Major
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
> Fix For: 1.19.0
>
>
> Several user-facing classes in {{flink-statebackend-rocksdb}} don't have any 
> API annotations, not even {{@PublicEvolving}}. These should be added to 
> clarify their usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-18255) Add API annotations to RocksDB user-facing classes

2024-01-16 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-18255:
-
Priority: Major  (was: Not a Priority)

> Add API annotations to RocksDB user-facing classes
> --
>
> Key: FLINK-18255
> URL: https://issues.apache.org/jira/browse/FLINK-18255
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.11.0
>Reporter: Nico Kruber
>Priority: Major
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> Several user-facing classes in {{flink-statebackend-rocksdb}} don't have any 
> API annotations, not even {{@PublicEvolving}}. These should be added to 
> clarify their usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-18255) Add API annotations to RocksDB user-facing classes

2024-01-16 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-18255:
-
Fix Version/s: 1.19.0

> Add API annotations to RocksDB user-facing classes
> --
>
> Key: FLINK-18255
> URL: https://issues.apache.org/jira/browse/FLINK-18255
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.11.0
>Reporter: Nico Kruber
>Priority: Major
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
> Fix For: 1.19.0
>
>
> Several user-facing classes in {{flink-statebackend-rocksdb}} don't have any 
> API annotations, not even {{@PublicEvolving}}. These should be added to 
> clarify their usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34072) Use JAVA_RUN in shell scripts

2024-01-14 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806474#comment-17806474
 ] 

Yun Tang commented on FLINK-34072:
--

[~Yu Chen] Already assigned, please go ahead.

> Use JAVA_RUN in shell scripts
> -
>
> Key: FLINK-34072
> URL: https://issues.apache.org/jira/browse/FLINK-34072
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Scripts
>Reporter: Yun Tang
>Assignee: Yu Chen
>Priority: Minor
> Fix For: 1.19.0
>
>
> We should call {{JAVA_RUN}} in all cases when we launch {{java}} command, 
> otherwise we might be able to run the {{java}} if JAVA_HOME is not set.
> such as:
> {code:java}
> flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT/bin/config.sh: line 339: > 17 : 
> syntax error: operand expected (error token is "> 17 ")
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34072) Use JAVA_RUN in shell scripts

2024-01-14 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-34072:


Assignee: Yu Chen

> Use JAVA_RUN in shell scripts
> -
>
> Key: FLINK-34072
> URL: https://issues.apache.org/jira/browse/FLINK-34072
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Scripts
>Reporter: Yun Tang
>Assignee: Yu Chen
>Priority: Minor
> Fix For: 1.19.0
>
>
> We should call {{JAVA_RUN}} in all cases when we launch {{java}} command, 
> otherwise we might be able to run the {{java}} if JAVA_HOME is not set.
> such as:
> {code:java}
> flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT/bin/config.sh: line 339: > 17 : 
> syntax error: operand expected (error token is "> 17 ")
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34072) Use JAVA_RUN in shell scripts

2024-01-14 Thread Yun Tang (Jira)
Yun Tang created FLINK-34072:


 Summary: Use JAVA_RUN in shell scripts
 Key: FLINK-34072
 URL: https://issues.apache.org/jira/browse/FLINK-34072
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Scripts
Reporter: Yun Tang
 Fix For: 1.19.0


We should call {{JAVA_RUN}} in all cases when we launch {{java}} command, 
otherwise we might be able to run the {{java}} if JAVA_HOME is not set.

such as:

{code:java}
flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT/bin/config.sh: line 339: > 17 : 
syntax error: operand expected (error token is "> 17 ")
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34013) ProfilingServiceTest.testRollingDeletion is unstable on AZP

2024-01-09 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-34013.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

merged in master: c09f07a406398bc4b2320e9b5ae0a8f5f27a00dc

> ProfilingServiceTest.testRollingDeletion is unstable on AZP
> ---
>
> Key: FLINK-34013
> URL: https://issues.apache.org/jira/browse/FLINK-34013
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Assignee: Yu Chen
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> This build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56073=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=7c1d86e3-35bd-5fd5-3b7c-30c126a78702=8258
>  fails as 
> {noformat}
> Jan 06 02:09:28 org.opentest4j.AssertionFailedError: expected: <2> but was: 
> <3>
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:531)
> Jan 06 02:09:28   at 
> org.apache.flink.runtime.util.profiler.ProfilingServiceTest.verifyRollingDeletionWorks(ProfilingServiceTest.java:167)
> Jan 06 02:09:28   at 
> org.apache.flink.runtime.util.profiler.ProfilingServiceTest.testRollingDeletion(ProfilingServiceTest.java:117)
> Jan 06 02:09:28   at java.lang.reflect.Method.invoke(Method.java:498)
> Jan 06 02:09:28   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34013) ProfilingServiceTest.testRollingDeletion is unstable on AZP

2024-01-08 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-34013:


Assignee: Yu Chen  (was: Yu Chen)

> ProfilingServiceTest.testRollingDeletion is unstable on AZP
> ---
>
> Key: FLINK-34013
> URL: https://issues.apache.org/jira/browse/FLINK-34013
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Assignee: Yu Chen
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> This build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56073=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=7c1d86e3-35bd-5fd5-3b7c-30c126a78702=8258
>  fails as 
> {noformat}
> Jan 06 02:09:28 org.opentest4j.AssertionFailedError: expected: <2> but was: 
> <3>
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:531)
> Jan 06 02:09:28   at 
> org.apache.flink.runtime.util.profiler.ProfilingServiceTest.verifyRollingDeletionWorks(ProfilingServiceTest.java:167)
> Jan 06 02:09:28   at 
> org.apache.flink.runtime.util.profiler.ProfilingServiceTest.testRollingDeletion(ProfilingServiceTest.java:117)
> Jan 06 02:09:28   at java.lang.reflect.Method.invoke(Method.java:498)
> Jan 06 02:09:28   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34013) ProfilingServiceTest.testRollingDeletion is unstable on AZP

2024-01-08 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-34013:


Assignee: Yu Chen

> ProfilingServiceTest.testRollingDeletion is unstable on AZP
> ---
>
> Key: FLINK-34013
> URL: https://issues.apache.org/jira/browse/FLINK-34013
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Assignee: Yu Chen
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> This build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56073=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=7c1d86e3-35bd-5fd5-3b7c-30c126a78702=8258
>  fails as 
> {noformat}
> Jan 06 02:09:28 org.opentest4j.AssertionFailedError: expected: <2> but was: 
> <3>
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:531)
> Jan 06 02:09:28   at 
> org.apache.flink.runtime.util.profiler.ProfilingServiceTest.verifyRollingDeletionWorks(ProfilingServiceTest.java:167)
> Jan 06 02:09:28   at 
> org.apache.flink.runtime.util.profiler.ProfilingServiceTest.testRollingDeletion(ProfilingServiceTest.java:117)
> Jan 06 02:09:28   at java.lang.reflect.Method.invoke(Method.java:498)
> Jan 06 02:09:28   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-33433) Support invoke async-profiler on Jobmanager through REST API

2024-01-02 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-33433.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

merged in master: 
240494fd6169cb98b47808a003ee00804a780360...3efe9d2b09bedde89322594f0f3927004b6b1adf

> Support invoke async-profiler on Jobmanager through REST API
> 
>
> Key: FLINK-33433
> URL: https://issues.apache.org/jira/browse/FLINK-33433
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / REST
>Affects Versions: 1.19.0
>Reporter: Yu Chen
>Assignee: Yu Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30535) Introduce TTL state based benchmarks

2023-12-25 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800407#comment-17800407
 ] 

Yun Tang commented on FLINK-30535:
--

Some work done on Flink side:
master: 0c82f8af859a4f463a07f5dfb35648970c1c3425

> Introduce TTL state based benchmarks
> 
>
> Key: FLINK-30535
> URL: https://issues.apache.org/jira/browse/FLINK-30535
> Project: Flink
>  Issue Type: New Feature
>  Components: Benchmarks
>Reporter: Yun Tang
>Assignee: Zakelly Lan
>Priority: Major
>  Labels: pull-request-available
>
> This ticket is inspired by https://issues.apache.org/jira/browse/FLINK-30088 
> which wants to optimize the TTL state performance. I think it would be useful 
> to introduce state benchmarks based on TTL as Flink has some overhead to 
> support TTL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30535) Introduce TTL state based benchmarks

2023-12-13 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796562#comment-17796562
 ] 

Yun Tang commented on FLINK-30535:
--

[~Zakelly] would you like to take this ticket?

> Introduce TTL state based benchmarks
> 
>
> Key: FLINK-30535
> URL: https://issues.apache.org/jira/browse/FLINK-30535
> Project: Flink
>  Issue Type: New Feature
>  Components: Benchmarks
>Reporter: Yun Tang
>Priority: Major
>
> This ticket is inspired by https://issues.apache.org/jira/browse/FLINK-30088 
> which wants to optimize the TTL state performance. I think it would be useful 
> to introduce state benchmarks based on TTL as Flink has some overhead to 
> support TTL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31752) SourceOperatorStreamTask increments numRecordsOut twice

2023-12-13 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-31752:
-
Fix Version/s: 1.17.1

> SourceOperatorStreamTask increments numRecordsOut twice
> ---
>
> Key: FLINK-31752
> URL: https://issues.apache.org/jira/browse/FLINK-31752
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Metrics
>Affects Versions: 1.17.0
>Reporter: !huwh
>Assignee: Yunfeng Zhou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.17.1
>
> Attachments: image-2023-04-07-15-51-44-304.png
>
>
> The counter of numRecordsOut was introduce to ChainingOutput to reduce the 
> function call stack depth in 
> https://issues.apache.org/jira/browse/FLINK-30536
> But SourceOperatorStreamTask.AsyncDataOutputToOutput increments the counter 
> of numRecordsOut too. This results in the source operator's numRecordsOut are 
> doubled.
> We should delete the numRecordsOut.inc in 
> SourceOperatorStreamTask.AsyncDataOutputToOutput.
> [~xtsong][~lindong] Could you please take a look at this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31752) SourceOperatorStreamTask increments numRecordsOut twice

2023-12-13 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-31752:
-
Fix Version/s: 1.18.0

> SourceOperatorStreamTask increments numRecordsOut twice
> ---
>
> Key: FLINK-31752
> URL: https://issues.apache.org/jira/browse/FLINK-31752
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Metrics
>Affects Versions: 1.17.0
>Reporter: !huwh
>Assignee: Yunfeng Zhou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
> Attachments: image-2023-04-07-15-51-44-304.png
>
>
> The counter of numRecordsOut was introduce to ChainingOutput to reduce the 
> function call stack depth in 
> https://issues.apache.org/jira/browse/FLINK-30536
> But SourceOperatorStreamTask.AsyncDataOutputToOutput increments the counter 
> of numRecordsOut too. This results in the source operator's numRecordsOut are 
> doubled.
> We should delete the numRecordsOut.inc in 
> SourceOperatorStreamTask.AsyncDataOutputToOutput.
> [~xtsong][~lindong] Could you please take a look at this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-23346) RocksDBStateBackend may core dump in flink_compactionfilterjni.cc

2023-12-12 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795654#comment-17795654
 ] 

Yun Tang commented on FLINK-23346:
--

[~Zakelly] current Flink community is trying to release a new FRocksDB version 
due to https://issues.apache.org/jira/browse/FLINK-8. Do you think you can 
fix it in this version?

> RocksDBStateBackend may core dump in flink_compactionfilterjni.cc
> -
>
> Key: FLINK-23346
> URL: https://issues.apache.org/jira/browse/FLINK-23346
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.14.0, 1.13.1, 1.12.4
>Reporter: Congxian Qiu
>Priority: Major
>
> The code in [flink_compactionfilte.cpp 
> |https://github.com/ververica/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/java/rocksjni/flink_compactionfilterjni.cc#L21]
> {code:cpp}
> inline void CheckAndRethrowException(JNIEnv* env) const {
> if (env->ExceptionCheck()) {
>   env->ExceptionDescribe();
>   env->Throw(env->ExceptionOccurred());
> }
> {code}
> may core dump in some sence, please see more information here[1][2][3]
> We can fix it by changing this to
> {code:cpp}
> inline void CheckAndRethrowException(JNIEnv* env) const {
> if (env->ExceptionCheck()) {
>   env->Throw(env->ExceptionOccurred());
> }
>   }
> {code}
> or
> {code:cpp}
>inline void CheckAndRethrowException(JNIEnv* env) const {
> if (env->ExceptionCheck()) {
>   jobject obj = env->ExceptionOccurred();
>   env->ExceptionDescribe();
>   env->Throw(obj);
> }
>   }
> {code}
> [1] 
> [https://stackoverflow.com/questions/30971068/does-jniexceptiondescribe-implicitily-clear-the-exception-trace-of-the-jni-env]
>  [2] [https://bugs.openjdk.java.net/browse/JDK-4067541]
>  [3] [https://bugs.openjdk.java.net/browse/JDK-8051947]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-33246) Add RescalingIT case that uses checkpoints and resource requests

2023-12-11 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-33246.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

 merged in master 98e4610f09f35a942e55472b5d358ebe113b0dba

> Add RescalingIT case that uses checkpoints and resource requests
> 
>
> Key: FLINK-33246
> URL: https://issues.apache.org/jira/browse/FLINK-33246
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Stefan Richter
>Assignee: Stefan Richter
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> RescalingITCase currently uses savepoints and cancel/restart for rescaling. 
> We should add a test that also tests rescaling from checkpoints under 
> changing resource requirements, i.e. without cancelation of the job.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33341) Use available local keyed state for rescaling

2023-12-11 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-33341:
-
Fix Version/s: 1.19.0

> Use available local keyed state for rescaling
> -
>
> Key: FLINK-33341
> URL: https://issues.apache.org/jira/browse/FLINK-33341
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: Stefan Richter
>Assignee: Stefan Richter
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Local state is currently only used for recovery. However, it would make sense 
> to also use available local state in rescaling scenarios to reduce the amount 
> of data to download from remote storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33741) Exposed Rocksdb statistics in Flink metrics and introduce 2 Rocksdb statistic related configuration

2023-12-10 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795142#comment-17795142
 ] 

Yun Tang commented on FLINK-33741:
--

[~zhoujira86] I think there exists valuable information in the RocksDB 
statistics, assigned to you, please go ahead.

> Exposed Rocksdb statistics in Flink metrics and introduce 2 Rocksdb statistic 
> related configuration
> ---
>
> Key: FLINK-33741
> URL: https://issues.apache.org/jira/browse/FLINK-33741
> Project: Flink
>  Issue Type: New Feature
>Reporter: xiaogang zhou
>Assignee: xiaogang zhou
>Priority: Major
>
> I think we can also parse the multi-line string of the rocksdb statistics.
> {code:java}
> // code placeholder
> /**
>  * DB implements can export properties about their state
>  * via this method on a per column family level.
>  *
>  * If {@code property} is a valid property understood by this DB
>  * implementation, fills {@code value} with its current value and
>  * returns true. Otherwise returns false.
>  *
>  * Valid property names include:
>  * 
>  * "rocksdb.num-files-at-levelN" - return the number of files at
>  * level N, where N is an ASCII representation of a level
>  * number (e.g. "0").
>  * "rocksdb.stats" - returns a multi-line string that describes statistics
>  * about the internal operation of the DB.
>  * "rocksdb.sstables" - returns a multi-line string that describes all
>  *of the sstables that make up the db contents.
>  * 
>  *
>  * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle}
>  * instance, or null for the default column family.
>  * @param property to be fetched. See above for examples
>  * @return property value
>  *
>  * @throws RocksDBException thrown if error happens in underlying
>  *native library.
>  */
> public String getProperty(
> /* @Nullable */ final ColumnFamilyHandle columnFamilyHandle,
> final String property) throws RocksDBException { {code}
>  
> Then we can directly export these rt latency number in metrics.
>  
> I'd like to introduce 2 rocksdb statistic related configuration.
> Then we can customize stats
> {code:java}
> // code placeholder
> Statistics s = new Statistics();
> s.setStatsLevel(EXCEPT_TIME_FOR_MUTEX);
> currentOptions.setStatsDumpPeriodSec(internalGetOption(RocksDBConfigurableOptions.STATISTIC_DUMP_PERIOD))
> .setStatistics(s); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33741) Exposed Rocksdb statistics in Flink metrics and introduce 2 Rocksdb statistic related configuration

2023-12-10 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-33741:


Assignee: xiaogang zhou

> Exposed Rocksdb statistics in Flink metrics and introduce 2 Rocksdb statistic 
> related configuration
> ---
>
> Key: FLINK-33741
> URL: https://issues.apache.org/jira/browse/FLINK-33741
> Project: Flink
>  Issue Type: New Feature
>Reporter: xiaogang zhou
>Assignee: xiaogang zhou
>Priority: Major
>
> I think we can also parse the multi-line string of the rocksdb statistics.
> {code:java}
> // code placeholder
> /**
>  * DB implements can export properties about their state
>  * via this method on a per column family level.
>  *
>  * If {@code property} is a valid property understood by this DB
>  * implementation, fills {@code value} with its current value and
>  * returns true. Otherwise returns false.
>  *
>  * Valid property names include:
>  * 
>  * "rocksdb.num-files-at-levelN" - return the number of files at
>  * level N, where N is an ASCII representation of a level
>  * number (e.g. "0").
>  * "rocksdb.stats" - returns a multi-line string that describes statistics
>  * about the internal operation of the DB.
>  * "rocksdb.sstables" - returns a multi-line string that describes all
>  *of the sstables that make up the db contents.
>  * 
>  *
>  * @param columnFamilyHandle {@link org.rocksdb.ColumnFamilyHandle}
>  * instance, or null for the default column family.
>  * @param property to be fetched. See above for examples
>  * @return property value
>  *
>  * @throws RocksDBException thrown if error happens in underlying
>  *native library.
>  */
> public String getProperty(
> /* @Nullable */ final ColumnFamilyHandle columnFamilyHandle,
> final String property) throws RocksDBException { {code}
>  
> Then we can directly export these rt latency number in metrics.
>  
> I'd like to introduce 2 rocksdb statistic related configuration.
> Then we can customize stats
> {code:java}
> // code placeholder
> Statistics s = new Statistics();
> s.setStatsLevel(EXCEPT_TIME_FOR_MUTEX);
> currentOptions.setStatsDumpPeriodSec(internalGetOption(RocksDBConfigurableOptions.STATISTIC_DUMP_PERIOD))
> .setStatistics(s); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-24819) Higher APIServer cpu load after using SharedIndexInformer replaced naked Kubernetes watch

2023-12-05 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-24819:
-
Fix Version/s: 1.19.0
   1.18.1
   1.17.3

> Higher APIServer cpu load after using SharedIndexInformer replaced naked 
> Kubernetes watch
> -
>
> Key: FLINK-24819
> URL: https://issues.apache.org/jira/browse/FLINK-24819
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.14.0
>Reporter: Yang Wang
>Priority: Major
> Fix For: 1.19.0, 1.18.1, 1.17.3
>
>
> In FLINK-22054, Flink has used a shared informer for ConfigMap to replace the 
> naked K8s watch. After then, each Flink JVM process(JM/TM) only needs one 
> connection to APIServer for ConfigMap watching. It aims to reduce the network 
> pressure on K8s APIServer.
>  
> However, in our recent tests, we found that the CPU and memory cost of 
> APIServer have been doubled while running same Flink workloads. After digging 
> more details in the K8s, I think the root cause might be that ETCD does not 
> have indexes for labels. It means APIServer need to pull all the events from 
> ETCD for each watch and then filter with specified labels(e.g. 
> app=xxx,type=flink-native-kubernetes,configmap-type=high-availability) 
> internally. Before FLINK-22054, we started a dedicated connection for each 
> ConfigMap watching. And it seems that APIServer only need to pull the events 
> for the specified ConfigMap name.
>  
> Watch URL example(Before):
> [https://kubernetes.default:6443/api/v1/namespaces/vvp-workload/configmaps?metadata.name=job-009d4f51-ca02-4793-a49b-a3344538719b-resourcemanager-leader=true|https://kubernetes.default:6443/api/v1/namespaces/vvp-workload/configmaps?labelSelector=app%3Dk8s-ha-app-1-1636077491-23461%2Ctype%3Dflink-native-kubernetes%2Cconfigmap-type%3Dhigh-availability=1153687321=true]
>  
> Watch URL example(After):
> [https://kubernetes.default:6443/api/v1/namespaces/vvp-workload/configmaps?labelSelector=app%3Dk8s-ha-app-1-1636077491-23461%2Ctype%3Dflink-native-kubernetes%2Cconfigmap-type%3Dhigh-availability=1153687321=true]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-24819) Higher APIServer cpu load after using SharedIndexInformer replaced naked Kubernetes watch

2023-12-05 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-24819.
--
Resolution: Fixed

> Higher APIServer cpu load after using SharedIndexInformer replaced naked 
> Kubernetes watch
> -
>
> Key: FLINK-24819
> URL: https://issues.apache.org/jira/browse/FLINK-24819
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.14.0
>Reporter: Yang Wang
>Priority: Major
>
> In FLINK-22054, Flink has used a shared informer for ConfigMap to replace the 
> naked K8s watch. After then, each Flink JVM process(JM/TM) only needs one 
> connection to APIServer for ConfigMap watching. It aims to reduce the network 
> pressure on K8s APIServer.
>  
> However, in our recent tests, we found that the CPU and memory cost of 
> APIServer have been doubled while running same Flink workloads. After digging 
> more details in the K8s, I think the root cause might be that ETCD does not 
> have indexes for labels. It means APIServer need to pull all the events from 
> ETCD for each watch and then filter with specified labels(e.g. 
> app=xxx,type=flink-native-kubernetes,configmap-type=high-availability) 
> internally. Before FLINK-22054, we started a dedicated connection for each 
> ConfigMap watching. And it seems that APIServer only need to pull the events 
> for the specified ConfigMap name.
>  
> Watch URL example(Before):
> [https://kubernetes.default:6443/api/v1/namespaces/vvp-workload/configmaps?metadata.name=job-009d4f51-ca02-4793-a49b-a3344538719b-resourcemanager-leader=true|https://kubernetes.default:6443/api/v1/namespaces/vvp-workload/configmaps?labelSelector=app%3Dk8s-ha-app-1-1636077491-23461%2Ctype%3Dflink-native-kubernetes%2Cconfigmap-type%3Dhigh-availability=1153687321=true]
>  
> Watch URL example(After):
> [https://kubernetes.default:6443/api/v1/namespaces/vvp-workload/configmaps?labelSelector=app%3Dk8s-ha-app-1-1636077491-23461%2Ctype%3Dflink-native-kubernetes%2Cconfigmap-type%3Dhigh-availability=1153687321=true]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-32611) Redirect to Apache Paimon's link instead of legacy flink table store

2023-12-04 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-32611.
--
Fix Version/s: 1.18.1
   Resolution: Fixed

merged in flink-web: 4cd8ab2fa927f48d74ed53e79a5e83efa674a720

> Redirect to Apache Paimon's link instead of legacy flink table store
> 
>
> Key: FLINK-32611
> URL: https://issues.apache.org/jira/browse/FLINK-32611
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, Project Website
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.19.0, 1.18.1
>
>
> Current Flink's official web site would always point to the legacy flink 
> table store. However, we should point to the new Apache Paimon website and 
> docs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33707) Verify the snapshot migration on Java17

2023-11-30 Thread Yun Tang (Jira)
Yun Tang created FLINK-33707:


 Summary: Verify the snapshot migration on Java17
 Key: FLINK-33707
 URL: https://issues.apache.org/jira/browse/FLINK-33707
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing
Reporter: Yun Tang


This task is like FLINK-33699, I think we could introduce a 
StatefulJobSnapshotMigrationITCase-like test to restore snapshots containing 
scala code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33699) Verify the snapshot migration on Java21

2023-11-30 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791592#comment-17791592
 ] 

Yun Tang commented on FLINK-33699:
--

I think it's better to introduce new tests to cover this problem. I checked the 
[CI|https://dev.azure.com/snuyanzin/flink/_build/results?buildId=2620=logs=0a15d512-44ac-5ba5-97ab-13a5d066c22c=9a028d19-6c4b-5a4e-d378-03fca149d0b1]
 you triggered before and noticed that the 
{{StatefulJobSnapshotMigrationITCase}} related tests have passed, which proves 
what I guessed before, most checkpoints/savepoints should be restored 
successfully.



> Verify the snapshot migration on Java21
> ---
>
> Key: FLINK-33699
> URL: https://issues.apache.org/jira/browse/FLINK-33699
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Checkpointing
>Reporter: Yun Tang
>Priority: Major
>
> In Java 21 builds, Scala is being bumped to 2.12.18, which causes 
> incompatibilities within Flink.
> This could affect loading savepoints from a Java 8/11/17 build. We already 
> have tests extending {{SnapshotMigrationTestBase}} to verify the logic of 
> migrating snapshots generated by the older Flink version. I think we can also 
> introduce similar tests to verify the logic across different Java versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33699) Verify the snapshot migration on Java21

2023-11-29 Thread Yun Tang (Jira)
Yun Tang created FLINK-33699:


 Summary: Verify the snapshot migration on Java21
 Key: FLINK-33699
 URL: https://issues.apache.org/jira/browse/FLINK-33699
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing
Reporter: Yun Tang


In Java 21 builds, Scala is being bumped to 2.12.18, which causes 
incompatibilities within Flink.

This could affect loading savepoints from a Java 8/11/17 build. We already have 
tests extending {{SnapshotMigrationTestBase}} to verify the logic of migrating 
snapshots generated by the older Flink version. I think we can also introduce 
similar tests to verify the logic across different Java versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33395) The join hint doesn't work when appears in subquery

2023-11-28 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-33395:
-
Fix Version/s: 1.17.3
   (was: 1.17.2)

> The join hint doesn't work when appears in subquery
> ---
>
> Key: FLINK-33395
> URL: https://issues.apache.org/jira/browse/FLINK-33395
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.16.0, 1.17.0, 1.18.0
>Reporter: xuyang
>Assignee: xuyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.18.1, 1.17.3
>
>
> See the existent test 
> 'NestLoopJoinHintTest#testJoinHintWithJoinHintInCorrelateAndWithAgg', the 
> test plan is 
> {code:java}
> HashJoin(joinType=[LeftSemiJoin], where=[=(a1, EXPR$0)], select=[a1, b1], 
> build=[right], tryDistinctBuildRow=[true])
> :- Exchange(distribution=[hash[a1]])
> :  +- TableSourceScan(table=[[default_catalog, default_database, T1]], 
> fields=[a1, b1])
> +- Exchange(distribution=[hash[EXPR$0]])
>+- LocalHashAggregate(groupBy=[EXPR$0], select=[EXPR$0])
>   +- Calc(select=[EXPR$0])
>  +- HashAggregate(isMerge=[true], groupBy=[a1], select=[a1, 
> Final_COUNT(count$0) AS EXPR$0])
> +- Exchange(distribution=[hash[a1]])
>+- LocalHashAggregate(groupBy=[a1], select=[a1, 
> Partial_COUNT(a2) AS count$0])
>   +- NestedLoopJoin(joinType=[InnerJoin], where=[=(a2, a1)], 
> select=[a2, a1], build=[right])
>  :- TableSourceScan(table=[[default_catalog, 
> default_database, T2, project=[a2], metadata=[]]], fields=[a2], 
> hints=[[[ALIAS options:[T2)
>  +- Exchange(distribution=[broadcast])
> +- TableSourceScan(table=[[default_catalog, 
> default_database, T1, project=[a1], metadata=[]]], fields=[a1], 
> hints=[[[ALIAS options:[T1) {code}
> but the NestedLoopJoin should broadcase left side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31385) Introduce extended Assertj Matchers for completable futures

2023-11-23 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789347#comment-17789347
 ] 

Yun Tang commented on FLINK-31385:
--

Also picked in release-1.17 2a83c910eee711f8b5f9dd4697de60221f21fb9d for the 
pick of FLINK-33598.

> Introduce extended Assertj Matchers for completable futures
> ---
>
> Key: FLINK-31385
> URL: https://issues.apache.org/jira/browse/FLINK-31385
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: David Morávek
>Assignee: David Morávek
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.17.3
>
>
> Introduce extended Assertj Matchers for completable futures that don't rely 
> on timeouts.
> In general, we want to avoid relying on timeouts in the Flink test suite to 
> get additional context (thread dump) in case something gets stuck.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31385) Introduce extended Assertj Matchers for completable futures

2023-11-23 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-31385:
-
Fix Version/s: 1.17.3

> Introduce extended Assertj Matchers for completable futures
> ---
>
> Key: FLINK-31385
> URL: https://issues.apache.org/jira/browse/FLINK-31385
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: David Morávek
>Assignee: David Morávek
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.17.3
>
>
> Introduce extended Assertj Matchers for completable futures that don't rely 
> on timeouts.
> In general, we want to avoid relying on timeouts in the Flink test suite to 
> get additional context (thread dump) in case something gets stuck.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-33598) Watch HA configmap via name instead of lables to reduce pressure on APIserver

2023-11-23 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-33598.
--
Resolution: Fixed

Merged
master: 608546e090f5d41c6a8b9af2c264467279181027 ... 
b7e8b792c086c3c445ee8429fbcfe035097a878c

release-1.18: 6f30c6e427251dd4b2e4ad03f89bed06a519b05f
release-1.17: 18d5a4696eccac3b5e7fe1d579547feef4537c08

> Watch HA configmap via name instead of lables to reduce pressure on APIserver 
> --
>
> Key: FLINK-33598
> URL: https://issues.apache.org/jira/browse/FLINK-33598
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.18.0, 1.17.1
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.18.1, 1.17.3
>
>
> As FLINK-24819 described, the k8s API server would receive more pressure when 
> HA is enabled, due to the configmap watching being achieved via filter with 
> labels instead of just querying the configmap name. This could be done after 
> FLINK-24038, which reduced the number of configmaps to only one as 
> {{-cluster-config-map}}.
> This ticket would not touch {{--config-map}}, which stores 
> the checkpoint information, as that configmap is directly accessed by JM and 
> not watched by taskmanagers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33149) Bump snappy-java to 1.1.10.4

2023-11-22 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788969#comment-17788969
 ] 

Yun Tang commented on FLINK-33149:
--

[~mapohl] When can we close this ticket? 

> Bump snappy-java to 1.1.10.4
> 
>
> Key: FLINK-33149
> URL: https://issues.apache.org/jira/browse/FLINK-33149
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / AWS, Connectors / HBase, 
> Connectors / Kafka, Stateful Functions
>Affects Versions: 1.18.0, 1.16.3, 1.17.2
>Reporter: Ryan Skraba
>Assignee: Ryan Skraba
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, kafka-4.0.0, 1.16.3, 1.17.2
>
>
> Xerial published a security alert for a Denial of Service attack that [exists 
> on 
> 1.1.10.1|https://github.com/xerial/snappy-java/security/advisories/GHSA-55g7-9cwv-5qfv].
> This is included in flink-dist, but also in flink-statefun, and several 
> connectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33598) Watch HA configmap via name instead of lables to reduce pressure on APIserver

2023-11-21 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-33598:
-
Description: 
As FLINK-24819 described, the k8s API server would receive more pressure when 
HA is enabled, due to the configmap watching being achieved via filter with 
labels instead of just querying the configmap name. This could be done after 
FLINK-24038, which reduced the number of configmaps to only one as 
{{-cluster-config-map}}.

This ticket would not touch {{--config-map}}, which stores 
the checkpoint information, as that configmap is directly accessed by JM and 
not watched by taskmanagers.

  was:
As FLINK-24819 described, the k8s API server would receive more pressure when 
HA is enabled, due to the configmap watching being achieved via filter with 
labels instead of just querying the configmap name. This could be done after 
FLINK-24038, which reduced the number of configmaps to only one as 
{{-cluster-config-map}}.

This ticket would not touch {{--config-map}}, which stores 
the checkpoint information, as that configmap is only used by JM and not 
watched by taskmanagers.


> Watch HA configmap via name instead of lables to reduce pressure on APIserver 
> --
>
> Key: FLINK-33598
> URL: https://issues.apache.org/jira/browse/FLINK-33598
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.18.0, 1.17.1
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.18.1, 1.17.3
>
>
> As FLINK-24819 described, the k8s API server would receive more pressure when 
> HA is enabled, due to the configmap watching being achieved via filter with 
> labels instead of just querying the configmap name. This could be done after 
> FLINK-24038, which reduced the number of configmaps to only one as 
> {{-cluster-config-map}}.
> This ticket would not touch {{--config-map}}, which stores 
> the checkpoint information, as that configmap is directly accessed by JM and 
> not watched by taskmanagers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33598) Watch HA configmap via name instead of lables to reduce pressure on APIserver

2023-11-21 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-33598:
-
Description: 
As FLINK-24819 described, the k8s API server would receive more pressure when 
HA is enabled, due to the configmap watching being achieved via filter with 
labels instead of just querying the configmap name. This could be done after 
FLINK-24038, which reduced the number of configmaps to only one as 
{{-cluster-config-map}}.

This ticket would not touch {{--config-map}}, which stores 
the checkpoint information, as that configmap is only used by JM and not 
watched by taskmanagers.

  was:As FLINK-24819 described, the k8s API server would receive more pressure 
when HA is enabled, due to the configmap watching being achieved via filter 
with labels instead of just querying the configmap name. This could be done 
after FLINK-24038, which reduced the number of configmaps to only one as 
{{-cluster-config-map}}.


> Watch HA configmap via name instead of lables to reduce pressure on APIserver 
> --
>
> Key: FLINK-33598
> URL: https://issues.apache.org/jira/browse/FLINK-33598
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.18.0, 1.17.1
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.18.1, 1.17.3
>
>
> As FLINK-24819 described, the k8s API server would receive more pressure when 
> HA is enabled, due to the configmap watching being achieved via filter with 
> labels instead of just querying the configmap name. This could be done after 
> FLINK-24038, which reduced the number of configmaps to only one as 
> {{-cluster-config-map}}.
> This ticket would not touch {{--config-map}}, which stores 
> the checkpoint information, as that configmap is only used by JM and not 
> watched by taskmanagers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33598) Watch HA configmap via name instead of lables to reduce pressure on APIserver

2023-11-21 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-33598:
-
Description: As FLINK-24819 described, the k8s API server would receive 
more pressure when HA is enabled, due to the configmap watching being achieved 
via filter with labels instead of just querying the configmap name. This could 
be done after FLINK-24038, which reduced the number of configmaps to only one 
as {{-cluster-config-map}}.  (was: As FLINK-24819 described, the k8s 
API server would receive more pressure when HA is enabled, due to the configmap 
watching being achieved via filter with labels instead of just querying the 
configmap name. This could be done after FLINK-24038, which reduced the number 
of configmaps to only one.)

> Watch HA configmap via name instead of lables to reduce pressure on APIserver 
> --
>
> Key: FLINK-33598
> URL: https://issues.apache.org/jira/browse/FLINK-33598
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.18.0, 1.17.1
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.18.1, 1.17.3
>
>
> As FLINK-24819 described, the k8s API server would receive more pressure when 
> HA is enabled, due to the configmap watching being achieved via filter with 
> labels instead of just querying the configmap name. This could be done after 
> FLINK-24038, which reduced the number of configmaps to only one as 
> {{-cluster-config-map}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33598) Watch HA configmap via name instead of lables to reduce pressure on APIserver

2023-11-20 Thread Yun Tang (Jira)
Yun Tang created FLINK-33598:


 Summary: Watch HA configmap via name instead of lables to reduce 
pressure on APIserver 
 Key: FLINK-33598
 URL: https://issues.apache.org/jira/browse/FLINK-33598
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Affects Versions: 1.17.1, 1.18.0
Reporter: Yun Tang
Assignee: Yun Tang
 Fix For: 1.19.0, 1.18.1, 1.17.3


As FLINK-24819 described, the k8s API server would receive more pressure when 
HA is enabled, due to the configmap watching being achieved via filter with 
labels instead of just querying the configmap name. This could be done after 
FLINK-24038, which reduced the number of configmaps to only one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33263) Implement ParallelismProvider for sources in the table planner

2023-11-15 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786219#comment-17786219
 ] 

Yun Tang commented on FLINK-33263:
--

[~Zhanghao Chen] Thanks for the update. Looking forward to the PR.

> Implement ParallelismProvider for sources in the table planner
> --
>
> Key: FLINK-33263
> URL: https://issues.apache.org/jira/browse/FLINK-33263
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Zhanghao Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33263) Implement ParallelismProvider for sources in Blink planner

2023-11-14 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784448#comment-17784448
 ] 

Yun Tang edited comment on FLINK-33263 at 11/15/23 4:01 AM:


[~Zhanghao Chen] Do we still have some specific planner called {{Blink}} 
planner currently? There is only one table planner now.


was (Author: yunta):
Do we still have some specific planner called {{Blink}} planner currently? 
There is only one table planner now.

> Implement ParallelismProvider for sources in Blink planner
> --
>
> Key: FLINK-33263
> URL: https://issues.apache.org/jira/browse/FLINK-33263
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Zhanghao Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33263) Implement ParallelismProvider for sources in Blink planner

2023-11-09 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784448#comment-17784448
 ] 

Yun Tang commented on FLINK-33263:
--

Do we still have some specific planner called {{Blink}} planner currently? 
There is only one table planner now.

> Implement ParallelismProvider for sources in Blink planner
> --
>
> Key: FLINK-33263
> URL: https://issues.apache.org/jira/browse/FLINK-33263
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Zhanghao Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-20672) notifyCheckpointAborted RPC failure can fail JM

2023-11-08 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784035#comment-17784035
 ] 

Yun Tang commented on FLINK-20672:
--

[~Zakelly] Thanks for the information. If so, I have another question: do we 
really need the {{io-executor}} to work with {{FatalExitExceptionHandler}}? 
From my point of view, if we do not delete the Savepoint correctly (as this is 
also executed on the {{io-executor}}), shall we need to fail the whole 
JobManager?

If the correct behavior of the exception handler of {{io-executor}} is not 
fatal exiting, I think we shall correct that behavior first.
[~Zakelly], [~roman], [~srichter] WDYT?

> notifyCheckpointAborted RPC failure can fail JM
> ---
>
> Key: FLINK-20672
> URL: https://issues.apache.org/jira/browse/FLINK-20672
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.3, 1.12.0
>Reporter: Roman Khachatryan
>Assignee: Zakelly Lan
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> pull-request-available
>
> Introduced in FLINK-8871, aborted RPC notifications are done asynchonously:
>  
> {code}
>   private void sendAbortedMessages(long checkpointId, long timeStamp) {
>   // send notification of aborted checkpoints asynchronously.
>   executor.execute(() -> {
>   // send the "abort checkpoint" messages to necessary 
> vertices.
> // ..
>   });
>   }
> {code}
> However, the executor that eventually executes this request is created as 
> follows
> {code}
>   final ScheduledExecutorService futureExecutor = 
> Executors.newScheduledThreadPool(
>   Hardware.getNumberCPUCores(),
>   new ExecutorThreadFactory("jobmanager-future"));
> {code}
> ExecutorThreadFactory uses UncaughtExceptionHandler that exits JVM on error.
> cc: [~yunta]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-20672) notifyCheckpointAborted RPC failure can fail JM

2023-11-07 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783868#comment-17783868
 ] 

Yun Tang edited comment on FLINK-20672 at 11/8/23 3:01 AM:
---

[~Zakelly] Thanks for picking up the stale tickets. However, I think this is 
not true after FLINK-23654 is resolved.


was (Author: yunta):
[~Zakelly] Thanks for picking up the stale tickets. However, I think this is 
not true after FLINK-20672 is resolved.

> notifyCheckpointAborted RPC failure can fail JM
> ---
>
> Key: FLINK-20672
> URL: https://issues.apache.org/jira/browse/FLINK-20672
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.3, 1.12.0
>Reporter: Roman Khachatryan
>Assignee: Zakelly Lan
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> pull-request-available
>
> Introduced in FLINK-8871, aborted RPC notifications are done asynchonously:
>  
> {code}
>   private void sendAbortedMessages(long checkpointId, long timeStamp) {
>   // send notification of aborted checkpoints asynchronously.
>   executor.execute(() -> {
>   // send the "abort checkpoint" messages to necessary 
> vertices.
> // ..
>   });
>   }
> {code}
> However, the executor that eventually executes this request is created as 
> follows
> {code}
>   final ScheduledExecutorService futureExecutor = 
> Executors.newScheduledThreadPool(
>   Hardware.getNumberCPUCores(),
>   new ExecutorThreadFactory("jobmanager-future"));
> {code}
> ExecutorThreadFactory uses UncaughtExceptionHandler that exits JVM on error.
> cc: [~yunta]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-20672) notifyCheckpointAborted RPC failure can fail JM

2023-11-07 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783868#comment-17783868
 ] 

Yun Tang commented on FLINK-20672:
--

[~Zakelly] Thanks for picking up the stale tickets. However, I think this is 
not true after FLINK-20672 is resolved.

> notifyCheckpointAborted RPC failure can fail JM
> ---
>
> Key: FLINK-20672
> URL: https://issues.apache.org/jira/browse/FLINK-20672
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.3, 1.12.0
>Reporter: Roman Khachatryan
>Assignee: Zakelly Lan
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> pull-request-available
>
> Introduced in FLINK-8871, aborted RPC notifications are done asynchonously:
>  
> {code}
>   private void sendAbortedMessages(long checkpointId, long timeStamp) {
>   // send notification of aborted checkpoints asynchronously.
>   executor.execute(() -> {
>   // send the "abort checkpoint" messages to necessary 
> vertices.
> // ..
>   });
>   }
> {code}
> However, the executor that eventually executes this request is created as 
> follows
> {code}
>   final ScheduledExecutorService futureExecutor = 
> Executors.newScheduledThreadPool(
>   Hardware.getNumberCPUCores(),
>   new ExecutorThreadFactory("jobmanager-future"));
> {code}
> ExecutorThreadFactory uses UncaughtExceptionHandler that exits JVM on error.
> cc: [~yunta]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-33474) ShowPlan throws undefined exception In Flink Web Submit Page

2023-11-07 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-33474.
--
Fix Version/s: 1.17.2
   1.19.0
   1.18.1
   Resolution: Fixed

merged
master: 008e1916e8bbeb18c1d06c74e2797da5a439cd47
release-1.18: 2409184456aa2d07c5bbc580916370802fb3ae8e
release-1.17: 89cbd394a6cbfce1ca685362bf9ce4cf476bca7d


> ShowPlan throws undefined exception In Flink Web Submit Page
> 
>
> Key: FLINK-33474
> URL: https://issues.apache.org/jira/browse/FLINK-33474
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Yu Chen
>Assignee: Yu Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.2, 1.19.0, 1.18.1
>
> Attachments: image-2023-11-07-13-53-08-216.png
>
>
> The exception as shown in the figure below, meanwhile, the job plan cannot be 
> displayed properly.
>  
> The root cause is that the dagreComponent is located in the nz-drawer and is 
> only loaded when the drawer is visible, so we need to wait for the drawer to 
> finish loading and then render the job plan.
> !image-2023-11-07-13-53-08-216.png|width=400,height=190!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33474) ShowPlan throws undefined exception In Flink Web Submit Page

2023-11-07 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-33474:


Assignee: Yu Chen

> ShowPlan throws undefined exception In Flink Web Submit Page
> 
>
> Key: FLINK-33474
> URL: https://issues.apache.org/jira/browse/FLINK-33474
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Yu Chen
>Assignee: Yu Chen
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-11-07-13-53-08-216.png
>
>
> The exception as shown in the figure below, meanwhile, the job plan cannot be 
> displayed properly.
>  
> The root cause is that the dagreComponent is located in the nz-drawer and is 
> only loaded when the drawer is visible, so we need to wait for the drawer to 
> finish loading and then render the job plan.
> !image-2023-11-07-13-53-08-216.png|width=400,height=190!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33433) Support invoke async-profiler on Jobmanager through REST API

2023-11-02 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-33433:


Assignee: Yu Chen

> Support invoke async-profiler on Jobmanager through REST API
> 
>
> Key: FLINK-33433
> URL: https://issues.apache.org/jira/browse/FLINK-33433
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / REST
>Affects Versions: 1.19.0
>Reporter: Yu Chen
>Assignee: Yu Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33325) FLIP-375: Built-in cross-platform powerful java profiler

2023-11-02 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-33325:


Assignee: Yu Chen

> FLIP-375: Built-in cross-platform powerful java profiler
> 
>
> Key: FLINK-33325
> URL: https://issues.apache.org/jira/browse/FLINK-33325
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST, Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Yu Chen
>Assignee: Yu Chen
>Priority: Major
>
> This is an umbrella JIRA of 
> [FLIP-375|https://cwiki.apache.org/confluence/x/64lEE]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33436) Documentation on the built-in Profiler

2023-11-02 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-33436:


Assignee: Yu Chen

> Documentation on the built-in Profiler
> --
>
> Key: FLINK-33436
> URL: https://issues.apache.org/jira/browse/FLINK-33436
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Yu Chen
>Assignee: Yu Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33434) Support invoke async-profiler on Taskmanager through REST API

2023-11-02 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-33434:


Assignee: Yu Chen

> Support invoke async-profiler on Taskmanager through REST API
> -
>
> Key: FLINK-33434
> URL: https://issues.apache.org/jira/browse/FLINK-33434
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / REST
>Affects Versions: 1.19.0
>Reporter: Yu Chen
>Assignee: Yu Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33435) The visualization and download capabilities of profiling history

2023-11-02 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-33435:


Assignee: Yu Chen

> The visualization and download capabilities of profiling history 
> -
>
> Key: FLINK-33435
> URL: https://issues.apache.org/jira/browse/FLINK-33435
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Web Frontend
>Affects Versions: 1.19.0
>Reporter: Yu Chen
>Assignee: Yu Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33355) can't reduce the parallelism from 'n' to '1' when recovering through a savepoint.

2023-10-25 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17779364#comment-17779364
 ] 

Yun Tang commented on FLINK-33355:
--

I think this is because you forgot to set the uid for each operator. Since 
`windowAll` operator could only have parallelism 1, all operators would chain 
together once you change the parallelism to 1. Please assign the operator id as 
doc 
https://nightlies.apache.org/flink/flink-docs-stable/docs/ops/state/savepoints/#assigning-operator-ids
 said.

> can't reduce the parallelism from 'n' to '1' when recovering through a 
> savepoint.
> -
>
> Key: FLINK-33355
> URL: https://issues.apache.org/jira/browse/FLINK-33355
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
> Environment: flink 1.17.1
>Reporter: zhang
>Priority: Major
>
> If the program includes operators with window, it is not possible to reduce 
> the parallelism of the operators from n to 1 when restarting from a 
> savepoint, and it will result in an error: 
> {code:java}
> //IllegalStateException: Failed to rollback to checkpoint/savepoint 
> Checkpoint Metadata. Max parallelism mismatch between checkpoint/savepoint 
> state and new program. Cannot map operator 0e059b9f403cf6f35592ab773c9408d4 
> with max parallelism 128 to new program with max parallelism 1. This 
> indicates that the program has been changed in a non-compatible way after the 
> checkpoint/savepoint. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33355) can't reduce the parallelism from 'n' to '1' when recovering through a savepoint.

2023-10-25 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17779358#comment-17779358
 ] 

Yun Tang commented on FLINK-33355:
--

[~edmond_j] How did you assign the parallelism, by setting the configuration of 
`parallelism.default`?

> can't reduce the parallelism from 'n' to '1' when recovering through a 
> savepoint.
> -
>
> Key: FLINK-33355
> URL: https://issues.apache.org/jira/browse/FLINK-33355
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
> Environment: flink 1.17.1
>Reporter: zhang
>Priority: Major
>
> If the program includes operators with window, it is not possible to reduce 
> the parallelism of the operators from n to 1 when restarting from a 
> savepoint, and it will result in an error: 
> {code:java}
> //IllegalStateException: Failed to rollback to checkpoint/savepoint 
> Checkpoint Metadata. Max parallelism mismatch between checkpoint/savepoint 
> state and new program. Cannot map operator 0e059b9f403cf6f35592ab773c9408d4 
> with max parallelism 128 to new program with max parallelism 1. This 
> indicates that the program has been changed in a non-compatible way after the 
> checkpoint/savepoint. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33355) can't reduce the parallelism from 'n' to '1' when recovering through a savepoint.

2023-10-24 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17779340#comment-17779340
 ] 

Yun Tang commented on FLINK-33355:
--

[~edmond_j] could you please share the code to reproduce this problem?

> can't reduce the parallelism from 'n' to '1' when recovering through a 
> savepoint.
> -
>
> Key: FLINK-33355
> URL: https://issues.apache.org/jira/browse/FLINK-33355
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
> Environment: flink 1.17.1
>Reporter: zhang
>Priority: Major
>
> If the program includes operators with window, it is not possible to reduce 
> the parallelism of the operators from n to 1 when restarting from a 
> savepoint, and it will result in an error: 
> {code:java}
> //IllegalStateException: Failed to rollback to checkpoint/savepoint 
> Checkpoint Metadata. Max parallelism mismatch between checkpoint/savepoint 
> state and new program. Cannot map operator 0e059b9f403cf6f35592ab773c9408d4 
> with max parallelism 128 to new program with max parallelism 1. This 
> indicates that the program has been changed in a non-compatible way after the 
> checkpoint/savepoint. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33355) can't reduce the parallelism from 'n' to '1' when recovering through a savepoint.

2023-10-24 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17779315#comment-17779315
 ] 

Yun Tang commented on FLINK-33355:
--

Changing the max-parallelism (instead of parallelism), would break the 
checkpoint compatibility, which is built by design. You can refer to 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/parallel/#setting-the-maximum-parallelism
 for more details.

> can't reduce the parallelism from 'n' to '1' when recovering through a 
> savepoint.
> -
>
> Key: FLINK-33355
> URL: https://issues.apache.org/jira/browse/FLINK-33355
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
> Environment: flink 1.17.1
>Reporter: zhang
>Priority: Major
>
> If the program includes operators with window, it is not possible to reduce 
> the parallelism of the operators from n to 1 when restarting from a 
> savepoint, and it will result in an error: 
> {code:java}
> //IllegalStateException: Failed to rollback to checkpoint/savepoint 
> Checkpoint Metadata. Max parallelism mismatch between checkpoint/savepoint 
> state and new program. Cannot map operator 0e059b9f403cf6f35592ab773c9408d4 
> with max parallelism 128 to new program with max parallelism 1. This 
> indicates that the program has been changed in a non-compatible way after the 
> checkpoint/savepoint. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33355) can't reduce the parallelism from 'n' to '1' when recovering through a savepoint.

2023-10-24 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang closed FLINK-33355.

Resolution: Information Provided

> can't reduce the parallelism from 'n' to '1' when recovering through a 
> savepoint.
> -
>
> Key: FLINK-33355
> URL: https://issues.apache.org/jira/browse/FLINK-33355
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
> Environment: flink 1.17.1
>Reporter: zhang
>Priority: Major
>
> If the program includes operators with window, it is not possible to reduce 
> the parallelism of the operators from n to 1 when restarting from a 
> savepoint, and it will result in an error: 
> {code:java}
> //IllegalStateException: Failed to rollback to checkpoint/savepoint 
> Checkpoint Metadata. Max parallelism mismatch between checkpoint/savepoint 
> state and new program. Cannot map operator 0e059b9f403cf6f35592ab773c9408d4 
> with max parallelism 128 to new program with max parallelism 1. This 
> indicates that the program has been changed in a non-compatible way after the 
> checkpoint/savepoint. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >