[jira] [Comment Edited] (FLINK-32953) [State TTL]resolve data correctness problem after ttl was changed

2023-09-11 Thread Jinzhong Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764016#comment-17764016
 ] 

Jinzhong Li edited comment on FLINK-32953 at 9/12/23 5:39 AM:
--

[~masteryhx]  Currently, the stateBackend layer 
(HeapStateBackend/RocksDBStateBackend) doesn't contain information about 
state-ttl.  To solve this problem, i think we have to pass a ttl-state restore 
transformer from TtlStateFactory to StateBackend which is like 
[StateSnapshotTransformer|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/StateSnapshotTransformer.java].
  

 

And this requires changing the interface 
KeyedStateFactory#createOrUpdateInternalState (see my [draft 
commit|https://github.com/ljz2051/flink/commit/7a058fd0f9d34a8fb42254a7ca1d6f32419c69e1]
 )

 

I think that fixing this corner case but changing the general interface of 
stateBackend layer may not be worthwhile. But at least,i think we should alert 
users to this situation in the documentation.

 

WDYT? [~masteryhx] 

 


was (Author: lijinzhong):
[~masteryhx]  Currently, the stateBackend layer 
(HeapStateBackend/RocksDBStateBackend) doesn't contain information about 
state-ttl.  To solve this problem, i think we have to pass a ttl-state restore 
transformer from TtlStateFactory to StateBackend which is like 
[StateSnapshotTransformer|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/StateSnapshotTransformer.java].
  

 

And this requires changing the interface 
KeyedStateFactory#createOrUpdateInternalState (see my [draft 
commit|[https://github.com/ljz2051/flink/commit/7a058fd0f9d34a8fb42254a7ca1d6f32419c69e1]]

 

I think that fixing this corner case but changing the general interface of 
stateBackend layer may not be worthwhile. But at least,i think we should alert 
users to this situation in the documentation.

 

WDYT? [~masteryhx] 

 

> [State TTL]resolve data correctness problem after ttl was changed 
> --
>
> Key: FLINK-32953
> URL: https://issues.apache.org/jira/browse/FLINK-32953
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Jinzhong Li
>Assignee: Jinzhong Li
>Priority: Major
>
> Because expired data is cleaned up in background on a best effort basis 
> (hashmap use INCREMENTAL_CLEANUP strategy, rocksdb use 
> ROCKSDB_COMPACTION_FILTER strategy), some expired state is often persisted 
> into snapshots.
>  
> In some scenarios, user changes the state ttl of the job and then restore job 
> from the old state. If the user adjust the state ttl from a short value to a 
> long value (eg, from 12 hours to 24 hours),  some expired data that was not 
> cleaned up will be alive after restore. Obviously this is unreasonable, and 
> may break data regulatory requirements. 
>  
> Particularly, rocksdb stateBackend may cause data correctness problems due to 
> level compaction in this case.(eg. One key has two versions at level-1 and 
> level-2,both of which are ttl expired. Then level-1 version is cleaned up by 
> compaction,  and level-2 version isn't.  If we adjust state ttl and restart 
> job, the incorrect data of level-2 will become valid after restore)
>  
> To solve this problem, I think we can
> 1) persist old state ttl into snapshot meta info; (eg. 
> RegisteredKeyValueStateBackendMetaInfo or others)
> 2) During state restore, check the size between the current ttl and old ttl;
> 3) If current ttl is longer than old ttl, we need to iterate over all data, 
> filter out expired data with old ttl, and wirte valid data into stateBackend.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32953) [State TTL]resolve data correctness problem after ttl was changed

2023-09-11 Thread Jinzhong Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764016#comment-17764016
 ] 

Jinzhong Li edited comment on FLINK-32953 at 9/12/23 5:38 AM:
--

[~masteryhx]  Currently, the stateBackend layer 
(HeapStateBackend/RocksDBStateBackend) doesn't contain information about 
state-ttl.  To solve this problem, i think we have to pass a ttl-state restore 
transformer from TtlStateFactory to StateBackend which is like 
[StateSnapshotTransformer|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/StateSnapshotTransformer.java].
  

 

And this requires changing the interface 
KeyedStateFactory#createOrUpdateInternalState (see my [draft 
commit|[https://github.com/ljz2051/flink/commit/7a058fd0f9d34a8fb42254a7ca1d6f32419c69e1]]

 

I think that fixing this corner case but changing the general interface of 
stateBackend layer may not be worthwhile. But at least,i think we should alert 
users to this situation in the documentation.

 

WDYT? [~masteryhx] 

 


was (Author: lijinzhong):
[~masteryhx]  Currently, the stateBackend layer 
(HeapStateBackend/RocksDBStateBackend) doesn't contain information about 
state-ttl.  To solve this problem, i think we have to pass a ttl-state restore 
transformer from TtlStateFactory to StateBackend which is like 
[StateSnapshotTransformer|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/StateSnapshotTransformer.java].
  

 

And this requires changing the interface 
KeyedStateFactory#createOrUpdateInternalState (see my [draft 
commit|[https://github.com/ljz2051/flink/commit/7a058fd0f9d34a8fb42254a7ca1d6f32419c69e1])]

 

I think that fixing this corner case but changing the general interface of 
stateBackend layer may not be worthwhile. But at least,i think we should alert 
users to this situation in the documentation.

 

WDYT? [~masteryhx] 

 

> [State TTL]resolve data correctness problem after ttl was changed 
> --
>
> Key: FLINK-32953
> URL: https://issues.apache.org/jira/browse/FLINK-32953
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Jinzhong Li
>Assignee: Jinzhong Li
>Priority: Major
>
> Because expired data is cleaned up in background on a best effort basis 
> (hashmap use INCREMENTAL_CLEANUP strategy, rocksdb use 
> ROCKSDB_COMPACTION_FILTER strategy), some expired state is often persisted 
> into snapshots.
>  
> In some scenarios, user changes the state ttl of the job and then restore job 
> from the old state. If the user adjust the state ttl from a short value to a 
> long value (eg, from 12 hours to 24 hours),  some expired data that was not 
> cleaned up will be alive after restore. Obviously this is unreasonable, and 
> may break data regulatory requirements. 
>  
> Particularly, rocksdb stateBackend may cause data correctness problems due to 
> level compaction in this case.(eg. One key has two versions at level-1 and 
> level-2,both of which are ttl expired. Then level-1 version is cleaned up by 
> compaction,  and level-2 version isn't.  If we adjust state ttl and restart 
> job, the incorrect data of level-2 will become valid after restore)
>  
> To solve this problem, I think we can
> 1) persist old state ttl into snapshot meta info; (eg. 
> RegisteredKeyValueStateBackendMetaInfo or others)
> 2) During state restore, check the size between the current ttl and old ttl;
> 3) If current ttl is longer than old ttl, we need to iterate over all data, 
> filter out expired data with old ttl, and wirte valid data into stateBackend.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32953) [State TTL]resolve data correctness problem after ttl was changed

2023-09-11 Thread Jinzhong Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764016#comment-17764016
 ] 

Jinzhong Li commented on FLINK-32953:
-

[~masteryhx]  Currently, the stateBackend layer 
(HeapStateBackend/RocksDBStateBackend) doesn't contain information about 
state-ttl.  To solve this problem, i think we have to pass a ttl-state restore 
transformer from TtlStateFactory to StateBackend which is like 
[StateSnapshotTransformer|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/StateSnapshotTransformer.java].
  

 

And this requires changing the interface 
KeyedStateFactory#createOrUpdateInternalState (see my [draft 
commit|[https://github.com/ljz2051/flink/commit/7a058fd0f9d34a8fb42254a7ca1d6f32419c69e1])]

 

I think that fixing this corner case but changing the general interface of 
stateBackend layer may not be worthwhile. But at least,i think we should alert 
users to this situation in the documentation.

 

WDYT? [~masteryhx] 

 

> [State TTL]resolve data correctness problem after ttl was changed 
> --
>
> Key: FLINK-32953
> URL: https://issues.apache.org/jira/browse/FLINK-32953
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Jinzhong Li
>Assignee: Jinzhong Li
>Priority: Major
>
> Because expired data is cleaned up in background on a best effort basis 
> (hashmap use INCREMENTAL_CLEANUP strategy, rocksdb use 
> ROCKSDB_COMPACTION_FILTER strategy), some expired state is often persisted 
> into snapshots.
>  
> In some scenarios, user changes the state ttl of the job and then restore job 
> from the old state. If the user adjust the state ttl from a short value to a 
> long value (eg, from 12 hours to 24 hours),  some expired data that was not 
> cleaned up will be alive after restore. Obviously this is unreasonable, and 
> may break data regulatory requirements. 
>  
> Particularly, rocksdb stateBackend may cause data correctness problems due to 
> level compaction in this case.(eg. One key has two versions at level-1 and 
> level-2,both of which are ttl expired. Then level-1 version is cleaned up by 
> compaction,  and level-2 version isn't.  If we adjust state ttl and restart 
> job, the incorrect data of level-2 will become valid after restore)
>  
> To solve this problem, I think we can
> 1) persist old state ttl into snapshot meta info; (eg. 
> RegisteredKeyValueStateBackendMetaInfo or others)
> 2) During state restore, check the size between the current ttl and old ttl;
> 3) If current ttl is longer than old ttl, we need to iterate over all data, 
> filter out expired data with old ttl, and wirte valid data into stateBackend.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #23399: [FLINK-33061][docs] Translate failure-enricher documentation to Chinese

2023-09-11 Thread via GitHub


flinkbot commented on PR #23399:
URL: https://github.com/apache/flink/pull/23399#issuecomment-1715000130

   
   ## CI report:
   
   * f9a5b318b86856634929075d2b3c37820a858e45 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] wangzzu commented on pull request #23399: [FLINK-33061][docs] Translate failure-enricher documentation to Chinese

2023-09-11 Thread via GitHub


wangzzu commented on PR #23399:
URL: https://github.com/apache/flink/pull/23399#issuecomment-1714996308

   cc @pgaref @huwh 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33061) Translate failure-enricher documentation to Chinese

2023-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33061:
---
Labels: pull-request-available  (was: )

> Translate failure-enricher documentation to Chinese
> ---
>
> Key: FLINK-33061
> URL: https://issues.apache.org/jira/browse/FLINK-33061
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Matt Wang
>Priority: Not a Priority
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] wangzzu opened a new pull request, #23399: [FLINK-33061][docs] Translate failure-enricher documentation to Chinese

2023-09-11 Thread via GitHub


wangzzu opened a new pull request, #23399:
URL: https://github.com/apache/flink/pull/23399

   
   
   ## What is the purpose of the change
   
   - Translate failure-enricher documentation to Chinese
   
   ## Brief change log
   
   - Translate failure-enricher documentation to Chinese
   
   ## Verifying this change
   
   verify this in local
   
   
![image](https://github.com/apache/flink/assets/7333163/460ee6b0-59b3-45c7-884b-fd1ade007590)
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] 1996fanrui commented on a diff in pull request #23391: [FLINK-33071][metrics,checkpointing] Log a json dump of checkpoint statistics when checkpoint complets or fails

2023-09-11 Thread via GitHub


1996fanrui commented on code in PR #23391:
URL: https://github.com/apache/flink/pull/23391#discussion_r1322399618


##
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointStatsTracker.java:
##
@@ -92,10 +113,18 @@ public class CheckpointStatsTracker {
  * progress ones.
  * @param metricGroup Metric group for exposed metrics
  */
+public CheckpointStatsTracker(int numRememberedCheckpoints, 
JobManagerJobMetricGroup metricGroup) {
+this(numRememberedCheckpoints, metricGroup, metricGroup.jobId(), 
metricGroup.jobName());
+}
 public CheckpointStatsTracker(int numRememberedCheckpoints, MetricGroup 
metricGroup) {

Review Comment:
   Checkstyple fails
   
   
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53115=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb=5836



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-32988) HiveITCase failed due to TestContainer not coming up

2023-09-11 Thread Shengkai Fang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shengkai Fang updated FLINK-32988:
--
Affects Version/s: (was: 1.16.2)
   (was: 1.17.1)

> HiveITCase failed due to TestContainer not coming up
> 
>
> Key: FLINK-32988
> URL: https://issues.apache.org/jira/browse/FLINK-32988
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.18.0, 1.19.0
>Reporter: Matthias Pohl
>Assignee: Shengkai Fang
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=52740=logs=87489130-75dc-54e4-1f45-80c30aa367a3=efbee0b1-38ac-597d-6466-1ea8fc908c50=15866
> {code}
> Aug 29 02:47:56 org.testcontainers.containers.ContainerLaunchException: 
> Container startup failed for image prestodb/hive3.1-hive:10
> Aug 29 02:47:56   at 
> org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:349)
> Aug 29 02:47:56   at 
> org.apache.flink.tests.hive.containers.HiveContainer.doStart(HiveContainer.java:81)
> Aug 29 02:47:56   at 
> org.testcontainers.containers.GenericContainer.start(GenericContainer.java:322)
> Aug 29 02:47:56   at 
> org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1131)
> Aug 29 02:47:56   at 
> org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:28)
> Aug 29 02:47:56   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Aug 29 02:47:56   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Aug 29 02:47:56   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Aug 29 02:47:56   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> Aug 29 02:47:56   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> Aug 29 02:47:56   at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> Aug 29 02:47:56   at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
> Aug 29 02:47:56   at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
> Aug 29 02:47:56   at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:147)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:127)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:90)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:55)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:102)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:54)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:114)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:86)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.DefaultLauncherSession$DelegatingLauncher.execute(DefaultLauncherSession.java:86)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.SessionPerRequestLauncher.execute(SessionPerRequestLauncher.java:53)
> Aug 29 02:47:56   at 
> org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.execute(JUnitPlatformProvider.java:188)
> Aug 29 02:47:56   at 
> org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:154)
> Aug 29 02:47:56   at 
> org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:128)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32877) Support for HTTP connect and timeout options while writes in GCS connector

2023-09-11 Thread Jayadeep Jayaraman (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764005#comment-17764005
 ] 

Jayadeep Jayaraman commented on FLINK-32877:


[~martijnvisser] - can you please take a look.

> Support for HTTP connect and timeout options while writes in GCS connector
> --
>
> Key: FLINK-32877
> URL: https://issues.apache.org/jira/browse/FLINK-32877
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Affects Versions: 1.15.3, 1.16.2, 1.17.1
>Reporter: Jayadeep Jayaraman
>Priority: Major
>  Labels: pull-request-available
>
> The current GCS connector uses the gcs java storage library and bypasses the 
> hadoop gcs connector which supports multiple http options. There are 
> situations where GCS takes longer to provide a response for a PUT operation 
> than the default value.
> This change will allow users to customize their connect time and read timeout 
> based on their application



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] ljz2051 commented on a diff in pull request #23376: [FLINK-14032][state]Make the cache size of RocksDBPriorityQueueSetFactory configurable

2023-09-11 Thread via GitHub


ljz2051 commented on code in PR #23376:
URL: https://github.com/apache/flink/pull/23376#discussion_r1322334957


##
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBOptions.java:
##
@@ -62,6 +63,16 @@ public class RocksDBOptions {
 .withDescription(
 "This determines the factory for timer 
service state implementation.");
 
+@Documentation.Section(Documentation.Sections.STATE_BACKEND_ROCKSDB)
+public static final ConfigOption 
ROCKSDB_TIMER_SERVICE_FACTORY_CACHE_SIZE =
+ConfigOptions.key("state.backend.rocksdb.timer-service.cache-size")
+.intType()
+.defaultValue(DEFAULT_CACHES_SIZE)
+.withDescription(
+String.format(
+"The cache size for rocksdb timer service 
factory. This option only has an effect when '%s' is configured to '%s'",

Review Comment:
   Good idea. I have add a simple description about this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] ljz2051 commented on a diff in pull request #23376: [FLINK-14032][state]Make the cache size of RocksDBPriorityQueueSetFactory configurable

2023-09-11 Thread via GitHub


ljz2051 commented on code in PR #23376:
URL: https://github.com/apache/flink/pull/23376#discussion_r1322334666


##
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackendBuilder.java:
##
@@ -126,6 +129,7 @@ public class RocksDBKeyedStateBackendBuilder extends 
AbstractKeyedStateBacken
 private RocksDBStateUploader injectRocksDBStateUploader; // for testing
 
 public RocksDBKeyedStateBackendBuilder(
+ReadableConfig configuration,

Review Comment:
   In the latest commit, i introduce a separate config class 
RocksDBPriorityQueueConfig which contains PriorityQueueStateType and  
cache-size.



##
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBOptions.java:
##
@@ -62,6 +63,16 @@ public class RocksDBOptions {
 .withDescription(
 "This determines the factory for timer 
service state implementation.");
 
+@Documentation.Section(Documentation.Sections.STATE_BACKEND_ROCKSDB)

Review Comment:
   done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] ljz2051 commented on a diff in pull request #23376: [FLINK-14032][state]Make the cache size of RocksDBPriorityQueueSetFactory configurable

2023-09-11 Thread via GitHub


ljz2051 commented on code in PR #23376:
URL: https://github.com/apache/flink/pull/23376#discussion_r1322333661


##
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBPriorityQueueSetFactory.java:
##
@@ -52,7 +52,9 @@
 public class RocksDBPriorityQueueSetFactory implements PriorityQueueSetFactory 
{
 
 /** Default cache size per key-group. */
-@VisibleForTesting static final int DEFAULT_CACHES_SIZE = 128; // TODO 
make this configurable
+@VisibleForTesting static final int DEFAULT_CACHES_SIZE = 128;
+
+private final int cacheSize;

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-14032) Make the cache size of RocksDBPriorityQueueSetFactory configurable

2023-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-14032:
---
Labels: auto-unassigned pull-request-available usability  (was: 
auto-unassigned usability)

> Make the cache size of RocksDBPriorityQueueSetFactory configurable
> --
>
> Key: FLINK-14032
> URL: https://issues.apache.org/jira/browse/FLINK-14032
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: Yun Tang
>Assignee: Jinzhong Li
>Priority: Major
>  Labels: auto-unassigned, pull-request-available, usability
> Fix For: 1.18.0
>
>
> Currently, the cache size of {{RocksDBPriorityQueueSetFactory}} has been set 
> as 128 and no any ways to configure this to other value. (We could increase 
> this to obtain better performance if necessary). Actually, this is also a 
> TODO for quiet a long time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] ljz2051 commented on pull request #23376: [FLINK-14032][state]Make the cache size of RocksDBPriorityQueueSetFactory configurable

2023-09-11 Thread via GitHub


ljz2051 commented on PR #23376:
URL: https://github.com/apache/flink/pull/23376#issuecomment-1714932710

   > Thanks for the pr. PTAL my comments. And you should also provide a UT Case 
to verify reading the correct config.
   
   @masteryhx  Thanks for the comments. In the latest commit,  i add the UT 
RocksDBStateBackendConfigTest#testConfigureRocksDBPriorityQueueFactoryCacheSize 
to verify cacheSize config.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-32978) Deprecate RichFunction#open(Configuration parameters)

2023-09-11 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-32978.

Resolution: Done

> Deprecate RichFunction#open(Configuration parameters)
> -
>
> Key: FLINK-32978
> URL: https://issues.apache.org/jira/browse/FLINK-32978
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Wencong Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The 
> [FLIP-344|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425231]
>  has decided that the parameter in RichFunction#open will be removed in the 
> next major version. We should deprecate it now and remove it in Flink 2.0. 
> The removal will be tracked in 
> [FLINK-6912|https://issues.apache.org/jira/browse/FLINK-6912].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32978) Deprecate RichFunction#open(Configuration parameters)

2023-09-11 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763999#comment-17763999
 ] 

Xintong Song commented on FLINK-32978:
--

master (1.19): e9353319ad625baa5b2c20fa709ab5b23f83c0f4

> Deprecate RichFunction#open(Configuration parameters)
> -
>
> Key: FLINK-32978
> URL: https://issues.apache.org/jira/browse/FLINK-32978
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Wencong Liu
>Assignee: Wencong Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The 
> [FLIP-344|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425231]
>  has decided that the parameter in RichFunction#open will be removed in the 
> next major version. We should deprecate it now and remove it in Flink 2.0. 
> The removal will be tracked in 
> [FLINK-6912|https://issues.apache.org/jira/browse/FLINK-6912].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32978) Deprecate RichFunction#open(Configuration parameters)

2023-09-11 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-32978:


Assignee: Wencong Liu

> Deprecate RichFunction#open(Configuration parameters)
> -
>
> Key: FLINK-32978
> URL: https://issues.apache.org/jira/browse/FLINK-32978
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Wencong Liu
>Assignee: Wencong Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The 
> [FLIP-344|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425231]
>  has decided that the parameter in RichFunction#open will be removed in the 
> next major version. We should deprecate it now and remove it in Flink 2.0. 
> The removal will be tracked in 
> [FLINK-6912|https://issues.apache.org/jira/browse/FLINK-6912].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] xintongsong closed pull request #23058: [FLINK-32978] Deprecate RichFunction#open(Configuration parameters)

2023-09-11 Thread via GitHub


xintongsong closed pull request #23058: [FLINK-32978] Deprecate 
RichFunction#open(Configuration parameters)
URL: https://github.com/apache/flink/pull/23058


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-32870) Reading multiple small buffers by reading and slicing one large buffer for tiered storage

2023-09-11 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-32870.

Fix Version/s: 1.18.0
   Resolution: Fixed

- master(1.19): ade583cf80478ac60e9b83fc0a97f36bc2b26f1c
- release-1.18: d100ab65367fe0b3d74a9901bcaaa26049c996b0

> Reading multiple small buffers by reading and slicing one large buffer for 
> tiered storage
> -
>
> Key: FLINK-32870
> URL: https://issues.apache.org/jira/browse/FLINK-32870
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.18.0
>Reporter: Yuxin Tan
>Assignee: Yuxin Tan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> Currently, when the file reader of tiered storage loads data from the disk 
> file, it reads data in buffer granularity. Before compression, each buffer is 
> 32K by default. After compressed, the size will become smaller (may less than 
> 5K), which is pretty small for the network buffer and the file IO. 
> We should read multiple small buffers by reading and slicing one large buffer 
> to decrease the buffer competition and the file IO, leading to better 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] xintongsong closed pull request #23255: [FLINK-32870][network] Tiered storage supports reading multiple small buffers by reading and slicing one large buffer

2023-09-11 Thread via GitHub


xintongsong closed pull request #23255: [FLINK-32870][network] Tiered storage 
supports reading multiple small buffers by reading and slicing one large buffer
URL: https://github.com/apache/flink/pull/23255


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-33011) Operator deletes HA data unexpectedly

2023-09-11 Thread Gyula Fora (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora reassigned FLINK-33011:
--

Assignee: Gyula Fora

> Operator deletes HA data unexpectedly
> -
>
> Key: FLINK-33011
> URL: https://issues.apache.org/jira/browse/FLINK-33011
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: 1.17.1, kubernetes-operator-1.6.0
> Environment: Flink: 1.17.1
> Flink Kubernetes Operator: 1.6.0
>Reporter: Ruibin Xing
>Assignee: Gyula Fora
>Priority: Critical
> Attachments: flink_operator_logs_0831.csv
>
>
> We encountered a problem where the operator unexpectedly deleted HA data.
> The timeline is as follows:
> 12:08 We submitted the first spec, which suspended the job with savepoint 
> upgrade mode.
> 12:08 The job was suspended, while the HA data was preserved, and the log 
> showed the observed job deployment status was MISSING.
> 12:10 We submitted the second spec, which deployed the job with the last 
> state upgrade mode.
> 12:10 Logs showed the operator deleted both the Flink deployment and the HA 
> data again.
> 12:10 The job failed to start because the HA data was missing.
> According to the log, the deletion was triggered by 
> https://github.com/apache/flink-kubernetes-operator/blob/a728ba768e20236184e2b9e9e45163304b8b196c/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconciler.java#L168
> I think this would only be triggered if the job deployment status wasn't 
> MISSING. But the log before the deletion showed the observed job status was 
> MISSING at that moment.
> Related logs:
>  
> {code:java}
> 2023-08-30 12:08:48.190 + o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][default/pipeline-pipeline-se-3] Cluster shutdown completed.
> 2023-08-30 12:10:27.010 + o.a.f.k.o.o.d.ApplicationObserver [INFO 
> ][default/pipeline-pipeline-se-3] Observing JobManager deployment. Previous 
> status: MISSING
> 2023-08-30 12:10:27.533 + o.a.f.k.o.l.AuditUtils         [INFO 
> ][default/pipeline-pipeline-se-3] >>> Event  | Info    | SPECCHANGED     | 
> UPGRADE change(s) detected (Diff: FlinkDeploymentSpec[image : 
> docker-registry.randomcompany.com/octopus/pipeline-pipeline-online:0835137c-362
>  -> 
> docker-registry.randomcompany.com/octopus/pipeline-pipeline-online:23db7ae8-365,
>  podTemplate.metadata.labels.app.kubernetes.io~1version : 
> 0835137cd803b7258695eb53a6ec520cb62a48a7 -> 
> 23db7ae84bdab8d91fa527fe2f8f2fce292d0abc, job.state : suspended -> running, 
> job.upgradeMode : last-state -> savepoint, restartNonce : 1545 -> 1547]), 
> starting reconciliation.
> 2023-08-30 12:10:27.679 + o.a.f.k.o.s.NativeFlinkService [INFO 
> ][default/pipeline-pipeline-se-3] Deleting JobManager deployment and HA 
> metadata.
> {code}
> A more complete log file is attached. Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33011) Operator deletes HA data unexpectedly

2023-09-11 Thread Gyula Fora (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora updated FLINK-33011:
---
Priority: Blocker  (was: Critical)

> Operator deletes HA data unexpectedly
> -
>
> Key: FLINK-33011
> URL: https://issues.apache.org/jira/browse/FLINK-33011
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: 1.17.1, kubernetes-operator-1.6.0
> Environment: Flink: 1.17.1
> Flink Kubernetes Operator: 1.6.0
>Reporter: Ruibin Xing
>Assignee: Gyula Fora
>Priority: Blocker
> Attachments: flink_operator_logs_0831.csv
>
>
> We encountered a problem where the operator unexpectedly deleted HA data.
> The timeline is as follows:
> 12:08 We submitted the first spec, which suspended the job with savepoint 
> upgrade mode.
> 12:08 The job was suspended, while the HA data was preserved, and the log 
> showed the observed job deployment status was MISSING.
> 12:10 We submitted the second spec, which deployed the job with the last 
> state upgrade mode.
> 12:10 Logs showed the operator deleted both the Flink deployment and the HA 
> data again.
> 12:10 The job failed to start because the HA data was missing.
> According to the log, the deletion was triggered by 
> https://github.com/apache/flink-kubernetes-operator/blob/a728ba768e20236184e2b9e9e45163304b8b196c/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconciler.java#L168
> I think this would only be triggered if the job deployment status wasn't 
> MISSING. But the log before the deletion showed the observed job status was 
> MISSING at that moment.
> Related logs:
>  
> {code:java}
> 2023-08-30 12:08:48.190 + o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][default/pipeline-pipeline-se-3] Cluster shutdown completed.
> 2023-08-30 12:10:27.010 + o.a.f.k.o.o.d.ApplicationObserver [INFO 
> ][default/pipeline-pipeline-se-3] Observing JobManager deployment. Previous 
> status: MISSING
> 2023-08-30 12:10:27.533 + o.a.f.k.o.l.AuditUtils         [INFO 
> ][default/pipeline-pipeline-se-3] >>> Event  | Info    | SPECCHANGED     | 
> UPGRADE change(s) detected (Diff: FlinkDeploymentSpec[image : 
> docker-registry.randomcompany.com/octopus/pipeline-pipeline-online:0835137c-362
>  -> 
> docker-registry.randomcompany.com/octopus/pipeline-pipeline-online:23db7ae8-365,
>  podTemplate.metadata.labels.app.kubernetes.io~1version : 
> 0835137cd803b7258695eb53a6ec520cb62a48a7 -> 
> 23db7ae84bdab8d91fa527fe2f8f2fce292d0abc, job.state : suspended -> running, 
> job.upgradeMode : last-state -> savepoint, restartNonce : 1545 -> 1547]), 
> starting reconciliation.
> 2023-08-30 12:10:27.679 + o.a.f.k.o.s.NativeFlinkService [INFO 
> ][default/pipeline-pipeline-se-3] Deleting JobManager deployment and HA 
> metadata.
> {code}
> A more complete log file is attached. Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33011) Operator deletes HA data unexpectedly

2023-09-11 Thread Gyula Fora (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora updated FLINK-33011:
---
Priority: Critical  (was: Major)

> Operator deletes HA data unexpectedly
> -
>
> Key: FLINK-33011
> URL: https://issues.apache.org/jira/browse/FLINK-33011
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: 1.17.1, kubernetes-operator-1.6.0
> Environment: Flink: 1.17.1
> Flink Kubernetes Operator: 1.6.0
>Reporter: Ruibin Xing
>Priority: Critical
> Attachments: flink_operator_logs_0831.csv
>
>
> We encountered a problem where the operator unexpectedly deleted HA data.
> The timeline is as follows:
> 12:08 We submitted the first spec, which suspended the job with savepoint 
> upgrade mode.
> 12:08 The job was suspended, while the HA data was preserved, and the log 
> showed the observed job deployment status was MISSING.
> 12:10 We submitted the second spec, which deployed the job with the last 
> state upgrade mode.
> 12:10 Logs showed the operator deleted both the Flink deployment and the HA 
> data again.
> 12:10 The job failed to start because the HA data was missing.
> According to the log, the deletion was triggered by 
> https://github.com/apache/flink-kubernetes-operator/blob/a728ba768e20236184e2b9e9e45163304b8b196c/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconciler.java#L168
> I think this would only be triggered if the job deployment status wasn't 
> MISSING. But the log before the deletion showed the observed job status was 
> MISSING at that moment.
> Related logs:
>  
> {code:java}
> 2023-08-30 12:08:48.190 + o.a.f.k.o.s.AbstractFlinkService [INFO 
> ][default/pipeline-pipeline-se-3] Cluster shutdown completed.
> 2023-08-30 12:10:27.010 + o.a.f.k.o.o.d.ApplicationObserver [INFO 
> ][default/pipeline-pipeline-se-3] Observing JobManager deployment. Previous 
> status: MISSING
> 2023-08-30 12:10:27.533 + o.a.f.k.o.l.AuditUtils         [INFO 
> ][default/pipeline-pipeline-se-3] >>> Event  | Info    | SPECCHANGED     | 
> UPGRADE change(s) detected (Diff: FlinkDeploymentSpec[image : 
> docker-registry.randomcompany.com/octopus/pipeline-pipeline-online:0835137c-362
>  -> 
> docker-registry.randomcompany.com/octopus/pipeline-pipeline-online:23db7ae8-365,
>  podTemplate.metadata.labels.app.kubernetes.io~1version : 
> 0835137cd803b7258695eb53a6ec520cb62a48a7 -> 
> 23db7ae84bdab8d91fa527fe2f8f2fce292d0abc, job.state : suspended -> running, 
> job.upgradeMode : last-state -> savepoint, restartNonce : 1545 -> 1547]), 
> starting reconciliation.
> 2023-08-30 12:10:27.679 + o.a.f.k.o.s.NativeFlinkService [INFO 
> ][default/pipeline-pipeline-se-3] Deleting JobManager deployment and HA 
> metadata.
> {code}
> A more complete log file is attached. Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-connector-elasticsearch] liyubin117 commented on pull request #39: [FLINK-29042][Connectors/ElasticSearch] Support lookup join for es connector

2023-09-11 Thread via GitHub


liyubin117 commented on PR #39:
URL: 
https://github.com/apache/flink-connector-elasticsearch/pull/39#issuecomment-1714909304

   @snuyanzin  @MartijnVisser Thanks a lot for your patient work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] WencongLiu closed pull request #23375: Dev 0908 1101

2023-09-11 Thread via GitHub


WencongLiu closed pull request #23375: Dev 0908 1101
URL: https://github.com/apache/flink/pull/23375


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #23398: [hotfix][network] Fix some bugs of Hybrid Shuffle

2023-09-11 Thread via GitHub


flinkbot commented on PR #23398:
URL: https://github.com/apache/flink/pull/23398#issuecomment-1714902279

   
   ## CI report:
   
   * 2c46fa10143f17ad1a2946536046858556f14771 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #23397: [hotfix][network] Fix some bugs of Hybrid Shuffle

2023-09-11 Thread via GitHub


flinkbot commented on PR #23397:
URL: https://github.com/apache/flink/pull/23397#issuecomment-1714902192

   
   ## CI report:
   
   * edbcd0e920025a70f0aa770daeb8e0caa12d9c72 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] WencongLiu opened a new pull request, #23398: [hotfix][network] Fix some bugs of Hybrid Shuffle

2023-09-11 Thread via GitHub


WencongLiu opened a new pull request, #23398:
URL: https://github.com/apache/flink/pull/23398

   ## What is the purpose of the change
   
   [hotfix][network] Fix the memory usage threshold that triggers disk writing 
in Hybrid Shuffle
   [hotfix][network] Optimize the backlog calculation logic in Hybrid Shuffle


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] WencongLiu opened a new pull request, #23397: [hotfix][network] Fix some bugs of Hybrid Shuffle

2023-09-11 Thread via GitHub


WencongLiu opened a new pull request, #23397:
URL: https://github.com/apache/flink/pull/23397

   ## What is the purpose of the change
   
   [hotfix][network] Fix the memory usage threshold that triggers disk writing 
in Hybrid Shuffle
   [hotfix][network] Optimize the backlog calculation logic in Hybrid Shuffle


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-33063) udaf with user defined pojo object throw error while generate record equaliser

2023-09-11 Thread Shengkai Fang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shengkai Fang closed FLINK-33063.
-
Resolution: Fixed

> udaf with user defined pojo object throw error while generate record equaliser
> --
>
> Key: FLINK-33063
> URL: https://issues.apache.org/jira/browse/FLINK-33063
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.18.0
>Reporter: Yunhong Zheng
>Assignee: Yunhong Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.19.0
>
>
> Udaf with user define pojo object throw error while generating record 
> equaliser: 
> When user create an udaf while recore contains user define complex pojo 
> object (like List or Map). The codegen 
> will throw error while generating record equaliser, the error is:
> {code:java}
> A method named "compareTo" is not declared in any enclosing class nor any 
> subtype, nor through a static import.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33063) udaf with user defined pojo object throw error while generate record equaliser

2023-09-11 Thread Shengkai Fang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763994#comment-17763994
 ] 

Shengkai Fang commented on FLINK-33063:
---

Merged into release-1.18: f2584a1df364a14ff50b5a52fe7cf5e38d4cdc9a


> udaf with user defined pojo object throw error while generate record equaliser
> --
>
> Key: FLINK-33063
> URL: https://issues.apache.org/jira/browse/FLINK-33063
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.18.0
>Reporter: Yunhong Zheng
>Assignee: Yunhong Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.19.0
>
>
> Udaf with user define pojo object throw error while generating record 
> equaliser: 
> When user create an udaf while recore contains user define complex pojo 
> object (like List or Map). The codegen 
> will throw error while generating record equaliser, the error is:
> {code:java}
> A method named "compareTo" is not declared in any enclosing class nor any 
> subtype, nor through a static import.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32952) Scan reuse with readable metadata and watermark push down will get wrong watermark

2023-09-11 Thread Shengkai Fang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shengkai Fang closed FLINK-32952.
-
Resolution: Fixed

> Scan reuse with readable metadata and watermark push down will get wrong 
> watermark 
> ---
>
> Key: FLINK-32952
> URL: https://issues.apache.org/jira/browse/FLINK-32952
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.18.0
>Reporter: Yunhong Zheng
>Assignee: Yunhong Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.19.0
>
>
> Scan reuse with readable metadata and watermark push down will get wrong 
> result. In class ScanReuser, we will re-build watermark spec after projection 
> push down. However, we will get wrong index while try to find index in new 
> source type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] fsk119 merged pull request #23388: [FLINK-33063][table-runtime] Fix udaf with complex user defined pojo object throw error while generate record equaliser

2023-09-11 Thread via GitHub


fsk119 merged PR #23388:
URL: https://github.com/apache/flink/pull/23388


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-32952) Scan reuse with readable metadata and watermark push down will get wrong watermark

2023-09-11 Thread Shengkai Fang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763993#comment-17763993
 ] 

Shengkai Fang commented on FLINK-32952:
---

Merged into release-1.18: f66679bfe5f5b344eec71a7579504762cc3c04ae

> Scan reuse with readable metadata and watermark push down will get wrong 
> watermark 
> ---
>
> Key: FLINK-32952
> URL: https://issues.apache.org/jira/browse/FLINK-32952
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.18.0
>Reporter: Yunhong Zheng
>Assignee: Yunhong Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.19.0
>
>
> Scan reuse with readable metadata and watermark push down will get wrong 
> result. In class ScanReuser, we will re-build watermark spec after projection 
> push down. However, we will get wrong index while try to find index in new 
> source type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] fsk119 merged pull request #23338: [FLINK-32952][table-planner] Fix scan reuse with readable metadata and watermark push down get wrong watermark error

2023-09-11 Thread via GitHub


fsk119 merged PR #23338:
URL: https://github.com/apache/flink/pull/23338


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-32488) Introduce configuration to control ExecutionGraph cache in REST API

2023-09-11 Thread Yangze Guo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763992#comment-17763992
 ] 

Yangze Guo commented on FLINK-32488:


Thanks for the PR [~hejufang001] and [~xiangyu0xf] . I'm now considering not 
introducing a configuration for it. As we all know, the configuration of flink 
become massive which would harm the usability. How about make the refresh 
interval adaptive to the total number of execution cache in job manager?

> Introduce configuration to control ExecutionGraph cache in REST API
> ---
>
> Key: FLINK-32488
> URL: https://issues.apache.org/jira/browse/FLINK-32488
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST
>Affects Versions: 1.16.2, 1.17.1
>Reporter: Hong Liang Teoh
>Assignee: Jufang He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> *What*
> Currently, REST handlers that inherit from AbstractExecutionGraphHandler 
> serve information derived from a cached ExecutionGraph.
> This ExecutionGraph cache currently derives it's timeout from 
> {*}web.refresh-interval{*}. The *web.refresh-interval* controls both the 
> refresh rate of the Flink dashboard and the ExecutionGraph cache timeout. 
> We should introduce a new configuration to control the ExecutionGraph cache, 
> namely {*}rest.cache.execution-graph.expiry{*}.
> *Why*
> Sharing configuration between REST handler and Flink dashboard is a sign that 
> we are coupling the two. 
> Ideally, we want our REST API behaviour to independent of the Flink dashboard 
> (e.g. supports programmatic access).
>  
> Mailing list discussion: 
> https://lists.apache.org/thread/7o330hfyoqqkkrfhtvz3kp448jcspjrm
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32488) Introduce configuration to control ExecutionGraph cache in REST API

2023-09-11 Thread xiangyu feng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763989#comment-17763989
 ] 

xiangyu feng edited comment on FLINK-32488 at 9/12/23 2:51 AM:
---

Hi [~guoyangze] , [~hejufang001] has created a 
PR([https://github.com/apache/flink/pull/23387)|https://github.com/apache/flink/pull/23387,]
 for this issue would you kindly assign this issue to him?


was (Author: JIRAUSER301129):
Hi [~guoyangze] , [~hejufang001] has created a 
[PR|[https://github.com/apache/flink/pull/23387|https://github.com/apache/flink/pull/23387,]]
 for this issue would you kindly assign this issue to him?

> Introduce configuration to control ExecutionGraph cache in REST API
> ---
>
> Key: FLINK-32488
> URL: https://issues.apache.org/jira/browse/FLINK-32488
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST
>Affects Versions: 1.16.2, 1.17.1
>Reporter: Hong Liang Teoh
>Assignee: Jufang He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> *What*
> Currently, REST handlers that inherit from AbstractExecutionGraphHandler 
> serve information derived from a cached ExecutionGraph.
> This ExecutionGraph cache currently derives it's timeout from 
> {*}web.refresh-interval{*}. The *web.refresh-interval* controls both the 
> refresh rate of the Flink dashboard and the ExecutionGraph cache timeout. 
> We should introduce a new configuration to control the ExecutionGraph cache, 
> namely {*}rest.cache.execution-graph.expiry{*}.
> *Why*
> Sharing configuration between REST handler and Flink dashboard is a sign that 
> we are coupling the two. 
> Ideally, we want our REST API behaviour to independent of the Flink dashboard 
> (e.g. supports programmatic access).
>  
> Mailing list discussion: 
> https://lists.apache.org/thread/7o330hfyoqqkkrfhtvz3kp448jcspjrm
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32488) Introduce configuration to control ExecutionGraph cache in REST API

2023-09-11 Thread Yangze Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yangze Guo reassigned FLINK-32488:
--

Assignee: Jufang He  (was: xiangyu feng)

> Introduce configuration to control ExecutionGraph cache in REST API
> ---
>
> Key: FLINK-32488
> URL: https://issues.apache.org/jira/browse/FLINK-32488
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST
>Affects Versions: 1.16.2, 1.17.1
>Reporter: Hong Liang Teoh
>Assignee: Jufang He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> *What*
> Currently, REST handlers that inherit from AbstractExecutionGraphHandler 
> serve information derived from a cached ExecutionGraph.
> This ExecutionGraph cache currently derives it's timeout from 
> {*}web.refresh-interval{*}. The *web.refresh-interval* controls both the 
> refresh rate of the Flink dashboard and the ExecutionGraph cache timeout. 
> We should introduce a new configuration to control the ExecutionGraph cache, 
> namely {*}rest.cache.execution-graph.expiry{*}.
> *Why*
> Sharing configuration between REST handler and Flink dashboard is a sign that 
> we are coupling the two. 
> Ideally, we want our REST API behaviour to independent of the Flink dashboard 
> (e.g. supports programmatic access).
>  
> Mailing list discussion: 
> https://lists.apache.org/thread/7o330hfyoqqkkrfhtvz3kp448jcspjrm
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32488) Introduce configuration to control ExecutionGraph cache in REST API

2023-09-11 Thread xiangyu feng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763989#comment-17763989
 ] 

xiangyu feng commented on FLINK-32488:
--

Hi [~guoyangze] , [~hejufang001] has created a 
[PR|[https://github.com/apache/flink/pull/23387|https://github.com/apache/flink/pull/23387,]]
 for this issue would you kindly assign this issue to him?

> Introduce configuration to control ExecutionGraph cache in REST API
> ---
>
> Key: FLINK-32488
> URL: https://issues.apache.org/jira/browse/FLINK-32488
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST
>Affects Versions: 1.16.2, 1.17.1
>Reporter: Hong Liang Teoh
>Assignee: xiangyu feng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> *What*
> Currently, REST handlers that inherit from AbstractExecutionGraphHandler 
> serve information derived from a cached ExecutionGraph.
> This ExecutionGraph cache currently derives it's timeout from 
> {*}web.refresh-interval{*}. The *web.refresh-interval* controls both the 
> refresh rate of the Flink dashboard and the ExecutionGraph cache timeout. 
> We should introduce a new configuration to control the ExecutionGraph cache, 
> namely {*}rest.cache.execution-graph.expiry{*}.
> *Why*
> Sharing configuration between REST handler and Flink dashboard is a sign that 
> we are coupling the two. 
> Ideally, we want our REST API behaviour to independent of the Flink dashboard 
> (e.g. supports programmatic access).
>  
> Mailing list discussion: 
> https://lists.apache.org/thread/7o330hfyoqqkkrfhtvz3kp448jcspjrm
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #23396: Cherry pick to release-1.18

2023-09-11 Thread via GitHub


flinkbot commented on PR #23396:
URL: https://github.com/apache/flink/pull/23396#issuecomment-1714856384

   
   ## CI report:
   
   * 7b43c3eacd85d79286bc47629e738f7e674a6374 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] fsk119 opened a new pull request, #23396: Cherry pick to release-1.18

2023-09-11 Thread via GitHub


fsk119 opened a new pull request, #23396:
URL: https://github.com/apache/flink/pull/23396

   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-32988) HiveITCase failed due to TestContainer not coming up

2023-09-11 Thread Shengkai Fang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763984#comment-17763984
 ] 

Shengkai Fang commented on FLINK-32988:
---

Merged into master: 649b7fe197c8b03cce9595adcfea33c8d708a8b4

> HiveITCase failed due to TestContainer not coming up
> 
>
> Key: FLINK-32988
> URL: https://issues.apache.org/jira/browse/FLINK-32988
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.16.2, 1.18.0, 1.17.1, 1.19.0
>Reporter: Matthias Pohl
>Assignee: Shengkai Fang
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=52740=logs=87489130-75dc-54e4-1f45-80c30aa367a3=efbee0b1-38ac-597d-6466-1ea8fc908c50=15866
> {code}
> Aug 29 02:47:56 org.testcontainers.containers.ContainerLaunchException: 
> Container startup failed for image prestodb/hive3.1-hive:10
> Aug 29 02:47:56   at 
> org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:349)
> Aug 29 02:47:56   at 
> org.apache.flink.tests.hive.containers.HiveContainer.doStart(HiveContainer.java:81)
> Aug 29 02:47:56   at 
> org.testcontainers.containers.GenericContainer.start(GenericContainer.java:322)
> Aug 29 02:47:56   at 
> org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1131)
> Aug 29 02:47:56   at 
> org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:28)
> Aug 29 02:47:56   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Aug 29 02:47:56   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Aug 29 02:47:56   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Aug 29 02:47:56   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> Aug 29 02:47:56   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> Aug 29 02:47:56   at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> Aug 29 02:47:56   at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
> Aug 29 02:47:56   at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
> Aug 29 02:47:56   at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:147)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:127)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:90)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:55)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:102)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:54)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:114)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:86)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.DefaultLauncherSession$DelegatingLauncher.execute(DefaultLauncherSession.java:86)
> Aug 29 02:47:56   at 
> org.junit.platform.launcher.core.SessionPerRequestLauncher.execute(SessionPerRequestLauncher.java:53)
> Aug 29 02:47:56   at 
> org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.execute(JUnitPlatformProvider.java:188)
> Aug 29 02:47:56   at 
> org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:154)
> Aug 29 02:47:56   at 
> org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:128)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] fsk119 closed pull request #23370: [FLINK-32731][e2e] Fix NameNode uses port 9870 in hadoop3

2023-09-11 Thread via GitHub


fsk119 closed pull request #23370: [FLINK-32731][e2e] Fix NameNode uses port 
9870 in hadoop3
URL: https://github.com/apache/flink/pull/23370


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] elkhand commented on a diff in pull request #23383: [FLINK-32885 ] [flink-clients] Refactoring: Moving UrlPrefixDecorator into flink-clients so it can be used by RestClusterClient for

2023-09-11 Thread via GitHub


elkhand commented on code in PR #23383:
URL: https://github.com/apache/flink/pull/23383#discussion_r132058


##
flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/rest/header/util/SQLGatewayUrlPrefixDecorator.java:
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.gateway.rest.header.util;
+
+import org.apache.flink.client.program.rest.UrlPrefixDecorator;
+import org.apache.flink.runtime.rest.messages.MessageHeaders;
+import org.apache.flink.runtime.rest.messages.MessageParameters;
+import org.apache.flink.runtime.rest.messages.RequestBody;
+import org.apache.flink.runtime.rest.messages.ResponseBody;
+import org.apache.flink.runtime.rest.versioning.RestAPIVersion;
+import org.apache.flink.table.gateway.rest.util.SqlGatewayRestAPIVersion;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.stream.Collectors;
+
+/**
+ * This class decorates the URL of the original message headers by adding a 
prefix. The purpose of
+ * this decorator is to provide a consistent URL prefix for all the REST API 
endpoints while
+ * preserving the behavior of the original message headers.
+ *
+ * @param  the type of the request body
+ * @param  the type of the response body
+ * @param  the type of the message parameters
+ */
+public class SQLGatewayUrlPrefixDecorator<
+R extends RequestBody, P extends ResponseBody, M extends 
MessageParameters>
+extends UrlPrefixDecorator {

Review Comment:
   Thanks, updated correspondingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-33073) Implement end-to-end tests for the Kinesis Streams Sink

2023-09-11 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-33073:
---

Assignee: Hong Liang Teoh

> Implement end-to-end tests for the Kinesis Streams Sink
> ---
>
> Key: FLINK-33073
> URL: https://issues.apache.org/jira/browse/FLINK-33073
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / AWS
>Reporter: Hong Liang Teoh
>Assignee: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>
> *What*
> Implement end-to-end tests for KinesisStreamsSink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33072) Implement end-to-end tests for AWS Kinesis Connectors

2023-09-11 Thread Hong Liang Teoh (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh updated FLINK-33072:

Description: 
*What*

We want to implement end-to-end tests that target real Kinesis Data Streams.

*Why*

This solidifies our testing to ensure we pick up any integration issues with 
Kinesis Data Streams API.

We especially want to test happy cases and failure cases to ensure those cases 
are handled as expected by the KDS connector.

 

Reference: https://issues.apache.org/jira/browse/INFRA-24474

 

  was:
*What*

We want to implement end-to-end tests that target real Kinesis Data Streams.

*Why*

This solidifies our testing to ensure we pick up any integration issues with 
Kinesis Data Streams API.

We especially want to test happy cases and failure cases to ensure those cases 
are handled as expected by the KDS connector.

 


> Implement end-to-end tests for AWS Kinesis Connectors
> -
>
> Key: FLINK-33072
> URL: https://issues.apache.org/jira/browse/FLINK-33072
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / AWS
>Reporter: Hong Liang Teoh
>Priority: Major
> Fix For: 2.0.0
>
>
> *What*
> We want to implement end-to-end tests that target real Kinesis Data Streams.
> *Why*
> This solidifies our testing to ensure we pick up any integration issues with 
> Kinesis Data Streams API.
> We especially want to test happy cases and failure cases to ensure those 
> cases are handled as expected by the KDS connector.
>  
> Reference: https://issues.apache.org/jira/browse/INFRA-24474
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33073) Implement end-to-end tests for the Kinesis Streams Sink

2023-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33073:
---
Labels: pull-request-available  (was: )

> Implement end-to-end tests for the Kinesis Streams Sink
> ---
>
> Key: FLINK-33073
> URL: https://issues.apache.org/jira/browse/FLINK-33073
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / AWS
>Reporter: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>
> *What*
> Implement end-to-end tests for KinesisStreamsSink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-connector-aws] hlteoh37 opened a new pull request, #94: [FLINK-33073][Connectors/AWS] Implement end-to-end tests for KinesisS…

2023-09-11 Thread via GitHub


hlteoh37 opened a new pull request, #94:
URL: https://github.com/apache/flink-connector-aws/pull/94

   …treamsSink
   
   ## Purpose of the change
   Implement end-to-end tests for KinesisStreamsSink.
   
   ## Verifying this change
   This change added tests and can be verified as follows:
   - Added end-to-end tests
   
   ## Significant changes
   *(Please check any boxes [x] if the answer is "yes". You can first publish 
the PR and check them afterwards, for convenience.)*
   - [ ] Dependencies have been added or upgraded
   - [ ] Public API has been changed (Public API is any class annotated with 
`@Public(Evolving)`)
   - [ ] Serializers have been changed
   - [ ] New feature has been introduced
 - If yes, how is this documented? (not applicable / docs / JavaDocs / not 
documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-33073) Implement end-to-end tests for the Kinesis Streams Sink

2023-09-11 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-33073:
---

 Summary: Implement end-to-end tests for the Kinesis Streams Sink
 Key: FLINK-33073
 URL: https://issues.apache.org/jira/browse/FLINK-33073
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / AWS
Reporter: Hong Liang Teoh
 Fix For: 2.0.0


*What*

Implement end-to-end tests for KinesisStreamsSink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33072) Implement end-to-end tests for AWS Kinesis Connectors

2023-09-11 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-33072:
---

 Summary: Implement end-to-end tests for AWS Kinesis Connectors
 Key: FLINK-33072
 URL: https://issues.apache.org/jira/browse/FLINK-33072
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / AWS
Reporter: Hong Liang Teoh
 Fix For: 2.0.0


*What*

We want to implement end-to-end tests that target real Kinesis Data Streams.

*Why*

This solidifies our testing to ensure we pick up any integration issues with 
Kinesis Data Streams API.

We especially want to test happy cases and failure cases to ensure those cases 
are handled as expected by the KDS connector.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] tweise commented on a diff in pull request #23383: [FLINK-32885 ] [flink-clients] Refactoring: Moving UrlPrefixDecorator into flink-clients so it can be used by RestClusterClient for

2023-09-11 Thread via GitHub


tweise commented on code in PR #23383:
URL: https://github.com/apache/flink/pull/23383#discussion_r1322170417


##
flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/rest/header/util/SQLGatewayUrlPrefixDecorator.java:
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.gateway.rest.header.util;
+
+import org.apache.flink.client.program.rest.UrlPrefixDecorator;
+import org.apache.flink.runtime.rest.messages.MessageHeaders;
+import org.apache.flink.runtime.rest.messages.MessageParameters;
+import org.apache.flink.runtime.rest.messages.RequestBody;
+import org.apache.flink.runtime.rest.messages.ResponseBody;
+import org.apache.flink.runtime.rest.versioning.RestAPIVersion;
+import org.apache.flink.table.gateway.rest.util.SqlGatewayRestAPIVersion;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.stream.Collectors;
+
+/**
+ * This class decorates the URL of the original message headers by adding a 
prefix. The purpose of
+ * this decorator is to provide a consistent URL prefix for all the REST API 
endpoints while
+ * preserving the behavior of the original message headers.
+ *
+ * @param  the type of the request body
+ * @param  the type of the response body
+ * @param  the type of the message parameters
+ */
+public class SQLGatewayUrlPrefixDecorator<
+R extends RequestBody, P extends ResponseBody, M extends 
MessageParameters>
+extends UrlPrefixDecorator {

Review Comment:
   +1, that would probably make it a bit easier to understand



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] elkhand commented on a diff in pull request #23383: [FLINK-32885 ] [flink-clients] Refactoring: Moving UrlPrefixDecorator into flink-clients so it can be used by RestClusterClient for

2023-09-11 Thread via GitHub


elkhand commented on code in PR #23383:
URL: https://github.com/apache/flink/pull/23383#discussion_r1322163343


##
flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/rest/header/util/SQLGatewayUrlPrefixDecorator.java:
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.gateway.rest.header.util;
+
+import org.apache.flink.client.program.rest.UrlPrefixDecorator;
+import org.apache.flink.runtime.rest.messages.MessageHeaders;
+import org.apache.flink.runtime.rest.messages.MessageParameters;
+import org.apache.flink.runtime.rest.messages.RequestBody;
+import org.apache.flink.runtime.rest.messages.ResponseBody;
+import org.apache.flink.runtime.rest.versioning.RestAPIVersion;
+import org.apache.flink.table.gateway.rest.util.SqlGatewayRestAPIVersion;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.stream.Collectors;
+
+/**
+ * This class decorates the URL of the original message headers by adding a 
prefix. The purpose of
+ * this decorator is to provide a consistent URL prefix for all the REST API 
endpoints while
+ * preserving the behavior of the original message headers.
+ *
+ * @param  the type of the request body
+ * @param  the type of the response body
+ * @param  the type of the message parameters
+ */
+public class SQLGatewayUrlPrefixDecorator<
+R extends RequestBody, P extends ResponseBody, M extends 
MessageParameters>
+extends UrlPrefixDecorator {

Review Comment:
   @tweise @mxm , I can get rid off `MonitoringApiMessageHeaders`, and make 
`getSupportedAPIVersions` a public method of `UrlPrefixDecorator`. Please let 
me know if you want me to do so. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] elkhand commented on a diff in pull request #23383: [FLINK-32885 ] [flink-clients] Refactoring: Moving UrlPrefixDecorator into flink-clients so it can be used by RestClusterClient for

2023-09-11 Thread via GitHub


elkhand commented on code in PR #23383:
URL: https://github.com/apache/flink/pull/23383#discussion_r1322162928


##
flink-clients/src/main/java/org/apache/flink/client/program/rest/UrlPrefixDecorator.java:
##
@@ -41,7 +40,7 @@
  */
 public class UrlPrefixDecorator<
 R extends RequestBody, P extends ResponseBody, M extends 
MessageParameters>
-implements SqlGatewayMessageHeaders {

Review Comment:
   `SQLGatewayUrlPrefixDecorator`, the subclass of `UrlPrefixDecorator`, 
overrides `getSupportedAPIVersions`and does what 
`SqlGatewayMessageHeaders::getSupportedAPIVersions` was doing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-elasticsearch] mtfelisb commented on a diff in pull request #53: [FLINK-26088][Connectors/ElasticSearch] Add Elasticsearch 8.0 support

2023-09-11 Thread via GitHub


mtfelisb commented on code in PR #53:
URL: 
https://github.com/apache/flink-connector-elasticsearch/pull/53#discussion_r1322154473


##
flink-connector-elasticsearch8/src/main/java/org/apache/flink/connector/elasticsearch/sink/INetworkConfigFactory.java:
##
@@ -0,0 +1,29 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.flink.connector.elasticsearch.sink;
+
+import co.elastic.clients.elasticsearch.ElasticsearchAsyncClient;
+
+import java.io.Serializable;
+
+public interface INetworkConfigFactory extends Serializable {

Review Comment:
   You're right. Just removed it! Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-24241) test_table_environment_api.py fail with NPE

2023-09-11 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-24241:
---
Labels: stale-major test-stability  (was: test-stability)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issues has been marked as 
Major but is unassigned and neither itself nor its Sub-Tasks have been updated 
for 60 days. I have gone ahead and added a "stale-major" to the issue". If this 
ticket is a Major, please either assign yourself or give an update. Afterwards, 
please remove the label or in 7 days the issue will be deprioritized.


> test_table_environment_api.py fail with NPE
> ---
>
> Key: FLINK-24241
> URL: https://issues.apache.org/jira/browse/FLINK-24241
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Table SQL / Planner
>Affects Versions: 1.14.0, 1.15.0
>Reporter: Xintong Song
>Priority: Major
>  Labels: stale-major, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23876=logs=821b528f-1eed-5598-a3b4-7f748b13f261=6bb545dd-772d-5d8c-f258-f5085fba3295=23263
> {code}
> Sep 10 03:03:39 E   py4j.protocol.Py4JJavaError: An error 
> occurred while calling o16211.execute.
> Sep 10 03:03:39 E   : java.lang.NullPointerException
> Sep 10 03:03:39 E at 
> java.util.Objects.requireNonNull(Objects.java:203)
> Sep 10 03:03:39 E at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.(RelMetadataQuery.java:144)
> Sep 10 03:03:39 E at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.(RelMetadataQuery.java:108)
> Sep 10 03:03:39 E at 
> org.apache.flink.table.planner.plan.metadata.FlinkRelMetadataQuery.(FlinkRelMetadataQuery.java:78)
> Sep 10 03:03:39 E at 
> org.apache.flink.table.planner.plan.metadata.FlinkRelMetadataQuery.instance(FlinkRelMetadataQuery.java:59)
> Sep 10 03:03:39 E at 
> org.apache.flink.table.planner.calcite.FlinkRelOptClusterFactory$$anon$1.get(FlinkRelOptClusterFactory.scala:39)
> Sep 10 03:03:39 E at 
> org.apache.flink.table.planner.calcite.FlinkRelOptClusterFactory$$anon$1.get(FlinkRelOptClusterFactory.scala:38)
> Sep 10 03:03:39 E at 
> org.apache.calcite.plan.RelOptCluster.getMetadataQuery(RelOptCluster.java:178)
> Sep 10 03:03:39 E at 
> org.apache.calcite.rel.metadata.RelMdUtil.clearCache(RelMdUtil.java:965)
> Sep 10 03:03:39 E at 
> org.apache.calcite.plan.hep.HepPlanner.buildFinalPlan(HepPlanner.java:942)
> Sep 10 03:03:39 E at 
> org.apache.calcite.plan.hep.HepPlanner.buildFinalPlan(HepPlanner.java:939)
> Sep 10 03:03:39 E at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:194)
> Sep 10 03:03:39 E at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkHepProgram.optimize(FlinkHepProgram.scala:69)
> Sep 10 03:03:39 E at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkHepRuleSetProgram.optimize(FlinkHepRuleSetProgram.scala:87)
> Sep 10 03:03:39 E at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:63)
> Sep 10 03:03:39 E at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:60)
> Sep 10 03:03:39 E at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
> Sep 10 03:03:39 E at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
> Sep 10 03:03:39 E at 
> scala.collection.Iterator$class.foreach(Iterator.scala:891)
> Sep 10 03:03:39 E at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
> Sep 10 03:03:39 E at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> Sep 10 03:03:39 E at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> Sep 10 03:03:39 E at 
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
> Sep 10 03:03:39 E at 
> scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
> Sep 10 03:03:39 E at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:60)
> Sep 10 03:03:39 E at 
> 

[jira] [Updated] (FLINK-31857) Support pluginable observers mechanism

2023-09-11 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-31857:
---
  Labels: auto-deprioritized-major pull-request-available  (was: 
pull-request-available stale-major)
Priority: Minor  (was: Major)

This issue was labeled "stale-major" 7 days ago and has not received any 
updates so it is being deprioritized. If this ticket is actually Major, please 
raise the priority and ask a committer to assign you the issue or revive the 
public discussion.


> Support pluginable observers mechanism
> --
>
> Key: FLINK-31857
> URL: https://issues.apache.org/jira/browse/FLINK-31857
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.5.0
>Reporter: Daren Wong
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>
> Currently, Kubernetes Operator uses AbstractFlinkResourceObserver to observe 
> Flink Job state, cluster state, JM/TM state, etc and update Custom Resource: 
> FlinkDeployment, FlinkSessionJob with the gathered info.
> This Jira is to introduce a Flink plugin to allow user to implement custom 
> observer logic after the default observer logic is run.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32938) flink-connector-pulsar should remove all `PulsarAdmin` calls

2023-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-32938:
---
Labels: pull-request-available  (was: )

> flink-connector-pulsar should remove all `PulsarAdmin` calls
> 
>
> Key: FLINK-32938
> URL: https://issues.apache.org/jira/browse/FLINK-32938
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Pulsar
>Reporter: Neng Lu
>Priority: Major
>  Labels: pull-request-available
>
> The flink-connector-pulsar should not access and interact with the admin 
> endpoint. This could introduce potential security issues.
> In a production environment, a Pulsar cluster admin will not grant the 
> permissions for the flink application to conduct any admin operations. 
> Currently, the connector does various admin calls:
> ```{{{}{}}}{{{}{}}}
> PulsarAdmin.topics().getPartitionedTopicMetadata(topic)
> PulsarAdmin.namespaces().getTopics(namespace)
> PulsarAdmin.topics().getLastMessageId(topic)
> PulsarAdmin.topics().getMessageIdByTimestamp(topic, timestamp)
> PulsarAdmin.topics().getSubscriptions(topic)
> PulsarAdmin.topics().createSubscription(topic, subscription, 
> MessageId.earliest)
> PulsarAdmin.topics().resetCursor(topic, subscription, initial, !include)
> ```
> We need to replace these calls with consumer or client calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-connector-pulsar] nlu90 commented on pull request #59: [FLINK-32938] Replace pulsar admin calls

2023-09-11 Thread via GitHub


nlu90 commented on PR #59:
URL: 
https://github.com/apache/flink-connector-pulsar/pull/59#issuecomment-1714662290

   Can someone help rerun the CI?  On my local laptop, everything runs fine:
   
   ```
   -> mvn clean verify
   ...
   ...
   ...
   [INFO] 

   [INFO] Reactor Summary for Flink : Connectors : Pulsar : Parent 4.0-SNAPSHOT:
   [INFO] 
   [INFO] Flink : Connectors : Pulsar : Parent ... SUCCESS [  1.819 
s]
   [INFO] Flink : Connectors : Pulsar  SUCCESS [27:41 
min]
   [INFO] Flink : Connectors : Pulsar : SQL .. SUCCESS [  5.184 
s]
   [INFO] Flink : Connectors : Pulsar : E2E Tests  SUCCESS [  1.157 
s]
   [INFO] 

   [INFO] BUILD SUCCESS
   [INFO] 

   [INFO] Total time:  27:50 min
   [INFO] Finished at: 2023-09-11T15:01:51-07:00
   [INFO] 

   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #23395: [FLINK-33058][formats] Add encoding option to Avro format

2023-09-11 Thread via GitHub


flinkbot commented on PR #23395:
URL: https://github.com/apache/flink/pull/23395#issuecomment-1714582332

   
   ## CI report:
   
   * 858c3eebae767ea71e1d43fc5e8f4af88f924c4a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33058) Support for JSON-encoded Avro

2023-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33058:
---
Labels: avro flink flink-formats pull-request-available  (was: avro flink 
flink-formats)

> Support for JSON-encoded Avro
> -
>
> Key: FLINK-33058
> URL: https://issues.apache.org/jira/browse/FLINK-33058
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Reporter: Dale Lane
>Priority: Minor
>  Labels: avro, flink, flink-formats, pull-request-available
>
> Avro supports two serialization encoding methods: binary and JSON
> cf. [https://avro.apache.org/docs/1.11.1/specification/#encodings] 
> flink-avro currently has a hard-coded assumption that Avro data is 
> binary-encoded (and cannot process Avro data that has been JSON-encoded).
> I propose adding a new optional format option to flink-avro: *avro.encoding*
> It will support two options: 'binary' and 'json'. 
> It unset, it will default to 'binary' to maintain compatibility/consistency 
> with current behaviour. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] dalelane opened a new pull request, #23395: [FLINK-33058][formats] Add encoding option to Avro format

2023-09-11 Thread via GitHub


dalelane opened a new pull request, #23395:
URL: https://github.com/apache/flink/pull/23395

   ## What is the purpose of the change
   
   Initially proposed in https://issues.apache.org/jira/browse/FLINK-33058 
   
   Avro supports two serialization encoding methods: binary and JSON (cf. [Avro 
docs](https://avro.apache.org/docs/1.11.1/specification/#encodings))
   
   flink-avro currently has a hard-coded assumption that Avro data is 
binary-encoded (and cannot process Avro data that has been JSON-encoded).
   
   This pull request introduces a new optional format option to flink-avro: 
`avro.encoding`
   It supports two options: 'binary' and 'json'. 
   It unset, it will default to 'binary' to maintain compatibility/consistency 
with current behaviour. 
   
   
   ## Brief change log
   
   Flink uses Avro 
`[Decoder](https://avro.apache.org/docs/1.8.2/api/java/org/apache/avro/io/Decoder.html)`
 and 
`[Encoder](https://avro.apache.org/docs/1.8.2/api/java/org/apache/avro/io/Encoder.html)`
 classes for deserializing/serializing Avro data. 
   
   However it was hard-coding the use of factory classes to only use the 
binary-encoding implementations of these abstract classes.  
(`DecoderFactory.get().binaryDecoder` and 
`EncoderFactory.get().directBinaryEncoder`)
   
   In this pull request, I'm using the value of the new `avro.encoding` option 
to create the JSON Decoder/Encoder classes where appropriate. 
   
   ## Verifying this change
   
   This change modified existing tests by re-running all of the tests that 
perform Avro serialization/deserialization to repeat the test using both binary 
and avro encoding. 
   
   This verifies that the existing binary behaviour is unaffected by the new 
option, as well as the new JSON support.
   
   I've also manually verified the new support using Flink SQL such as:
   
   ```sql
   CREATE TABLE JSONAVRORAW
   (
   ... my columns ...
   )
   WITH (
   'connector' = 'kafka',
   'topic' = 'MY.TOPIC',
   'properties.bootstrap.servers' = 'localhost:9092',
   'properties.group.id' = 'my-group',
   'scan.startup.mode' = 'earliest-offset',
   'format' = 'avro',
   'avro.encoding' = 'json'
   );
   ```
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: yes
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): yes
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? yes
 - If yes, how is the feature documented? update to the Avro format page to 
explain the new option
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-29042) Support lookup join for es connector

2023-09-11 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763915#comment-17763915
 ] 

Sergey Nuyanzin commented on FLINK-29042:
-

Merged as 
[fef5c964f4a2fc8b61c482b0bc8504757581adfb|https://github.com/apache/flink-connector-elasticsearch/commit/fef5c964f4a2fc8b61c482b0bc8504757581adfb]

> Support lookup join for es connector
> 
>
> Key: FLINK-29042
> URL: https://issues.apache.org/jira/browse/FLINK-29042
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / ElasticSearch
>Affects Versions: 1.16.0
>Reporter: Yubin Li
>Assignee: Yubin Li
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: elasticsearch-4.0.0
>
>
> Now es connector could only be used as a sink, but in many business 
> scenarios, we treat es as a index database, we should support to make it 
> lookupable in flink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #23394: [BP-1.16][FLINK-33010][table] GREATEST/LEAST should work with other functions as input

2023-09-11 Thread via GitHub


flinkbot commented on PR #23394:
URL: https://github.com/apache/flink/pull/23394#issuecomment-1714538178

   
   ## CI report:
   
   * fa40b4f52627a9b2fc32b48b138310311d211a6e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-29042) Support lookup join for es connector

2023-09-11 Thread Sergey Nuyanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Nuyanzin closed FLINK-29042.
---
Fix Version/s: elasticsearch-4.0.0
   Resolution: Fixed

> Support lookup join for es connector
> 
>
> Key: FLINK-29042
> URL: https://issues.apache.org/jira/browse/FLINK-29042
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / ElasticSearch
>Affects Versions: 1.16.0
>Reporter: Yubin Li
>Assignee: Yubin Li
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: elasticsearch-4.0.0
>
>
> Now es connector could only be used as a sink, but in many business 
> scenarios, we treat es as a index database, we should support to make it 
> lookupable in flink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-connector-elasticsearch] boring-cyborg[bot] commented on pull request #39: [FLINK-29042][Connectors/ElasticSearch] Support lookup join for es connector

2023-09-11 Thread via GitHub


boring-cyborg[bot] commented on PR #39:
URL: 
https://github.com/apache/flink-connector-elasticsearch/pull/39#issuecomment-1714536181

   Awesome work, congrats on your first merged pull request!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-connector-elasticsearch] snuyanzin merged pull request #39: [FLINK-29042][Connectors/ElasticSearch] Support lookup join for es connector

2023-09-11 Thread via GitHub


snuyanzin merged PR #39:
URL: https://github.com/apache/flink-connector-elasticsearch/pull/39


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #23393: [BP-1.18][FLINK-33010][table] GREATEST/LEAST should work with other functions as input

2023-09-11 Thread via GitHub


flinkbot commented on PR #23393:
URL: https://github.com/apache/flink/pull/23393#issuecomment-1714524505

   
   ## CI report:
   
   * e296671f364fdeda71054bc090ad35e3f77ad07b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #23392: [BP-1.17][FLINK-33010][table] GREATEST/LEAST should work with other functions as input

2023-09-11 Thread via GitHub


flinkbot commented on PR #23392:
URL: https://github.com/apache/flink/pull/23392#issuecomment-1714518365

   
   ## CI report:
   
   * d7e7c703c9a23d18f5c44a0c8b42db7d6afe46a8 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (FLINK-32999) Remove HBase connector from master branch

2023-09-11 Thread Sergey Nuyanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Nuyanzin resolved FLINK-32999.
-
Fix Version/s: 1.18.0
   1.19.0
   Resolution: Fixed

> Remove HBase connector from master branch
> -
>
> Key: FLINK-32999
> URL: https://issues.apache.org/jira/browse/FLINK-32999
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / HBase
>Reporter: Sergey Nuyanzin
>Assignee: Ferenc Csaky
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.19.0
>
>
> The connector was externalized at FLINK-30061
> Once it is released it would make sense to remove it from master branch



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32999) Remove HBase connector from master branch

2023-09-11 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763905#comment-17763905
 ] 

Sergey Nuyanzin commented on FLINK-32999:
-

Merged to 1.18 as 
[53981d81d378770f8a1f87717c5d80b35e4010d7|https://github.com/apache/flink/commit/53981d81d378770f8a1f87717c5d80b35e4010d7]
master: 
[62e41acdf56cccde53cc6226b2cc7c0bc3cda3e6|https://github.com/apache/flink/commit/62e41acdf56cccde53cc6226b2cc7c0bc3cda3e6]

> Remove HBase connector from master branch
> -
>
> Key: FLINK-32999
> URL: https://issues.apache.org/jira/browse/FLINK-32999
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / HBase
>Reporter: Sergey Nuyanzin
>Assignee: Ferenc Csaky
>Priority: Major
>  Labels: pull-request-available
>
> The connector was externalized at FLINK-30061
> Once it is released it would make sense to remove it from master branch



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] snuyanzin merged pull request #23343: [BP-1.18][FLINK-32999][connectors/hbase] Remove HBase connector code from main repo

2023-09-11 Thread via GitHub


snuyanzin merged PR #23343:
URL: https://github.com/apache/flink/pull/23343


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] rkhachatryan commented on a diff in pull request #23391: [FLINK-33071][metrics,checkpointing] Log a json dump of checkpoint statistics when checkpoint complets or fails

2023-09-11 Thread via GitHub


rkhachatryan commented on code in PR #23391:
URL: https://github.com/apache/flink/pull/23391#discussion_r1321980461


##
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointStatsTracker.java:
##
@@ -227,11 +257,23 @@ void reportFailedCheckpoint(FailedCheckpointStats failed) 
{
 history.replacePendingCheckpointById(failed);
 
 dirty = true;
+logCheckpointStatistics(failed);
 } finally {
 statsReadWriteLock.unlock();
 }
 }
 
+private void logCheckpointStatistics(AbstractCheckpointStats 
checkpointStats) {
+try {
+StringWriter sw = new StringWriter();
+MAPPER.writeValue(sw, 
CheckpointStatistics.generateCheckpointStatistics(checkpointStats, true));
+String jsonDump = sw.toString();
+LOG.debug("CheckpointStatistics (for jobID={}, jobName={}) dump = 
{} ", jobID, jobName, jsonDump);

Review Comment:
   How about explicitly logging checkpoint ID along with job ID?
   So that we don't need to parse/know json to get stats for a specific 
checkpoint.



##
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointStatsTracker.java:
##
@@ -227,11 +257,23 @@ void reportFailedCheckpoint(FailedCheckpointStats failed) 
{
 history.replacePendingCheckpointById(failed);
 
 dirty = true;
+logCheckpointStatistics(failed);
 } finally {
 statsReadWriteLock.unlock();
 }
 }
 
+private void logCheckpointStatistics(AbstractCheckpointStats 
checkpointStats) {
+try {
+StringWriter sw = new StringWriter();
+MAPPER.writeValue(sw, 
CheckpointStatistics.generateCheckpointStatistics(checkpointStats, true));
+String jsonDump = sw.toString();

Review Comment:
   Should we save some resources by guarding this with isDebugEnabled?



##
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointStatsTracker.java:
##
@@ -227,11 +257,23 @@ void reportFailedCheckpoint(FailedCheckpointStats failed) 
{
 history.replacePendingCheckpointById(failed);
 
 dirty = true;
+logCheckpointStatistics(failed);
 } finally {
 statsReadWriteLock.unlock();
 }
 }
 
+private void logCheckpointStatistics(AbstractCheckpointStats 
checkpointStats) {
+try {
+StringWriter sw = new StringWriter();
+MAPPER.writeValue(sw, 
CheckpointStatistics.generateCheckpointStatistics(checkpointStats, true));
+String jsonDump = sw.toString();
+LOG.debug("CheckpointStatistics (for jobID={}, jobName={}) dump = 
{} ", jobID, jobName, jsonDump);
+} catch (Exception ex) {
+LOG.error("Fail to log CheckpointStatistics", ex);

Review Comment:
   nit: I'd log it as a warning



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-33066) Enable to inject environment variable from secret/configmap to operatorPod

2023-09-11 Thread Gyula Fora (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora reassigned FLINK-33066:
--

Assignee: dongwoo.kim

> Enable to inject environment variable from secret/configmap to operatorPod
> --
>
> Key: FLINK-33066
> URL: https://issues.apache.org/jira/browse/FLINK-33066
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: dongwoo.kim
>Assignee: dongwoo.kim
>Priority: Minor
>
> Hello, I've been working with the Flink Kubernetes operator and noticed that 
> the {{operatorPod.env}} only allows for simple key-value pairs and doesn't 
> support Kubernetes {{valueFrom}} syntax.
> How about changing template to support more various k8s syntax? 
> *Current template*
> {code:java}
> {{- range $k, $v := .Values.operatorPod.env }}
>   - name: {{ $v.name | quote }}
>     value: {{ $v.value | quote }}
> {{- end }}{code}
>  
> *Proposed template*
> 1) Modify template like below 
> {code:java}
> {{- with .Values.operatorPod.env }} 
> {{- toYaml . | nindent 12 }} 
> {{- end }} 
> {code}
> 2) create extra config, *Values.operatorPod.envFrom* and utilize this
>  
> I'd be happy to implement this update if it's approved.
> Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33066) Enable to inject environment variable from secret/configmap to operatorPod

2023-09-11 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763887#comment-17763887
 ] 

Gyula Fora commented on FLINK-33066:


I think this would be a nice improvement :) 

> Enable to inject environment variable from secret/configmap to operatorPod
> --
>
> Key: FLINK-33066
> URL: https://issues.apache.org/jira/browse/FLINK-33066
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: dongwoo.kim
>Priority: Minor
>
> Hello, I've been working with the Flink Kubernetes operator and noticed that 
> the {{operatorPod.env}} only allows for simple key-value pairs and doesn't 
> support Kubernetes {{valueFrom}} syntax.
> How about changing template to support more various k8s syntax? 
> *Current template*
> {code:java}
> {{- range $k, $v := .Values.operatorPod.env }}
>   - name: {{ $v.name | quote }}
>     value: {{ $v.value | quote }}
> {{- end }}{code}
>  
> *Proposed template*
> 1) Modify template like below 
> {code:java}
> {{- with .Values.operatorPod.env }} 
> {{- toYaml . | nindent 12 }} 
> {{- end }} 
> {code}
> 2) create extra config, *Values.operatorPod.envFrom* and utilize this
>  
> I'd be happy to implement this update if it's approved.
> Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] pgaref commented on a diff in pull request #23386: [FLINK-33022] Log an error when enrichers defined as part of the configuration can not be found/loaded

2023-09-11 Thread via GitHub


pgaref commented on code in PR #23386:
URL: https://github.com/apache/flink/pull/23386#discussion_r1321976643


##
flink-runtime/src/main/java/org/apache/flink/runtime/failure/FailureEnricherUtils.java:
##
@@ -106,23 +108,34 @@ static Collection getFailureEnrichers(
 }
 }
 
+if 
(!includedEnrichers.values().stream().allMatch(Boolean::booleanValue)) {

Review Comment:
   I agree, I went back and forth between changing the method signature but 
this simplifies things a lot.
   Updated, and thanks ;) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] elkhand commented on a diff in pull request #23383: [FLINK-32885 ] [flink-clients] Refactoring: Moving UrlPrefixDecorator into flink-clients so it can be used by RestClusterClient for

2023-09-11 Thread via GitHub


elkhand commented on code in PR #23383:
URL: https://github.com/apache/flink/pull/23383#discussion_r1321874986


##
flink-clients/src/main/java/org/apache/flink/client/program/rest/UrlPrefixDecorator.java:
##
@@ -41,7 +40,7 @@
  */
 public class UrlPrefixDecorator<
 R extends RequestBody, P extends ResponseBody, M extends 
MessageParameters>
-implements SqlGatewayMessageHeaders {

Review Comment:
   It is used in other places which is not related to this PR. But the reason 
`UrlPrefixDecorator` implemented `SqlGatewayMessageHeaders` to fetch 
`SqlGatewayRestAPIVersion::isStableVersion`, as `UrlPrefixDecorator` was 
initially meant to be used only with FlinkSQLGateway.
   Now I intend to move `UrlPrefixDecorator` into `flink-clients` module, so 
that `UrlPrefixDecorator` can be used by RestClusterClient also for PyFlink 
remote execution. `flink-sql-gateway` has dependency on `flink-clients` module, 
thus will be able to extend `UrlPrefixDecorator` class and use it also.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] elkhand commented on a diff in pull request #23383: [FLINK-32885 ] [flink-clients] Refactoring: Moving UrlPrefixDecorator into flink-clients so it can be used by RestClusterClient for

2023-09-11 Thread via GitHub


elkhand commented on code in PR #23383:
URL: https://github.com/apache/flink/pull/23383#discussion_r1321869568


##
flink-clients/src/main/java/org/apache/flink/client/program/rest/MonitoringApiMessageHeaders.java:
##
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.client.program.rest;
+
+import org.apache.flink.runtime.rest.messages.MessageHeaders;
+import org.apache.flink.runtime.rest.messages.MessageParameters;
+import org.apache.flink.runtime.rest.messages.RequestBody;
+import org.apache.flink.runtime.rest.messages.ResponseBody;
+import org.apache.flink.runtime.rest.versioning.RestAPIVersion;
+import org.apache.flink.runtime.rest.versioning.RuntimeRestAPIVersion;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.stream.Collectors;
+
+/**
+ * This class links {@link RequestBody}s to {@link ResponseBody}s types and 
contains meta-data
+ * required for their http headers in runtime module.
+ *
+ * Implementations must be state-less.
+ *
+ * @param  request message type
+ * @param  response message type
+ * @param  message parameters type
+ */
+public interface MonitoringApiMessageHeaders<

Review Comment:
   This interface provides a method `getSupportedAPIVersions` which enables 
fetching required stable REST versions. For example for Flink SQL Gateway it 
can be `SqlGatewayRestAPIVersion::isStableVersion`, but for PyFlink or other 
cases it will be `RuntimeRestAPIVersion::isStableVersion`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] elkhand commented on a diff in pull request #23383: [FLINK-32885 ] [flink-clients] Refactoring: Moving UrlPrefixDecorator into flink-clients so it can be used by RestClusterClient for

2023-09-11 Thread via GitHub


elkhand commented on code in PR #23383:
URL: https://github.com/apache/flink/pull/23383#discussion_r1321868038


##
flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/rest/header/util/SQLGatewayUrlPrefixDecorator.java:
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.gateway.rest.header.util;
+
+import org.apache.flink.client.program.rest.UrlPrefixDecorator;
+import org.apache.flink.runtime.rest.messages.MessageHeaders;
+import org.apache.flink.runtime.rest.messages.MessageParameters;
+import org.apache.flink.runtime.rest.messages.RequestBody;
+import org.apache.flink.runtime.rest.messages.ResponseBody;
+import org.apache.flink.runtime.rest.versioning.RestAPIVersion;
+import org.apache.flink.table.gateway.rest.util.SqlGatewayRestAPIVersion;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.stream.Collectors;
+
+/**
+ * This class decorates the URL of the original message headers by adding a 
prefix. The purpose of
+ * this decorator is to provide a consistent URL prefix for all the REST API 
endpoints while
+ * preserving the behavior of the original message headers.
+ *
+ * @param  the type of the request body
+ * @param  the type of the response body
+ * @param  the type of the message parameters
+ */
+public class SQLGatewayUrlPrefixDecorator<
+R extends RequestBody, P extends ResponseBody, M extends 
MessageParameters>
+extends UrlPrefixDecorator {

Review Comment:
   The overridden method `getSupportedAPIVersions()` of 
SQLGatewayUrlPrefixDecorator will return SQL Gateway's stable REST API 
versions. UrlPrefixDecorator will be generic and can be used on the PyFlink 
side, which will need to fetch RuntimeRestAPIVersion stable versions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] elkhand commented on a diff in pull request #23383: [FLINK-32885 ] [flink-clients] Refactoring: Moving UrlPrefixDecorator into flink-clients so it can be used by RestClusterClient for

2023-09-11 Thread via GitHub


elkhand commented on code in PR #23383:
URL: https://github.com/apache/flink/pull/23383#discussion_r1321865428


##
flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/rest/header/util/SQLGatewayUrlPrefixDecorator.java:
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.gateway.rest.header.util;
+
+import org.apache.flink.client.program.rest.UrlPrefixDecorator;
+import org.apache.flink.runtime.rest.messages.MessageHeaders;
+import org.apache.flink.runtime.rest.messages.MessageParameters;
+import org.apache.flink.runtime.rest.messages.RequestBody;
+import org.apache.flink.runtime.rest.messages.ResponseBody;
+import org.apache.flink.runtime.rest.versioning.RestAPIVersion;
+import org.apache.flink.table.gateway.rest.util.SqlGatewayRestAPIVersion;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.stream.Collectors;
+
+/**
+ * This class decorates the URL of the original message headers by adding a 
prefix. The purpose of
+ * this decorator is to provide a consistent URL prefix for all the REST API 
endpoints while
+ * preserving the behavior of the original message headers.
+ *
+ * @param  the type of the request body
+ * @param  the type of the response body
+ * @param  the type of the message parameters
+ */
+public class SQLGatewayUrlPrefixDecorator<
+R extends RequestBody, P extends ResponseBody, M extends 
MessageParameters>
+extends UrlPrefixDecorator {

Review Comment:
   The overridden method `getSupportedAPIVersions()` of 
SQLGatewayUrlPrefixDecorator will return SQL Gateway's stable REST API 
versions. UrlPrefixDecorator will be generic and can be used on the PyFlink 
side, which will need to fetch different stable rest API versions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] elkhand commented on a diff in pull request #23383: [FLINK-32885 ] [flink-clients] Refactoring: Moving UrlPrefixDecorator into flink-clients so it can be used by RestClusterClient for

2023-09-11 Thread via GitHub


elkhand commented on code in PR #23383:
URL: https://github.com/apache/flink/pull/23383#discussion_r1321865428


##
flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/rest/header/util/SQLGatewayUrlPrefixDecorator.java:
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.gateway.rest.header.util;
+
+import org.apache.flink.client.program.rest.UrlPrefixDecorator;
+import org.apache.flink.runtime.rest.messages.MessageHeaders;
+import org.apache.flink.runtime.rest.messages.MessageParameters;
+import org.apache.flink.runtime.rest.messages.RequestBody;
+import org.apache.flink.runtime.rest.messages.ResponseBody;
+import org.apache.flink.runtime.rest.versioning.RestAPIVersion;
+import org.apache.flink.table.gateway.rest.util.SqlGatewayRestAPIVersion;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.stream.Collectors;
+
+/**
+ * This class decorates the URL of the original message headers by adding a 
prefix. The purpose of
+ * this decorator is to provide a consistent URL prefix for all the REST API 
endpoints while
+ * preserving the behavior of the original message headers.
+ *
+ * @param  the type of the request body
+ * @param  the type of the response body
+ * @param  the type of the message parameters
+ */
+public class SQLGatewayUrlPrefixDecorator<
+R extends RequestBody, P extends ResponseBody, M extends 
MessageParameters>
+extends UrlPrefixDecorator {

Review Comment:
   The overridden method `getSupportedAPIVersions()` of 
SQLGatewayUrlPrefixDecorator will return SQL Gateway's stable REST API 
versions. UrlPrefixDecorator will be generic and can be used on the PyFlink 
side, which will need to fetch different stable rest API versions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #23391: [FLINK-33071][metrics,checkpointing] Log a json dump of checkpoint statistics when checkpoint complets or fails

2023-09-11 Thread via GitHub


flinkbot commented on PR #23391:
URL: https://github.com/apache/flink/pull/23391#issuecomment-1714278117

   
   ## CI report:
   
   * 7571dc3f4381b14c4764a19de346fd4e44bb59dc UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pnowojski opened a new pull request, #23391: [FLINK-33071][metrics,checkpointing] Log a json dump of checkpoint statistics when checkpoint complets or fails

2023-09-11 Thread via GitHub


pnowojski opened a new pull request, #23391:
URL: https://github.com/apache/flink/pull/23391

   ## What is the purpose of the change
   
   This adds a DEBUG level log message that contains all checkpoint stats on 
every completed/failed checkpoint. A stop-gap solution until we come up with a 
proper way to expose those stats as metrics somewhere (FLINK-23411).
   
   
   ## Verifying this change
   
   I manually checked log output in some of the ITCases:
   ```
   8299 [jobmanager-io-thread-10] DEBUG 
org.apache.flink.runtime.checkpoint.CheckpointStatsTracker [] - 
CheckpointStatistics (for jobID=cf70776f3762dd273961bd57a2dbdc62, jobName=Flink 
Streaming Job) dump = 
{"className":"completed","id":2,"status":"COMPLETED","is_savepoint":false,"savepointFormat":null,"trigger_timestamp":1694452019812,"latest_ack_timestamp":1694452019823,"checkpointed_size":0,"state_size":0,"end_to_end_duration":11,"alignment_buffered":0,"processed_data":0,"persisted_data":0,"num_subtasks":10,"num_acknowledged_subtasks":10,"checkpoint_type":"CHECKPOINT","tasks":{"feca28aff5a3958840bee985ee7de4d3":{"id":2,"status":"COMPLETED","latest_ack_timestamp":1694452019823,"checkpointed_size":0,"state_size":0,"end_to_end_duration":11,"alignment_buffered":0,"processed_data":0,"persisted_data":0,"num_subtasks":1,"num_acknowledged_subtasks":1},"8b9f3e513101bcc5d198aee685286d98":{"id":2,"status":"COMPLETED","latest_ack_timestamp":1694452019823,"checkpointed_size":0,"state_size":0,"e
 
nd_to_end_duration":11,"alignment_buffered":0,"processed_data":0,"persisted_data":0,"num_subtasks":4,"num_acknowledged_subtasks":4},"51397532e2d9c7a21097a30d590b3114":{"id":2,"status":"COMPLETED","latest_ack_timestamp":1694452019823,"checkpointed_size":0,"state_size":0,"end_to_end_duration":11,"alignment_buffered":0,"processed_data":0,"persisted_data":0,"num_subtasks":4,"num_acknowledged_subtasks":4},"bc764cd8ddf7a0cff126f51c16239658":{"id":2,"status":"COMPLETED","latest_ack_timestamp":1694452019823,"checkpointed_size":0,"state_size":0,"end_to_end_duration":11,"alignment_buffered":0,"processed_data":0,"persisted_data":0,"num_subtasks":1,"num_acknowledged_subtasks":1}},"external_path":"","discarded":false}
 
   8357 [Checkpoint Timer] DEBUG 
org.apache.flink.runtime.checkpoint.CheckpointStatsTracker [] - 
CheckpointStatistics (for jobID=cf70776f3762dd273961bd57a2dbdc62, jobName=Flink 
Streaming Job) dump = 
{"className":"failed","id":3,"status":"FAILED","is_savepoint":false,"savepointFormat":null,"trigger_timestamp":1694452019832,"latest_ack_timestamp":-1,"checkpointed_size":0,"state_size":0,"end_to_end_duration":48,"alignment_buffered":0,"processed_data":0,"persisted_data":0,"num_subtasks":10,"num_acknowledged_subtasks":0,"checkpoint_type":"CHECKPOINT","tasks":{"feca28aff5a3958840bee985ee7de4d3":{"id":3,"status":"FAILED","latest_ack_timestamp":-1,"checkpointed_size":0,"state_size":0,"end_to_end_duration":-1,"alignment_buffered":0,"processed_data":0,"persisted_data":0,"num_subtasks":1,"num_acknowledged_subtasks":0},"8b9f3e513101bcc5d198aee685286d98":{"id":3,"status":"FAILED","latest_ack_timestamp":-1,"checkpointed_size":0,"state_size":0,"end_to_end_duration":-1,"alignment_buffered":0,"proces
 
sed_data":0,"persisted_data":0,"num_subtasks":4,"num_acknowledged_subtasks":0},"51397532e2d9c7a21097a30d590b3114":{"id":3,"status":"FAILED","latest_ack_timestamp":-1,"checkpointed_size":0,"state_size":0,"end_to_end_duration":-1,"alignment_buffered":0,"processed_data":0,"persisted_data":0,"num_subtasks":4,"num_acknowledged_subtasks":0},"bc764cd8ddf7a0cff126f51c16239658":{"id":3,"status":"FAILED","latest_ack_timestamp":-1,"checkpointed_size":0,"state_size":0,"end_to_end_duration":-1,"alignment_buffered":0,"processed_data":0,"persisted_data":0,"num_subtasks":1,"num_acknowledged_subtasks":0}},"failure_timestamp":1694452019880,"failure_message":"TaskManager
 received a checkpoint request for unknown task 
4d61cc52dbd40f4ac96e1ebf5f561cba_bc764cd8ddf7a0cff126f51c16239658_0_0. Failure 
reason: Task local checkpoint failure."} 
   ```
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use 

[jira] [Updated] (FLINK-33071) Log checkpoint statistics

2023-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33071:
---
Labels: pull-request-available  (was: )

> Log checkpoint statistics 
> --
>
> Key: FLINK-33071
> URL: https://issues.apache.org/jira/browse/FLINK-33071
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing, Runtime / Metrics
>Affects Versions: 1.18.0
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
>Priority: Major
>  Labels: pull-request-available
>
> This is a stop gap solution until we have a proper way of solving FLINK-23411.
> The plan is to dump JSON serialised checkpoint statistics into Flink JM's 
> log, with a {{DEBUG}} level. This could be used to analyse what has happened 
> with a certain checkpoint in the past.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] ferenc-csaky commented on pull request #23343: [BP-1.18][FLINK-32999][connectors/hbase] Remove HBase connector code from main repo

2023-09-11 Thread via GitHub


ferenc-csaky commented on PR #23343:
URL: https://github.com/apache/flink/pull/23343#issuecomment-1714173307

   @snuyanzin can you merge this one as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33018) GCP Pubsub PubSubConsumingTest.testStoppingConnectorWhenDeserializationSchemaIndicatesEndOfStream failed

2023-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33018:
---
Labels: pull-request-available  (was: )

> GCP Pubsub 
> PubSubConsumingTest.testStoppingConnectorWhenDeserializationSchemaIndicatesEndOfStream
>  failed
> 
>
> Key: FLINK-33018
> URL: https://issues.apache.org/jira/browse/FLINK-33018
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Google Cloud PubSub
>Affects Versions: gcp-pubsub-3.0.2
>Reporter: Martijn Visser
>Priority: Blocker
>  Labels: pull-request-available
>
> https://github.com/apache/flink-connector-gcp-pubsub/actions/runs/6061318336/job/16446392844#step:13:507
> {code:java}
> [INFO] 
> [INFO] Results:
> [INFO] 
> Error:  Failures: 
> Error:
> PubSubConsumingTest.testStoppingConnectorWhenDeserializationSchemaIndicatesEndOfStream:119
>  
> expected: ["1", "2", "3"]
>  but was: ["1", "2"]
> [INFO] 
> Error:  Tests run: 30, Failures: 1, Errors: 0, Skipped: 0
> [INFO] 
> [INFO] 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-connector-gcp-pubsub] RyanSkraba opened a new pull request, #19: [FLINK-33018][Tests] Fix flaky test when cancelling source

2023-09-11 Thread via GitHub


RyanSkraba opened a new pull request, #19:
URL: https://github.com/apache/flink-connector-gcp-pubsub/pull/19

   We sometimes observe the test failure described in FLINK-33018
   
   ```
   Expected :["1", "2", "3"]
   Actual   :["1", "2"]
   ```
   
   This occurs because the thread running the source is explicitly cancelled in 
the **finally** clause.  Unlike the other tests in this class, the source 
thread should run to completion and finish when the last message ID (`"3"`) 
arrives without external intervention.
   
   In rare cases, it was possible for the explicit `cancel()` to happen after 
the two records arrived, but before the third record causes the source to 
cleanly shut itself down and finish.
   
   This is fixed by no longer waiting for any records to arrive, just letting 
the thread run to completion, and _then_ testing the results.
   
   I checked this by running this particular test as a 
`@RepeatedTest(100_000)`.  I have never observed the behaviour described in the 
issue after this change, but I've noticed that *with or without* the change, 
this test will consistently fail after the 16,248-16,250th execution, during 
the `thread.start()`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (FLINK-23411) Expose Flink checkpoint details metrics

2023-09-11 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763789#comment-17763789
 ] 

Piotr Nowojski edited comment on FLINK-23411 at 9/11/23 3:51 PM:
-

Hey, I've just upon this ticket. Should we really expose point-wise metrics as 
continuous ones? It's following the pattern of {{lastCheckpointDuration}}, but 
I don't think this is a good pattern:
* This is wasteful if {{checkpoint duration >> metrics collecting/reporting 
interval}}. We will report the same values many times over.
* This is loosing the data if {{checkpoint duration < metric 
collecting/reporting interval}}. Only a fraction of checkpoint's stats will be 
visible in the metric system.
* I don't know how this approach will scale with larger jobs (parallelism > 
20). How user/dev ops could actually visualize and analyze hundreds/thousands 
of this kind of metrics for a single checkpoint {{parallelism * number_of_tasks 
* 8 (number of metrics) == a lot}} (800 for parallelism 20 and 5 tasks)? From 
my experience if there is an issue with checkpoint, I need to look down to 
subtask level checkpoint stats to track down what is the problem and then often 
correlate numbers between different subtasks. I just don't see how this can be 
done with the current metric reporting scheme.

I think in the long term, this should be exposed as something like [distributed 
tracing|https://newrelic.com/blog/how-to-relic/distributed-tracing-anomaly-detection]
 on the user side, and reported by Flink as a bunch of traces via OTEL. Maybe 
we should expand the current {{MetricReporter}} plugins to support that?

As a stop gap solution, I would propose this FLINK-33071.


was (Author: pnowojski):
Hey, I've just upon this ticket as we are also looking into lack of those 
metrics. Should we really expose point-wise metrics as continuous ones? It's 
following the pattern of {{lastCheckpointDuration}}, but I don't think this is 
a good pattern:
* this is wasteful if {{checkpoint duration >> metrics collecting/reporting 
interval}}
* this is loosing the data if {{checkpoint duration < metric 
collecting/reporting interval}}
* I don't know how this approach will scale with larger jobs (parallelism > 
20). How user/dev ops could actually visualise and analyse hundreds/thousands 
of this kind of metrics for a single checkpoint {{parallelism * number_of_tasks 
* 8 (number of metrics) == a lot}} (800 for parallelism 20 and 5 tasks). 

I think in the long term, this should be exposed as something like [distributed 
tracing|https://newrelic.com/blog/how-to-relic/distributed-tracing-anomaly-detection]
 on the user side, and reported by Flink as a bunch of traces via OTEL. Maybe 
we should expand the current {{MetricReporter}} plugins to support that?

As a stop gap solution, I would propose this FLINK-33071.

> Expose Flink checkpoint details metrics
> ---
>
> Key: FLINK-23411
> URL: https://issues.apache.org/jira/browse/FLINK-23411
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Affects Versions: 1.13.1, 1.12.4
>Reporter: Jun Qin
>Assignee: Hangxiang Yu
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.18.0
>
>
> The checkpoint metrics as shown in the Flink Web UI like the 
> sync/async/alignment/start delay are not exposed to the metrics system. This 
> makes problem investigation harder when Web UI is not enabled: those numbers 
> can not get in the DEBUG logs. I think we should see how we can expose 
> metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-23411) Expose Flink checkpoint details metrics

2023-09-11 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763789#comment-17763789
 ] 

Piotr Nowojski commented on FLINK-23411:


Hey, I've just upon this ticket as we are also looking into lack of those 
metrics. Should we really expose point-wise metrics as continuous ones? It's 
following the pattern of {{lastCheckpointDuration}}, but I don't think this is 
a good pattern:
* this is wasteful if {{checkpoint duration >> metrics collecting/reporting 
interval}}
* this is loosing the data if {{checkpoint duration < metric 
collecting/reporting interval}}
* I don't know how this approach will scale with larger jobs (parallelism > 
20). How user/dev ops could actually visualise and analyse hundreds/thousands 
of this kind of metrics for a single checkpoint {{parallelism * number_of_tasks 
* 8 (number of metrics) == a lot}} (800 for parallelism 20 and 5 tasks). 

I think in the long term, this should be exposed as something like [distributed 
tracing|https://newrelic.com/blog/how-to-relic/distributed-tracing-anomaly-detection]
 on the user side, and reported by Flink as a bunch of traces via OTEL. Maybe 
we should expand the current {{MetricReporter}} plugins to support that?

As a stop gap solution, I would propose this FLINK-33071.

> Expose Flink checkpoint details metrics
> ---
>
> Key: FLINK-23411
> URL: https://issues.apache.org/jira/browse/FLINK-23411
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Affects Versions: 1.13.1, 1.12.4
>Reporter: Jun Qin
>Assignee: Hangxiang Yu
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.18.0
>
>
> The checkpoint metrics as shown in the Flink Web UI like the 
> sync/async/alignment/start delay are not exposed to the metrics system. This 
> makes problem investigation harder when Web UI is not enabled: those numbers 
> can not get in the DEBUG logs. I think we should see how we can expose 
> metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33071) Log checkpoint statistics

2023-09-11 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-33071:
--

 Summary: Log checkpoint statistics 
 Key: FLINK-33071
 URL: https://issues.apache.org/jira/browse/FLINK-33071
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Checkpointing, Runtime / Metrics
Affects Versions: 1.18.0
Reporter: Piotr Nowojski
Assignee: Piotr Nowojski


This is a stop gap solution until we have a proper way of solving FLINK-23411.

The plan is to dump JSON serialised checkpoint statistics into Flink JM's log, 
with a {{DEBUG}} level. This could be used to analyse what has happened with a 
certain checkpoint in the past.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] tweise commented on pull request #23383: [FLINK-32885 ] [flink-clients] Refactoring: Moving UrlPrefixDecorator into flink-clients so it can be used by RestClusterClient for PyFlink re

2023-09-11 Thread via GitHub


tweise commented on PR #23383:
URL: https://github.com/apache/flink/pull/23383#issuecomment-1714118207

   @afedulov PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] tweise commented on a diff in pull request #23383: [FLINK-32885 ] [flink-clients] Refactoring: Moving UrlPrefixDecorator into flink-clients so it can be used by RestClusterClient for

2023-09-11 Thread via GitHub


tweise commented on code in PR #23383:
URL: https://github.com/apache/flink/pull/23383#discussion_r1321705462


##
flink-clients/src/main/java/org/apache/flink/client/program/rest/UrlPrefixDecorator.java:
##
@@ -41,7 +40,7 @@
  */
 public class UrlPrefixDecorator<
 R extends RequestBody, P extends ResponseBody, M extends 
MessageParameters>
-implements SqlGatewayMessageHeaders {

Review Comment:
   Where/how is `SqlGatewayMessageHeaders` used after this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-32854) [JUnit5 Migration] The state package of flink-runtime module

2023-09-11 Thread Rui Fan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763765#comment-17763765
 ] 

Rui Fan commented on FLINK-32854:
-

Merged via  554810abbbd44a656a446d051b3dd199b0df9055

> [JUnit5 Migration] The state package of flink-runtime module
> 
>
> Key: FLINK-32854
> URL: https://issues.apache.org/jira/browse/FLINK-32854
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Rui Fan
>Assignee: Jiabao Sun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-32854) [JUnit5 Migration] The state package of flink-runtime module

2023-09-11 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan resolved FLINK-32854.
-
Fix Version/s: 1.19.0
   Resolution: Fixed

> [JUnit5 Migration] The state package of flink-runtime module
> 
>
> Key: FLINK-32854
> URL: https://issues.apache.org/jira/browse/FLINK-32854
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Rui Fan
>Assignee: Jiabao Sun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] gaborgsomogyi commented on a diff in pull request #23359: [FLINK-33029][python] Drop python 3.7 support

2023-09-11 Thread via GitHub


gaborgsomogyi commented on code in PR #23359:
URL: https://github.com/apache/flink/pull/23359#discussion_r1321570938


##
tools/releasing/create_binary_release.sh:
##
@@ -129,8 +129,8 @@ make_python_release() {
   cp ${pyflink_actual_name} "${PYTHON_RELEASE_DIR}/${pyflink_release_name}"
 
   wheel_packages_num=0
-  # py37,py38,py39,py310 for mac and linux (11 wheel packages)
-  EXPECTED_WHEEL_PACKAGES_NUM=11
+  # py38,py39,py310 for mac 10.9, 11.0 and linux (9 wheel packages)
+  EXPECTED_WHEEL_PACKAGES_NUM=9

Review Comment:
   https://pypi.org/project/apache-flink/#files
   
   Here we've had originally built:
   * (py38,py39,py310) X (mac 10.9, mac 11.0, linux) = 9
   * (py37) X (mac 10.9, linux) = 2
   
   Now the last one dropped so we should have 9 but I was unable to test it on 
azure.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] tweise commented on a diff in pull request #23383: [FLINK-32885 ] [flink-clients] Refactoring: Moving UrlPrefixDecorator into flink-clients so it can be used by RestClusterClient for

2023-09-11 Thread via GitHub


tweise commented on code in PR #23383:
URL: https://github.com/apache/flink/pull/23383#discussion_r1321682712


##
flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/rest/header/util/SQLGatewayUrlPrefixDecorator.java:
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.gateway.rest.header.util;
+
+import org.apache.flink.client.program.rest.UrlPrefixDecorator;
+import org.apache.flink.runtime.rest.messages.MessageHeaders;
+import org.apache.flink.runtime.rest.messages.MessageParameters;
+import org.apache.flink.runtime.rest.messages.RequestBody;
+import org.apache.flink.runtime.rest.messages.ResponseBody;
+import org.apache.flink.runtime.rest.versioning.RestAPIVersion;
+import org.apache.flink.table.gateway.rest.util.SqlGatewayRestAPIVersion;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.stream.Collectors;
+
+/**
+ * This class decorates the URL of the original message headers by adding a 
prefix. The purpose of
+ * this decorator is to provide a consistent URL prefix for all the REST API 
endpoints while
+ * preserving the behavior of the original message headers.
+ *
+ * @param  the type of the request body
+ * @param  the type of the response body
+ * @param  the type of the message parameters
+ */
+public class SQLGatewayUrlPrefixDecorator<
+R extends RequestBody, P extends ResponseBody, M extends 
MessageParameters>
+extends UrlPrefixDecorator {

Review Comment:
   Based on what I see here the class filters a SQL gateway specific header.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] mxm commented on a diff in pull request #23383: [FLINK-32885 ] [flink-clients] Refactoring: Moving UrlPrefixDecorator into flink-clients so it can be used by RestClusterClient for PyF

2023-09-11 Thread via GitHub


mxm commented on code in PR #23383:
URL: https://github.com/apache/flink/pull/23383#discussion_r1321673233


##
flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/rest/header/util/SQLGatewayUrlPrefixDecorator.java:
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.gateway.rest.header.util;
+
+import org.apache.flink.client.program.rest.UrlPrefixDecorator;
+import org.apache.flink.runtime.rest.messages.MessageHeaders;
+import org.apache.flink.runtime.rest.messages.MessageParameters;
+import org.apache.flink.runtime.rest.messages.RequestBody;
+import org.apache.flink.runtime.rest.messages.ResponseBody;
+import org.apache.flink.runtime.rest.versioning.RestAPIVersion;
+import org.apache.flink.table.gateway.rest.util.SqlGatewayRestAPIVersion;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.stream.Collectors;
+
+/**
+ * This class decorates the URL of the original message headers by adding a 
prefix. The purpose of
+ * this decorator is to provide a consistent URL prefix for all the REST API 
endpoints while
+ * preserving the behavior of the original message headers.
+ *
+ * @param  the type of the request body
+ * @param  the type of the response body
+ * @param  the type of the message parameters
+ */
+public class SQLGatewayUrlPrefixDecorator<
+R extends RequestBody, P extends ResponseBody, M extends 
MessageParameters>
+extends UrlPrefixDecorator {

Review Comment:
   Why is `SQLGatewayUrlPrefixDecorator` required? There is already 
UrlPrefixDecorator which this class is using as a super class. I don't 
understand why we need a new class. We can simply use UrlPrefixDecorator .



##
flink-clients/src/main/java/org/apache/flink/client/program/rest/MonitoringApiMessageHeaders.java:
##
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.client.program.rest;
+
+import org.apache.flink.runtime.rest.messages.MessageHeaders;
+import org.apache.flink.runtime.rest.messages.MessageParameters;
+import org.apache.flink.runtime.rest.messages.RequestBody;
+import org.apache.flink.runtime.rest.messages.ResponseBody;
+import org.apache.flink.runtime.rest.versioning.RestAPIVersion;
+import org.apache.flink.runtime.rest.versioning.RuntimeRestAPIVersion;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.stream.Collectors;
+
+/**
+ * This class links {@link RequestBody}s to {@link ResponseBody}s types and 
contains meta-data
+ * required for their http headers in runtime module.
+ *
+ * Implementations must be state-less.
+ *
+ * @param  request message type
+ * @param  response message type
+ * @param  message parameters type
+ */
+public interface MonitoringApiMessageHeaders<

Review Comment:
   Why is this class required?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] 1996fanrui commented on a diff in pull request #22985: [FLINK-21883][scheduler] Implement cooldown period for adaptive scheduler

2023-09-11 Thread via GitHub


1996fanrui commented on code in PR #22985:
URL: https://github.com/apache/flink/pull/22985#discussion_r1321654007


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/Executing.java:
##
@@ -124,17 +154,33 @@ private void handleDeploymentFailure(ExecutionVertex 
executionVertex, JobExcepti
 
 @Override
 public void onNewResourcesAvailable() {
-maybeRescale();
+rescaleWhenCooldownPeriodIsOver();
 }
 
 @Override
 public void onNewResourceRequirements() {
-maybeRescale();
+rescaleWhenCooldownPeriodIsOver();
 }
 
 private void maybeRescale() {
-if (context.shouldRescale(getExecutionGraph())) {
-getLogger().info("Can change the parallelism of job. Restarting 
job.");
+final Duration timeSinceLastRescale = timeSinceLastRescale();
+rescaleScheduled = false;
+final boolean shouldForceRescale =
+(scalingIntervalMax != null)
+&& (timeSinceLastRescale.compareTo(scalingIntervalMax) 
> 0)
+&& (lastRescale != Instant.EPOCH); // initial rescale 
is not forced
+if (shouldForceRescale || context.shouldRescale(getExecutionGraph())) {
+if (shouldForceRescale) {
+getLogger()
+.info(
+"Time since last rescale ({}) >  {} ({}). 
Force-changing the parallelism of the job. Restarting the job.",
+timeSinceLastRescale,
+
JobManagerOptions.SCHEDULER_SCALING_INTERVAL_MAX.key(),
+scalingIntervalMax);
+} else {
+getLogger().info("Can change the parallelism of the job. 
Restarting the job.");
+}
+lastRescale = Instant.now();
 context.goToRestarting(
 getExecutionGraph(),

Review Comment:
   >> Do you mean we force trigger a rescale when new resources comes and the 
current time > lastRescaleTime + scalingIntervalMax?
   >
   > yes, that is what is what the community agreed on in the FLIP.
   
   I think this strategy just solve one case: the last resource comes after 
scalingIntervalMax. The job can rescale directly.
   
   However, it cannot solve the case: the last resource comes before 
scalingIntervalMax. Assuming the scalingIntervalMax is 5 miuntes, the 
`jobmanager.adaptive-scheduler.min-parallelism-increase` is 2, and the expected 
paralleslim is 100. 
   
   - The job has 99 TMs at 09:00:00, it run with parallelism 99.
   - At 09:03:00, the last TM comes. 
   - The `Executing` will ignore the rescale due to the parallelism diff is 1, 
it less than `min-parallelism-increase` and `current time < lastRescaleTime + 
scalingIntervalMax`.
   
   And then the job doesn't need the new resource and the expected parallelism 
isn't change, so the job won't rescale anymore.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] 1996fanrui commented on a diff in pull request #22985: [FLINK-21883][scheduler] Implement cooldown period for adaptive scheduler

2023-09-11 Thread via GitHub


1996fanrui commented on code in PR #22985:
URL: https://github.com/apache/flink/pull/22985#discussion_r1321641317


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/Executing.java:
##
@@ -124,17 +154,33 @@ private void handleDeploymentFailure(ExecutionVertex 
executionVertex, JobExcepti
 
 @Override
 public void onNewResourcesAvailable() {
-maybeRescale();
+rescaleWhenCooldownPeriodIsOver();
 }
 
 @Override
 public void onNewResourceRequirements() {
-maybeRescale();
+rescaleWhenCooldownPeriodIsOver();
 }
 
 private void maybeRescale() {
-if (context.shouldRescale(getExecutionGraph())) {
-getLogger().info("Can change the parallelism of job. Restarting 
job.");
+final Duration timeSinceLastRescale = timeSinceLastRescale();
+rescaleScheduled = false;
+final boolean shouldForceRescale =
+(scalingIntervalMax != null)
+&& (timeSinceLastRescale.compareTo(scalingIntervalMax) 
> 0)
+&& (lastRescale != Instant.EPOCH); // initial rescale 
is not forced
+if (shouldForceRescale || context.shouldRescale(getExecutionGraph())) {
+if (shouldForceRescale) {
+getLogger()
+.info(
+"Time since last rescale ({}) >  {} ({}). 
Force-changing the parallelism of the job. Restarting the job.",
+timeSinceLastRescale,
+
JobManagerOptions.SCHEDULER_SCALING_INTERVAL_MAX.key(),
+scalingIntervalMax);
+} else {
+getLogger().info("Can change the parallelism of the job. 
Restarting the job.");
+}
+lastRescale = Instant.now();
 context.goToRestarting(
 getExecutionGraph(),

Review Comment:
   > > If so, I think the logic related to scalingIntervalMax should be moved 
into maybeRescale.
   > 
   > This logic is already in maybeRescale() method
   
   Sorry, I have a typo here. I mean the logic related to scalingIntervalMax 
should be moved out of the maybeRescale. 
   
   `onNewResourceRequirements` also call the `rescaleWhenCooldownPeriodIsOver` 
when user changed the parallelism, maybeRescale shouldn't force trigger a 
rescale in this case even if the `current time > lastRescaleTime + 
scalingIntervalMax`.
   
   Because after requirement is adjusted, resources cannot be ready 
immediately. The rescale is meaningless at this time even if the `current time 
> lastRescaleTime + scalingIntervalMax`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] echauchot commented on a diff in pull request #22985: [FLINK-21883][scheduler] Implement cooldown period for adaptive scheduler

2023-09-11 Thread via GitHub


echauchot commented on code in PR #22985:
URL: https://github.com/apache/flink/pull/22985#discussion_r1321602343


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/Executing.java:
##
@@ -124,17 +154,33 @@ private void handleDeploymentFailure(ExecutionVertex 
executionVertex, JobExcepti
 
 @Override
 public void onNewResourcesAvailable() {
-maybeRescale();
+rescaleWhenCooldownPeriodIsOver();
 }
 
 @Override
 public void onNewResourceRequirements() {
-maybeRescale();
+rescaleWhenCooldownPeriodIsOver();
 }
 
 private void maybeRescale() {
-if (context.shouldRescale(getExecutionGraph())) {
-getLogger().info("Can change the parallelism of job. Restarting 
job.");
+final Duration timeSinceLastRescale = timeSinceLastRescale();
+rescaleScheduled = false;
+final boolean shouldForceRescale =
+(scalingIntervalMax != null)
+&& (timeSinceLastRescale.compareTo(scalingIntervalMax) 
> 0)
+&& (lastRescale != Instant.EPOCH); // initial rescale 
is not forced
+if (shouldForceRescale || context.shouldRescale(getExecutionGraph())) {
+if (shouldForceRescale) {
+getLogger()
+.info(
+"Time since last rescale ({}) >  {} ({}). 
Force-changing the parallelism of the job. Restarting the job.",
+timeSinceLastRescale,
+
JobManagerOptions.SCHEDULER_SCALING_INTERVAL_MAX.key(),
+scalingIntervalMax);
+} else {
+getLogger().info("Can change the parallelism of the job. 
Restarting the job.");
+}
+lastRescale = Instant.now();
 context.goToRestarting(
 getExecutionGraph(),

Review Comment:
   > Do you mean we force trigger a rescale when new resources comes and the 
current time > lastRescaleTime + scalingIntervalMax?
   
   yes, that is what is what the community agreed on in the FLIP.
   
   > If so, I think the logic related to scalingIntervalMax should be moved 
into maybeRescale.
   
   This logic is already in maybeRescale() method
   
   > onNewResourceRequirements also call the rescaleWhenCooldownPeriodIsOver, 
when user changed the parallelism; maybeRescale will force trigger a rescale as 
well. right?
   
   yes `onNewResourceRequirements` is ultimately called by a rest endpoint to 
react to use action on the web ui so it will call maybeRescale in the end but 
that would not always mean a rescale: if timeSinceLastRescale < 
scalingIntervalMax and minimal resources are not met then there will be no 
rescale.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >