[jira] [Updated] (FLINK-34566) Flink Kubernetes Operator reconciliation parallelism setting not work

2024-03-08 Thread Gyula Fora (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora updated FLINK-34566:
---
Fix Version/s: kubernetes-operator-1.8.0

> Flink Kubernetes Operator reconciliation parallelism setting not work
> -
>
> Key: FLINK-34566
> URL: https://issues.apache.org/jira/browse/FLINK-34566
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.7.0
>Reporter: Fei Feng
>Assignee: Fei Feng
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.8.0
>
> Attachments: image-2024-03-04-10-58-37-679.png, 
> image-2024-03-04-11-17-22-877.png, image-2024-03-04-11-31-44-451.png
>
>
> After we upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , 
> we can not enlarge reconciliation parallelism , and the maximum 
> reconciliation parallelism was only 10. This results FlinkDeployment and 
> SessionJob 's reconciliation delay about 10-30 seconds when we have a large 
> scale flink session cluster and session jobs in k8s cluster。
>  
> After investigating and validating, I found the reason is the logic for 
> reconciliation thread pool creation in JOSDK has changed significantly 
> between this two version. 
> v4.3.0: 
> reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
> was same as corePoolSize), so we pass the reconciliation thread and get a 
> thread pool that matches our expectations.
> !image-2024-03-04-10-58-37-679.png|width=497,height=91!
> [https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]
>  
> but in v4.2.0:
> the reconciliation thread pool was created as a customer executor which we 
> can pass corePoolSize and maximumPoolSize to create this thread pool.The 
> problem is that we only set the maximumPoolSize of the thread pool, while, 
> the corePoolSize of the thread pool is defaulted to 10. This causes thread 
> pool size was only 10 and majority of events would be placed in the workQueue 
> for a while.  
> !image-2024-03-04-11-17-22-877.png|width=569,height=112!
> [https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37]
>  
> the solution is also simple, we can create and pass thread pool in flink 
> kubernetes operator so that we can control the reconciliation thread pool 
> directly, such as:
> !image-2024-03-04-11-31-44-451.png|width=483,height=98!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34566) Flink Kubernetes Operator reconciliation parallelism setting not work

2024-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-34566:
---
Labels: pull-request-available  (was: )

> Flink Kubernetes Operator reconciliation parallelism setting not work
> -
>
> Key: FLINK-34566
> URL: https://issues.apache.org/jira/browse/FLINK-34566
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.7.0
>Reporter: Fei Feng
>Assignee: Fei Feng
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: image-2024-03-04-10-58-37-679.png, 
> image-2024-03-04-11-17-22-877.png, image-2024-03-04-11-31-44-451.png
>
>
> After we upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , 
> we can not enlarge reconciliation parallelism , and the maximum 
> reconciliation parallelism was only 10. This results FlinkDeployment and 
> SessionJob 's reconciliation delay about 10-30 seconds when we have a large 
> scale flink session cluster and session jobs in k8s cluster。
>  
> After investigating and validating, I found the reason is the logic for 
> reconciliation thread pool creation in JOSDK has changed significantly 
> between this two version. 
> v4.3.0: 
> reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
> was same as corePoolSize), so we pass the reconciliation thread and get a 
> thread pool that matches our expectations.
> !image-2024-03-04-10-58-37-679.png|width=497,height=91!
> [https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]
>  
> but in v4.2.0:
> the reconciliation thread pool was created as a customer executor which we 
> can pass corePoolSize and maximumPoolSize to create this thread pool.The 
> problem is that we only set the maximumPoolSize of the thread pool, while, 
> the corePoolSize of the thread pool is defaulted to 10. This causes thread 
> pool size was only 10 and majority of events would be placed in the workQueue 
> for a while.  
> !image-2024-03-04-11-17-22-877.png|width=569,height=112!
> [https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37]
>  
> the solution is also simple, we can create and pass thread pool in flink 
> kubernetes operator so that we can control the reconciliation thread pool 
> directly, such as:
> !image-2024-03-04-11-31-44-451.png|width=483,height=98!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34566) Flink Kubernetes Operator reconciliation parallelism setting not work

2024-03-03 Thread Fei Feng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Feng updated FLINK-34566:
-
Description: 
After we upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , we 
can not enlarge reconciliation parallelism , and the maximum reconciliation 
parallelism was only 10. This results FlinkDeployment and SessionJob 's 
reconciliation delay about 10-30 seconds when we have a large scale flink 
session cluster and session jobs in k8s cluster。
 

After investigating and validating, I found the reason is the logic for 
reconciliation thread pool creation in JOSDK has changed significantly between 
this two version. 

v4.3.0: 
reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
was same as corePoolSize), so we pass the reconciliation thread and get a 
thread pool that matches our expectations.

!image-2024-03-04-10-58-37-679.png|width=497,height=91!

[https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]

 

but in v4.2.0:

the reconciliation thread pool was created as a customer executor which we can 
pass corePoolSize and maximumPoolSize to create this thread pool.The problem is 
that we only set the maximumPoolSize of the thread pool, while, the 
corePoolSize of the thread pool is defaulted to 10. This causes thread pool 
size was only 10 and majority of events would be placed in the workQueue for a 
while.  

!image-2024-03-04-11-17-22-877.png|width=569,height=112!

[https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37]

 

the solution is also simple, we can create and pass thread pool in flink 
kubernetes operator so that we can control the reconciliation thread pool 
directly, such as:

!image-2024-03-04-11-31-44-451.png|width=483,height=98!

  was:
After upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , we can 
not enlarge reconciliation parallelism , and the maximum reconciliation 
parallelism was 10. This results FlinkDeployment and SessionJob 's 
reconciliation delay about 10-20 seconds where we have a large scale  flink 
session cluster and flink jobs。
 

After investigating and validating, I found the reason is the logic for 
reconciliation thread pool creation in JOSDK has changed significantly between 
this two version. 

v4.3.0: 
reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
was same as corePoolSize), so we pass the reconciliation thread and get a 
thread pool that matches our expectations.

!image-2024-03-04-10-58-37-679.png|width=497,height=91!

[https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]

 

but in v4.2.0:

the reconciliation thread pool was created as a customer executor which we can 
pass corePoolSize and maximumPoolSize to create this thread pool.The problem is 
that we only set the maximumPoolSize of the thread pool, while, the 
corePoolSize of the thread pool is defaulted to 10. This causes thread pool 
size was only 10 and majority of events would be placed in the workQueue for a 
while.  

!image-2024-03-04-11-17-22-877.png|width=569,height=112!

[https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37]

 

the solution is also simple, we can create and pass thread pool in flink 
kubernetes operator so that we can control the reconciliation thread pool 
directly, such as:

!image-2024-03-04-11-31-44-451.png|width=483,height=98!


> Flink Kubernetes Operator reconciliation parallelism setting not work
> -
>
> Key: FLINK-34566
> URL: https://issues.apache.org/jira/browse/FLINK-34566
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.7.0
>Reporter: Fei Feng
>Priority: Blocker
> Attachments: image-2024-03-04-10-58-37-679.png, 
> image-2024-03-04-11-17-22-877.png, image-2024-03-04-11-31-44-451.png
>
>
> After we upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , 
> we can not enlarge reconciliation parallelism , and the maximum 
> reconciliation parallelism was only 10. This results FlinkDeployment and 
> SessionJob 's reconciliation delay about 10-30 seconds when we have a large 
> scale flink session cluster and session jobs in k8s cluster。
>  
> After investigating and validating, I found the reason is the logic for 
> reconciliation thread pool creat

[jira] [Updated] (FLINK-34566) Flink Kubernetes Operator reconciliation parallelism setting not work

2024-03-03 Thread Fei Feng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Feng updated FLINK-34566:
-
Attachment: (was: image-2024-03-04-11-30-53-118.png)

> Flink Kubernetes Operator reconciliation parallelism setting not work
> -
>
> Key: FLINK-34566
> URL: https://issues.apache.org/jira/browse/FLINK-34566
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.7.0
>Reporter: Fei Feng
>Priority: Blocker
> Attachments: image-2024-03-04-10-58-37-679.png, 
> image-2024-03-04-11-17-22-877.png, image-2024-03-04-11-31-44-451.png
>
>
> After upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , we 
> can not enlarge reconciliation parallelism , and the maximum reconciliation 
> parallelism was 10. This results FlinkDeployment and SessionJob 's 
> reconciliation delay about 10-20 seconds where we have a large scale  flink 
> session cluster and flink jobs。
>  
> After investigating and validating, I found the reason is the logic for 
> reconciliation thread pool creation in JOSDK has changed significantly 
> between this two version. 
> v4.3.0: 
> reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
> was same as corePoolSize), so we pass the reconciliation thread and get a 
> thread pool that matches our expectations.
> !image-2024-03-04-10-58-37-679.png|width=497,height=91!
> [https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]
>  
> but in v4.2.0:
> the reconciliation thread pool was created as a customer executor which we 
> can pass corePoolSize and maximumPoolSize to create this thread pool.The 
> problem is that we only set the maximumPoolSize of the thread pool, while, 
> the corePoolSize of the thread pool is defaulted to 10. This causes thread 
> pool size was only 10 and majority of events would be placed in the workQueue 
> for a while.  
> !image-2024-03-04-11-17-22-877.png|width=569,height=112!
> [https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37]
>  
> the solution is also simple, we can create and pass thread pool in flink 
> kubernetes operator so that we can control the reconciliation thread pool 
> directly, such as:
> !image-2024-03-04-11-31-44-451.png|width=483,height=98!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34566) Flink Kubernetes Operator reconciliation parallelism setting not work

2024-03-03 Thread Fei Feng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Feng updated FLINK-34566:
-
Description: 
After upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , we can 
not enlarge reconciliation parallelism , and the maximum reconciliation 
parallelism was 10. This results FlinkDeployment and SessionJob 's 
reconciliation delay about 10-20 seconds where we have a large scale  flink 
session cluster and flink jobs。
 

After investigating and validating, I found the reason is the logic for 
reconciliation thread pool creation in JOSDK has changed significantly between 
this two version. 

v4.3.0: 
reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
was same as corePoolSize), so we pass the reconciliation thread and get a 
thread pool that matches our expectations.

!image-2024-03-04-10-58-37-679.png|width=497,height=91!

[https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]

 

but in v4.2.0:

the reconciliation thread pool was created as a customer executor which we can 
pass corePoolSize and maximumPoolSize to create this thread pool.The problem is 
that we only set the maximumPoolSize of the thread pool, while, the 
corePoolSize of the thread pool is defaulted to 10. This causes thread pool 
size was only 10 and majority of events would be placed in the workQueue for a 
while.  

!image-2024-03-04-11-17-22-877.png|width=569,height=112!

[https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37]

 

the solution is also simple, we can create and pass thread pool in flink 
kubernetes operator so that we can control the reconciliation thread pool 
directly, such as:

!image-2024-03-04-11-31-44-451.png|width=483,height=98!

  was:
After upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , we can 
not enlarge reconciliation parallelism , and the maximum reconciliation 
parallelism was 10. This results FlinkDeployment and SessionJob 's 
reconciliation delay about 10-20 seconds where we have a large scale  flink 
session cluster and flink jobs。
 

After investigating and validating, I found the reason is the logic for 
reconciliation thread pool creation in JOSDK has changed significantly between 
this two version. 

v4.3.0: 
reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
was same as corePoolSize), so we pass the reconciliation thread and get a 
thread pool that matches our expectations.

!image-2024-03-04-10-58-37-679.png|width=628,height=115!

[https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]

 

but in v4.2.0:

the reconciliation thread pool was created as a customer executor which we can 
pass corePoolSize and maximumPoolSize to create this thread pool.The problem is 
that we only set the maximumPoolSize of the thread pool, while, the 
corePoolSize of the thread pool is defaulted to 10. This causes thread pool 
size was only 10 and majority of events would be placed in the workQueue for a 
while.  

!image-2024-03-04-11-17-22-877.png|width=594,height=117!

[https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37]

 

 

!image-2024-03-04-11-31-44-451.png!


> Flink Kubernetes Operator reconciliation parallelism setting not work
> -
>
> Key: FLINK-34566
> URL: https://issues.apache.org/jira/browse/FLINK-34566
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.7.0
>Reporter: Fei Feng
>Priority: Blocker
> Attachments: image-2024-03-04-10-58-37-679.png, 
> image-2024-03-04-11-17-22-877.png, image-2024-03-04-11-30-53-118.png, 
> image-2024-03-04-11-31-44-451.png
>
>
> After upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , we 
> can not enlarge reconciliation parallelism , and the maximum reconciliation 
> parallelism was 10. This results FlinkDeployment and SessionJob 's 
> reconciliation delay about 10-20 seconds where we have a large scale  flink 
> session cluster and flink jobs。
>  
> After investigating and validating, I found the reason is the logic for 
> reconciliation thread pool creation in JOSDK has changed significantly 
> between this two version. 
> v4.3.0: 
> reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
> was same as corePoolSize), so 

[jira] [Updated] (FLINK-34566) Flink Kubernetes Operator reconciliation parallelism setting not work

2024-03-03 Thread Fei Feng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Feng updated FLINK-34566:
-
Description: 
After upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , we can 
not enlarge reconciliation parallelism , and the maximum reconciliation 
parallelism was 10. This results FlinkDeployment and SessionJob 's 
reconciliation delay about 10-20 seconds where we have a large scale  flink 
session cluster and flink jobs。
 

After investigating and validating, I found the reason is the logic for 
reconciliation thread pool creation in JOSDK has changed significantly between 
this two version. 

v4.3.0: 
reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
was same as corePoolSize), so we pass the reconciliation thread and get a 
thread pool that matches our expectations.

!image-2024-03-04-10-58-37-679.png|width=628,height=115!

[https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]

 

but in v4.2.0:

the reconciliation thread pool was created as a customer executor which we can 
pass corePoolSize and maximumPoolSize to create this thread pool.The problem is 
that we only set the maximumPoolSize of the thread pool, while, the 
corePoolSize of the thread pool is defaulted to 10. This causes thread pool 
size was only 10 and majority of events would be placed in the workQueue for a 
while.  

!image-2024-03-04-11-17-22-877.png|width=594,height=117!

[https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37]

 

 

!image-2024-03-04-11-31-44-451.png!

  was:
After upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , we can 
not enlarge reconciliation parallelism , and the maximum reconciliation 
parallelism was 10. This results FlinkDeployment and SessionJob 's 
reconciliation delay about 10-20 seconds where we have a large scale  flink 
session cluster and flink jobs。
 

After investigating and validating, I found the reason is the logic for 
reconciliation thread pool creation in JOSDK has changed significantly between 
this two version. 

v4.3.0: 
reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
was same as corePoolSize), so we pass the reconciliation thread and get a 
thread pool that matches our expectations.

!image-2024-03-04-10-58-37-679.png|width=628,height=115!

[https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]

 

but in v4.2.0:

the reconciliation thread pool was created as a customer executor which we can 
pass corePoolSize and maximumPoolSize to create this thread pool.The problem is 
that we only set the maximumPoolSize of the thread pool, while, the 
corePoolSize of the thread pool is defaulted to 10. This causes thread pool 
size was only 10 and majority of events would be placed in the workQueue for a 
while.  

!image-2024-03-04-11-17-22-877.png|width=594,height=117!

[https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37]

 

solution was quite simple: we define a executor directly in flink kubernetes 
operator to control the thread pool creation,such as 

!image-2024-03-04-11-30-53-118.png!

 


> Flink Kubernetes Operator reconciliation parallelism setting not work
> -
>
> Key: FLINK-34566
> URL: https://issues.apache.org/jira/browse/FLINK-34566
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.7.0
>Reporter: Fei Feng
>Priority: Blocker
> Attachments: image-2024-03-04-10-58-37-679.png, 
> image-2024-03-04-11-17-22-877.png, image-2024-03-04-11-30-53-118.png, 
> image-2024-03-04-11-31-44-451.png
>
>
> After upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , we 
> can not enlarge reconciliation parallelism , and the maximum reconciliation 
> parallelism was 10. This results FlinkDeployment and SessionJob 's 
> reconciliation delay about 10-20 seconds where we have a large scale  flink 
> session cluster and flink jobs。
>  
> After investigating and validating, I found the reason is the logic for 
> reconciliation thread pool creation in JOSDK has changed significantly 
> between this two version. 
> v4.3.0: 
> reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
> was same as corePoolSize), so we pass the reconciliation thread and get a 
> threa

[jira] [Updated] (FLINK-34566) Flink Kubernetes Operator reconciliation parallelism setting not work

2024-03-03 Thread Fei Feng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Feng updated FLINK-34566:
-
Attachment: image-2024-03-04-11-31-44-451.png

> Flink Kubernetes Operator reconciliation parallelism setting not work
> -
>
> Key: FLINK-34566
> URL: https://issues.apache.org/jira/browse/FLINK-34566
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.7.0
>Reporter: Fei Feng
>Priority: Blocker
> Attachments: image-2024-03-04-10-58-37-679.png, 
> image-2024-03-04-11-17-22-877.png, image-2024-03-04-11-30-53-118.png, 
> image-2024-03-04-11-31-44-451.png
>
>
> After upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , we 
> can not enlarge reconciliation parallelism , and the maximum reconciliation 
> parallelism was 10. This results FlinkDeployment and SessionJob 's 
> reconciliation delay about 10-20 seconds where we have a large scale  flink 
> session cluster and flink jobs。
>  
> After investigating and validating, I found the reason is the logic for 
> reconciliation thread pool creation in JOSDK has changed significantly 
> between this two version. 
> v4.3.0: 
> reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
> was same as corePoolSize), so we pass the reconciliation thread and get a 
> thread pool that matches our expectations.
> !image-2024-03-04-10-58-37-679.png|width=628,height=115!
> [https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]
>  
> but in v4.2.0:
> the reconciliation thread pool was created as a customer executor which we 
> can pass corePoolSize and maximumPoolSize to create this thread pool.The 
> problem is that we only set the maximumPoolSize of the thread pool, while, 
> the corePoolSize of the thread pool is defaulted to 10. This causes thread 
> pool size was only 10 and majority of events would be placed in the workQueue 
> for a while.  
> !image-2024-03-04-11-17-22-877.png|width=594,height=117!
> [https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37]
>  
> solution was quite simple: we define a executor directly in flink kubernetes 
> operator to control the thread pool creation,such as 
> !image-2024-03-04-11-30-53-118.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34566) Flink Kubernetes Operator reconciliation parallelism setting not work

2024-03-03 Thread Fei Feng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Feng updated FLINK-34566:
-
Description: 
After upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , we can 
not enlarge reconciliation parallelism , and the maximum reconciliation 
parallelism was 10. This results FlinkDeployment and SessionJob 's 
reconciliation delay about 10-20 seconds where we have a large scale  flink 
session cluster and flink jobs。
 

After investigating and validating, I found the reason is the logic for 
reconciliation thread pool creation in JOSDK has changed significantly between 
this two version. 

v4.3.0: 
reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
was same as corePoolSize), so we pass the reconciliation thread and get a 
thread pool that matches our expectations.

!image-2024-03-04-10-58-37-679.png|width=628,height=115!

[https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]

 

but in v4.2.0:

the reconciliation thread pool was created as a customer executor which we can 
pass corePoolSize and maximumPoolSize to create this thread pool.The problem is 
that we only set the maximumPoolSize of the thread pool, while, the 
corePoolSize of the thread pool is defaulted to 10. This causes thread pool 
size was only 10 and majority of events would be placed in the workQueue for a 
while.  

!image-2024-03-04-11-17-22-877.png|width=594,height=117!

[https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37]

 

solution was quite simple: we define a executor directly in flink kubernetes 
operator to control the thread pool creation,such as 

!image-2024-03-04-11-30-53-118.png!

 

  was:
After upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , we can 
not enlarge reconciliation parallelism , and the maximum reconciliation 
parallelism was 10. This results FlinkDeployment and SessionJob 's 
reconciliation delay about 10-20 seconds where we have a large scale  flink 
session cluster and flink jobs。
 

After investigating and validating, I found the reason is the logic for 
reconciliation thread pool creation in JOSDK has changed significantly between 
this two version. 

v4.3.0: 
reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
was same as corePoolSize), so we pass the reconciliation thread and get a 
thread pool that matches our expectations.


!image-2024-03-04-10-58-37-679.png|width=628,height=115!

[https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]

 

but in v4.2.0:

the reconciliation thread pool was created as a customer executor which we can 
pass corePoolSize and maximumPoolSize to create this thread pool.The problem is 
that we only set the maximumPoolSize of the thread pool, while, the 
corePoolSize of the thread pool is defaulted to 10. This causes thread pool 
size was only 10 and majority of events would be placed in the workQueue for a 
while.  

!image-2024-03-04-11-17-22-877.png|width=594,height=117!

https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37

 


> Flink Kubernetes Operator reconciliation parallelism setting not work
> -
>
> Key: FLINK-34566
> URL: https://issues.apache.org/jira/browse/FLINK-34566
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.7.0
>Reporter: Fei Feng
>Priority: Blocker
> Attachments: image-2024-03-04-10-58-37-679.png, 
> image-2024-03-04-11-17-22-877.png, image-2024-03-04-11-30-53-118.png, 
> image-2024-03-04-11-31-44-451.png
>
>
> After upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , we 
> can not enlarge reconciliation parallelism , and the maximum reconciliation 
> parallelism was 10. This results FlinkDeployment and SessionJob 's 
> reconciliation delay about 10-20 seconds where we have a large scale  flink 
> session cluster and flink jobs。
>  
> After investigating and validating, I found the reason is the logic for 
> reconciliation thread pool creation in JOSDK has changed significantly 
> between this two version. 
> v4.3.0: 
> reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
> was same as corePoolSize), so we pass the reconciliation thread and get a 
> thread pool that matches our expectations.
> !

[jira] [Updated] (FLINK-34566) Flink Kubernetes Operator reconciliation parallelism setting not work

2024-03-03 Thread Fei Feng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Feng updated FLINK-34566:
-
Attachment: image-2024-03-04-11-30-53-118.png

> Flink Kubernetes Operator reconciliation parallelism setting not work
> -
>
> Key: FLINK-34566
> URL: https://issues.apache.org/jira/browse/FLINK-34566
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.7.0
>Reporter: Fei Feng
>Priority: Blocker
> Attachments: image-2024-03-04-10-58-37-679.png, 
> image-2024-03-04-11-17-22-877.png, image-2024-03-04-11-30-53-118.png
>
>
> After upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , we 
> can not enlarge reconciliation parallelism , and the maximum reconciliation 
> parallelism was 10. This results FlinkDeployment and SessionJob 's 
> reconciliation delay about 10-20 seconds where we have a large scale  flink 
> session cluster and flink jobs。
>  
> After investigating and validating, I found the reason is the logic for 
> reconciliation thread pool creation in JOSDK has changed significantly 
> between this two version. 
> v4.3.0: 
> reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
> was same as corePoolSize), so we pass the reconciliation thread and get a 
> thread pool that matches our expectations.
> !image-2024-03-04-10-58-37-679.png|width=628,height=115!
> [https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]
>  
> but in v4.2.0:
> the reconciliation thread pool was created as a customer executor which we 
> can pass corePoolSize and maximumPoolSize to create this thread pool.The 
> problem is that we only set the maximumPoolSize of the thread pool, while, 
> the corePoolSize of the thread pool is defaulted to 10. This causes thread 
> pool size was only 10 and majority of events would be placed in the workQueue 
> for a while.  
> !image-2024-03-04-11-17-22-877.png|width=594,height=117!
> https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)