[jira] [Commented] (FLINK-34588) FineGrainedSlotManager checks whether resources need to reconcile but doesn't act on the result

2024-03-11 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825261#comment-17825261
 ] 

Matthias Pohl commented on FLINK-34588:
---

Ok, thanks for clarification. I might add this information as comments to my 
FLINK-34427 PR. (y)

> FineGrainedSlotManager checks whether resources need to reconcile but doesn't 
> act on the result
> ---
>
> Key: FLINK-34588
> URL: https://issues.apache.org/jira/browse/FLINK-34588
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>
> There are a few locations in {{FineGrainedSlotManager}} where we check 
> whether resources can/need to be reconciled but don't care about the result 
> and just trigger the resource update (e.g. in 
> [FineGrainedSlotManager:626|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java#L626]
>  and 
> [FineGrainedSlotManager:682|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java#L682]).
>  Looks like we could reduce the calls to the backend here.
> It's not having a major impact because this feature is only used in the 
> {{ActiveResourceManager}} which triggers 
> [checkResourceDeclarations|https://github.com/apache/flink/blob/c678244a3890273145a786b9e1bf1a4f96f6dcfd/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/active/ActiveResourceManager.java#L331]
>  and reevaluates the {{resourceDeclarations}}. Not sure whether I missed 
> something here and there's actually a bigger issue with it. But considering 
> that nobody complained about it in the past, I'd assume that it's not a 
> severe issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34588) FineGrainedSlotManager checks whether resources need to reconcile but doesn't act on the result

2024-03-07 Thread Weihua Hu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824616#comment-17824616
 ] 

Weihua Hu commented on FLINK-34588:
---

Thanks [~mapohl]  reporting this. 

At the first time. the function `checkResourcesNeedReconcile` is called 
`checkTaskManagerReleasable`, it is only responsible for release idle task 
managers. So we only care the result of `checkTaskManagerReleasable` in release 
path([Line 
816|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java#L816]).
 

In [FLINK-32880|https://issues.apache.org/jira/browse/FLINK-32880], we change 
it to `checkResourcesNeedReconcile` and let it check whether need to allocate 
redundant task manager.


There are two functions to allocate/release task managers now. 

`checkResourcesNeedReconcile`: allocate redundant task manager and release idle 
task manager

`checkResourceRequirements`: allocate task manager for job requirement

So, in periodic check of `checkClusterReconciliation`, we take the result of 
`checkResourcesNeedReconcile` in account because we don't try to fulfill the 
job requirement here. In other place we ignore the result of 
`checkResourcesNeedReconcile` because `checkResourceRequirements` may also 
allocate/release taskmanagers.

 

 

> FineGrainedSlotManager checks whether resources need to reconcile but doesn't 
> act on the result
> ---
>
> Key: FLINK-34588
> URL: https://issues.apache.org/jira/browse/FLINK-34588
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>
> There are a few locations in {{FineGrainedSlotManager}} where we check 
> whether resources can/need to be reconciled but don't care about the result 
> and just trigger the resource update (e.g. in 
> [FineGrainedSlotManager:626|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java#L626]
>  and 
> [FineGrainedSlotManager:682|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java#L682]).
>  Looks like we could reduce the calls to the backend here.
> It's not having a major impact because this feature is only used in the 
> {{ActiveResourceManager}} which triggers 
> [checkResourceDeclarations|https://github.com/apache/flink/blob/c678244a3890273145a786b9e1bf1a4f96f6dcfd/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/active/ActiveResourceManager.java#L331]
>  and reevaluates the {{resourceDeclarations}}. Not sure whether I missed 
> something here and there's actually a bigger issue with it. But considering 
> that nobody complained about it in the past, I'd assume that it's not a 
> severe issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34588) FineGrainedSlotManager checks whether resources need to reconcile but doesn't act on the result

2024-03-06 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824278#comment-17824278
 ] 

Matthias Pohl commented on FLINK-34588:
---

Sorry for that. I updated the links. They should work now. For the record: This 
was also just observed in a code review. I'm not aware of any actual issues 
that arise from this.

> FineGrainedSlotManager checks whether resources need to reconcile but doesn't 
> act on the result
> ---
>
> Key: FLINK-34588
> URL: https://issues.apache.org/jira/browse/FLINK-34588
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>
> There are a few locations in {{FineGrainedSlotManager}} where we check 
> whether resources can/need to be reconciled but don't care about the result 
> and just trigger the resource update (e.g. in 
> [FineGrainedSlotManager:626|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java#L626]
>  and 
> [FineGrainedSlotManager:682|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java#L682]).
>  Looks like we could reduce the calls to the backend here.
> It's not having a major impact because this feature is only used in the 
> {{ActiveResourceManager}} which triggers 
> [checkResourceDeclarations|https://github.com/apache/flink/blob/c678244a3890273145a786b9e1bf1a4f96f6dcfd/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/active/ActiveResourceManager.java#L331]
>  and reevaluates the {{resourceDeclarations}}. Not sure whether I missed 
> something here and there's actually a bigger issue with it. But considering 
> that nobody complained about it in the past, I'd assume that it's not a 
> severe issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34588) FineGrainedSlotManager checks whether resources need to reconcile but doesn't act on the result

2024-03-06 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824082#comment-17824082
 ] 

Gyula Fora commented on FLINK-34588:


The links in the description don't seem to work :/ 

> FineGrainedSlotManager checks whether resources need to reconcile but doesn't 
> act on the result
> ---
>
> Key: FLINK-34588
> URL: https://issues.apache.org/jira/browse/FLINK-34588
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>
> There are a few locations in {{FineGrainedSlotManager}} where we check 
> whether resources can/need to be reconciled but don't care about the result 
> and just trigger the resource update (e.g. in 
> [FineGrainedSlotManager:620|https://github.com/apache/flink/blob/c0d3e495f4c2316a80f251de77b05b943b5be1f8/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java#L620]
>  and 
> [FineGrainedSlotManager:676|https://github.com/apache/flink/blob/c0d3e495f4c2316a80f251de77b05b943b5be1f8/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java#L676]).
>  Looks like we could reduce the calls to the backend here.
> It's not having a major impact because this feature is only used in the 
> {{ActiveResourceManager}} which triggers 
> [checkResourceDeclarations|https://github.com/apache/flink/blob/c678244a3890273145a786b9e1bf1a4f96f6dcfd/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/active/ActiveResourceManager.java#L331]
>  and reevaluates the {{resourceDeclarations}}. Not sure whether I missed 
> something here and there's actually a bigger issue with it. But considering 
> that nobody complained about it in the past, I'd assume that it's not a 
> severe issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34588) FineGrainedSlotManager checks whether resources need to reconcile but doesn't act on the result

2024-03-06 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824061#comment-17824061
 ] 

Matthias Pohl commented on FLINK-34588:
---

cc [~huwh]

> FineGrainedSlotManager checks whether resources need to reconcile but doesn't 
> act on the result
> ---
>
> Key: FLINK-34588
> URL: https://issues.apache.org/jira/browse/FLINK-34588
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>
> There are a few locations in {{FineGrainedSlotManager}} where we check 
> whether resources can/need to be reconciled but don't care about the result 
> and just trigger the resource update (e.g. in 
> [FineGrainedSlotManager:620|https://github.com/apache/flink/blob/c0d3e495f4c2316a80f251de77b05b943b5be1f8/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java#L620]
>  and 
> [FineGrainedSlotManager:676|https://github.com/apache/flink/blob/c0d3e495f4c2316a80f251de77b05b943b5be1f8/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java#L676]).
>  Looks like we could reduce the calls to the backend here.
> It's not having a major impact because this feature is only used in the 
> {{ActiveResourceManager}} which triggers 
> [checkResourceDeclarations|https://github.com/apache/flink/blob/c678244a3890273145a786b9e1bf1a4f96f6dcfd/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/active/ActiveResourceManager.java#L331]
>  and reevaluates the {{resourceDeclarations}}. Not sure whether I missed 
> something here and there's actually a bigger issue with it. But considering 
> that nobody complained about it in the past, I'd assume that it's not a 
> severe issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)