[jira] [Updated] (FLINK-36015) Align rescale parameters

2024-09-19 Thread Zdenek Tison (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zdenek Tison updated FLINK-36015:
-
Release Note: 
FLIP-472 aligns timeout logic in AdaptiveScheduler states. To make alignment 
more clear to users the configuration has also been alignment: 

- Parameter `jobmanager.adaptive-scheduler.resource-wait-timeout` was renamed 
to the `jobmanager.adaptive-scheduler.submission.resource-wait-timeout`
- Parameter `jobmanager.adaptive-scheduler.resource-stabilization-timeout` was 
renamed to the 
`jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.min` was renamed to 
the `jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.max` was replaced 
by the `jobmanager.adaptive-scheduler.executing.resource-stabilization-timeout` 
with a default value of 60s.
- Parameter `jobmanager.adaptive-scheduler.min-parallelism-increase` was 
removed without a replacement.

The following parameters introduced in Flink 2.0 have been also renamed: 

- Parameter `jobmanager.adaptive-scheduler.max-delay-for-scale-trigger` was 
renamed to the `jobmanager.adaptive-scheduler.rescale-trigger.max-delays`
- Parameter `jobmanager.adaptive-scheduler.scale-on-failed-checkpoints-count` 
was renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`

See Flink's configuration documentation for further details.

  was:
FLIP-472 aligns timeout logic in AdaptiveScheduler states. To make alignment 
more clear to users the configuration has also been alignment: 

- Parameter `jobmanager.adaptive-scheduler.resource-wait-timeout` was renamed 
to the `jobmanager.adaptive-scheduler.submission.resource-wait-timeout`
- Parameter `jobmanager.adaptive-scheduler.resource-stabilization-timeout` was 
renamed to the 
`jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.min` was renamed to 
the `jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.max` was replaced 
by the `jobmanager.adaptive-scheduler.executing.resource-stabilization-timeout` 
with a default value of 60s.
- Parameter `jobmanager.adaptive-scheduler.min-parallelism-increase` was 
removed without a replacement.

The following parameters introduced in Flink 2.0 have been also renamed: 

- Parameter `jobmanager.adaptive-scheduler.max-delay-for-scale-trigger` was 
renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`
- Parameter `jobmanager.adaptive-scheduler.scale-on-failed-checkpoints-count` 
was renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`

See Flink's configuration documentation for further details.


> Align rescale parameters
> 
>
> Key: FLINK-36015
> URL: https://issues.apache.org/jira/browse/FLINK-36015
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Zdenek Tison
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0-preview
>
>
> * Parameter 
> [_jobmanager.adaptive-scheduler.resource-wait-timeout_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-wait-timeout]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.submission.resource-wait-timeout
>  * Parameter 
> [_jobmanager.adaptive-scheduler.resource-stabilization-timeout_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-wait-timeout]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout
>  * Parameter 
> {_}j{_}[_obmanager.adaptive-scheduler.scaling-interval.min_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-min]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling
>  * Parameter 
> [_jobmanager.adaptive-scheduler.scaling-interval.max_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max]
>  will be renamed to the 
> {_}jobmanager.adaptive-scheduler{_}{_}.{_}executing.resource-stabilization-timeout
>  with default value 60s. 
>  * Parameter 
> [jobmanager.adaptive-scheduler.min-parallelism-increase|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-min-parallelism-increase]
>  will be removed without a direct replacement. Still, it will be superseded 
> by combining the parameters 
> jobmanager.adaptive-schedule

[jira] [Updated] (FLINK-36015) Align rescale parameters

2024-09-18 Thread Zdenek Tison (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zdenek Tison updated FLINK-36015:
-
Release Note: 
FLIP-472 aligns timeout logic in AdaptiveScheduler states. To make alignment 
more clear to users the configuration has also been alignment: 

- Parameter `jobmanager.adaptive-scheduler.resource-wait-timeout` was renamed 
to the `jobmanager.adaptive-scheduler.submission.resource-wait-timeout`
- Parameter `jobmanager.adaptive-scheduler.resource-stabilization-timeout` was 
renamed to the 
`jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.min` was renamed to 
the `jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.max` was replaced 
by the `jobmanager.adaptive-scheduler.executing.resource-stabilization-timeout` 
with a default value of 60s.
- Parameter `jobmanager.adaptive-scheduler.min-parallelism-increase` was 
removed without a replacement.

The following parameters introduced in Flink 2.0 have been also renamed: 

- Parameter `jobmanager.adaptive-scheduler.max-delay-for-scale-trigger` was 
renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`
- Parameter `jobmanager.adaptive-scheduler.scale-on-failed-checkpoints-count` 
was renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`

See Flink's configuration documentation for further details.

  was:
AdpativeScheduler configuration has been aligned for different 
AdpativeScheduler stages:

- Parameter `jobmanager.adaptive-scheduler.resource-wait-timeout` was renamed 
to the `jobmanager.adaptive-scheduler.submission.resource-wait-timeout`
- Parameter `jobmanager.adaptive-scheduler.resource-stabilization-timeout` was 
renamed to the 
`jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.min` was renamed to 
the `jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.max` was replaced 
by the `jobmanager.adaptive-scheduler.executing.resource-stabilization-timeout` 
with a default value of 60s.
- Parameter `jobmanager.adaptive-scheduler.min-parallelism-increase` was 
removed without a replacement.

The following parameters introduced in Flink 2.0 have been also renamed: 

- Parameter `jobmanager.adaptive-scheduler.max-delay-for-scale-trigger` was 
renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`
- Parameter `jobmanager.adaptive-scheduler.scale-on-failed-checkpoints-count` 
was renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`


> Align rescale parameters
> 
>
> Key: FLINK-36015
> URL: https://issues.apache.org/jira/browse/FLINK-36015
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Zdenek Tison
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0-preview
>
>
> * Parameter 
> [_jobmanager.adaptive-scheduler.resource-wait-timeout_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-wait-timeout]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.submission.resource-wait-timeout
>  * Parameter 
> [_jobmanager.adaptive-scheduler.resource-stabilization-timeout_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-wait-timeout]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout
>  * Parameter 
> {_}j{_}[_obmanager.adaptive-scheduler.scaling-interval.min_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-min]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling
>  * Parameter 
> [_jobmanager.adaptive-scheduler.scaling-interval.max_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max]
>  will be renamed to the 
> {_}jobmanager.adaptive-scheduler{_}{_}.{_}executing.resource-stabilization-timeout
>  with default value 60s. 
>  * Parameter 
> [jobmanager.adaptive-scheduler.min-parallelism-increase|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-min-parallelism-increase]
>  will be removed without a direct replacement. Still, it will be superseded 
> by combining the parameters 
> jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling and 
> {_}jobmanager.adaptive-scheduler{_}{_}.{_}executing.reso

[jira] [Updated] (FLINK-36015) Align rescale parameters

2024-09-18 Thread Zdenek Tison (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zdenek Tison updated FLINK-36015:
-
Release Note: 
AdpativeScheduler configuration has been aligned for different 
AdpativeScheduler stages:

- Parameter `jobmanager.adaptive-scheduler.resource-wait-timeout` was renamed 
to the `jobmanager.adaptive-scheduler.submission.resource-wait-timeout`
- Parameter `jobmanager.adaptive-scheduler.resource-stabilization-timeout` was 
renamed to the 
`jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.min` was renamed to 
the `jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.max` was replaced 
by the `jobmanager.adaptive-scheduler.executing.resource-stabilization-timeout` 
with a default value of 60s.
- Parameter `jobmanager.adaptive-scheduler.min-parallelism-increase` was 
removed without a replacement.

The following parameters introduced in Flink 2.0 have been also renamed: 

- Parameter `jobmanager.adaptive-scheduler.max-delay-for-scale-trigger` was 
renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`
- Parameter `jobmanager.adaptive-scheduler.scale-on-failed-checkpoints-count` 
was renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`

  was:
- Parameter `jobmanager.adaptive-scheduler.resource-wait-timeout` was renamed 
to the `jobmanager.adaptive-scheduler.submission.resource-wait-timeout`
- Parameter `jobmanager.adaptive-scheduler.resource-stabilization-timeout` was 
renamed to the 
`jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.min` was renamed to 
the `jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling`
 - Parameter `jobmanager.adaptive-scheduler.scaling-interval.max` was replaced 
by the `jobmanager.adaptive-scheduler.executing.resource-stabilization-timeout` 
with a default value of 60s.
- Parameter `jobmanager.adaptive-scheduler.min-parallelism-increase` was 
removed without a replacement.

The following parameters introduced in Flink 2.0 have been also renamed: 

- Parameter `jobmanager.adaptive-scheduler.max-delay-for-scale-trigger` was 
renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`
- Parameter `jobmanager.adaptive-scheduler.scale-on-failed-checkpoints-count` 
was renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`


> Align rescale parameters
> 
>
> Key: FLINK-36015
> URL: https://issues.apache.org/jira/browse/FLINK-36015
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Zdenek Tison
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0-preview
>
>
> * Parameter 
> [_jobmanager.adaptive-scheduler.resource-wait-timeout_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-wait-timeout]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.submission.resource-wait-timeout
>  * Parameter 
> [_jobmanager.adaptive-scheduler.resource-stabilization-timeout_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-wait-timeout]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout
>  * Parameter 
> {_}j{_}[_obmanager.adaptive-scheduler.scaling-interval.min_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-min]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling
>  * Parameter 
> [_jobmanager.adaptive-scheduler.scaling-interval.max_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max]
>  will be renamed to the 
> {_}jobmanager.adaptive-scheduler{_}{_}.{_}executing.resource-stabilization-timeout
>  with default value 60s. 
>  * Parameter 
> [jobmanager.adaptive-scheduler.min-parallelism-increase|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-min-parallelism-increase]
>  will be removed without a direct replacement. Still, it will be superseded 
> by combining the parameters 
> jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling and 
> {_}jobmanager.adaptive-scheduler{_}{_}.{_}executing.resource-stabilization-timeout



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-36015) Align rescale parameters

2024-09-18 Thread Zdenek Tison (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zdenek Tison updated FLINK-36015:
-
Fix Version/s: 2.0-preview
 Release Note: 
Parameter `jobmanager.adaptive-scheduler.resource-wait-timeout` was renamed to 
the `jobmanager.adaptive-scheduler.submission.resource-wait-timeout`
- Parameter `jobmanager.adaptive-scheduler.resource-stabilization-timeout` was 
renamed to the 
`jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.min` was renamed to 
the `jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling`
 - Parameter `jobmanager.adaptive-scheduler.scaling-interval.max` was replaced 
by the `jobmanager.adaptive-scheduler.executing.resource-stabilization-timeout` 
with a default value of 60s.
- Parameter `jobmanager.adaptive-scheduler.min-parallelism-increase` was 
removed without a replacement.

The following parameters are new in 2.0: 

- Parameter `jobmanager.adaptive-scheduler.max-delay-for-scale-trigger` was 
renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`
- Parameter `jobmanager.adaptive-scheduler.scale-on-failed-checkpoints-count` 
was renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`

> Align rescale parameters
> 
>
> Key: FLINK-36015
> URL: https://issues.apache.org/jira/browse/FLINK-36015
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Zdenek Tison
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0-preview
>
>
> * Parameter 
> [_jobmanager.adaptive-scheduler.resource-wait-timeout_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-wait-timeout]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.submission.resource-wait-timeout
>  * Parameter 
> [_jobmanager.adaptive-scheduler.resource-stabilization-timeout_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-wait-timeout]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout
>  * Parameter 
> {_}j{_}[_obmanager.adaptive-scheduler.scaling-interval.min_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-min]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling
>  * Parameter 
> [_jobmanager.adaptive-scheduler.scaling-interval.max_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max]
>  will be renamed to the 
> {_}jobmanager.adaptive-scheduler{_}{_}.{_}executing.resource-stabilization-timeout
>  with default value 60s. 
>  * Parameter 
> [jobmanager.adaptive-scheduler.min-parallelism-increase|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-min-parallelism-increase]
>  will be removed without a direct replacement. Still, it will be superseded 
> by combining the parameters 
> jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling and 
> {_}jobmanager.adaptive-scheduler{_}{_}.{_}executing.resource-stabilization-timeout



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-36015) Align rescale parameters

2024-09-18 Thread Zdenek Tison (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zdenek Tison updated FLINK-36015:
-
Release Note: 
- Parameter `jobmanager.adaptive-scheduler.resource-wait-timeout` was renamed 
to the `jobmanager.adaptive-scheduler.submission.resource-wait-timeout`
- Parameter `jobmanager.adaptive-scheduler.resource-stabilization-timeout` was 
renamed to the 
`jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.min` was renamed to 
the `jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling`
 - Parameter `jobmanager.adaptive-scheduler.scaling-interval.max` was replaced 
by the `jobmanager.adaptive-scheduler.executing.resource-stabilization-timeout` 
with a default value of 60s.
- Parameter `jobmanager.adaptive-scheduler.min-parallelism-increase` was 
removed without a replacement.

The following parameters are new in 2.0: 

- Parameter `jobmanager.adaptive-scheduler.max-delay-for-scale-trigger` was 
renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`
- Parameter `jobmanager.adaptive-scheduler.scale-on-failed-checkpoints-count` 
was renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`

  was:
Parameter `jobmanager.adaptive-scheduler.resource-wait-timeout` was renamed to 
the `jobmanager.adaptive-scheduler.submission.resource-wait-timeout`
- Parameter `jobmanager.adaptive-scheduler.resource-stabilization-timeout` was 
renamed to the 
`jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.min` was renamed to 
the `jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling`
 - Parameter `jobmanager.adaptive-scheduler.scaling-interval.max` was replaced 
by the `jobmanager.adaptive-scheduler.executing.resource-stabilization-timeout` 
with a default value of 60s.
- Parameter `jobmanager.adaptive-scheduler.min-parallelism-increase` was 
removed without a replacement.

The following parameters are new in 2.0: 

- Parameter `jobmanager.adaptive-scheduler.max-delay-for-scale-trigger` was 
renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`
- Parameter `jobmanager.adaptive-scheduler.scale-on-failed-checkpoints-count` 
was renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`


> Align rescale parameters
> 
>
> Key: FLINK-36015
> URL: https://issues.apache.org/jira/browse/FLINK-36015
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Zdenek Tison
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0-preview
>
>
> * Parameter 
> [_jobmanager.adaptive-scheduler.resource-wait-timeout_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-wait-timeout]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.submission.resource-wait-timeout
>  * Parameter 
> [_jobmanager.adaptive-scheduler.resource-stabilization-timeout_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-wait-timeout]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout
>  * Parameter 
> {_}j{_}[_obmanager.adaptive-scheduler.scaling-interval.min_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-min]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling
>  * Parameter 
> [_jobmanager.adaptive-scheduler.scaling-interval.max_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max]
>  will be renamed to the 
> {_}jobmanager.adaptive-scheduler{_}{_}.{_}executing.resource-stabilization-timeout
>  with default value 60s. 
>  * Parameter 
> [jobmanager.adaptive-scheduler.min-parallelism-increase|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-min-parallelism-increase]
>  will be removed without a direct replacement. Still, it will be superseded 
> by combining the parameters 
> jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling and 
> {_}jobmanager.adaptive-scheduler{_}{_}.{_}executing.resource-stabilization-timeout



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-36015) Align rescale parameters

2024-09-18 Thread Zdenek Tison (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zdenek Tison updated FLINK-36015:
-
Release Note: 
- Parameter `jobmanager.adaptive-scheduler.resource-wait-timeout` was renamed 
to the `jobmanager.adaptive-scheduler.submission.resource-wait-timeout`
- Parameter `jobmanager.adaptive-scheduler.resource-stabilization-timeout` was 
renamed to the 
`jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.min` was renamed to 
the `jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling`
 - Parameter `jobmanager.adaptive-scheduler.scaling-interval.max` was replaced 
by the `jobmanager.adaptive-scheduler.executing.resource-stabilization-timeout` 
with a default value of 60s.
- Parameter `jobmanager.adaptive-scheduler.min-parallelism-increase` was 
removed without a replacement.

The following parameters introduced in Flink 2.0 have been also renamed: 

- Parameter `jobmanager.adaptive-scheduler.max-delay-for-scale-trigger` was 
renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`
- Parameter `jobmanager.adaptive-scheduler.scale-on-failed-checkpoints-count` 
was renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`

  was:
- Parameter `jobmanager.adaptive-scheduler.resource-wait-timeout` was renamed 
to the `jobmanager.adaptive-scheduler.submission.resource-wait-timeout`
- Parameter `jobmanager.adaptive-scheduler.resource-stabilization-timeout` was 
renamed to the 
`jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout`
- Parameter `jobmanager.adaptive-scheduler.scaling-interval.min` was renamed to 
the `jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling`
 - Parameter `jobmanager.adaptive-scheduler.scaling-interval.max` was replaced 
by the `jobmanager.adaptive-scheduler.executing.resource-stabilization-timeout` 
with a default value of 60s.
- Parameter `jobmanager.adaptive-scheduler.min-parallelism-increase` was 
removed without a replacement.

The following parameters are new in 2.0: 

- Parameter `jobmanager.adaptive-scheduler.max-delay-for-scale-trigger` was 
renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`
- Parameter `jobmanager.adaptive-scheduler.scale-on-failed-checkpoints-count` 
was renamed to the 
`jobmanager.adaptive-scheduler.rescale-trigger.max-checkpoint-failures`


> Align rescale parameters
> 
>
> Key: FLINK-36015
> URL: https://issues.apache.org/jira/browse/FLINK-36015
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Zdenek Tison
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0-preview
>
>
> * Parameter 
> [_jobmanager.adaptive-scheduler.resource-wait-timeout_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-wait-timeout]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.submission.resource-wait-timeout
>  * Parameter 
> [_jobmanager.adaptive-scheduler.resource-stabilization-timeout_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-wait-timeout]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout
>  * Parameter 
> {_}j{_}[_obmanager.adaptive-scheduler.scaling-interval.min_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-min]
>  will be renamed to the 
> jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling
>  * Parameter 
> [_jobmanager.adaptive-scheduler.scaling-interval.max_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max]
>  will be renamed to the 
> {_}jobmanager.adaptive-scheduler{_}{_}.{_}executing.resource-stabilization-timeout
>  with default value 60s. 
>  * Parameter 
> [jobmanager.adaptive-scheduler.min-parallelism-increase|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-min-parallelism-increase]
>  will be removed without a direct replacement. Still, it will be superseded 
> by combining the parameters 
> jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling and 
> {_}jobmanager.adaptive-scheduler{_}{_}.{_}executing.resource-stabilization-timeout



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-36279) AdaptiveScheduler#hasDesiredResources doesn't rely on all available slots which causes problems in Executing state

2024-09-17 Thread Zdenek Tison (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-36279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882589#comment-17882589
 ] 

Zdenek Tison commented on FLINK-36279:
--

[~mapohl] Thank you very much for taking care of this bug. 

> AdaptiveScheduler#hasDesiredResources doesn't rely on all available slots 
> which causes problems in Executing state
> --
>
> Key: FLINK-36279
> URL: https://issues.apache.org/jira/browse/FLINK-36279
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 2.0-preview
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 2.0-preview
>
> Attachments: FLINK-36279-FLINK-36014-pr.success.log, 
> FLINK-36279.20240914.6.success.log, FLINK-36279.fixed.success.log
>
>
> FLINK-36014 aligned the triggering of the execution graph creation in 
> {{WaitingForResources}} and rescaling in {{Executing}} state. Before that 
> change, only {{WaitingForResources}} relied on this method. Relying on free 
> slots was good enough because in {{WaitingForResources}} state, there are no 
> slots allocated, yet.
> Using this method for {{Executing}} state now as well changes this premise 
> because there are slots allocated while checking the slot availability that 
> would become available after the restart. Hence, considering these currently 
> allocated slots as well in the slot availability check is good enough. This 
> will not break the premise for the {{WaitingForResources}} state.
> {{RescaleOnCheckpointITCase}} fails because of that issue:
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62105&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba&l=11287
> {code}
> Sep 13 17:16:55 "ForkJoinPool-1-worker-25" #28 daemon prio=5 os_prio=0 
> tid=0x7f973f0c2800 nid=0x31a1 waiting on condition [0x7f97089fc000]
> Sep 13 17:16:55java.lang.Thread.State: TIMED_WAITING (sleeping)
> Sep 13 17:16:55   at java.lang.Thread.sleep(Native Method)
> Sep 13 17:16:55   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:152)
> Sep 13 17:16:55   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
> Sep 13 17:16:55   at 
> org.apache.flink.test.scheduling.UpdateJobResourceRequirementsITCase.waitForRunningTasks(UpdateJobResourceRequirementsITCase.java:219)
> Sep 13 17:16:55   at 
> org.apache.flink.test.scheduling.RescaleOnCheckpointITCase.testRescaleOnCheckpoint(RescaleOnCheckpointITCase.java:139)
> Sep 13 17:16:55   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Sep 13 17:16:55   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-36011) Generalize RescaleManager to become StateTransitionManager

2024-09-02 Thread Zdenek Tison (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zdenek Tison resolved FLINK-36011.
--
Resolution: Implemented

> Generalize RescaleManager to become StateTransitionManager
> --
>
> Key: FLINK-36011
> URL: https://issues.apache.org/jira/browse/FLINK-36011
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Zdenek Tison
>Assignee: Zdenek Tison
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>
> The goal is to change the RescaleManager component to one with a broader 
> responsibility that will manage the adaptive scheduler's state transitions.   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-36016) Synchronize initialization time and clock usage

2024-08-08 Thread Zdenek Tison (Jira)
Zdenek Tison created FLINK-36016:


 Summary: Synchronize initialization time and clock usage 
 Key: FLINK-36016
 URL: https://issues.apache.org/jira/browse/FLINK-36016
 Project: Flink
  Issue Type: Sub-task
Reporter: Zdenek Tison


StateTransitionManager's initialization time and the clock parameter should be 
based on the same time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-36015) Align rescale parameters

2024-08-08 Thread Zdenek Tison (Jira)
Zdenek Tison created FLINK-36015:


 Summary: Align rescale parameters
 Key: FLINK-36015
 URL: https://issues.apache.org/jira/browse/FLINK-36015
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Configuration
Reporter: Zdenek Tison


* Parameter 
[_jobmanager.adaptive-scheduler.resource-wait-timeout_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-wait-timeout]
 will be renamed to the 
jobmanager.adaptive-scheduler.submission.resource-wait-timeout
 * Parameter 
[_jobmanager.adaptive-scheduler.resource-stabilization-timeout_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-wait-timeout]
 will be renamed to the 
jobmanager.adaptive-scheduler.submission.resource-stabilization-timeout
 * Parameter 
{_}j{_}[_obmanager.adaptive-scheduler.scaling-interval.min_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-min]
 will be renamed to the 
jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling
 * Parameter 
[_jobmanager.adaptive-scheduler.scaling-interval.max_|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max]
 will be renamed to the 
{_}jobmanager.adaptive-scheduler{_}{_}.{_}executing.resource-stabilization-timeout
 with default value 60s. 
 * Parameter 
[jobmanager.adaptive-scheduler.min-parallelism-increase|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-min-parallelism-increase]
 will be removed without a direct replacement. Still, it will be superseded by 
combining the parameters 
jobmanager.adaptive-scheduler.executing.cooldown-after-rescaling and 
{_}jobmanager.adaptive-scheduler{_}{_}.{_}executing.resource-stabilization-timeout



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-36014) Align the desired and sufficient resources definiton in Executing and WaitForResources states

2024-08-08 Thread Zdenek Tison (Jira)
Zdenek Tison created FLINK-36014:


 Summary: Align the desired and sufficient resources definiton in 
Executing and WaitForResources states
 Key: FLINK-36014
 URL: https://issues.apache.org/jira/browse/FLINK-36014
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Zdenek Tison


The goal is to use the same definition for the desired and sufficient resources 
in the Executing state as in the WaitingForResources state. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-36013) Introduce the transition from Restarting to CreatingExecutionGraph state

2024-08-08 Thread Zdenek Tison (Jira)
Zdenek Tison created FLINK-36013:


 Summary: Introduce the transition from Restarting to 
CreatingExecutionGraph state
 Key: FLINK-36013
 URL: https://issues.apache.org/jira/browse/FLINK-36013
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Zdenek Tison


The AdaptiveScheduler omits the WaitingForResources state when rescaling. Pass 
a flag into the Restarting state that directs the state transition to the 
CreatingExecutinggraph instead of WaitingForResources. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-36012) Integrate StateTransitionManager into WaitingForResources state

2024-08-08 Thread Zdenek Tison (Jira)
Zdenek Tison created FLINK-36012:


 Summary: Integrate StateTransitionManager into WaitingForResources 
state
 Key: FLINK-36012
 URL: https://issues.apache.org/jira/browse/FLINK-36012
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Zdenek Tison


The StateTransitionManager will be used in the WaitingForResources state to 
manage the transition to a subsequent state. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-36011) Generalize RescaleManager to become StateTransitionManager

2024-08-08 Thread Zdenek Tison (Jira)
Zdenek Tison created FLINK-36011:


 Summary: Generalize RescaleManager to become StateTransitionManager
 Key: FLINK-36011
 URL: https://issues.apache.org/jira/browse/FLINK-36011
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Zdenek Tison
 Fix For: 2.0.0


The goal is to change the RescaleManager component to one with a broader 
responsibility that will manage the adaptive scheduler's state transitions.   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35035) Reduce job pause time when cluster resources are expanded in adaptive mode

2024-08-08 Thread Zdenek Tison (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872011#comment-17872011
 ] 

Zdenek Tison commented on FLINK-35035:
--

This task will be used as a parent task for changes proposed in 
[FLIP-472|https://cwiki.apache.org/confluence/display/FLINK/FLIP-472%3A+Aligning+timeout+logic+in+the+AdaptiveScheduler%27s+WaitingForResources+and+Executing+states]

> Reduce job pause time when cluster resources are expanded in adaptive mode
> --
>
> Key: FLINK-35035
> URL: https://issues.apache.org/jira/browse/FLINK-35035
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Task
>Affects Versions: 1.19.0
>Reporter: yuanfenghu
>Assignee: Zdenek Tison
>Priority: Minor
>
> When 'jobmanager.scheduler = adaptive' , job graph changes triggered by 
> cluster expansion will cause long-term task stagnation. We should reduce this 
> impact.
> As an example:
> I have jobgraph for : [v1 (maxp=10 minp = 1)] -> [v2 (maxp=10, minp=1)]
> When my cluster has 5 slots, the job will be executed as [v1 p5]->[v2 p5]
> When I add slots the task will trigger jobgraph changes,by
> org.apache.flink.runtime.scheduler.adaptive.ResourceListener#onNewResourcesAvailable,
> However, the five new slots I added were not discovered at the same time (for 
> convenience, I assume that a taskmanager has one slot), because no matter 
> what environment we add, we cannot guarantee that the new slots will be added 
> at once, so this will cause onNewResourcesAvailable triggers repeatedly
> ,If each new slot action has a certain interval, then the jobgraph will 
> continue to change during this period. What I hope is that there will be a 
> stable time to configure the cluster resources  and then go to it after the 
> number of cluster slots has been stable for a certain period of time. Trigger 
> jobgraph changes to avoid this situation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35035) Reduce job pause time when cluster resources are expanded in adaptive mode

2024-07-12 Thread Zdenek Tison (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865350#comment-17865350
 ] 

Zdenek Tison commented on FLINK-35035:
--

Hi, [~echauchot] [~heigebupahei]

I took over [~mapohl]'s work and prepared the FLIP document on the discussed 
topic. Please take a look if the topic is still relevant to you. Thanks

https://lists.apache.org/thread/krnjv8fm62nbnrljmk3bfoons86pc1dw

> Reduce job pause time when cluster resources are expanded in adaptive mode
> --
>
> Key: FLINK-35035
> URL: https://issues.apache.org/jira/browse/FLINK-35035
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Task
>Affects Versions: 1.19.0
>Reporter: yuanfenghu
>Priority: Minor
>
> When 'jobmanager.scheduler = adaptive' , job graph changes triggered by 
> cluster expansion will cause long-term task stagnation. We should reduce this 
> impact.
> As an example:
> I have jobgraph for : [v1 (maxp=10 minp = 1)] -> [v2 (maxp=10, minp=1)]
> When my cluster has 5 slots, the job will be executed as [v1 p5]->[v2 p5]
> When I add slots the task will trigger jobgraph changes,by
> org.apache.flink.runtime.scheduler.adaptive.ResourceListener#onNewResourcesAvailable,
> However, the five new slots I added were not discovered at the same time (for 
> convenience, I assume that a taskmanager has one slot), because no matter 
> what environment we add, we cannot guarantee that the new slots will be added 
> at once, so this will cause onNewResourcesAvailable triggers repeatedly
> ,If each new slot action has a certain interval, then the jobgraph will 
> continue to change during this period. What I hope is that there will be a 
> stable time to configure the cluster resources  and then go to it after the 
> number of cluster slots has been stable for a certain period of time. Trigger 
> jobgraph changes to avoid this situation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30403) The reported latest completed checkpoint is discarded

2023-02-23 Thread Zdenek Tison (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692567#comment-17692567
 ] 

Zdenek Tison commented on FLINK-30403:
--

Hi, thanks for asking. No, let's close it. 

> The reported latest completed checkpoint is discarded
> -
>
> Key: FLINK-30403
> URL: https://issues.apache.org/jira/browse/FLINK-30403
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.16.0
>Reporter: Zdenek Tison
>Priority: Major
>
> There is a small window where the reported latest completed checkpoint can be 
> marked as discarded while the new checkpoint wasn't reported yet. 
> The reason is that the function 
> _addCompletedCheckpointToStoreAndSubsumeOldest_  is called before 
> _reportCompletedCheckpoint_ in _CheckpointCoordinator._
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30403) The reported latest completed checkpoint is discarded

2022-12-13 Thread Zdenek Tison (Jira)
Zdenek Tison created FLINK-30403:


 Summary: The reported latest completed checkpoint is discarded
 Key: FLINK-30403
 URL: https://issues.apache.org/jira/browse/FLINK-30403
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.16.0
Reporter: Zdenek Tison


There is a small window where the reported latest completed checkpoint can be 
marked as discarded while the new checkpoint wasn't reported yet. 

The reason is that the function _addCompletedCheckpointToStoreAndSubsumeOldest_ 
 is called before _reportCompletedCheckpoint_ in _CheckpointCoordinator._

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)