[jira] [Commented] (FLINK-34655) Autoscaler doesn't work for flink 1.15
[ https://issues.apache.org/jira/browse/FLINK-34655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826924#comment-17826924 ] Rui Fan commented on FLINK-34655: - Merged to main(1.8.0) via: ab41083f38cbe27c7d0ee3d8ba29b527e13a4fcc > Autoscaler doesn't work for flink 1.15 > -- > > Key: FLINK-34655 > URL: https://issues.apache.org/jira/browse/FLINK-34655 > Project: Flink > Issue Type: Bug > Components: Autoscaler >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.8.0 > > > flink-ubernetes-operator is committed to supporting the latest 4 flink minor > versions, and autoscaler is a part of flink-ubernetes-operator. Currently, > the latest 4 flink minor versions are 1.15, 1.16, 1.17 and 1.18. > But autoscaler doesn't work for flink 1.15. > h2. Root cause: > * FLINK-28310 added some properties in IOMetricsInfo in flink-1.16 > * IOMetricsInfo is a part of JobDetailsInfo > * JobDetailsInfo is necessary for autoscaler [1] > * flink's RestClient doesn't allow miss any property during deserializing the > json > That means that the RestClient after 1.15 cannot fetch JobDetailsInfo for > 1.15 jobs. > h2. How to fix it properly? > - [[FLINK-34655](https://issues.apache.org/jira/browse/FLINK-34655)] Copy > IOMetricsInfo to flink-autoscaler-standalone module > - Removing them after 1.15 are not supported > [1] > https://github.com/apache/flink-kubernetes-operator/blob/ede1a610b3375d31a2e82287eec67ace70c4c8df/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ScalingMetricCollector.java#L109 > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34655) Autoscaler doesn't work for flink 1.15
[ https://issues.apache.org/jira/browse/FLINK-34655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826923#comment-17826923 ] Rui Fan commented on FLINK-34655: - {quote}I understand the idea behind providing suggestions. However, it is difficult to assess the quality of Autoscaling decisions without applying them automatically. The reason is that suggestions become stale very quickly if the load pattern is not completely static. Even for static load patterns, if the user doesn't redeploy in a matter of minutes, the suggestions might already be stale again when the number of pending records increased too much. In any case, production load patterns are rarely static which means that autoscaling will inevitable trigger multiple times a day, but that is where its real power is unleashed. It would be great to hear about any concerns your users have for turning on automatic scaling. {quote} Thanks for pointing it out! It is indeed difficult to observe the dynamic changes of the load. But users don't want to use a huge feature without observe. This does not only refer to autoscaler, but to all major features, users need to do enough research before they can be applied to the production environment. Although the parallelism may change dynamically, based on historical experience, users are more concerned about whether the parallelism is reasonable during peak periods. Currently, jdbc event handler recorded all ScalingReports. The ScalingReport includes the create time, users can check them conveniently. {quote}We've been operating it in production for about a year now.{quote} It's great to see that your users have been using autoscaler for a long time. I believe it will give the entire community more confidence in using the autoscaler. {quote}Back to the issue here, should we think about a patch release for 1.15 / 1.16 to add support for overriding vertex parallelism?{quote} I agree with [~gyfora], the 1.15 and 1.16 won't be released anymore. So community doesn't need to backport them. If some users want to use these features, it's better to use the new version or cherry pick them to their internal flink version. > Autoscaler doesn't work for flink 1.15 > -- > > Key: FLINK-34655 > URL: https://issues.apache.org/jira/browse/FLINK-34655 > Project: Flink > Issue Type: Bug > Components: Autoscaler >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.8.0 > > > flink-ubernetes-operator is committed to supporting the latest 4 flink minor > versions, and autoscaler is a part of flink-ubernetes-operator. Currently, > the latest 4 flink minor versions are 1.15, 1.16, 1.17 and 1.18. > But autoscaler doesn't work for flink 1.15. > h2. Root cause: > * FLINK-28310 added some properties in IOMetricsInfo in flink-1.16 > * IOMetricsInfo is a part of JobDetailsInfo > * JobDetailsInfo is necessary for autoscaler [1] > * flink's RestClient doesn't allow miss any property during deserializing the > json > That means that the RestClient after 1.15 cannot fetch JobDetailsInfo for > 1.15 jobs. > h2. How to fix it properly? > - [[FLINK-34655](https://issues.apache.org/jira/browse/FLINK-34655)] Copy > IOMetricsInfo to flink-autoscaler-standalone module > - Removing them after 1.15 are not supported > [1] > https://github.com/apache/flink-kubernetes-operator/blob/ede1a610b3375d31a2e82287eec67ace70c4c8df/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ScalingMetricCollector.java#L109 > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34655) Autoscaler doesn't work for flink 1.15
[ https://issues.apache.org/jira/browse/FLINK-34655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826054#comment-17826054 ] Gyula Fora commented on FLINK-34655: [~mxm] I would be hesitant to try to backport these changes to 1.15/1.16, the community doesn't generally backport new features to older releases and also these are already out of the supported version scope of Flink core anyways. For 1.15 we would have to backport the aggregated metrics changes which is not backward compatible with the current 1.15 rest api, so not possible to do. > Autoscaler doesn't work for flink 1.15 > -- > > Key: FLINK-34655 > URL: https://issues.apache.org/jira/browse/FLINK-34655 > Project: Flink > Issue Type: Bug > Components: Autoscaler >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.8.0 > > > flink-ubernetes-operator is committed to supporting the latest 4 flink minor > versions, and autoscaler is a part of flink-ubernetes-operator. Currently, > the latest 4 flink minor versions are 1.15, 1.16, 1.17 and 1.18. > But autoscaler doesn't work for flink 1.15. > h2. Root cause: > * FLINK-28310 added some properties in IOMetricsInfo in flink-1.16 > * IOMetricsInfo is a part of JobDetailsInfo > * JobDetailsInfo is necessary for autoscaler [1] > * flink's RestClient doesn't allow miss any property during deserializing the > json > That means that the RestClient after 1.15 cannot fetch JobDetailsInfo for > 1.15 jobs. > h2. How to fix it properly? > - [[FLINK-34655](https://issues.apache.org/jira/browse/FLINK-34655)] Copy > IOMetricsInfo to flink-autoscaler-standalone module > - Removing them after 1.15 are not supported > [1] > https://github.com/apache/flink-kubernetes-operator/blob/ede1a610b3375d31a2e82287eec67ace70c4c8df/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ScalingMetricCollector.java#L109 > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34655) Autoscaler doesn't work for flink 1.15
[ https://issues.apache.org/jira/browse/FLINK-34655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826015#comment-17826015 ] Maximilian Michels commented on FLINK-34655: Thanks for raising awareness for the Flink version compatibility, [~fanrui]! Although we've been using Flink Autoscaling with 1.16, it is true that only Flink 1.17 supports it out of the box. {quote}In the short term, we only use the autoscaler to give suggestion instead of scaling directly. After our users think the parallelism calculation is reliable, they will have stronger motivation to upgrade the flink version. {quote} I understand the idea behind providing suggestions. However, it is difficult to assess the quality of Autoscaling decisions without applying them automatically. The reason is that suggestions become stale very quickly if the load pattern is not completely static. Even for static load patterns, if the user doesn't redeploy in a matter of minutes, the suggestions might already be stale again when the number of pending records increased too much. In any case, production load patterns are rarely static which means that autoscaling will inevitable trigger multiple times a day, but that is where its real power is unleashed. It would be great to hear about any concerns your users have for turning on automatic scaling. We've been operating it in production for about a year now. Back to the issue here, should we think about a patch release for 1.15 / 1.16 to add support for overriding vertex parallelism? > Autoscaler doesn't work for flink 1.15 > -- > > Key: FLINK-34655 > URL: https://issues.apache.org/jira/browse/FLINK-34655 > Project: Flink > Issue Type: Bug > Components: Autoscaler >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.8.0 > > > flink-ubernetes-operator is committed to supporting the latest 4 flink minor > versions, and autoscaler is a part of flink-ubernetes-operator. Currently, > the latest 4 flink minor versions are 1.15, 1.16, 1.17 and 1.18. > But autoscaler doesn't work for flink 1.15. > h2. Root cause: > * FLINK-28310 added some properties in IOMetricsInfo in flink-1.16 > * IOMetricsInfo is a part of JobDetailsInfo > * JobDetailsInfo is necessary for autoscaler [1] > * flink's RestClient doesn't allow miss any property during deserializing the > json > That means that the RestClient after 1.15 cannot fetch JobDetailsInfo for > 1.15 jobs. > h2. How to fix it properly? > - [[FLINK-34655](https://issues.apache.org/jira/browse/FLINK-34655)] Copy > IOMetricsInfo to flink-autoscaler-standalone module > - Removing them after 1.15 are not supported > [1] > https://github.com/apache/flink-kubernetes-operator/blob/ede1a610b3375d31a2e82287eec67ace70c4c8df/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ScalingMetricCollector.java#L109 > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34655) Autoscaler doesn't work for flink 1.15
[ https://issues.apache.org/jira/browse/FLINK-34655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825669#comment-17825669 ] Rui Fan commented on FLINK-34655: - {quote}But the vertex parallelism overrides feature was introduced in 1.17 so the autoscaler never really officially supported anything before that{quote} We(our internal platform) want to use the autoscaler to give some parallelism setting suggestions to our users. We suggest they upgrade job to 1.17 or later version if users want to scaling automatically. And that's why we want to parse scaling report. In the short term, we only use the autoscaler to give suggestion instead of scaling directly. After our users think the parallelism calculation is reliable, they will have stronger motivation to upgrade the flink version. > Autoscaler doesn't work for flink 1.15 > -- > > Key: FLINK-34655 > URL: https://issues.apache.org/jira/browse/FLINK-34655 > Project: Flink > Issue Type: Bug > Components: Autoscaler >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > > flink-ubernetes-operator is committed to supporting the latest 4 flink minor > versions, and autoscaler is a part of flink-ubernetes-operator. Currently, > the latest 4 flink minor versions are 1.15, 1.16, 1.17 and 1.18. > But autoscaler doesn't work for flink 1.15. > h2. Root cause: > * FLINK-28310 added some properties in IOMetricsInfo in flink-1.16 > * IOMetricsInfo is a part of JobDetailsInfo > * JobDetailsInfo is necessary for autoscaler [1] > * flink's RestClient doesn't allow miss any property during deserializing the > json > That means that the RestClient after 1.15 cannot fetch JobDetailsInfo for > 1.15 jobs. > h2. How to fix it properly? > Flink side support ignore unknown properties. > FLINK-33268 already do it. But I try run autoscaler with flink-1.15 job, it > still doesn't work. Because the IOMetricsInfo added some properties, they are > primitive type. > It should disable DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES as well. > (Not sure whether it should be a seperate FLIP or it can be a part of > FLIP-401 [2].) > h2. How to fix it in the short term? > 1. Copy the latest RestMapperUtils and RestClient from master branch (It > includes FLINK-33268) to flink-autoscaler module. (The copied class will be > loaded first) > 2. Disable DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES in > RestMapperUtils#flexibleObjectMapper in copied class. > Based on these 2 steps, flink-1.15 works well with autoscaler. (I try it > locally). > After DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES in > RestMapperUtils#flexibleObjectMapper is disabled, and the corresponding code > is released in flink side. flink-ubernetes-operator can remove these 2 copied > classes. > [1] > https://github.com/apache/flink-kubernetes-operator/blob/ede1a610b3375d31a2e82287eec67ace70c4c8df/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ScalingMetricCollector.java#L109 > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34655) Autoscaler doesn't work for flink 1.15
[ https://issues.apache.org/jira/browse/FLINK-34655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825657#comment-17825657 ] Gyula Fora commented on FLINK-34655: Also this issue is fixed in the Kubernetes-operator package where we have an override version of JobDetailsInfo > Autoscaler doesn't work for flink 1.15 > -- > > Key: FLINK-34655 > URL: https://issues.apache.org/jira/browse/FLINK-34655 > Project: Flink > Issue Type: Bug > Components: Autoscaler >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > > flink-ubernetes-operator is committed to supporting the latest 4 flink minor > versions, and autoscaler is a part of flink-ubernetes-operator. Currently, > the latest 4 flink minor versions are 1.15, 1.16, 1.17 and 1.18. > But autoscaler doesn't work for flink 1.15. > h2. Root cause: > * FLINK-28310 added some properties in IOMetricsInfo in flink-1.16 > * IOMetricsInfo is a part of JobDetailsInfo > * JobDetailsInfo is necessary for autoscaler [1] > * flink's RestClient doesn't allow miss any property during deserializing the > json > That means that the RestClient after 1.15 cannot fetch JobDetailsInfo for > 1.15 jobs. > h2. How to fix it properly? > Flink side support ignore unknown properties. > FLINK-33268 already do it. But I try run autoscaler with flink-1.15 job, it > still doesn't work. Because the IOMetricsInfo added some properties, they are > primitive type. > It should disable DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES as well. > (Not sure whether it should be a seperate FLIP or it can be a part of > FLIP-401 [2].) > h2. How to fix it in the short term? > 1. Copy the latest RestMapperUtils and RestClient from master branch (It > includes FLINK-33268) to flink-autoscaler module. (The copied class will be > loaded first) > 2. Disable DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES in > RestMapperUtils#flexibleObjectMapper in copied class. > Based on these 2 steps, flink-1.15 works well with autoscaler. (I try it > locally). > After DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES in > RestMapperUtils#flexibleObjectMapper is disabled, and the corresponding code > is released in flink side. flink-ubernetes-operator can remove these 2 copied > classes. > [1] > https://github.com/apache/flink-kubernetes-operator/blob/ede1a610b3375d31a2e82287eec67ace70c4c8df/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ScalingMetricCollector.java#L109 > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34655) Autoscaler doesn't work for flink 1.15
[ https://issues.apache.org/jira/browse/FLINK-34655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825656#comment-17825656 ] Gyula Fora commented on FLINK-34655: But the vertex parallelism overrides feature was introduced in 1.17 so the autoscaler never really officially supported anything before that. What do you think [~mxm] ? > Autoscaler doesn't work for flink 1.15 > -- > > Key: FLINK-34655 > URL: https://issues.apache.org/jira/browse/FLINK-34655 > Project: Flink > Issue Type: Bug > Components: Autoscaler >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Blocker > Labels: pull-request-available > Fix For: 1.8.0 > > > flink-ubernetes-operator is committed to supporting the latest 4 flink minor > versions, and autoscaler is a part of flink-ubernetes-operator. Currently, > the latest 4 flink minor versions are 1.15, 1.16, 1.17 and 1.18. > But autoscaler doesn't work for flink 1.15. > h2. Root cause: > * FLINK-28310 added some properties in IOMetricsInfo in flink-1.16 > * IOMetricsInfo is a part of JobDetailsInfo > * JobDetailsInfo is necessary for autoscaler [1] > * flink's RestClient doesn't allow miss any property during deserializing the > json > That means that the RestClient after 1.15 cannot fetch JobDetailsInfo for > 1.15 jobs. > h2. How to fix it properly? > Flink side support ignore unknown properties. > FLINK-33268 already do it. But I try run autoscaler with flink-1.15 job, it > still doesn't work. Because the IOMetricsInfo added some properties, they are > primitive type. > It should disable DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES as well. > (Not sure whether it should be a seperate FLIP or it can be a part of > FLIP-401 [2].) > h2. How to fix it in the short term? > 1. Copy the latest RestMapperUtils and RestClient from master branch (It > includes FLINK-33268) to flink-autoscaler module. (The copied class will be > loaded first) > 2. Disable DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES in > RestMapperUtils#flexibleObjectMapper in copied class. > Based on these 2 steps, flink-1.15 works well with autoscaler. (I try it > locally). > After DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES in > RestMapperUtils#flexibleObjectMapper is disabled, and the corresponding code > is released in flink side. flink-ubernetes-operator can remove these 2 copied > classes. > [1] > https://github.com/apache/flink-kubernetes-operator/blob/ede1a610b3375d31a2e82287eec67ace70c4c8df/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ScalingMetricCollector.java#L109 > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34655) Autoscaler doesn't work for flink 1.15
[ https://issues.apache.org/jira/browse/FLINK-34655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825655#comment-17825655 ] Gyula Fora commented on FLINK-34655: The bigger issue is that aggregated busy time metrics are not part of Flink 1.15 > Autoscaler doesn't work for flink 1.15 > -- > > Key: FLINK-34655 > URL: https://issues.apache.org/jira/browse/FLINK-34655 > Project: Flink > Issue Type: Bug > Components: Autoscaler >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Blocker > Labels: pull-request-available > Fix For: 1.8.0 > > > flink-ubernetes-operator is committed to supporting the latest 4 flink minor > versions, and autoscaler is a part of flink-ubernetes-operator. Currently, > the latest 4 flink minor versions are 1.15, 1.16, 1.17 and 1.18. > But autoscaler doesn't work for flink 1.15. > h2. Root cause: > * FLINK-28310 added some properties in IOMetricsInfo in flink-1.16 > * IOMetricsInfo is a part of JobDetailsInfo > * JobDetailsInfo is necessary for autoscaler [1] > * flink's RestClient doesn't allow miss any property during deserializing the > json > That means that the RestClient after 1.15 cannot fetch JobDetailsInfo for > 1.15 jobs. > h2. How to fix it properly? > Flink side support ignore unknown properties. > FLINK-33268 already do it. But I try run autoscaler with flink-1.15 job, it > still doesn't work. Because the IOMetricsInfo added some properties, they are > primitive type. > It should disable DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES as well. > (Not sure whether it should be a seperate FLIP or it can be a part of > FLIP-401 [2].) > h2. How to fix it in the short term? > 1. Copy the latest RestMapperUtils and RestClient from master branch (It > includes FLINK-33268) to flink-autoscaler module. (The copied class will be > loaded first) > 2. Disable DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES in > RestMapperUtils#flexibleObjectMapper in copied class. > Based on these 2 steps, flink-1.15 works well with autoscaler. (I try it > locally). > After DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES in > RestMapperUtils#flexibleObjectMapper is disabled, and the corresponding code > is released in flink side. flink-ubernetes-operator can remove these 2 copied > classes. > [1] > https://github.com/apache/flink-kubernetes-operator/blob/ede1a610b3375d31a2e82287eec67ace70c4c8df/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ScalingMetricCollector.java#L109 > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance -- This message was sent by Atlassian Jira (v8.20.10#820010)