date:20240118

[jira] [Comment Edited] (FLINK-33923) Upgrade upload-artifacts and download-artifacts to version 4

2024-01-18 Thread Matthias Pohl (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-33923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808088#comment-17808088
 ] 

Matthias Pohl edited comment on FLINK-33923 at 1/18/24 8:55 AM:


Here are a few test runs with different solutions:
 * Plain upgrade of the {{*-artifact}} actions to v4
 ** [https://github.com/XComp/flink/actions/runs/7556760147]
 ** Result: Failure due to missing clib versions (2.25, 2.27, 2.28) analogous 
to what was reported in FLINK-33277
 * Tried installing Node 16 through a separate step as the default node version
 ** [https://github.com/XComp/flink/actions/runs/7557113216]
 ** Result: Still error due to missing GLIBC versions for node 20
 * Ubuntu 22.04 base image:
 ** [https://github.com/XComp/flink/actions/runs/7557690888]
 ** There was a problem with the missing caller ID which resulted in each job 
trying to publish its artifacts under the same name
 ** But the Hadoop 3.1.3 job succeeded and revealed issues with Ubuntu 22.04 
(SSL-related tests were failing in {{core}} and there was no {{python}} binary 
anymore (which made {{PythonEnvUtilsTest}} fail in {{misc}} stage)
 * Ubuntu 20.04 base image:
 ** [https://github.com/XComp/flink/actions/runs/7559103903]
 ** Revealed same issues with SSL-related tests (was cancelled early on)
 * Ubuntu 18.04 as a base image:
 ** [https://github.com/XComp/flink/actions/runs/7559619643]
 ** Compilation failed again but because of missing GLIBC version 2.28
 ** A local run in the Ubuntu 18.04 image revealed that the OpenSSL setup works 
with netty here

The conclusion is to either switch to a newer Ubuntu version 20.04+, fix the 
OpenSSL issue and hope that no additional error pop up or try to find a way to 
install newer GCLIB versions in the current image.


was (Author: mapohl):
Here are a few test runs with different solutions:
 * Plain upgrade of the {{*-artifact}} actions to v4
 ** [https://github.com/XComp/flink/actions/runs/7556760147]
 ** Result: Failure due to missing clib versions (2.25, 2.27, 2.28) analogous 
to what was reported in FLINK-33277
 * Tried installing Node 16 through a separate step as the default node version
 ** [https://github.com/XComp/flink/actions/runs/7557113216]
 ** Result: Still error due to missing GLIBC versions for node 20
 * Switch to Ubuntu 22.04 base image:
 ** [https://github.com/XComp/flink/actions/runs/7557690888]
 ** There was a problem with the missing caller ID which resulted in each job 
trying to publish its artifacts under the same name
 ** But the Hadoop 3.1.3 job succeeded and revealed issues with Ubuntu 22.04 
(SSL-related tests were failing in {{core}} and there was no {{python}} binary 
anymore (which made {{PythonEnvUtilsTest}} fail in {{misc}} stage)
 * Switch to Ubuntu 20.04 base image:
 ** [https://github.com/XComp/flink/actions/runs/7559103903]
 ** Revealed same issues with SSL-related tests (was cancelled early on)
 * Ubuntu 18.04 as a base image:
 ** [https://github.com/XComp/flink/actions/runs/7559619643]
 ** Compilation failed again but because of missing GLIBC version 2.28
 ** A local run in the Ubuntu 18.04 image revealed that the OpenSSL setup works 
with netty here

The conclusion is to either switch to a newer Ubuntu version 20.04+, fix the 
OpenSSL issue and hope that no additional error pop up or try to find a way to 
install newer GCLIB versions in the current image.

> Upgrade upload-artifacts and download-artifacts to version 4
> 
>
> Key: FLINK-33923
> URL: https://issues.apache.org/jira/browse/FLINK-33923
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.18.0, 1.19.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions
>
> {{upload-artifacts}} and {{download-artifacts}} can be upgraded to v4. This 
> would bring us support for unique build artifact names (see 
> [https://github.com/actions/toolkit/tree/main/packages/artifact#breaking-changes])
>  and performance improvements 
> ([https://github.blog/changelog/2023-12-14-github-actions-artifacts-v4-is-now-generally-available/|https://github.blog/changelog/2023-12-14-github-actions-artifacts-v4-is-now-generally-available/)]).
> Such a change would require upgrading the node version as stated in 
> FLINK-33277



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-33923) Upgrade upload-artifacts and download-artifacts to version 4

2024-01-18 Thread Matthias Pohl (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-33923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808088#comment-17808088
 ] 

Matthias Pohl commented on FLINK-33923:
---

Here are a few test runs with different solutions:
 * Plain upgrade of the {{*-artifact}} actions to v4
 ** [https://github.com/XComp/flink/actions/runs/7556760147]
 ** Result: Failure due to missing clib versions (2.25, 2.27, 2.28) analogous 
to what was reported in FLINK-33277
 * Tried installing Node 16 through a separate step as the default node version
 ** [https://github.com/XComp/flink/actions/runs/7557113216]
 ** Result: Still error due to missing GLIBC versions for node 20
 * Switch to Ubuntu 22.04 base image:
 ** [https://github.com/XComp/flink/actions/runs/7557690888]
 ** There was a problem with the missing caller ID which resulted in each job 
trying to publish its artifacts under the same name
 ** But the Hadoop 3.1.3 job succeeded and revealed issues with Ubuntu 22.04 
(SSL-related tests were failing in {{core}} and there was no {{python}} binary 
anymore (which made {{PythonEnvUtilsTest}} fail in {{misc}} stage)
 * Switch to Ubuntu 20.04 base image:
 ** [https://github.com/XComp/flink/actions/runs/7559103903]
 ** Revealed same issues with SSL-related tests (was cancelled early on)
 * Ubuntu 18.04 as a base image:
 ** [https://github.com/XComp/flink/actions/runs/7559619643]
 ** Compilation failed again but because of missing GLIBC version 2.28

> Upgrade upload-artifacts and download-artifacts to version 4
> 
>
> Key: FLINK-33923
> URL: https://issues.apache.org/jira/browse/FLINK-33923
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.18.0, 1.19.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions
>
> {{upload-artifacts}} and {{download-artifacts}} can be upgraded to v4. This 
> would bring us support for unique build artifact names (see 
> [https://github.com/actions/toolkit/tree/main/packages/artifact#breaking-changes])
>  and performance improvements 
> ([https://github.blog/changelog/2023-12-14-github-actions-artifacts-v4-is-now-generally-available/|https://github.blog/changelog/2023-12-14-github-actions-artifacts-v4-is-now-generally-available/)]).
> Such a change would require upgrading the node version as stated in 
> FLINK-33277



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34142) TaskManager WorkingDirectory is not removed during shutdown

2024-01-18 Thread Prabhu Joseph (Jira)

Prabhu Joseph created FLINK-34142:
-

 Summary: TaskManager WorkingDirectory is not removed during 
shutdown 
 Key: FLINK-34142
 URL: https://issues.apache.org/jira/browse/FLINK-34142
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.17.1, 1.16.0
Reporter: Prabhu Joseph


TaskManager WorkingDirectory is not removed during shutdown. 

*Repro*

 
{code:java}
1. Execute a Flink batch job within a Flink on YARN Session

flink-yarn-session -d

flink run -d /usr/lib/flink/examples/batch/WordCount.jar --input 
s3://prabhuflinks3/INPUT --output s3://prabhuflinks3/OUT

{code}
The batch job completes successfully, but the taskmanager working directory is 
not being removed.
{code:java}
[root@ip-1-2-3-4 container_1705470896818_0017_01_02]# ls -R -lrt 
/mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_02
/mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_02:
total 0
drwxr-xr-x 2 yarn yarn  6 Jan 18 08:34 tmp
drwxr-xr-x 4 yarn yarn 66 Jan 18 08:34 blobStorage
drwxr-xr-x 2 yarn yarn  6 Jan 18 08:34 slotAllocationSnapshots
drwxr-xr-x 2 yarn yarn  6 Jan 18 08:34 localState

/mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_02/tmp:
total 0

/mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_02/blobStorage:
total 0
drwxr-xr-x 2 yarn yarn 94 Jan 18 08:34 job_d11f7085314ef1fb04c4e12fe292185a
drwxr-xr-x 2 yarn yarn  6 Jan 18 08:34 incoming

/mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_02/blobStorage/job_d11f7085314ef1fb04c4e12fe292185a:
total 12
-rw-r--r-- 1 yarn yarn 10323 Jan 18 08:34 
blob_p-cdd441a64b3ea6eed0058df02c6c10fd208c94a8-86d84864273dad1e8084d8ef0f5aad52

/mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_02/blobStorage/incoming:
total 0

/mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_02/slotAllocationSnapshots:
total 0

/mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_02/localState:
total 0


{code}
*Analysis*

1. The TaskManagerRunner removes the working directory only when its 'close' 
method is called, which never happens.
{code:java}
public void close() throws Exception {
try {
closeAsync().get();
} catch (ExecutionException e) {

ExceptionUtils.rethrowException(ExceptionUtils.stripExecutionException(e));
}
}

public CompletableFuture closeAsync() {
return closeAsync(Result.SUCCESS);
}
{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-34090][core] Introduce SerializerConfig [flink]

2024-01-18 Thread via GitHub



reswqa commented on code in PR #24127:
URL: https://github.com/apache/flink/pull/24127#discussion_r1457109072


##
flink-core/src/main/java/org/apache/flink/api/common/serialization/SerializerConfig.java:
##
@@ -74,8 +68,19 @@ public final class SerializerConfig implements Serializable {
 private LinkedHashMap, Class>> 
registeredTypeFactories =
 new LinkedHashMap<>();
 
+private boolean hasGenericTypesEnabled;
+private boolean forceKryo;
+private boolean forceAvro;
+
 // 

 
+public SerializerConfig() {

Review Comment:
   Whether we will always call `ExecutionConfig#configure`? If so, the default 
values should not matter.  If not, can we pass these arguments from the 
outside? I kind of dislike creating a configuration object in this class. :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-28915] Support fetching artifacts in native K8s and standalone application mode [flink]

2024-01-18 Thread via GitHub



ferenc-csaky commented on code in PR #24065:
URL: https://github.com/apache/flink/pull/24065#discussion_r1457107530


##
flink-kubernetes/src/main/java/org/apache/flink/kubernetes/entrypoint/KubernetesApplicationClusterEntrypoint.java:
##
@@ -111,14 +126,45 @@ private static PackagedProgramRetriever 
getPackagedProgramRetriever(
 // No need to do pipelineJars validation if it is a PyFlink job.
 if (!(PackagedProgramUtils.isPython(jobClassName)
 || PackagedProgramUtils.isPython(programArguments))) {
-final List pipelineJars =
-
KubernetesUtils.checkJarFileForApplicationMode(configuration);
-Preconditions.checkArgument(pipelineJars.size() == 1, "Should only 
have one jar");
+final ArtifactFetchManager.Result fetchRes = 
fetchArtifacts(configuration);
+
 return DefaultPackagedProgramRetriever.create(
-userLibDir, pipelineJars.get(0), jobClassName, 
programArguments, configuration);
+userLibDir,
+fetchRes.getUserArtifactDir(),
+fetchRes.getJobJar(),
+jobClassName,
+programArguments,
+configuration);
 }
 
 return DefaultPackagedProgramRetriever.create(
 userLibDir, jobClassName, programArguments, configuration);
 }
+
+private static ArtifactFetchManager.Result fetchArtifacts(Configuration 
configuration) {
+try {
+String targetDir = generateJarDir(configuration);
+ArtifactFetchManager fetchMgr = new 
ArtifactFetchManager(configuration, targetDir);
+
+List uris = configuration.get(PipelineOptions.JARS);
+checkArgument(uris.size() == 1, "Should only have one jar");
+List additionalUris =
+configuration
+.getOptional(ArtifactFetchOptions.ARTIFACT_LIST)
+.orElse(Collections.emptyList());
+
+return fetchMgr.fetchArtifacts(uris.get(0), additionalUris);
+} catch (Exception e) {
+throw new RuntimeException(e);
+}
+}
+
+static String generateJarDir(Configuration configuration) {
+return String.join(
+File.separator,
+new 
File(configuration.get(ArtifactFetchOptions.ARTIFACT_BASE_DIR))
+.getAbsolutePath(),
+configuration.get(KubernetesConfigOptions.NAMESPACE),
+configuration.get(KubernetesConfigOptions.CLUSTER_ID));
+}

Review Comment:
   I did not write this part and personally have limited K8s experience, but I 
think the reason behind this is that multiple deployments may end up on the 
same node, so adding the `namespace` and `cluster-id` to the dir path makes 
sure any deployment will have its own artifact base dir.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-33263) Implement ParallelismProvider for sources in the table planner

2024-01-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33263:
---
Labels: pull-request-available  (was: )

> Implement ParallelismProvider for sources in the table planner
> --
>
> Key: FLINK-33263
> URL: https://issues.apache.org/jira/browse/FLINK-33263
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Zhanghao Chen
>Assignee: SuDewei
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[PR] [FLINK-33263][table-planner] Implement ParallelismProvider for sources in the table planner [flink]

2024-01-18 Thread via GitHub



BIOINSu opened a new pull request, #24128:
URL: https://github.com/apache/flink/pull/24128

   ## What is the purpose of the change
   
   Implement `ParallelismProvider` for sources in the table planner to support 
setting parallelism for Table/SQL sources.
   
   ## Brief change log
   
   - `CommonExecTableSourceScan` would get parallelism from 
`ParallelismProvider` if possible and configure the source parallelism into 
source transformation after validation
   - Introduce `SourceTransformationWrapper` to wrap the source transformation 
and expose a default parallelism to downstream
   - Use real source transformation when generate stream graph
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
   - Add end-to-end integration tests for setting source parallelism using 
table api
   - Add new field `@Nullable` parallelism in 
`TestValuesScanTableSourceWithoutProjectionPushDown` for the test related to 
ParallelismProvider in source
   - Test `SourceFunctionProvider`, `InputFormatProvider` and 
`DataStreamScanProvider` working with `ParallelismProvider`
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (**yes** / no)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (**yes** / no)
 - If yes, how is the feature documented? (not applicable / **docs** / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-28915] Support fetching artifacts in native K8s and standalone application mode [flink]

2024-01-18 Thread via GitHub



ferenc-csaky commented on code in PR #24065:
URL: https://github.com/apache/flink/pull/24065#discussion_r1457104509


##
flink-clients/src/main/java/org/apache/flink/client/cli/ArtifactFetchOptions.java:
##
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.client.cli;
+
+import org.apache.flink.configuration.ConfigOption;
+import org.apache.flink.configuration.ConfigOptions;
+
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.flink.configuration.ConfigOptions.key;
+
+/** Artifact Fetch options. */
+public class ArtifactFetchOptions {
+
+public static final ConfigOption ARTIFACT_BASE_DIR =
+ConfigOptions.key("user.artifacts.base-dir")
+.stringType()
+.defaultValue("/opt/flink/artifacts")
+.withDescription("The base dir to put the application job 
artifacts.");
+
+public static final ConfigOption> ARTIFACT_LIST =
+key("user.artifacts.artifact-list")
+.stringType()
+.asList()
+.noDefaultValue()
+.withDescription(
+"A semicolon-separated list of the additional 
artifacts to fetch for the job before setting up the application cluster."
++ "All of these have to be valid URIs. 
Example: s3://sandbox-bucket/format.jar,http://sandbox-server:1234/udf.jar;);
+
+public static final ConfigOption> 
ARTIFACT_HTTP_HEADERS =

Review Comment:
   It could hold auth info, so I think you have a good point, added it to the 
sensitive list.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-34131) Checkpoint check window should take in account checkpoint job configuration

2024-01-18 Thread Nicolas Fraison (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Fraison updated FLINK-34131:

Description: 
When enabling checkpoint progress check 
(kubernetes.operator.cluster.health-check.checkpoint-progress.enabled) to 
define cluster health the operator rely detect if a checkpoint has been 
performed during the 
kubernetes.operator.cluster.health-check.checkpoint-progress.window

As indicated in the doc it must be bigger to checkpointing interval.

But this is a manual configuration which can leads to misconfiguration and 
unwanted restart of the flink cluster if the checkpointing interval is bigger 
than the window one.

The operator must check that the config is healthy before to rely on this 
check. If it is not well set it should not execute the check (return true on 
[evaluateCheckpoints|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/observer/ClusterHealthEvaluator.java#L197C1-L199C50])
 and log a WARN message.

Also flink jobs have other checkpointing parameters that should be taken in 
account for this window configuration which are execution.checkpointing.timeout 
and execution.checkpointing.tolerable-failed-checkpoints

The idea would be to check that 
kubernetes.operator.cluster.health-check.checkpoint-progress.window >= 
max(execution.checkpointing.interval, execution.checkpointing.timeout * 
execution.checkpointing.tolerable-failed-checkpoints)

  was:
When enabling checkpoint progress check 
(kubernetes.operator.cluster.health-check.checkpoint-progress.enabled) to 
define cluster health the operator rely detect if a checkpoint has been 
performed during the 
kubernetes.operator.cluster.health-check.checkpoint-progress.window

As indicated in the doc it must be bigger to checkpointing interval.

But this is a manual configuration which can leads to misconfiguration and 
unwanted restart of the flink cluster if the checkpointing interval is bigger 
than the window one.

The operator must check that the config is healthy before to rely on this 
check. If it is not well set it should not execute the check (return true on 
[evaluateCheckpoints|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/observer/ClusterHealthEvaluator.java#L197C1-L199C50])
 and log a WARN message.

Also flink jobs have other checkpointing parameters that should be taken in 
account for this window configuration which are execution.checkpointing.timeout 
and execution.checkpointing.tolerable-failed-checkpoints

The idea would be to check that 
kubernetes.operator.cluster.health-check.checkpoint-progress.window is at >= to 
(execution.checkpointing.interval + execution.checkpointing.timeout) * 
execution.checkpointing.tolerable-failed-checkpoints


> Checkpoint check window should take in account checkpoint job configuration
> ---
>
> Key: FLINK-34131
> URL: https://issues.apache.org/jira/browse/FLINK-34131
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Nicolas Fraison
>Priority: Minor
>
> When enabling checkpoint progress check 
> (kubernetes.operator.cluster.health-check.checkpoint-progress.enabled) to 
> define cluster health the operator rely detect if a checkpoint has been 
> performed during the 
> kubernetes.operator.cluster.health-check.checkpoint-progress.window
> As indicated in the doc it must be bigger to checkpointing interval.
> But this is a manual configuration which can leads to misconfiguration and 
> unwanted restart of the flink cluster if the checkpointing interval is bigger 
> than the window one.
> The operator must check that the config is healthy before to rely on this 
> check. If it is not well set it should not execute the check (return true on 
> [evaluateCheckpoints|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/observer/ClusterHealthEvaluator.java#L197C1-L199C50])
>  and log a WARN message.
> Also flink jobs have other checkpointing parameters that should be taken in 
> account for this window configuration which are 
> execution.checkpointing.timeout and 
> execution.checkpointing.tolerable-failed-checkpoints
> The idea would be to check that 
> kubernetes.operator.cluster.health-check.checkpoint-progress.window >= 
> max(execution.checkpointing.interval, execution.checkpointing.timeout * 
> execution.checkpointing.tolerable-failed-checkpoints)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-34131) Checkpoint check window should take in account checkpoint job configuration

2024-01-18 Thread Nicolas Fraison (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Fraison updated FLINK-34131:

Priority: Minor  (was: Major)

> Checkpoint check window should take in account checkpoint job configuration
> ---
>
> Key: FLINK-34131
> URL: https://issues.apache.org/jira/browse/FLINK-34131
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Nicolas Fraison
>Priority: Minor
>
> When enabling checkpoint progress check 
> (kubernetes.operator.cluster.health-check.checkpoint-progress.enabled) to 
> define cluster health the operator rely detect if a checkpoint has been 
> performed during the 
> kubernetes.operator.cluster.health-check.checkpoint-progress.window
> As indicated in the doc it must be bigger to checkpointing interval.
> But this is a manual configuration which can leads to misconfiguration and 
> unwanted restart of the flink cluster if the checkpointing interval is bigger 
> than the window one.
> The operator must check that the config is healthy before to rely on this 
> check. If it is not well set it should not execute the check (return true on 
> [evaluateCheckpoints|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/observer/ClusterHealthEvaluator.java#L197C1-L199C50])
>  and log a WARN message.
> Also flink jobs have other checkpointing parameters that should be taken in 
> account for this window configuration which are 
> execution.checkpointing.timeout and 
> execution.checkpointing.tolerable-failed-checkpoints
> The idea would be to check that 
> kubernetes.operator.cluster.health-check.checkpoint-progress.window is at >= 
> to (execution.checkpointing.interval + execution.checkpointing.timeout) * 
> execution.checkpointing.tolerable-failed-checkpoints



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-33914][ci] Adds basic Flink CI workflow [flink]

2024-01-18 Thread via GitHub



XComp commented on PR #23970:
URL: https://github.com/apache/flink/pull/23970#issuecomment-1898009721

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-34140) Rename WindowContext and TriggerContext in window

2024-01-18 Thread Martijn Visser (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-34140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808069#comment-17808069
 ] 

Martijn Visser commented on FLINK-34140:


[~xuyangzhong] Are these internal renames, or changes to 
Public/PublicInternal/Experimental interfaces?

> Rename WindowContext and TriggerContext in window
> -
>
> Key: FLINK-34140
> URL: https://issues.apache.org/jira/browse/FLINK-34140
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Runtime
>Reporter: xuyang
>Assignee: xuyang
>Priority: Major
>
> Currently, WindowContext and TriggerContext not only contains a series of get 
> methods to obtain context information, but also includes behaviors such as 
> clear.
> Maybe it's better to rename them as WindowDelegator and TriggerDelegator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-33182) Allow metadata columns in NduAnalyzer with ChangelogNormalize

2024-01-18 Thread lincoln lee (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-33182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808066#comment-17808066
 ] 

lincoln lee commented on FLINK-33182:
-

[~twalthr] I've submitted a pr for this, if you could help with the review, 
that would be great.

> Allow metadata columns in NduAnalyzer with ChangelogNormalize
> -
>
> Key: FLINK-33182
> URL: https://issues.apache.org/jira/browse/FLINK-33182
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Timo Walther
>Assignee: lincoln lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, the NduAnalyzer is very strict about metadata columns in updating 
> sources. However, for upsert sources (like Kafka) that contain an incomplete 
> changelog, the planner always adds a ChangelogNormalize node. 
> ChangelogNormalize will make sure that metadata columns can be considered 
> deterministic. So the NduAnalyzer should be satisfied in this case. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-33923) Upgrade upload-artifacts and download-artifacts to version 4

[jira] [Commented] (FLINK-33923) Upgrade upload-artifacts and download-artifacts to version 4

[jira] [Created] (FLINK-34142) TaskManager WorkingDirectory is not removed during shutdown

Re: [PR] [FLINK-34090][core] Introduce SerializerConfig [flink]

Re: [PR] [FLINK-28915] Support fetching artifacts in native K8s and standalone application mode [flink]

[jira] [Updated] (FLINK-33263) Implement ParallelismProvider for sources in the table planner

[PR] [FLINK-33263][table-planner] Implement ParallelismProvider for sources in the table planner [flink]

Re: [PR] [FLINK-28915] Support fetching artifacts in native K8s and standalone application mode [flink]

[jira] [Updated] (FLINK-34131) Checkpoint check window should take in account checkpoint job configuration

[jira] [Updated] (FLINK-34131) Checkpoint check window should take in account checkpoint job configuration

Re: [PR] [FLINK-33914][ci] Adds basic Flink CI workflow [flink]

[jira] [Commented] (FLINK-34140) Rename WindowContext and TriggerContext in window

[jira] [Commented] (FLINK-33182) Allow metadata columns in NduAnalyzer with ChangelogNormalize

< 1 2 3 4

301 - 313 of 313 matches

Site Navigation

Mail list logo

Footer information