[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #153: [FLINK-27000] Support to set JVM args for operator
gyfora commented on code in PR #153: URL: https://github.com/apache/flink-kubernetes-operator/pull/153#discussion_r841366071 ## helm/flink-kubernetes-operator/values.yaml: ## @@ -78,3 +78,8 @@ metrics: imagePullSecrets: [] nameOverride: "" fullnameOverride: "" + +# Set the jvm start up options for webhook and operator +jvmArgs: + webhook: "" + operator: "" Review Comment: I will try to work on this in another ticket, we need to clean up many things . It’s ok as it is -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] Aitozi commented on a diff in pull request #153: [FLINK-27000] Support to set JVM args for operator
Aitozi commented on code in PR #153: URL: https://github.com/apache/flink-kubernetes-operator/pull/153#discussion_r841297739 ## helm/flink-kubernetes-operator/values.yaml: ## @@ -78,3 +78,8 @@ metrics: imagePullSecrets: [] nameOverride: "" fullnameOverride: "" + +# Set the jvm start up options for webhook and operator +jvmArgs: + webhook: "" + operator: "" Review Comment: I also feel the helm option is a little mess now , do you have some suggestion for this ? Or let it be improved in your ticket? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] Aitozi commented on a diff in pull request #153: [FLINK-27000] Support to set JVM args for operator
Aitozi commented on code in PR #153: URL: https://github.com/apache/flink-kubernetes-operator/pull/153#discussion_r841297026 ## docker-entrypoint.sh: ## @@ -27,12 +27,12 @@ if [ "$1" = "help" ]; then elif [ "$1" = "operator" ]; then echo "Starting Operator" -exec java -cp /$FLINK_KUBERNETES_SHADED_JAR:/$OPERATOR_JAR $LOG_CONFIG org.apache.flink.kubernetes.operator.FlinkOperator +exec java $JVM_ARGS -cp /$FLINK_KUBERNETES_SHADED_JAR:/$OPERATOR_JAR $LOG_CONFIG org.apache.flink.kubernetes.operator.FlinkOperator Review Comment: Good point, fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-27042) StreamTaskITCase#testMailboxMetricsScheduling is unstable
Chesnay Schepler created FLINK-27042: Summary: StreamTaskITCase#testMailboxMetricsScheduling is unstable Key: FLINK-27042 URL: https://issues.apache.org/jira/browse/FLINK-27042 Project: Flink Issue Type: Technical Debt Components: Runtime / Task, Tests Affects Versions: 1.15.0 Reporter: Chesnay Schepler Fix For: 1.15.1, 1.16.0 {code:java} java.lang.AssertionError: Expected: a value greater than <0L> but: <0L> was equal to <0L> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8) at org.apache.flink.streaming.runtime.tasks.StreamTaskITCase.testMailboxMetricsScheduling(StreamTaskITCase.java:1823) {code} Can be reproduced locally by looping the test. Probably due to wrong assumptions about 2 time measurements being different. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27041) KafkaSource in batch mode failing on 0 messages in any topic partition
Terxor created FLINK-27041: -- Summary: KafkaSource in batch mode failing on 0 messages in any topic partition Key: FLINK-27041 URL: https://issues.apache.org/jira/browse/FLINK-27041 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: 1.14.4 Environment: Kafka cluster version: 3.1.0 Flink version 1.14.4 Reporter: Terxor First let's take the case of consuming from a Kafka topic with a single partition having 0 messages. Execution in batch mode, with bounded offsets set to latest, is expected to finish gracefully. However, it fails with an exception. Consider this minimal working example (assume that test_topic exists with 1 partition and 0 messages): {code:java} final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setRuntimeMode(RuntimeExecutionMode.BATCH); KafkaSource kafkaSource = KafkaSource .builder() .setBootstrapServers("localhost:9092") .setTopics("test_topic") .setValueOnlyDeserializer(new SimpleStringSchema()) .setBounded(OffsetsInitializer.latest()) .build(); DataStream stream = env.fromSource( kafkaSource, WatermarkStrategy.noWatermarks(), "Kafka Source" ); stream.print(); env.execute("Flink KafkaSource test job"); {code} This produces exception: {code} Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144) ... [omitted for readability] Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) ... [omitted for readability] Caused by: java.lang.RuntimeException: One or more fetchers have encountered exception at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:225) at org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:169) at org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:130) at org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:351) at org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:68) at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:496) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958) at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.RuntimeException: SplitFetcher thread 0 received unexpected exception while polling the records at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:150) at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:105) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ... 1 more Caused by: java.lang.IllegalStateException: Consumer is not subscribed to any topics or assigned any partitions at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1223) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1211) at
[jira] [Created] (FLINK-27040) Better structure for the helm chart configs
Gyula Fora created FLINK-27040: -- Summary: Better structure for the helm chart configs Key: FLINK-27040 URL: https://issues.apache.org/jira/browse/FLINK-27040 Project: Flink Issue Type: Sub-task Components: Kubernetes Operator Reporter: Gyula Fora Fix For: kubernetes-operator-1.0.0 The config options and structure in the helm chart is very chaotic. There are some options that are not prefixed and there are some that are prefixed with operater/flink/job etc. We should try to improve and simplify with more hierarchical structures. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27039) Publish Javadocs for Kubernetes Operator
Márton Balassi created FLINK-27039: -- Summary: Publish Javadocs for Kubernetes Operator Key: FLINK-27039 URL: https://issues.apache.org/jira/browse/FLINK-27039 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Márton Balassi Fix For: kubernetes-operator-1.0.0 We should add publishing the Javadocs to our existing doc GHA action pipeline here: [https://github.com/apache/flink-kubernetes-operator/blob/main/.github/workflows/docs.yaml] [https://github.com/apache/flink-kubernetes-operator/blob/main/.github/workflows/docs.sh#L39-L51] It is based on the work done in Flink: [https://github.com/apache/flink/blob/master/.github/workflows/docs.sh] Results should be on nightly docs: https://nightlies.apache.org/flink/flink-docs-release-1.14/api/java/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27038) Publish Javadocs for Kubernetes Operator
Márton Balassi created FLINK-27038: -- Summary: Publish Javadocs for Kubernetes Operator Key: FLINK-27038 URL: https://issues.apache.org/jira/browse/FLINK-27038 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Affects Versions: kubernetes-operator-0.1.0, kubernetes-operator-1.0.0 Reporter: Márton Balassi We should add publishing the Javadocs to our existing doc GHA action pipeline here: [https://github.com/apache/flink-kubernetes-operator/blob/main/.github/workflows/docs.yaml] [https://github.com/apache/flink-kubernetes-operator/blob/main/.github/workflows/docs.sh#L39-L51] It is based on the work done in Flink: [https://github.com/apache/flink/blob/master/.github/workflows/docs.sh] Results should be on nightly docs: https://nightlies.apache.org/flink/flink-docs-release-1.14/api/java/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27037) Publish Javadocs for Kubernetes Operator
Márton Balassi created FLINK-27037: -- Summary: Publish Javadocs for Kubernetes Operator Key: FLINK-27037 URL: https://issues.apache.org/jira/browse/FLINK-27037 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Affects Versions: kubernetes-operator-0.1.0, kubernetes-operator-1.0.0 Reporter: Márton Balassi We should add publishing the Javadocs to our existing doc GHA action pipeline here: [https://github.com/apache/flink-kubernetes-operator/blob/main/.github/workflows/docs.yaml] [https://github.com/apache/flink-kubernetes-operator/blob/main/.github/workflows/docs.sh#L39-L51] It is based on the work done in Flink: [https://github.com/apache/flink/blob/master/.github/workflows/docs.sh] Results should be on nightly docs: https://nightlies.apache.org/flink/flink-docs-release-1.14/api/java/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27036) Do not build and push images on final release tags
Gyula Fora created FLINK-27036: -- Summary: Do not build and push images on final release tags Key: FLINK-27036 URL: https://issues.apache.org/jira/browse/FLINK-27036 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Gyula Fora Fix For: kubernetes-operator-1.0.0 The current github workflow for building and pushing images executes on all release-* tags and branches. We should change this to only work on release-*-rc* tags and not the final release tags such as release-0.1.0. The reason for this is that final releases always point to the same commit as one of the RC and we should simply re-tag the image instead of rebuilding it. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27035) Reduce memory usage of ByteArrayWrapperSerializerTest
Chesnay Schepler created FLINK-27035: Summary: Reduce memory usage of ByteArrayWrapperSerializerTest Key: FLINK-27035 URL: https://issues.apache.org/jira/browse/FLINK-27035 Project: Flink Issue Type: Technical Debt Components: API / Python, Tests Reporter: Chesnay Schepler Fix For: 1.16.0 The ByteArrayWrapperSerializerTest can consume up to 1gb of heap space on it's own. This is due to the parallelization in SerializerTestBase#testDuplicate running tests 10x in parallel, the test data being duplicate twice, and the test using 32mb arrays. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27034) Use testcontainers for google cloud pubsub e2e test
Chesnay Schepler created FLINK-27034: Summary: Use testcontainers for google cloud pubsub e2e test Key: FLINK-27034 URL: https://issues.apache.org/jira/browse/FLINK-27034 Project: Flink Issue Type: Technical Debt Components: Connectors / Google Cloud PubSub, Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.16.0 We can use testcontainers to simplify the test, which will even make it easier to run in different environments. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27033) YARNITCase failed due to OOM
Matthias Pohl created FLINK-27033: - Summary: YARNITCase failed due to OOM Key: FLINK-27033 URL: https://issues.apache.org/jira/browse/FLINK-27033 Project: Flink Issue Type: Bug Components: Deployment / YARN Affects Versions: 1.16.0 Reporter: Matthias Pohl We experienced a 137 exit code in [this build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=34124=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=33678] while executing {{YARNITCase}}: {code} ##[error]Exit code 137 returned from process: file name '/bin/docker', arguments 'exec -i -u 1004 -w /home/agent05_azpcontainer bb00bf8c80330d042d18da617194edc1ff1a8bf5f73851d8786eb6675d13b5f2 /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'. {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27032) PyFlink end-to-end test failed due to str object not having expected attribute _j_expr
Matthias Pohl created FLINK-27032: - Summary: PyFlink end-to-end test failed due to str object not having expected attribute _j_expr Key: FLINK-27032 URL: https://issues.apache.org/jira/browse/FLINK-27032 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.15.0 Reporter: Matthias Pohl The PyFlink e2e test failed in [this build|https://dev.azure.com/mapohl/flink/_build/results?buildId=926=logs=0e31ee24-31a6-528c-a4bf-45cde9b2a14e=ff03a8fa-e84e-5199-efb2-5433077ce8e2=7124]: {code:java} Apr 01 21:35:58 AttributeError: 'str' object has no attribute '_j_expr' org.apache.flink.client.program.ProgramAbortException: java.lang.RuntimeException: Python process exits with code: 1 at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:140) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:836) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:247) at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1078) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1156) at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1156) Caused by: java.lang.RuntimeException: Python process exits with code: 1 at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:130) ... 13 more {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27031) ChangelogRescalingITCase.test failed due to IllegalStateException
Matthias Pohl created FLINK-27031: - Summary: ChangelogRescalingITCase.test failed due to IllegalStateException Key: FLINK-27031 URL: https://issues.apache.org/jira/browse/FLINK-27031 Project: Flink Issue Type: Bug Components: Runtime / State Backends Affects Versions: 1.16.0 Reporter: Matthias Pohl [This build|https://dev.azure.com/mapohl/flink/_build/results?buildId=923=logs=cc649950-03e9-5fae-8326-2f1ad744b536=a9a20597-291c-5240-9913-a731d46d6dd1=12961] failed in {{ChangelogRescalingITCase.test}}: {code} Apr 01 20:26:53 Caused by: java.lang.IllegalArgumentException: Key group 94 is not in KeyGroupRange{startKeyGroup=96, endKeyGroup=127}. Unless you're directly using low level state access APIs, this is most likely caused by non-deterministic shuffle key (hashCode and equals implementation). Apr 01 20:26:53 at org.apache.flink.runtime.state.KeyGroupRangeOffsets.newIllegalKeyGroupException(KeyGroupRangeOffsets.java:37) Apr 01 20:26:53 at org.apache.flink.runtime.state.heap.StateTable.getMapForKeyGroup(StateTable.java:305) Apr 01 20:26:53 at org.apache.flink.runtime.state.heap.StateTable.get(StateTable.java:261) Apr 01 20:26:53 at org.apache.flink.runtime.state.heap.StateTable.get(StateTable.java:143) Apr 01 20:26:53 at org.apache.flink.runtime.state.heap.HeapListState.add(HeapListState.java:94) Apr 01 20:26:53 at org.apache.flink.state.changelog.ChangelogListState.add(ChangelogListState.java:78) Apr 01 20:26:53 at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:404) Apr 01 20:26:53 at org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:99) Apr 01 20:26:53 at org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:80) Apr 01 20:26:53 at org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39) Apr 01 20:26:53 at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56) Apr 01 20:26:53 at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29) Apr 01 20:26:53 at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:38) Apr 01 20:26:53 at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:233) Apr 01 20:26:53 at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134) Apr 01 20:26:53 at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105) Apr 01 20:26:53 at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) Apr 01 20:26:53 at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:531) Apr 01 20:26:53 at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:227) Apr 01 20:26:53 at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:841) Apr 01 20:26:53 at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:767) Apr 01 20:26:53 at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) Apr 01 20:26:53 at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) Apr 01 20:26:53 at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) Apr 01 20:26:53 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) Apr 01 20:26:53 at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27030) RpcEndpointTest.testCancelScheduledTask failed
Matthias Pohl created FLINK-27030: - Summary: RpcEndpointTest.testCancelScheduledTask failed Key: FLINK-27030 URL: https://issues.apache.org/jira/browse/FLINK-27030 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.15.0, 1.16.0 Reporter: Matthias Pohl [This build|https://dev.azure.com/mapohl/flink/_build/results?buildId=922=logs=9dc1b5dc-bcfa-5f83-eaa7-0cb181ddc267=511d2595-ec54-5ab7-86ce-92f328796f20=9266] failed due to a failure in {{RpcEndpointTest.testCancelScheduledTask}}: {code} Test org.apache.flink.runtime.rpc.RpcEndpointTest.testCancelScheduledRunnable failed with: org.opentest4j.AssertionFailedError: expected: but was: at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) at org.junit.jupiter.api.AssertFalse.assertFalse(AssertFalse.java:40) at org.junit.jupiter.api.AssertFalse.assertFalse(AssertFalse.java:35) at org.junit.jupiter.api.Assertions.assertFalse(Assertions.java:227) at org.apache.flink.runtime.rpc.RpcEndpointTest.testCancelScheduledTask(RpcEndpointTest.java:432) at org.apache.flink.runtime.rpc.RpcEndpointTest.testCancelScheduledRunnable(RpcEndpointTest.java:325 {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27029) DeploymentValidator should take default flink config into account during validation
Gyula Fora created FLINK-27029: -- Summary: DeploymentValidator should take default flink config into account during validation Key: FLINK-27029 URL: https://issues.apache.org/jira/browse/FLINK-27029 Project: Flink Issue Type: Bug Components: Kubernetes Operator Reporter: Gyula Fora Fix For: kubernetes-operator-1.0.0 Currently the DefaultDeploymentValidator only takes the FlinkDeployment object into account. However in places where we validate the presence of config keys we should also consider the default flink config which might already provide default values for the required configs even if the deployment itself doesnt. We should make sure this works correctly both in the operator and the webhook -- This message was sent by Atlassian Jira (v8.20.1#820001)