[jira] [Commented] (FLINK-27836) RocksDBMapState iteration may stop too early for var-length prefixes
[ https://issues.apache.org/jira/browse/FLINK-27836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642013#comment-17642013 ] Nico Kruber commented on FLINK-27836: - Unfortunately, it's not that simple because Flink doesn't have control over the serializer that is used here. Just because Flink's own serializers are (by chance) correct here, doesn't mean that an arbitrary serializer for user data is. And serializer choice is at the user's convenience... > RocksDBMapState iteration may stop too early for var-length prefixes > > > Key: FLINK-27836 > URL: https://issues.apache.org/jira/browse/FLINK-27836 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.13.6, 1.14.4, 1.15.0 >Reporter: Nico Kruber >Priority: Major > > A similar, yet orthogonal, issue to > https://issues.apache.org/jira/browse/FLINK-11141 is that the iterators used > in RocksDBMapState iterate over everything with a matching prefix of > flink-key and namespace. With var-length serializers for either of them, > however, it may return data for unrelated keys and/or namespaces. > It looks like the built-in serializers of Flink are not affected though since > they use a var-length encoding that is prefixed with the object's length and > thus different lengths will not have the same prefix. More exotic > serializers, e.g. relying on a terminating NUL character, may expose the > above-mentioned behaviour, though. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-30072) Cannot assign instance of SerializedLambda to field KeyGroupStreamPartitioner.keySelector
Nico Kruber created FLINK-30072: --- Summary: Cannot assign instance of SerializedLambda to field KeyGroupStreamPartitioner.keySelector Key: FLINK-30072 URL: https://issues.apache.org/jira/browse/FLINK-30072 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.16.0 Reporter: Nico Kruber In application mode, if the {{usrlib}} directories of the JM and TM differ, e.g. same jars but different names, the job is failing and throws this cryptic exception on the JM: {code} 2022-11-17 09:55:12,968 INFO org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler [] - Restarting job. org.apache.flink.streaming.runtime.tasks.StreamTaskException: Could not instantiate outputs in order. at org.apache.flink.streaming.api.graph.StreamConfig.getVertexNonChainedOutputs(StreamConfig.java:537) ~[flink-dist-1.16.0-ok.0.jar:1.16.0-ok.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.createRecordWriters(StreamTask.java:1600) ~[flink-dist-1.16.0-ok.0.jar:1.16.0-ok.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.createRecordWriterDelegate(StreamTask.java:1584) ~[flink-dist-1.16.0-ok.0.jar:1.16.0-ok.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.(StreamTask.java:408) ~[flink-dist-1.16.0-ok.0.jar:1.16.0-ok.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.(StreamTask.java:362) ~[flink-dist-1.16.0-ok.0.jar:1.16.0-ok.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.(StreamTask.java:335) ~[flink-dist-1.16.0-ok.0.jar:1.16.0-ok.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.(StreamTask.java:327) ~[flink-dist-1.16.0-ok.0.jar:1.16.0-ok.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.(StreamTask.java:317) ~[flink-dist-1.16.0-ok.0.jar:1.16.0-ok.0] at org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask.(SourceOperatorStreamTask.java:84) ~[flink-dist-1.16.0-ok.0.jar:1.16.0-ok.0] at jdk.internal.reflect.GeneratedConstructorAccessor38.newInstance(Unknown Source) ~[?:?] at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) ~[?:?] at java.lang.reflect.Constructor.newInstance(Unknown Source) ~[?:?] at org.apache.flink.runtime.taskmanager.Task.loadAndInstantiateInvokable(Task.java:1589) ~[flink-dist-1.16.0-ok.0.jar:1.16.0-ok.0] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:714) ~[flink-dist-1.16.0-ok.0.jar:1.16.0-ok.0] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550) ~[flink-dist-1.16.0-ok.0.jar:1.16.0-ok.0] at java.lang.Thread.run(Unknown Source) ~[?:?] Caused by: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner.keySelector of type org.apache.flink.api.java.functions.KeySelector in instance of org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(Unknown Source) ~[?:?] at java.io.ObjectStreamClass$FieldReflector.checkObjectFieldValueTypes(Unknown Source) ~[?:?] at java.io.ObjectStreamClass.checkObjFieldValueTypes(Unknown Source) ~[?:?] at java.io.ObjectInputStream.defaultCheckFieldValues(Unknown Source) ~[?:?] at java.io.ObjectInputStream.readSerialData(Unknown Source) ~[?:?] at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) ~[?:?] at java.io.ObjectInputStream.readObject0(Unknown Source) ~[?:?] at java.io.ObjectInputStream.defaultReadFields(Unknown Source) ~[?:?] at java.io.ObjectInputStream.readSerialData(Unknown Source) ~[?:?] at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) ~[?:?] at java.io.ObjectInputStream.readObject0(Unknown Source) ~[?:?] at java.io.ObjectInputStream.readObject(Unknown Source) ~[?:?] at java.io.ObjectInputStream.readObject(Unknown Source) ~[?:?] at java.util.ArrayList.readObject(Unknown Source) ~[?:?] at jdk.internal.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:?] at java.lang.reflect.Method.invoke(Unknown Source) ~[?:?] at java.io.ObjectStreamClass.invokeReadObject(Unknown Source) ~[?:?] at java.io.ObjectInputStream.readSerialData(Unknown Source) ~[?:?] at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) ~[?:?] at java.io.ObjectInputStream.readObject0(Unknown Source) ~[?:?] at java.io.ObjectInputStream.readObject(Unknown Source) ~[?:?] at java.io.ObjectInputStream.readObject(Unknown Source) ~[?:?] at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:617) ~[flink-dist-1.16.0-ok.0.jar:1.16.0-ok.0] at org.apache.flink.util.InstantiationUtil.deserial
[jira] [Updated] (FLINK-30045) FromClasspathEntryClassInformationProvider too eager to verify MainClass
[ https://issues.apache.org/jira/browse/FLINK-30045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-30045: Description: {{FromClasspathEntryClassInformationProvider}} is attempting to verify (eagerly) whether the given MainClass is on the user classpath. However, it doesn't handle cases where the main class is inside a nested jar. We actually don't need this check at all since {{PackagedProgram}} is already doing it while attempting to load the main class. Having this once should be enough. was: {{FromClasspathEntryClassInformationProvider}} is attempting to verify (eagerly) whether the given MainClass is on the user classpath. However, it doesn't handle cases where the main class is inside a nested jar. This is something you would see when using such a nested jar file with the {{StandaloneApplicationClusterEntryPoint}}, e.g. from {{standalone-job.sh}} We actually don't need this check at all since {{PackagedProgram}} is already doing it while attempting to load the main class. Having this once should be enough. > FromClasspathEntryClassInformationProvider too eager to verify MainClass > > > Key: FLINK-30045 > URL: https://issues.apache.org/jira/browse/FLINK-30045 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission >Affects Versions: 1.16.0, 1.17.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Fix For: 1.17.0, 1.16.1 > > > {{FromClasspathEntryClassInformationProvider}} is attempting to verify > (eagerly) whether the given MainClass is on the user classpath. However, it > doesn't handle cases where the main class is inside a nested jar. > We actually don't need this check at all since {{PackagedProgram}} is already > doing it while attempting to load the main class. Having this once should be > enough. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-30045) FromClasspathEntryClassInformationProvider too eager to verify MainClass
Nico Kruber created FLINK-30045: --- Summary: FromClasspathEntryClassInformationProvider too eager to verify MainClass Key: FLINK-30045 URL: https://issues.apache.org/jira/browse/FLINK-30045 Project: Flink Issue Type: Bug Components: Client / Job Submission Affects Versions: 1.16.0, 1.17.0 Reporter: Nico Kruber Assignee: Nico Kruber Fix For: 1.17.0, 1.16.1 {{FromClasspathEntryClassInformationProvider}} is attempting to verify (eagerly) whether the given MainClass is on the user classpath. However, it doesn't handle cases where the main class is inside a nested jar. This is something you would see when using such a nested jar file with the {{StandaloneApplicationClusterEntryPoint}}, e.g. from {{standalone-job.sh}} We actually don't need this check at all since {{PackagedProgram}} is already doing it while attempting to load the main class. Having this once should be enough. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-15633) Javadocs cannot be built on Java 11
[ https://issues.apache.org/jira/browse/FLINK-15633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-15633. --- Fix Version/s: 1.17.0 Resolution: Fixed Should be fixed now with [e962c6964929d9943a3884bf8f7e16df6599d77b|https://github.com/apache/flink/commit/e962c6964929d9943a3884bf8f7e16df6599d77b] on {{master}} > Javadocs cannot be built on Java 11 > --- > > Key: FLINK-15633 > URL: https://issues.apache.org/jira/browse/FLINK-15633 > Project: Flink > Issue Type: Sub-task > Components: Build System, Release System >Affects Versions: 1.10.0 >Reporter: Chesnay Schepler >Assignee: Nico Kruber >Priority: Not a Priority > Labels: pull-request-available > Fix For: 1.17.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The javadoc-plugin fails when run Java 11. This isn't a big issue since we do > our releases on JDK 8 anyway, but we should still fix it so users can > reproduce releases on JDK 11. > {code} > java.lang.ExceptionInInitializerError > at > org.apache.maven.plugin.javadoc.AbstractJavadocMojo.(AbstractJavadocMojo.java:190) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:488) > at > com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:86) > at > com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:105) > at > com.google.inject.internal.ConstructorInjector.access$000(ConstructorInjector.java:32) > at > com.google.inject.internal.ConstructorInjector$1.call(ConstructorInjector.java:89) > at > com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:115) > at > com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:133) > at > com.google.inject.internal.ProvisionListenerStackCallback.provision(ProvisionListenerStackCallback.java:68) > at > com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:87) > at > com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:267) > at > com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016) > at > com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1103) > at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012) > at > com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051) > at > org.eclipse.sisu.space.AbstractDeferredClass.get(AbstractDeferredClass.java:48) > at > com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:81) > at > com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:53) > at > com.google.inject.internal.ProviderInternalFactory$1.call(ProviderInternalFactory.java:65) > at > com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:115) > at > com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:133) > at > com.google.inject.internal.ProvisionListenerStackCallback.provision(ProvisionListenerStackCallback.java:68) > at > com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:63) > at > com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:45) > at > com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016) > at > com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092) > at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012) > at org.eclipse.sisu.inject.Guice4$1.get(Guice4.java:162) > at org.eclipse.sisu.inject.LazyBeanEntry.getValue(LazyBeanEntry.java:81) > at > org.eclipse.sisu.plexus.LazyPlexusBean.getValue(LazyPlexusBean.java:51) > at > org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:263) > at > org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:255) > at > org.apache.maven.plugin.internal.DefaultMavenPluginManager.getConfiguredMojo(Defau
[jira] [Commented] (FLINK-25306) Flink CLI end-to-end test timeout on azure
[ https://issues.apache.org/jira/browse/FLINK-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628811#comment-17628811 ] Nico Kruber commented on FLINK-25306: - one more: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=42798&view=logs&j=bea52777-eaf8-5663-8482-18fbc3630e81&t=b2642e3a-5b86-574d-4c8a-f7e2842bfb14&l=3006 > Flink CLI end-to-end test timeout on azure > -- > > Key: FLINK-25306 > URL: https://issues.apache.org/jira/browse/FLINK-25306 > Project: Flink > Issue Type: Bug > Components: Build System / Azure Pipelines, Client / Job Submission >Affects Versions: 1.15.0 >Reporter: Yun Gao >Priority: Minor > Labels: auto-deprioritized-critical, auto-deprioritized-major, > test-stability > > {code:java} > Dec 14 02:14:48 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:16:59 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:19:10 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:21:21 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:23:32 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:25:43 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:27:54 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:30:05 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:32:16 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:34:27 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:36:38 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:38:49 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:41:01 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:43:12 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:45:23 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:47:34 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:49:45 Waiting for Dispatcher REST endpoint to come up... > Dec 14 02:49:46 Dispatcher REST endpoint has not started within a timeout of > 30 sec > Dec 14 02:49:46 [FAIL] Test script contains errors. > Dec 14 02:49:46 Checking for errors... > Dec 14 02:49:46 No errors in log files. > Dec 14 02:49:46 Checking for exceptions... > Dec 14 02:49:46 No exceptions in log files. > Dec 14 02:49:46 Checking for non-empty .out files... > Dec 14 02:49:46 No non-empty .out files. > Dec 14 02:49:46 > Dec 14 02:49:46 [FAIL] 'Flink CLI end-to-end test' failed after 65 minutes > and 35 seconds! Test exited with exit code 1 > Dec 14 02:49:46 > 02:49:46 ##[group]Environment Information > Dec 14 02:49:46 Searching for .dump, .dumpstream and related files in > '/home/vsts/work/1/s' > dmesg: read kernel buffer failed: Operation not permitted > Dec 14 02:49:48 Stopping taskexecutor daemon (pid: 93858) on host > fv-az231-497. > Dec 14 02:49:49 Stopping standalonesession daemon (pid: 93605) on host > fv-az231-497. > The STDIO streams did not close within 10 seconds of the exit event from > process '/usr/bin/bash'. This may indicate a child process inherited the > STDIO streams and has not yet exited. > ##[error]Bash exited with code '1'. > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=28066&view=logs&j=2ffee335-fb12-54a6-1ba9-9610c8a56b81&t=ad628523-4b0b-5f7d-41f5-e8e2e6921343&l=108 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-29865) Allow configuring the JDK in build-nightly-dist.yml
[ https://issues.apache.org/jira/browse/FLINK-29865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-29865. --- Resolution: Fixed Fixed on {{master}} via [2306e448bfa8ca78e2bf13e34d2838939bce31e1|https://github.com/apache/flink/commit/2306e448bfa8ca78e2bf13e34d2838939bce31e1] > Allow configuring the JDK in build-nightly-dist.yml > --- > > Key: FLINK-29865 > URL: https://issues.apache.org/jira/browse/FLINK-29865 > Project: Flink > Issue Type: Improvement > Components: Build System / Azure Pipelines >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.17.0 > > > {{build-nightly-dist.yml}} currently uses the default JDK from > https://github.com/flink-ci/flink-ci-docker which happens to be Java 1.8 that > we use for releases. We should > # not rely on this default being set to 1.8 and > # be able to configure this in the workflows themselves -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-29867) Update maven-enforcer-plugin to 3.1.0
[ https://issues.apache.org/jira/browse/FLINK-29867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-29867. --- Resolution: Fixed fixed in {{master}} via [d0c6075be47e3d277a418d9fdefe513eeeb0c0f2|https://github.com/apache/flink/commit/d0c6075be47e3d277a418d9fdefe513eeeb0c0f2] > Update maven-enforcer-plugin to 3.1.0 > - > > Key: FLINK-29867 > URL: https://issues.apache.org/jira/browse/FLINK-29867 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.17.0 > > > We currently rely on 3.0.0-M1 but will have to skip 3.0.0 (final) due to > MENFORCER-394 which hits Flink's current code base as well -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-29868) Dependency convergence error for org.osgi:org.osgi.core:jar
[ https://issues.apache.org/jira/browse/FLINK-29868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-29868. --- Resolution: Fixed fixed in {{master}} via [614fc2a5fd301789f7daa1282a5b500bf8d67d4b|https://github.com/apache/flink/commit/614fc2a5fd301789f7daa1282a5b500bf8d67d4b] > Dependency convergence error for org.osgi:org.osgi.core:jar > --- > > Key: FLINK-29868 > URL: https://issues.apache.org/jira/browse/FLINK-29868 > Project: Flink > Issue Type: Bug > Components: Build System, Table SQL / Runtime >Affects Versions: 1.17.0 >Reporter: Nico Kruber >Assignee: Chesnay Schepler >Priority: Major > Fix For: 1.17.0 > > > While working on FLINK-29867, the following new error is popping up while > running > {code} > ./mvnw clean install -pl flink-dist -am -DskipTests > -Dflink.convergence.phase=install -Pcheck-convergence > {code} > (this is also done by CI which therefore fails) > {code} > [WARNING] > Dependency convergence error for org.osgi:org.osgi.core:jar:4.3.0:runtime > paths to dependency are: > +-org.apache.flink:flink-table-planner-loader-bundle:jar:1.17-SNAPSHOT > +-org.apache.flink:flink-table-planner_2.12:jar:1.17-SNAPSHOT:runtime > +-org.apache.flink:flink-table-api-java-bridge:jar:1.17-SNAPSHOT:runtime > +-org.apache.flink:flink-streaming-java:jar:1.17-SNAPSHOT:runtime > +-org.apache.flink:flink-runtime:jar:1.17-SNAPSHOT:runtime > +-org.xerial.snappy:snappy-java:jar:1.1.8.3:runtime > +-org.osgi:org.osgi.core:jar:4.3.0:runtime > and > +-org.apache.flink:flink-table-planner-loader-bundle:jar:1.17-SNAPSHOT > +-org.apache.flink:flink-table-planner_2.12:jar:1.17-SNAPSHOT:runtime > +-org.apache.flink:flink-scala_2.12:jar:1.17-SNAPSHOT:runtime > +-org.apache.flink:flink-core:jar:1.17-SNAPSHOT:runtime > +-org.apache.commons:commons-compress:jar:1.21:runtime > +-org.osgi:org.osgi.core:jar:6.0.0:runtime > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29884) Flaky test failure in finegrained_resource_management/SortMergeResultPartitionTest.testRelease
Nico Kruber created FLINK-29884: --- Summary: Flaky test failure in finegrained_resource_management/SortMergeResultPartitionTest.testRelease Key: FLINK-29884 URL: https://issues.apache.org/jira/browse/FLINK-29884 Project: Flink Issue Type: Bug Components: Runtime / Coordination, Runtime / Network, Tests Affects Versions: 1.17.0 Reporter: Nico Kruber Fix For: 1.17.0 {{SortMergeResultPartitionTest.testRelease}} failed with a timeout in the finegrained_resource_management tests: {code:java} Nov 03 17:28:07 [ERROR] Tests run: 20, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 64.649 s <<< FAILURE! - in org.apache.flink.runtime.io.network.partition.SortMergeResultPartitionTest Nov 03 17:28:07 [ERROR] SortMergeResultPartitionTest.testRelease Time elapsed: 60.009 s <<< ERROR! Nov 03 17:28:07 org.junit.runners.model.TestTimedOutException: test timed out after 60 seconds Nov 03 17:28:07 at org.apache.flink.runtime.io.network.partition.SortMergeResultPartitionTest.testRelease(SortMergeResultPartitionTest.java:374) Nov 03 17:28:07 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Nov 03 17:28:07 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Nov 03 17:28:07 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Nov 03 17:28:07 at java.lang.reflect.Method.invoke(Method.java:498) Nov 03 17:28:07 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) Nov 03 17:28:07 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) Nov 03 17:28:07 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) Nov 03 17:28:07 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) Nov 03 17:28:07 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) Nov 03 17:28:07 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) Nov 03 17:28:07 at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) Nov 03 17:28:07 at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) Nov 03 17:28:07 at java.util.concurrent.FutureTask.run(FutureTask.java:266) Nov 03 17:28:07 at java.lang.Thread.run(Thread.java:748) {code} [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=42806&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-29884) Flaky test failure in finegrained_resource_management/SortMergeResultPartitionTest.testRelease
[ https://issues.apache.org/jira/browse/FLINK-29884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-29884: Labels: test-stability (was: ) > Flaky test failure in > finegrained_resource_management/SortMergeResultPartitionTest.testRelease > -- > > Key: FLINK-29884 > URL: https://issues.apache.org/jira/browse/FLINK-29884 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Runtime / Network, Tests >Affects Versions: 1.17.0 >Reporter: Nico Kruber >Priority: Major > Labels: test-stability > Fix For: 1.17.0 > > > {{SortMergeResultPartitionTest.testRelease}} failed with a timeout in the > finegrained_resource_management tests: > {code:java} > Nov 03 17:28:07 [ERROR] Tests run: 20, Failures: 0, Errors: 1, Skipped: 0, > Time elapsed: 64.649 s <<< FAILURE! - in > org.apache.flink.runtime.io.network.partition.SortMergeResultPartitionTest > Nov 03 17:28:07 [ERROR] SortMergeResultPartitionTest.testRelease Time > elapsed: 60.009 s <<< ERROR! > Nov 03 17:28:07 org.junit.runners.model.TestTimedOutException: test timed out > after 60 seconds > Nov 03 17:28:07 at > org.apache.flink.runtime.io.network.partition.SortMergeResultPartitionTest.testRelease(SortMergeResultPartitionTest.java:374) > Nov 03 17:28:07 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Nov 03 17:28:07 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Nov 03 17:28:07 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Nov 03 17:28:07 at java.lang.reflect.Method.invoke(Method.java:498) > Nov 03 17:28:07 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Nov 03 17:28:07 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Nov 03 17:28:07 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Nov 03 17:28:07 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Nov 03 17:28:07 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > Nov 03 17:28:07 at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > Nov 03 17:28:07 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > Nov 03 17:28:07 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > Nov 03 17:28:07 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Nov 03 17:28:07 at java.lang.Thread.run(Thread.java:748) {code} > [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=42806&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-15633) Javadocs cannot be built on Java 11
[ https://issues.apache.org/jira/browse/FLINK-15633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-15633: Labels: pull-request-available (was: auto-deprioritized-minor pull-request-available) > Javadocs cannot be built on Java 11 > --- > > Key: FLINK-15633 > URL: https://issues.apache.org/jira/browse/FLINK-15633 > Project: Flink > Issue Type: Sub-task > Components: Build System, Release System >Affects Versions: 1.10.0 >Reporter: Chesnay Schepler >Assignee: Nico Kruber >Priority: Not a Priority > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The javadoc-plugin fails when run Java 11. This isn't a big issue since we do > our releases on JDK 8 anyway, but we should still fix it so users can > reproduce releases on JDK 11. > {code} > java.lang.ExceptionInInitializerError > at > org.apache.maven.plugin.javadoc.AbstractJavadocMojo.(AbstractJavadocMojo.java:190) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:488) > at > com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:86) > at > com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:105) > at > com.google.inject.internal.ConstructorInjector.access$000(ConstructorInjector.java:32) > at > com.google.inject.internal.ConstructorInjector$1.call(ConstructorInjector.java:89) > at > com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:115) > at > com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:133) > at > com.google.inject.internal.ProvisionListenerStackCallback.provision(ProvisionListenerStackCallback.java:68) > at > com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:87) > at > com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:267) > at > com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016) > at > com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1103) > at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012) > at > com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051) > at > org.eclipse.sisu.space.AbstractDeferredClass.get(AbstractDeferredClass.java:48) > at > com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:81) > at > com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:53) > at > com.google.inject.internal.ProviderInternalFactory$1.call(ProviderInternalFactory.java:65) > at > com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:115) > at > com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:133) > at > com.google.inject.internal.ProvisionListenerStackCallback.provision(ProvisionListenerStackCallback.java:68) > at > com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:63) > at > com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:45) > at > com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016) > at > com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092) > at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012) > at org.eclipse.sisu.inject.Guice4$1.get(Guice4.java:162) > at org.eclipse.sisu.inject.LazyBeanEntry.getValue(LazyBeanEntry.java:81) > at > org.eclipse.sisu.plexus.LazyPlexusBean.getValue(LazyPlexusBean.java:51) > at > org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:263) > at > org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:255) > at > org.apache.maven.plugin.internal.DefaultMavenPluginManager.getConfiguredMojo(DefaultMavenPluginManager.java:519) > at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:121) > at
[jira] [Created] (FLINK-29868) Dependency convergence error for org.osgi:org.osgi.core:jar
Nico Kruber created FLINK-29868: --- Summary: Dependency convergence error for org.osgi:org.osgi.core:jar Key: FLINK-29868 URL: https://issues.apache.org/jira/browse/FLINK-29868 Project: Flink Issue Type: Bug Components: Build System, Table SQL / Runtime Affects Versions: 1.17.0 Reporter: Nico Kruber Fix For: 1.17.0 While working on FLINK-29867, the following new error is popping up while running {code} ./mvnw clean install -pl flink-dist -am -DskipTests -Dflink.convergence.phase=install -Pcheck-convergence {code} (this is also done by CI which therefore fails) {code} [WARNING] Dependency convergence error for org.osgi:org.osgi.core:jar:4.3.0:runtime paths to dependency are: +-org.apache.flink:flink-table-planner-loader-bundle:jar:1.17-SNAPSHOT +-org.apache.flink:flink-table-planner_2.12:jar:1.17-SNAPSHOT:runtime +-org.apache.flink:flink-table-api-java-bridge:jar:1.17-SNAPSHOT:runtime +-org.apache.flink:flink-streaming-java:jar:1.17-SNAPSHOT:runtime +-org.apache.flink:flink-runtime:jar:1.17-SNAPSHOT:runtime +-org.xerial.snappy:snappy-java:jar:1.1.8.3:runtime +-org.osgi:org.osgi.core:jar:4.3.0:runtime and +-org.apache.flink:flink-table-planner-loader-bundle:jar:1.17-SNAPSHOT +-org.apache.flink:flink-table-planner_2.12:jar:1.17-SNAPSHOT:runtime +-org.apache.flink:flink-scala_2.12:jar:1.17-SNAPSHOT:runtime +-org.apache.flink:flink-core:jar:1.17-SNAPSHOT:runtime +-org.apache.commons:commons-compress:jar:1.21:runtime +-org.osgi:org.osgi.core:jar:6.0.0:runtime {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29867) Update maven-enforcer-plugin to 3.1.0
Nico Kruber created FLINK-29867: --- Summary: Update maven-enforcer-plugin to 3.1.0 Key: FLINK-29867 URL: https://issues.apache.org/jira/browse/FLINK-29867 Project: Flink Issue Type: Improvement Components: Build System Reporter: Nico Kruber Assignee: Nico Kruber Fix For: 1.17.0 We currently rely on 3.0.0-M1 but will have to skip 3.0.0 (final) due to MENFORCER-394 which hits Flink's current code base as well -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29865) Allow configuring the JDK in build-nightly-dist.yml
Nico Kruber created FLINK-29865: --- Summary: Allow configuring the JDK in build-nightly-dist.yml Key: FLINK-29865 URL: https://issues.apache.org/jira/browse/FLINK-29865 Project: Flink Issue Type: Improvement Components: Build System / Azure Pipelines Reporter: Nico Kruber Assignee: Nico Kruber Fix For: 1.17.0 {{build-nightly-dist.yml}} currently uses the default JDK from https://github.com/flink-ci/flink-ci-docker which happens to be Java 1.8 that we use for releases. We should # not rely on this default being set to 1.8 and # be able to configure this in the workflows themselves -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-15633) Javadocs cannot be built on Java 11
[ https://issues.apache.org/jira/browse/FLINK-15633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reassigned FLINK-15633: --- Assignee: Nico Kruber > Javadocs cannot be built on Java 11 > --- > > Key: FLINK-15633 > URL: https://issues.apache.org/jira/browse/FLINK-15633 > Project: Flink > Issue Type: Sub-task > Components: Build System, Release System >Affects Versions: 1.10.0 >Reporter: Chesnay Schepler >Assignee: Nico Kruber >Priority: Not a Priority > Labels: auto-deprioritized-minor, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The javadoc-plugin fails when run Java 11. This isn't a big issue since we do > our releases on JDK 8 anyway, but we should still fix it so users can > reproduce releases on JDK 11. > {code} > java.lang.ExceptionInInitializerError > at > org.apache.maven.plugin.javadoc.AbstractJavadocMojo.(AbstractJavadocMojo.java:190) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:488) > at > com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:86) > at > com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:105) > at > com.google.inject.internal.ConstructorInjector.access$000(ConstructorInjector.java:32) > at > com.google.inject.internal.ConstructorInjector$1.call(ConstructorInjector.java:89) > at > com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:115) > at > com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:133) > at > com.google.inject.internal.ProvisionListenerStackCallback.provision(ProvisionListenerStackCallback.java:68) > at > com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:87) > at > com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:267) > at > com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016) > at > com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1103) > at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012) > at > com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051) > at > org.eclipse.sisu.space.AbstractDeferredClass.get(AbstractDeferredClass.java:48) > at > com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:81) > at > com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:53) > at > com.google.inject.internal.ProviderInternalFactory$1.call(ProviderInternalFactory.java:65) > at > com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:115) > at > com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:133) > at > com.google.inject.internal.ProvisionListenerStackCallback.provision(ProvisionListenerStackCallback.java:68) > at > com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:63) > at > com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:45) > at > com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016) > at > com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092) > at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012) > at org.eclipse.sisu.inject.Guice4$1.get(Guice4.java:162) > at org.eclipse.sisu.inject.LazyBeanEntry.getValue(LazyBeanEntry.java:81) > at > org.eclipse.sisu.plexus.LazyPlexusBean.getValue(LazyPlexusBean.java:51) > at > org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:263) > at > org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:255) > at > org.apache.maven.plugin.internal.DefaultMavenPluginManager.getConfiguredMojo(DefaultMavenPluginManager.java:519) > at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:121) > at > org.apache.maven.lifecycle.int
[jira] [Created] (FLINK-29643) Possible NPE in ApplicationDispatcherBootstrap with failedJob submission and no HA
Nico Kruber created FLINK-29643: --- Summary: Possible NPE in ApplicationDispatcherBootstrap with failedJob submission and no HA Key: FLINK-29643 URL: https://issues.apache.org/jira/browse/FLINK-29643 Project: Flink Issue Type: Bug Components: Client / Job Submission Affects Versions: 1.15.2, 1.16.0, 1.17.0 Reporter: Nico Kruber If - {{PipelineOptionsInternal.PIPELINE_FIXED_JOB_ID}} is not set, and - high availabibility is not activated, and - {{DeploymentOptions.SUBMIT_FAILED_JOB_ON_APPLICATION_ERROR}} is set, then a failure in job submission may fail with an NPE since the appropriate code in {{ApplicationDispatcherBootstrap#runApplicationEntryPoint()}} is trying to read the {{failedJobId}} from the configuration where it will not be present in these cases. Please refer to the conditions that set the {{jobId}} in {{ApplicationDispatcherBootstrap.fixJobIdAndRunApplicationAsync()}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-29208) Logs and stdout endpoints not mentioned in Docs or OpenAPI spec
[ https://issues.apache.org/jira/browse/FLINK-29208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-29208: Summary: Logs and stdout endpoints not mentioned in Docs or OpenAPI spec (was: Logs and stdout endpoints not mentioned on OpenAPI spec) > Logs and stdout endpoints not mentioned in Docs or OpenAPI spec > --- > > Key: FLINK-29208 > URL: https://issues.apache.org/jira/browse/FLINK-29208 > Project: Flink > Issue Type: Bug > Components: Runtime / REST >Affects Versions: 1.16.0, 1.15.2 >Reporter: Nico Kruber >Priority: Major > > Using Flink's web UI and clicking on "Stdout" or "Logs" in a JM or TM > accesses endpoints {{/jobmanager/logs}} and {{/jobmanager/stdout}} (and > similar for TMs) but these don't seem to exist in the [REST API > docs|https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/] > or the [REST API OpenAPI > spec|https://nightlies.apache.org/flink/flink-docs-master/generated/rest_v1_dispatcher.yml]. > Either these should become some webui-internal APIs (for which no concept > exists at the moment), or these endpoints should be added to the docs and > spec. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-29208) Logs and stdout endpoints not mentioned on OpenAPI spec
[ https://issues.apache.org/jira/browse/FLINK-29208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-29208: Affects Version/s: 1.16.0 > Logs and stdout endpoints not mentioned on OpenAPI spec > --- > > Key: FLINK-29208 > URL: https://issues.apache.org/jira/browse/FLINK-29208 > Project: Flink > Issue Type: Bug > Components: Runtime / REST >Affects Versions: 1.16.0, 1.15.2 >Reporter: Nico Kruber >Priority: Major > > Using Flink's web UI and clicking on "Stdout" or "Logs" in a JM or TM > accesses endpoints {{/jobmanager/logs}} and {{/jobmanager/stdout}} (and > similar for TMs) but these don't seem to exist in the [REST API > docs|https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/] > or the [REST API OpenAPI > spec|https://nightlies.apache.org/flink/flink-docs-master/generated/rest_v1_dispatcher.yml]. > Either these should become some webui-internal APIs (for which no concept > exists at the moment), or these endpoints should be added to the docs and > spec. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29208) Logs and stdout endpoints not mentioned on OpenAPI spec
Nico Kruber created FLINK-29208: --- Summary: Logs and stdout endpoints not mentioned on OpenAPI spec Key: FLINK-29208 URL: https://issues.apache.org/jira/browse/FLINK-29208 Project: Flink Issue Type: Bug Components: Runtime / REST Affects Versions: 1.15.2 Reporter: Nico Kruber Using Flink's web UI and clicking on "Stdout" or "Logs" in a JM or TM accesses endpoints {{/jobmanager/logs}} and {{/jobmanager/stdout}} (and similar for TMs) but these don't seem to exist in the [REST API docs|https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/] or the [REST API OpenAPI spec|https://nightlies.apache.org/flink/flink-docs-master/generated/rest_v1_dispatcher.yml]. Either these should become some webui-internal APIs (for which no concept exists at the moment), or these endpoints should be added to the docs and spec. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-27836) RocksDBMapState iteration may stop too early for var-length prefixes
Nico Kruber created FLINK-27836: --- Summary: RocksDBMapState iteration may stop too early for var-length prefixes Key: FLINK-27836 URL: https://issues.apache.org/jira/browse/FLINK-27836 Project: Flink Issue Type: Bug Components: Runtime / State Backends Affects Versions: 1.14.4, 1.13.6, 1.15.0 Reporter: Nico Kruber A similar, yet orthogonal, issue to https://issues.apache.org/jira/browse/FLINK-11141 is that the iterators used in RocksDBMapState iterate over everything with a matching prefix of flink-key and namespace. With var-length serializers for either of them, however, it may return data for unrelated keys and/or namespaces. It looks like the built-in serializers of Flink are not affected though since they use a var-length encoding that is prefixed with the object's length and thus different lengths will not have the same prefix. More exotic serializers, e.g. relying on a terminating NUL character, may expose the above-mentioned behaviour, though. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Closed] (FLINK-27353) Update training exercises to use Flink 1.15
[ https://issues.apache.org/jira/browse/FLINK-27353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-27353. --- Fix Version/s: 1.15.0 (was: 1.15.1) Resolution: Fixed fixed in 05791e55ad7ff0358b5c57ea8f40eada4a1f626a > Update training exercises to use Flink 1.15 > --- > > Key: FLINK-27353 > URL: https://issues.apache.org/jira/browse/FLINK-27353 > Project: Flink > Issue Type: New Feature > Components: Documentation / Training / Exercises >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-27353) Update training exercises to use Flink 1.15
Nico Kruber created FLINK-27353: --- Summary: Update training exercises to use Flink 1.15 Key: FLINK-27353 URL: https://issues.apache.org/jira/browse/FLINK-27353 Project: Flink Issue Type: New Feature Components: Documentation / Training / Exercises Reporter: Nico Kruber Assignee: Nico Kruber Fix For: 1.15.0 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (FLINK-24144) Improve DataGenerator to prevent excessive creation of new Random objects
[ https://issues.apache.org/jira/browse/FLINK-24144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reassigned FLINK-24144: --- Assignee: (was: Nico Kruber) > Improve DataGenerator to prevent excessive creation of new Random objects > - > > Key: FLINK-24144 > URL: https://issues.apache.org/jira/browse/FLINK-24144 > Project: Flink > Issue Type: Improvement > Components: Documentation / Training / Exercises >Affects Versions: 1.14.0, 1.13.2 >Reporter: Nico Kruber >Priority: Not a Priority > > For a couple of methods in {{DataGenerator}}, new {{Random}} objects are > created with a specific seed. Instead, we could create a single {{Random}} > object and reset the seed when needed. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (FLINK-24144) Improve DataGenerator to prevent excessive creation of new Random objects
[ https://issues.apache.org/jira/browse/FLINK-24144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-24144: Priority: Not a Priority (was: Major) > Improve DataGenerator to prevent excessive creation of new Random objects > - > > Key: FLINK-24144 > URL: https://issues.apache.org/jira/browse/FLINK-24144 > Project: Flink > Issue Type: Improvement > Components: Documentation / Training / Exercises >Affects Versions: 1.14.0, 1.13.2 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Not a Priority > > For a couple of methods in {{DataGenerator}}, new {{Random}} objects are > created with a specific seed. Instead, we could create a single {{Random}} > object and reset the seed when needed. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (FLINK-26382) Add Chinese documents for flink-training exercises
[ https://issues.apache.org/jira/browse/FLINK-26382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber resolved FLINK-26382. - Fix Version/s: 1.15.0 1.14.5 (was: 1.14.3) Assignee: tonny Resolution: Fixed Fixed on - release-1.14: 0132dd7be8c881607f9a374613309493ade8c6dd, 18e6db2206ca4156e21276b14d35bebaf222c151 - master: aca6c47b79d486eb38969492c7e2dc8cb200d146 > Add Chinese documents for flink-training exercises > -- > > Key: FLINK-26382 > URL: https://issues.apache.org/jira/browse/FLINK-26382 > Project: Flink > Issue Type: New Feature > Components: Documentation / Training / Exercises >Affects Versions: 1.14.3 >Reporter: tonny >Assignee: tonny >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0, 1.14.5 > > > Provide Chinese documents for all `README` and `DISCUSSION` accompanied by > Chinese documents of Flink -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Closed] (FLINK-27322) Add license headers and spotless checks for them
[ https://issues.apache.org/jira/browse/FLINK-27322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-27322. --- Fix Version/s: 1.15.0 Resolution: Fixed Fixed on master via: - a47f38e7b3e7fe0e1bf372d97852da3b997df461 - bc18cea87ef713fb8e103751c588b0703b08ee22 - 108c8b35dc294c511dc5e57b500d4641d89752ee > Add license headers and spotless checks for them > > > Key: FLINK-27322 > URL: https://issues.apache.org/jira/browse/FLINK-27322 > Project: Flink > Issue Type: Bug > Components: Documentation / Training / Exercises >Affects Versions: 1.14.4 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > It looks as if there are a couple of files that are missing their appropriate > license headers, e.g. > https://github.com/apache/flink-training/blob/0b1c83b16065484200564402bef2ca10ef19cb30/common/src/main/java/org/apache/flink/training/exercises/common/datatypes/RideAndFare.java > We should fix that by: > # adding the missing license headers > # adding spotless checks to ensure this doesn't happen again > Potential downside: if a user doing the training exercises creates files on > their own, these would need the license header as well. On the other hand, a > simple `./gradlew spotlessApply` can fix that easily -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (FLINK-25313) Enable flink runtime web
[ https://issues.apache.org/jira/browse/FLINK-25313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber resolved FLINK-25313. - Fix Version/s: 1.15.0 Resolution: Fixed fixed in master via 9c25c54a73578b24b15e9c52e28740460a687f93 > Enable flink runtime web > > > Key: FLINK-25313 > URL: https://issues.apache.org/jira/browse/FLINK-25313 > Project: Flink > Issue Type: Improvement > Components: Documentation / Training / Exercises >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-27322) Add license headers and spotless checks for them
Nico Kruber created FLINK-27322: --- Summary: Add license headers and spotless checks for them Key: FLINK-27322 URL: https://issues.apache.org/jira/browse/FLINK-27322 Project: Flink Issue Type: Bug Components: Documentation / Training / Exercises Affects Versions: 1.14.4 Reporter: Nico Kruber Assignee: Nico Kruber It looks as if there are a couple of files that are missing their appropriate license headers, e.g. https://github.com/apache/flink-training/blob/0b1c83b16065484200564402bef2ca10ef19cb30/common/src/main/java/org/apache/flink/training/exercises/common/datatypes/RideAndFare.java We should fix that by: # adding the missing license headers # adding spotless checks to ensure this doesn't happen again Potential downside: if a user doing the training exercises creates files on their own, these would need the license header as well. On the other hand, a simple `./gradlew spotlessApply` can fix that easily -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (FLINK-20255) Nested decorrelate failed
[ https://issues.apache.org/jira/browse/FLINK-20255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524261#comment-17524261 ] Nico Kruber commented on FLINK-20255: - The following example (which was simplified from a more complex join that makes more sense than this version) also seems to be an incarnation of the described problem (tested in Flink 1.14.3): {code} CREATE TEMPORARY TABLE Messages ( `id` CHAR(1), `userId` TINYINT, `relatedUserIds` ARRAY ) WITH ( 'connector' = 'datagen', 'fields.id.length' = '10', 'fields.userId.kind' = 'random', 'fields.userId.min' = '1', 'fields.userId.max' = '10', 'fields.relatedUserIds.kind' = 'random', 'fields.relatedUserIds.element.min' = '1', 'fields.relatedUserIds.element.max' = '10', 'rows-per-second' = '1000' ); -- the non-working version: SELECT * FROM Messages outer_message WHERE outer_message.userId IN ( SELECT relatedUserId FROM Messages inner_message CROSS JOIN UNNEST(inner_message.relatedUserIds) AS t (relatedUserId) WHERE inner_message.id = outer_message.id ) -- this one is working: /* SELECT * FROM Messages CROSS JOIN UNNEST(relatedUserIds) AS t (relatedUserId) WHERE userId = t.relatedUserId */ {code} It produces the following exception: {code} org.apache.flink.table.api.TableException: unexpected correlate variable $cor1 in the plan at org.apache.flink.table.planner.plan.optimize.program.FlinkDecorrelateProgram.checkCorrelVariableExists(FlinkDecorrelateProgram.scala:57) ~[flink-table_2.11-1.14.3.jar:1.14.3] at org.apache.flink.table.planner.plan.optimize.program.FlinkDecorrelateProgram.optimize(FlinkDecorrelateProgram.scala:42) ~[flink-table_2.11-1.14.3.jar:1.14.3] at org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:63) ~[flink-table_2.11-1.14.3.jar:1.14.3] at org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:60) ~[flink-table_2.11-1.14.3.jar:1.14.3] at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at scala.collection.Iterator$class.foreach(Iterator.scala:891) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at scala.collection.AbstractIterable.foreach(Iterable.scala:54) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:60) ~[flink-table_2.11-1.14.3.jar:1.14.3] at org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:55) ~[flink-table_2.11-1.14.3.jar:1.14.3] at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at scala.collection.immutable.Range.foreach(Range.scala:160) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram.optimize(FlinkGroupProgram.scala:55) ~[flink-table_2.11-1.14.3.jar:1.14.3] at org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram$$anonfun$optimize$1.apply(FlinkChainedProgram.scala:62) ~[flink-table_2.11-1.14.3.jar:1.14.3] at org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram$$anonfun$optimize$1.apply(FlinkChainedProgram.scala:58) ~[flink-table_2.11-1.14.3.jar:1.14.3] at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) ~[flink-dist_2.11-1.14.3.jar:1.14.3] at scala.collection.Iterat
[jira] [Created] (FLINK-26852) RocksDBMapState#clear not forwarding exceptions
Nico Kruber created FLINK-26852: --- Summary: RocksDBMapState#clear not forwarding exceptions Key: FLINK-26852 URL: https://issues.apache.org/jira/browse/FLINK-26852 Project: Flink Issue Type: Bug Components: Runtime / State Backends Affects Versions: 1.14.4, 1.15.0 Reporter: Nico Kruber I accidentally found an inconsistent behaviour in the RocksDB state backend implementation: If there's an exception in {{AbstractRocksDBState#clear()}} it will forward that inside a {{FlinkRuntimeException}}. However, if there's an exception in {{RocksDBMapState#clear}} it will merely print the exception stacktrace and continue as is. I assume, forwarding the exception as {{FlinkRuntimeException}} should be the desired behaviour for both use cases... -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-5129) make the BlobServer use a distributed file system
[ https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17474622#comment-17474622 ] Nico Kruber commented on FLINK-5129: No, blob.storage.directory has to be a local path. I also suggest writing to the user/dev mailing list instead of adding comments to an old (and closed) ticket > make the BlobServer use a distributed file system > - > > Key: FLINK-5129 > URL: https://issues.apache.org/jira/browse/FLINK-5129 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Fix For: 1.3.0 > > Attachments: image-2022-01-11-11-27-59-280.png > > > Currently, the BlobServer uses a local storage and, in addition when the HA > mode is set, a distributed file system, e.g. hdfs. This, however, is only > used by the JobManager and all TaskManager instances request blobs from the > JobManager. By using the distributed file system there as well, we would > lower the load on the JobManager and increase scalability. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-25362) Incorrect dependencies in Table Confluent/Avro docs
[ https://issues.apache.org/jira/browse/FLINK-25362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-25362. --- Fix Version/s: 1.15.0 1.13.6 Resolution: Fixed Fixed in * master: e9dba5e81f35..7bbf368bdfbf * release-1.14: d8b6f896c424..7b24c90c125d * release-1.13: 327113d26e80..e3d06b8807fd > Incorrect dependencies in Table Confluent/Avro docs > --- > > Key: FLINK-25362 > URL: https://issues.apache.org/jira/browse/FLINK-25362 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.12.7, 1.13.5, 1.14.2 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0, 1.13.6, 1.14.3 > > > "Confluent Avro Format" is missing an explanation to also > * add the dependency to flink-avro > * have the confluent repository defined > "Avro Format" should not show the maven dependency to {{flink-sql-avro}} but > instead {{flink-avro}} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25362) Incorrect dependencies in Table Confluent/Avro docs
Nico Kruber created FLINK-25362: --- Summary: Incorrect dependencies in Table Confluent/Avro docs Key: FLINK-25362 URL: https://issues.apache.org/jira/browse/FLINK-25362 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 1.14.2, 1.13.5, 1.12.7 Reporter: Nico Kruber Assignee: Nico Kruber Fix For: 1.14.3 "Confluent Avro Format" is missing an explanation to also * add the dependency to flink-avro * have the confluent repository defined "Avro Format" should not show the maven dependency to {{flink-sql-avro}} but instead {{flink-avro}} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25027) Allow GC of a finished job's JobMaster before the slot timeout is reached
Nico Kruber created FLINK-25027: --- Summary: Allow GC of a finished job's JobMaster before the slot timeout is reached Key: FLINK-25027 URL: https://issues.apache.org/jira/browse/FLINK-25027 Project: Flink Issue Type: Improvement Components: Runtime / Coordination Affects Versions: 1.13.3, 1.12.5, 1.14.0 Reporter: Nico Kruber Attachments: image-2021-11-23-20-32-20-479.png In a session cluster, after a (batch) job is finished, the JobMaster seems to stick around for another couple of minutes before being eligible for garbage collection. Looking into a heap dump, it seems to be tied to a {{PhysicalSlotRequestBulkCheckerImpl}} which is enqueued in the underlying Akka executor (and keeps the JM from being GC’d). Per default the action is scheduled for {{slot.request.timeout}} that defaults to 5 min (thanks [~trohrmann] for helping out here) !image-2021-11-23-20-32-20-479.png! With this setting, you will have to account for enough metaspace to cover 5 minutes of time which may span a couple of jobs, needlessly! The problem seems to be that Flink is using the main thread executor for the scheduling that uses the {{ActorSystem}}'s scheduler and the future task scheduled with Akka can (probably) not be easily cancelled. One idea could be to use a dedicated thread pool per JM, that we shut down when the JM terminates. That way we would not keep the JM from being GC’d. (The concrete example we investigated was a DataSet job) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25023) ClassLoader leak on JM/TM through indirectly-started Hadoop threads out of user code
Nico Kruber created FLINK-25023: --- Summary: ClassLoader leak on JM/TM through indirectly-started Hadoop threads out of user code Key: FLINK-25023 URL: https://issues.apache.org/jira/browse/FLINK-25023 Project: Flink Issue Type: Bug Components: Connectors / FileSystem, Connectors / Hadoop Compatibility, FileSystems Affects Versions: 1.13.3, 1.12.5, 1.14.0 Reporter: Nico Kruber If a Flink job is using HDFS through Flink's filesystem abstraction (either on the JM or TM), that code may actually spawn a few threads, e.g. from static class members: * {{org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner}} * {{IPC Parameter Sending Thread#*}} These threads are started as soon as the classes are loaded which may be in the context of the user code. In this specific scenario, however, the created threads may contain references to the context class loader (I did not see that though) or, as happened here, it may inherit thread contexts such as the {{ProtectionDomain}} (from an {{{}AccessController{}}}). Hence user contexts and user class loaders are leaked into long-running threads that are run in Flink's (parent) classloader. Fortunately, it seems to only *leak a single* {{ChildFirstClassLoader}} in this concrete example but that may depend on which code paths each client execution is walking. A *proper solution* doesn't seem so simple: * We could try to proactively initialize available file systems in the hope to start all threads in the parent classloader with parent context. * We could create a default {{ProtectionDomain}} for spawned threads as discussed at [https://dzone.com/articles/javalangoutofmemory-permgen], however, the {{StatisticsDataReferenceCleaner}} isn't actually actively spawned from any callback but as a static variable and this with the class loading itself (but maybe this is still possible somehow). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25022) ClassLoader leak with ThreadLocals on the JM when submitting a job through the REST API
Nico Kruber created FLINK-25022: --- Summary: ClassLoader leak with ThreadLocals on the JM when submitting a job through the REST API Key: FLINK-25022 URL: https://issues.apache.org/jira/browse/FLINK-25022 Project: Flink Issue Type: Bug Components: Runtime / REST Affects Versions: 1.13.3, 1.12.5, 1.14.0 Reporter: Nico Kruber If a job is submitted using the REST API's {{/jars/:jarid/run}} endpoint, user code has to be executed on the JobManager and it is doing this in a couple of (pooled) dispatcher threads like {{{}Flink-DispatcherRestEndpoint-thread-*{}}}. If the user code is using thread locals (and not cleaning them up), they may remain in the thread with references to the {{ChildFirstClassloader}} of the job and thus leaking that. We saw this for the {{jsoniter}} scala library at the JM which [creates ThreadLocal instances|https://github.com/plokhotnyuk/jsoniter-scala/blob/95c7053cfaa558877911f3448382f10d53c4fcbf/jsoniter-scala-core/jvm/src/main/scala/com/github/plokhotnyuk/jsoniter_scala/core/package.scala] but doesn't remove them, but it can actually happen with any user code or (worse) library used in user code. There are a few *workarounds* a user can use, e.g. putting the library in Flink's lib/ folder or submitting via the Flink CLI, but these may actually not be possible to use, depending on the circumstances. A *proper fix* should happen in Flink by guarding against any of these things in the dispatcher threads. We could, for example, spawn a separate thread for executing the user's {{main()}} method and once the job is submitted exit that thread and destroy all thread locals along with it. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-24783) Improve monitoring experience and usability of state backend
[ https://issues.apache.org/jira/browse/FLINK-24783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439496#comment-17439496 ] Nico Kruber commented on FLINK-24783: - FYI: I'm actually planning to upstream a couple of options from my OptionsFactory that I implemented for this training course (see https://github.com/ververica/flink-training/blob/master/troubleshooting/rocksdb/src/solution/java/com/ververica/flink/training/solutions/RocksDBTuningJobOptionsFactory.java ), most notably the JVM logger [~sjwiesman] mentioned and the choice of the compression algorithm which has proven very effective in our tests. I'll soon create tickets for these... > Improve monitoring experience and usability of state backend > - > > Key: FLINK-24783 > URL: https://issues.apache.org/jira/browse/FLINK-24783 > Project: Flink > Issue Type: Improvement > Components: Runtime / Metrics, Runtime / State Backends >Reporter: Yun Tang >Assignee: Yun Tang >Priority: Major > > This ticket targets for improving the monitoring experiences and usability > for HashMap and EmbededRocksDB state backends. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-24769) FlameGraphs in 1.14 do not aggregate subtasks' stack traces anymore
[ https://issues.apache.org/jira/browse/FLINK-24769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438706#comment-17438706 ] Nico Kruber commented on FLINK-24769: - Actually, it seems like this is the same with Flink 1.13.3, so maybe it is actually a missing feature (I thought that it was different before)... > FlameGraphs in 1.14 do not aggregate subtasks' stack traces anymore > --- > > Key: FLINK-24769 > URL: https://issues.apache.org/jira/browse/FLINK-24769 > Project: Flink > Issue Type: Bug > Components: Runtime / Web Frontend >Affects Versions: 1.14.0, 1.13.3 >Reporter: Nico Kruber >Priority: Major > Attachments: image-2021-11-04-12-59-24-308.png > > > Since Flink 1.14.0, after enabling FlameGraphs and gathering statistics for a > task, it doesn't aggregate the results from the parallel instances anymore > but instead shows each individual one - something that easily gets too messy > for higher parallelism. > It seems the last shared method on the stack is > {{Task.runWithSystemExitMonitoring}} and then it spawns off into individual > lambda functions: > !image-2021-11-04-12-59-24-308.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-24769) FlameGraphs in 1.14 do not aggregate subtasks' stack traces anymore
[ https://issues.apache.org/jira/browse/FLINK-24769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-24769: Affects Version/s: 1.13.3 > FlameGraphs in 1.14 do not aggregate subtasks' stack traces anymore > --- > > Key: FLINK-24769 > URL: https://issues.apache.org/jira/browse/FLINK-24769 > Project: Flink > Issue Type: Bug > Components: Runtime / Web Frontend >Affects Versions: 1.14.0, 1.13.3 >Reporter: Nico Kruber >Priority: Major > Attachments: image-2021-11-04-12-59-24-308.png > > > Since Flink 1.14.0, after enabling FlameGraphs and gathering statistics for a > task, it doesn't aggregate the results from the parallel instances anymore > but instead shows each individual one - something that easily gets too messy > for higher parallelism. > It seems the last shared method on the stack is > {{Task.runWithSystemExitMonitoring}} and then it spawns off into individual > lambda functions: > !image-2021-11-04-12-59-24-308.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24769) FlameGraphs in 1.14 do not aggregate subtasks' stack traces anymore
Nico Kruber created FLINK-24769: --- Summary: FlameGraphs in 1.14 do not aggregate subtasks' stack traces anymore Key: FLINK-24769 URL: https://issues.apache.org/jira/browse/FLINK-24769 Project: Flink Issue Type: Bug Components: Runtime / Web Frontend Affects Versions: 1.14.0 Reporter: Nico Kruber Attachments: image-2021-11-04-12-59-24-308.png Since Flink 1.14.0, after enabling FlameGraphs and gathering statistics for a task, it doesn't aggregate the results from the parallel instances anymore but instead shows each individual one - something that easily gets too messy for higher parallelism. It seems the last shared method on the stack is {{Task.runWithSystemExitMonitoring}} and then it spawns off into individual lambda functions: !image-2021-11-04-12-59-24-308.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (FLINK-21336) Activate bloom filter in RocksDB State Backend via Flink configuration
[ https://issues.apache.org/jira/browse/FLINK-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber resolved FLINK-21336. - Resolution: Duplicate > Activate bloom filter in RocksDB State Backend via Flink configuration > -- > > Key: FLINK-21336 > URL: https://issues.apache.org/jira/browse/FLINK-21336 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Jun Qin >Assignee: Jun Qin >Priority: Major > Labels: auto-unassigned, stale-assigned > > Activating bloom filter in the RocksDB state backend improves read > performance. Currently activating bloom filter can only be done by > implementing a custom ConfigurableRocksDBOptionsFactory. I think we should > provide an option to activate bloom filter via Flink configuration. > See also the discussion in ML: > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Activate-bloom-filter-in-RocksDB-State-Backend-via-Flink-configuration-td48636.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-24432) RocksIteratorWrapper.seekToLast() calls the wrong RocksIterator method
[ https://issues.apache.org/jira/browse/FLINK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reassigned FLINK-24432: --- Assignee: Victor Xu > RocksIteratorWrapper.seekToLast() calls the wrong RocksIterator method > -- > > Key: FLINK-24432 > URL: https://issues.apache.org/jira/browse/FLINK-24432 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.14.0 >Reporter: Victor Xu >Assignee: Victor Xu >Priority: Minor > Labels: pull-request-available > > The RocksIteratorWrapper is a wrapper of RocksIterator to do additional > status check for all the methods. However, there's a typo that > RocksIteratorWrapper.*seekToLast*() method calls RocksIterator's > *seekToFirst*(), which is obviously wrong. I guess this issue wasn't found > before as it was only referenced in the > RocksTransformingIteratorWrapper.seekToLast() method and nowhere else. > {code:java} > @Override > public void seekToFirst() { > iterator.seekToFirst(); > status(); > } > @Override > public void seekToLast() { > iterator.seekToFirst(); > status(); > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-23653) improve training exercises and tests so they are better examples
[ https://issues.apache.org/jira/browse/FLINK-23653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-23653. --- Fix Version/s: 1.13.3 Resolution: Fixed Fixed on master via bbb051ac458e1555f76f396f9424337d27cf4fd6 > improve training exercises and tests so they are better examples > > > Key: FLINK-23653 > URL: https://issues.apache.org/jira/browse/FLINK-23653 > Project: Flink > Issue Type: Improvement > Components: Documentation / Training / Exercises >Affects Versions: 1.13.2 >Reporter: David Anderson >Assignee: David Anderson >Priority: Major > Labels: pull-request-available > Fix For: 1.13.3 > > > The tests for the training exercises are implemented in a way that permits > the same tests to be used for both the exercises and the solutions, and for > both the Java and Scala implementations. The way that this was done is a bit > awkward. > It would be better to > * eliminate the ExerciseBase class and its mechanisms for setting the > source(s) and sink and parallelism > * have tests that run with parallelism > 1 > * speed up the tests by using MiniClusterWithClientResource > It's also the case that the watermarking is done by calling emitWatermark in > the sources. This is confusing; the watermarking should be visibly > implemented in the exercises and solutions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24411) Add docs navigation for release notes
Nico Kruber created FLINK-24411: --- Summary: Add docs navigation for release notes Key: FLINK-24411 URL: https://issues.apache.org/jira/browse/FLINK-24411 Project: Flink Issue Type: Improvement Components: Documentation Affects Versions: 1.13.2, 1.12.5, 1.14.0, 1.11.4 Reporter: Nico Kruber I propose to add a "Release Notes" section into the documentation's navigation bar for things like https://nightlies.apache.org/flink/flink-docs-release-1.14/release-notes/flink-1.14/ At the moment, I feel a bit lost in the navigation when viewing that page (which is only linked from the docs home page which I barely ever look at). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-24118) enable TaxiFareGenerator to produce a bounded stream
[ https://issues.apache.org/jira/browse/FLINK-24118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-24118. --- Fix Version/s: 1.13.3 1.14.0 Resolution: Fixed Fixed on master via bfcd25a9d52e71f018720ea4865090b5bab5b135 > enable TaxiFareGenerator to produce a bounded stream > > > Key: FLINK-24118 > URL: https://issues.apache.org/jira/browse/FLINK-24118 > Project: Flink > Issue Type: Improvement > Components: Documentation / Training / Exercises >Reporter: David Anderson >Assignee: David Anderson >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0, 1.13.3 > > > I would like to use the TaxiFareGenerator in tests for the training > exercises. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-23886) An exception is thrown out when recover job timers from checkpoint file
[ https://issues.apache.org/jira/browse/FLINK-23886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-23886: Affects Version/s: 1.10.0 1.11.3 > An exception is thrown out when recover job timers from checkpoint file > --- > > Key: FLINK-23886 > URL: https://issues.apache.org/jira/browse/FLINK-23886 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.0, 1.11.3 >Reporter: JING ZHANG >Priority: Major > Attachments: image-2021-08-25-16-38-04-023.png, > image-2021-08-25-16-38-12-308.png, image-2021-08-25-17-06-29-806.png, > image-2021-08-25-17-07-38-327.png > > > A user report the bug in the [mailist. > |http://mail-archives.apache.org/mod_mbox/flink-user/202108.mbox/%3ccakmsf43j14nkjmgjuy4dh5qn2vbjtw4tfh4pmmuyvcvfhgf...@mail.gmail.com%3E]I > paste the content here. > Setup Specifics: > Version: 1.6.2 > RocksDB Map State > Timers stored in rocksdb > > When we have this job running for long periods of time like > 30 days, if > for some reason the job restarts, we encounter "Error while deserializing the > element". Is this a known issue fixed in later versions? I see some changes > to code for FLINK-10175, but we don't use any queryable state > > Below is the stack trace > > org.apache.flink.util.FlinkRuntimeException: Error while deserializing the > element. > at > org.apache.flink.contrib.streaming.state.RocksDBCachingPriorityQueueSet.deserializeElement(RocksDBCachingPriorityQueueSet.java:389) > at > org.apache.flink.contrib.streaming.state.RocksDBCachingPriorityQueueSet.peek(RocksDBCachingPriorityQueueSet.java:146) > at > org.apache.flink.contrib.streaming.state.RocksDBCachingPriorityQueueSet.peek(RocksDBCachingPriorityQueueSet.java:56) > at > org.apache.flink.runtime.state.heap.KeyGroupPartitionedPriorityQueue$InternalPriorityQueueComparator.comparePriority(KeyGroupPartitionedPriorityQueue.java:274) > at > org.apache.flink.runtime.state.heap.KeyGroupPartitionedPriorityQueue$InternalPriorityQueueComparator.comparePriority(KeyGroupPartitionedPriorityQueue.java:261) > at > org.apache.flink.runtime.state.heap.HeapPriorityQueue.isElementPriorityLessThen(HeapPriorityQueue.java:164) > at > org.apache.flink.runtime.state.heap.HeapPriorityQueue.siftUp(HeapPriorityQueue.java:121) > at > org.apache.flink.runtime.state.heap.HeapPriorityQueue.addInternal(HeapPriorityQueue.java:85) > at > org.apache.flink.runtime.state.heap.AbstractHeapPriorityQueue.add(AbstractHeapPriorityQueue.java:73) > at > org.apache.flink.runtime.state.heap.KeyGroupPartitionedPriorityQueue.(KeyGroupPartitionedPriorityQueue.java:89) > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBPriorityQueueSetFactory.create(RocksDBKeyedStateBackend.java:2792) > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.create(RocksDBKeyedStateBackend.java:450) > at > org.apache.flink.streaming.api.operators.InternalTimeServiceManager.createTimerPriorityQueue(InternalTimeServiceManager.java:121) > at > org.apache.flink.streaming.api.operators.InternalTimeServiceManager.registerOrGetTimerService(InternalTimeServiceManager.java:106) > at > org.apache.flink.streaming.api.operators.InternalTimeServiceManager.getInternalTimerService(InternalTimeServiceManager.java:87) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.getInternalTimerService(AbstractStreamOperator.java:764) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.open(KeyedProcessOperator.java:61) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:424) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:290) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.EOFException > at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:290) > at org.apache.flink.types.StringValue.readString(StringValue.java:769) > at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:69) > at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:28) > at > org.apache.flink.api.java.typeutils.runtime.RowSerializer.deserialize(RowSerializer.java:179) > at > org.apache.flink.api.java.typeutils.runtime.RowSerializer.deserialize(RowSerializer.java:46) > at > org.apache.flink.streaming.api.operators.TimerSerializer.deserialize(TimerSerializer.java:168) > at > org.apache.flink.streaming.api.operators.TimerSerializer.deserialize(TimerSerializer.java:45) > at > org.apache.flink.contrib.streaming.state.RocksDBCachingPriorityQueueSet.deserializeElement(
[jira] [Created] (FLINK-24144) Improve DataGenerator to prevent excessive creation of new Random objects
Nico Kruber created FLINK-24144: --- Summary: Improve DataGenerator to prevent excessive creation of new Random objects Key: FLINK-24144 URL: https://issues.apache.org/jira/browse/FLINK-24144 Project: Flink Issue Type: Improvement Components: Documentation / Training / Exercises Affects Versions: 1.13.2, 1.14.0 Reporter: Nico Kruber Assignee: Nico Kruber For a couple of methods in {{DataGenerator}}, new {{Random}} objects are created with a specific seed. Instead, we could create a single {{Random}} object and reset the seed when needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-23926) change TaxiRide data model to have a single timestamp
[ https://issues.apache.org/jira/browse/FLINK-23926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-23926. --- Fix Version/s: 1.13.3 1.14.0 Resolution: Fixed Fixed on master via fd65b110714cbf487435b5dfdf0374550bd0c820 > change TaxiRide data model to have a single timestamp > - > > Key: FLINK-23926 > URL: https://issues.apache.org/jira/browse/FLINK-23926 > Project: Flink > Issue Type: Improvement > Components: Documentation / Training / Exercises >Reporter: David Anderson >Assignee: David Anderson >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0, 1.13.3 > > > The current TaxiRide events have two timestamps – the startTime and endTime. > Which timestamp applies to a given event depends on the value of the isStart > field. This is awkward, and unnecessary. It would be better to have a single > eventTime field. This will make the exercises better examples, and allow for > more straightforward conversion from DataStream to Table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24115) Outdated SQL Temporal Join Example
Nico Kruber created FLINK-24115: --- Summary: Outdated SQL Temporal Join Example Key: FLINK-24115 URL: https://issues.apache.org/jira/browse/FLINK-24115 Project: Flink Issue Type: Bug Components: Documentation, Table SQL / API Reporter: Nico Kruber [https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/joins/#event-time-temporal-join] is missing a primary key in the Table DDL. Also, the following note does not map the current example anymore: {quote} Note: The event-time temporal join requires the primary key contained in the equivalence condition of the temporal join condition, e.g., The primary key P.product_id of table product_changelog to be constrained in the condition O.product_id = P.product_id. {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-23670) Add Scala formatting checks to training exercises
[ https://issues.apache.org/jira/browse/FLINK-23670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-23670. --- Fix Version/s: 1.14.0 Resolution: Fixed Fixed on master via: - bd4a088fd3aa23b13eea5e75307b26a15afb3608 - a22bc8f088b6f6794f931a1f6e7bf961729ad66b - 177cadf30fb0f73341f89b3db1e2fc13dcedc2dd > Add Scala formatting checks to training exercises > - > > Key: FLINK-23670 > URL: https://issues.apache.org/jira/browse/FLINK-23670 > Project: Flink > Issue Type: Bug > Components: Documentation / Training / Exercises >Affects Versions: 1.13.2 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0, 1.13.3 > > > Currently, there are no formatting checks for Scala code in the training > exercises. We should employ the same checks that the main Flink project is > using. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-23895) Upsert materializer is not inserted for all sink providers
[ https://issues.apache.org/jira/browse/FLINK-23895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-23895: Description: The new {{SinkUpsertMaterializer}} is not inserted for {{TransformationSinkProvider}} or {{DataStreamSinkProvider}} which means that neither {{toChangelogStream}} nor the current {{KafkaDynamicSink}} work correctly. (was: The new {{SinkUpsertMaterializer}} is not inserted for {{TransformationSinkProvider}} or {{DataStreamSinkProvider}} which means that neither {{toChangelogStream}} not the current {{KafkaDynamicSink}} work correctly.) > Upsert materializer is not inserted for all sink providers > -- > > Key: FLINK-23895 > URL: https://issues.apache.org/jira/browse/FLINK-23895 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Blocker > Labels: pull-request-available > Fix For: 1.14.0, 1.15.0 > > > The new {{SinkUpsertMaterializer}} is not inserted for > {{TransformationSinkProvider}} or {{DataStreamSinkProvider}} which means that > neither {{toChangelogStream}} nor the current {{KafkaDynamicSink}} work > correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-24022) Scala checks not running in flink-training CI
[ https://issues.apache.org/jira/browse/FLINK-24022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-24022. --- Resolution: Fixed Fixed on master via 65cde7910e5bff6283eef7ea536fde3ea5d4c9d4 > Scala checks not running in flink-training CI > - > > Key: FLINK-24022 > URL: https://issues.apache.org/jira/browse/FLINK-24022 > Project: Flink > Issue Type: Bug > Components: Documentation / Training / Exercises >Affects Versions: 1.14.0, 1.13.3 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0, 1.13.3 > > > FLINK-23339 disabled Scala by default but therefore also disabled CI for > newly checked-in changes on the Scala code. > We should run CI with Scala enabled -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-24022) Scala checks not running in flink-training CI
[ https://issues.apache.org/jira/browse/FLINK-24022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-24022: Summary: Scala checks not running in flink-training CI (was: Re-Enable Scala checks in flink-training CI) > Scala checks not running in flink-training CI > - > > Key: FLINK-24022 > URL: https://issues.apache.org/jira/browse/FLINK-24022 > Project: Flink > Issue Type: Bug > Components: Documentation / Training / Exercises >Affects Versions: 1.14.0, 1.13.3 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0, 1.13.3 > > > FLINK-23339 disabled Scala by default but therefore also disabled CI for > newly checked-in changes on the Scala code. > We should run CI with Scala enabled -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24022) Re-Enable Scala checks in flink-training CI
Nico Kruber created FLINK-24022: --- Summary: Re-Enable Scala checks in flink-training CI Key: FLINK-24022 URL: https://issues.apache.org/jira/browse/FLINK-24022 Project: Flink Issue Type: Bug Components: Documentation / Training / Exercises Affects Versions: 1.14.0, 1.13.3 Reporter: Nico Kruber Assignee: Nico Kruber Fix For: 1.14.0, 1.13.3 FLINK-23339 disabled Scala by default but therefore also disabled CI for newly checked-in changes on the Scala code. We should run CI with Scala enabled -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-24016) Restore 1.12 SQL docs page on state retention/TTL
[ https://issues.apache.org/jira/browse/FLINK-24016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-24016: Summary: Restore 1.12 SQL docs page on state retention/TTL (was: Restore 1.12 SQL docs page on state retention) > Restore 1.12 SQL docs page on state retention/TTL > - > > Key: FLINK-24016 > URL: https://issues.apache.org/jira/browse/FLINK-24016 > Project: Flink > Issue Type: Improvement > Components: Documentation, Table SQL / API >Affects Versions: 1.14.0, 1.13.2 >Reporter: Nico Kruber >Priority: Major > > {color:#1d1c1d}It seems that the whole {color}[section about state retention > from the > docs|https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/streaming/query_configuration.html#idle-state-retention-time]{color:#1d1c1d} > in Flink 1.12 vanished with Flink 1.13. It was outdated with these min/max > settings but instead of updating it, it was just removed and state > retention/TTL is now safely hidden in > [https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/config/#table-exec-state-ttl]{color} > {color:#1d1c1d}The discussion in the 1.12 docs is, however, superior since it > explains a bit more why we need it and the types of queries that need it. I > therefore propose to restore it somewhere in the docs. > {color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24016) Restore 1.12 SQL docs page on state retention
Nico Kruber created FLINK-24016: --- Summary: Restore 1.12 SQL docs page on state retention Key: FLINK-24016 URL: https://issues.apache.org/jira/browse/FLINK-24016 Project: Flink Issue Type: Improvement Components: Documentation, Table SQL / API Affects Versions: 1.13.2, 1.14.0 Reporter: Nico Kruber {color:#1d1c1d}It seems that the whole {color}[section about state retention from the docs|https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/streaming/query_configuration.html#idle-state-retention-time]{color:#1d1c1d} in Flink 1.12 vanished with Flink 1.13. It was outdated with these min/max settings but instead of updating it, it was just removed and state retention/TTL is now safely hidden in [https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/config/#table-exec-state-ttl]{color} {color:#1d1c1d}The discussion in the 1.12 docs is, however, superior since it explains a bit more why we need it and the types of queries that need it. I therefore propose to restore it somewhere in the docs. {color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-24008) Support state cleanup based on unique keys
[ https://issues.apache.org/jira/browse/FLINK-24008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-24008: Description: In a join of two tables where we join on unique columns, e.g. from primary keys, we could clean up join state more pro-actively since we now whether future joins with this row are still possible (assuming uniqueness of that key). While this may not solve all issues of growing state in non-time-based joins it may still considerably reduce state size, depending on the involved columns. This would add one more way of expiring state that the operator stores; currently there are only these * time-based joins, e.g. interval join * idle state retention via {{TableConfig#setIdleStateRetention()}} was: In a join of two tables where we join on unique columns, e.g. from primary keys, we could clean up join state more pro-actively since we now whether future joins with this row are still possible (assuming uniqueness of that key). While this may not solve all issues of growing state in non-time-based joins it may still considerably reduce state size, depending on the involved columns. This would add one more way of expiring state that the operator stores; currently there are only these * time-based joins, e.g. interval join * idle state retention via \{{TableConfig#setIdleStateRetention()}} > Support state cleanup based on unique keys > -- > > Key: FLINK-24008 > URL: https://issues.apache.org/jira/browse/FLINK-24008 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Runtime >Affects Versions: 1.14.0 >Reporter: Nico Kruber >Priority: Major > > In a join of two tables where we join on unique columns, e.g. from primary > keys, we could clean up join state more pro-actively since we now whether > future joins with this row are still possible (assuming uniqueness of that > key). While this may not solve all issues of growing state in non-time-based > joins it may still considerably reduce state size, depending on the involved > columns. > This would add one more way of expiring state that the operator stores; > currently there are only these > * time-based joins, e.g. interval join > * idle state retention via {{TableConfig#setIdleStateRetention()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24008) Support state cleanup based on unique keys
Nico Kruber created FLINK-24008: --- Summary: Support state cleanup based on unique keys Key: FLINK-24008 URL: https://issues.apache.org/jira/browse/FLINK-24008 Project: Flink Issue Type: New Feature Components: Table SQL / Runtime Affects Versions: 1.14.0 Reporter: Nico Kruber In a join of two tables where we join on unique columns, e.g. from primary keys, we could clean up join state more pro-actively since we now whether future joins with this row are still possible (assuming uniqueness of that key). While this may not solve all issues of growing state in non-time-based joins it may still considerably reduce state size, depending on the involved columns. This would add one more way of expiring state that the operator stores; currently there are only these * time-based joins, e.g. interval join * idle state retention via \{{TableConfig#setIdleStateRetention()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23993) Describe eventually-consistency of materializing upserts
Nico Kruber created FLINK-23993: --- Summary: Describe eventually-consistency of materializing upserts Key: FLINK-23993 URL: https://issues.apache.org/jira/browse/FLINK-23993 Project: Flink Issue Type: Sub-task Components: Documentation, Table SQL / Ecosystem Affects Versions: 1.14.0 Reporter: Nico Kruber FLINK-20374 added an upsert materialization operator which fixes the order of shuffled streams. The results of this operator are actually _eventually consistent_ (it collects the latest value it has seen and redacts older versions when these are not valid anymore). You could see a result stream like this, based on the order the materialization receives events: +I10, -I10, +I5, -I5, +I10, -I10, +I3, -I3, +I10 Each time, the value stored in Kafka would change until the "final" result is in. It may be acceptable for upsert sinks, but should be documented (or changed/fixed) nonetheless. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22826) flink sql1.13.1 causes data loss based on change log stream data join
[ https://issues.apache.org/jira/browse/FLINK-22826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405135#comment-17405135 ] Nico Kruber commented on FLINK-22826: - While I see what [~lzljs3620320] mentioned as very important for upsert stream processing, I don't see a reference to filtering in the text above; only about the order of +u and -u; but maybe I'm wrong or I'm missing some details here. If I am wrong, then the mentioned issue of filtering UPDATE_BEFORE and UPDATE_AFTER differently is still is a very serious issue which we should either fix or prevent users from running into that (by not allowing such queries?). I suppose that would be a priority higher than "minor"... > flink sql1.13.1 causes data loss based on change log stream data join > - > > Key: FLINK-22826 > URL: https://issues.apache.org/jira/browse/FLINK-22826 > Project: Flink > Issue Type: Bug > Components: Table SQL / API >Affects Versions: 1.12.0, 1.13.1 >Reporter: 徐州州 >Priority: Minor > Labels: auto-deprioritized-major, stale-blocker > > {code:java} > insert into dwd_order_detail > select >ord.Id, >ord.Code, >Status > concat(cast(ord.Id as String),if(oed.Id is null,'oed_null',cast(oed.Id > as STRING)),DATE_FORMAT(LOCALTIMESTAMP,'-MM-dd')) as uuids, > TO_DATE(DATE_FORMAT(LOCALTIMESTAMP,'-MM-dd')) as As_Of_Date > from > orders ord > left join order_extend oed on ord.Id=oed.OrderId and oed.IsDeleted=0 and > oed.CreateTime>CAST(DATE_FORMAT(LOCALTIMESTAMP,'-MM-dd') AS TIMESTAMP) > where ( ord.OrderTime>CAST(DATE_FORMAT(LOCALTIMESTAMP,'-MM-dd') AS > TIMESTAMP) > or ord.ReviewTime>CAST(DATE_FORMAT(LOCALTIMESTAMP,'-MM-dd') AS TIMESTAMP) > or ord.RejectTime>CAST(DATE_FORMAT(LOCALTIMESTAMP,'-MM-dd') AS TIMESTAMP) > ) and ord.IsDeleted=0; > {code} > My upsert-kafka table for PRIMARY KEY for uuids. > This is the logic of my kafka based canal-json stream data join and write to > Upsert-kafka tables I confirm that version 1.12 also has this problem I just > upgraded from 1.12 to 1.13. > I look up a user s order data and order number XJ0120210531004794 in > canal-json original table as U which is normal. > {code:java} > | +U | XJ0120210531004794 | 50 | > | +U | XJ0120210531004672 | 50 | > {code} > But written to upsert-kakfa via join, the data consumed from upsert kafka is, > {code:java} > | +I | XJ0120210531004794 | 50 | > | -U | XJ0120210531004794 | 50 | > {code} > The order is two records this sheet in orders and order_extend tables has not > changed since created -U status caused my data loss not computed and the > final result was wrong. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21566) Improve error message for "Unsupported casting"
[ https://issues.apache.org/jira/browse/FLINK-21566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-21566: Labels: stale-major usability (was: stale-major) > Improve error message for "Unsupported casting" > --- > > Key: FLINK-21566 > URL: https://issues.apache.org/jira/browse/FLINK-21566 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Affects Versions: 1.12.1 >Reporter: Nico Kruber >Priority: Minor > Labels: stale-major, usability > > In a situation like from FLINK-21565, neither the error message {{Unsupported > casting from TINYINT to INTERVAL SECOND(3)}}, nor the exception trace (see > FLINK-21565) gives you a good hint on where the error is, especially if you > have many statements with TINYINTs or operations on these. > The query parser could highlight the location of the error inside the SQL > statement that the user provided. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21563) Support using computed columns when defining new computed columns
[ https://issues.apache.org/jira/browse/FLINK-21563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-21563: Labels: auto-deprioritized-major usability (was: auto-deprioritized-major) > Support using computed columns when defining new computed columns > - > > Key: FLINK-21563 > URL: https://issues.apache.org/jira/browse/FLINK-21563 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Affects Versions: 1.11.3 >Reporter: Nico Kruber >Priority: Minor > Labels: auto-deprioritized-major, usability > Attachments: flights-21563.csv > > > To avoid code duplications, it would be nice to be able to use computed > columns in later computations of new computed columns, e.g. > {code} > CREATE TABLE `flights` ( > `_YEAR` CHAR(4), > `_MONTH` CHAR(2), > `_DAY` CHAR(2), > `_SCHEDULED_DEPARTURE` CHAR(4), > `SCHEDULED_DEPARTURE` AS TO_TIMESTAMP(`_YEAR` || '-' || `_MONTH` || '-' || > `_DAY` || ' ' || SUBSTRING(`_SCHEDULED_DEPARTURE` FROM 0 FOR 2) || ':' || > SUBSTRING(`_SCHEDULED_DEPARTURE` FROM 3) || ':00'), > `_DEPARTURE_TIME` CHAR(4), > `DEPARTURE_DELAY` SMALLINT, > `DEPARTURE_TIME` AS TIMESTAMPADD(MINUTE, CAST(`DEPARTURE_DELAY` AS INT), > SCHEDULED_DEPARTURE) > )... > {code} > Otherwise, a user would have to repeat these calculations over and over again > which is not that maintainable. > Currently, for a minimal working example with the attached input file, it > would look like this, e.g. in the SQL CLI: > {code} > CREATE TABLE `flights` ( > `_YEAR` CHAR(4), > `_MONTH` CHAR(2), > `_DAY` CHAR(2), > `_SCHEDULED_DEPARTURE` CHAR(4), > `SCHEDULED_DEPARTURE` AS TO_TIMESTAMP(`_YEAR` || '-' || `_MONTH` || '-' || > `_DAY` || ' ' || SUBSTRING(`_SCHEDULED_DEPARTURE` FROM 0 FOR 2) || ':' || > SUBSTRING(`_SCHEDULED_DEPARTURE` FROM 3) || ':00'), > `_DEPARTURE_TIME` CHAR(4), > `DEPARTURE_DELAY` SMALLINT, > `DEPARTURE_TIME` AS TIMESTAMPADD(MINUTE, CAST(`DEPARTURE_DELAY` AS INT), > TO_TIMESTAMP(`_YEAR` || '-' || `_MONTH` || '-' || `_DAY` || ' ' || > SUBSTRING(`_SCHEDULED_DEPARTURE` FROM 0 FOR 2) || ':' || > SUBSTRING(`_SCHEDULED_DEPARTURE` FROM 3) || ':00')) > ) WITH ( > 'connector' = 'filesystem', > 'path' = 'file:///tmp/kaggle-flight-delay/flights-21563.csv', > 'format' = 'csv' > ); > SELECT * FROM flights; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21562) Add more informative message on CSV parsing errors
[ https://issues.apache.org/jira/browse/FLINK-21562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-21562: Labels: auto-deprioritized-major usability (was: auto-deprioritized-major) > Add more informative message on CSV parsing errors > -- > > Key: FLINK-21562 > URL: https://issues.apache.org/jira/browse/FLINK-21562 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table > SQL / Ecosystem >Affects Versions: 1.11.3 >Reporter: Nico Kruber >Priority: Minor > Labels: auto-deprioritized-major, usability > > I was parsing a CSV file with comments in it and used {{'csv.allow-comments' > = 'true'}} without also passing {{'csv.ignore-parse-errors' = 'true'}} to the > table DDL to not hide any actual format errors. > Since I didn't just have strings in my table, this did of course stumble on > the commented-out line with the following error: > {code} > 2021-02-16 17:45:53,055 WARN org.apache.flink.runtime.taskmanager.Task > [] - Source: TableSourceScan(table=[[default_catalog, > default_database, airports]], fields=[IATA_CODE, AIRPORT, CITY, STATE, > COUNTRY, LATITUDE, LONGITUDE]) -> SinkConversionToTuple2 -> Sink: SQL Client > Stream Collect Sink (1/1)#0 (9f3a3965f18ed99ee42580bdb559ba66) switched from > RUNNING to FAILED. > java.io.IOException: Failed to deserialize CSV row. > at > org.apache.flink.formats.csv.CsvFileSystemFormatFactory$CsvInputFormat.nextRecord(CsvFileSystemFormatFactory.java:257) > ~[flink-csv-1.12.1.jar:1.12.1] > at > org.apache.flink.formats.csv.CsvFileSystemFormatFactory$CsvInputFormat.nextRecord(CsvFileSystemFormatFactory.java:162) > ~[flink-csv-1.12.1.jar:1.12.1] > at > org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:90) > ~[flink-dist_2.12-1.12.1.jar:1.12.1] > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110) > ~[flink-dist_2.12-1.12.1.jar:1.12.1] > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:66) > ~[flink-dist_2.12-1.12.1.jar:1.12.1] > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:241) > ~[flink-dist_2.12-1.12.1.jar:1.12.1] > Caused by: java.lang.NumberFormatException: empty String > at > sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842) > ~[?:1.8.0_275] > at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) > ~[?:1.8.0_275] > at java.lang.Double.parseDouble(Double.java:538) ~[?:1.8.0_275] > at > org.apache.flink.formats.csv.CsvToRowDataConverters.convertToDouble(CsvToRowDataConverters.java:203) > ~[flink-csv-1.12.1.jar:1.12.1] > at > org.apache.flink.formats.csv.CsvToRowDataConverters.lambda$createNullableConverter$ac6e531e$1(CsvToRowDataConverters.java:113) > ~[flink-csv-1.12.1.jar:1.12.1] > at > org.apache.flink.formats.csv.CsvToRowDataConverters.lambda$createRowConverter$18bb1dd$1(CsvToRowDataConverters.java:98) > ~[flink-csv-1.12.1.jar:1.12.1] > at > org.apache.flink.formats.csv.CsvFileSystemFormatFactory$CsvInputFormat.nextRecord(CsvFileSystemFormatFactory.java:251) > ~[flink-csv-1.12.1.jar:1.12.1] > ... 5 more > {code} > Two things should be improved here: > # commented-out lines should be ignored by default (potentially, FLINK-17133 > addresses this or at least gives the user the power to do so) > # the error message itself is not very informative: "empty String". > This ticket is about the latter. I would suggest to have at least a few more > pointers to direct the user to finding the source in the CSV file/item/... - > here, the data type could just be wrong or the CSV file itself may be > wrong/corrupted and the user would need to investigate. > What exactly may help here, probably depends on the actual input connector > this format is currently working with, e.g. line number in a csv file would > be best, otherwise that may not be possible but we could show the whole line > or at least a few surrounding fields... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20427) Remove CheckpointConfig.setPreferCheckpointForRecovery because it can lead to data loss
[ https://issues.apache.org/jira/browse/FLINK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402158#comment-17402158 ] Nico Kruber commented on FLINK-20427: - [~trohrmann] it may be cleaner to remove this along with the new checkpoint/savepoint semantics, but I also don't have any user in mind who relies on this flag. > Remove CheckpointConfig.setPreferCheckpointForRecovery because it can lead to > data loss > --- > > Key: FLINK-20427 > URL: https://issues.apache.org/jira/browse/FLINK-20427 > Project: Flink > Issue Type: Bug > Components: API / DataStream, Runtime / Checkpointing >Affects Versions: 1.12.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: auto-deprioritized-critical > Fix For: 1.14.0 > > > The {{CheckpointConfig.setPreferCheckpointForRecovery}} allows to configure > whether Flink prefers checkpoints for recovery if the > {{CompletedCheckpointStore}} contains savepoints and checkpoints. This is > problematic because due to this feature, Flink might prefer older checkpoints > over newer savepoints for recovery. Since some components expect that the > always the latest checkpoint/savepoint is used (e.g. the > {{SourceCoordinator}}), it breaks assumptions and can lead to > {{SourceSplits}} which are not read. This effectively means that the system > loses data. Similarly, this behaviour can cause that exactly once sinks might > output results multiple times which violates the processing guarantees. > Hence, I believe that we should remove this setting because it changes > Flink's behaviour in some very significant way potentially w/o the user > noticing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22826) flink sql1.13.1 causes data loss based on change log stream data join
[ https://issues.apache.org/jira/browse/FLINK-22826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402117#comment-17402117 ] Nico Kruber commented on FLINK-22826: - [~lzljs3620320] Why did you re-open this ticket? Isn't it what FLINK-20374 is about? > flink sql1.13.1 causes data loss based on change log stream data join > - > > Key: FLINK-22826 > URL: https://issues.apache.org/jira/browse/FLINK-22826 > Project: Flink > Issue Type: Bug > Components: Table SQL / API >Affects Versions: 1.12.0, 1.13.1 >Reporter: 徐州州 >Priority: Minor > Labels: auto-deprioritized-major, stale-blocker > > {code:java} > insert into dwd_order_detail > select >ord.Id, >ord.Code, >Status > concat(cast(ord.Id as String),if(oed.Id is null,'oed_null',cast(oed.Id > as STRING)),DATE_FORMAT(LOCALTIMESTAMP,'-MM-dd')) as uuids, > TO_DATE(DATE_FORMAT(LOCALTIMESTAMP,'-MM-dd')) as As_Of_Date > from > orders ord > left join order_extend oed on ord.Id=oed.OrderId and oed.IsDeleted=0 and > oed.CreateTime>CAST(DATE_FORMAT(LOCALTIMESTAMP,'-MM-dd') AS TIMESTAMP) > where ( ord.OrderTime>CAST(DATE_FORMAT(LOCALTIMESTAMP,'-MM-dd') AS > TIMESTAMP) > or ord.ReviewTime>CAST(DATE_FORMAT(LOCALTIMESTAMP,'-MM-dd') AS TIMESTAMP) > or ord.RejectTime>CAST(DATE_FORMAT(LOCALTIMESTAMP,'-MM-dd') AS TIMESTAMP) > ) and ord.IsDeleted=0; > {code} > My upsert-kafka table for PRIMARY KEY for uuids. > This is the logic of my kafka based canal-json stream data join and write to > Upsert-kafka tables I confirm that version 1.12 also has this problem I just > upgraded from 1.12 to 1.13. > I look up a user s order data and order number XJ0120210531004794 in > canal-json original table as U which is normal. > {code:java} > | +U | XJ0120210531004794 | 50 | > | +U | XJ0120210531004672 | 50 | > {code} > But written to upsert-kakfa via join, the data consumed from upsert kafka is, > {code:java} > | +I | XJ0120210531004794 | 50 | > | -U | XJ0120210531004794 | 50 | > {code} > The order is two records this sheet in orders and order_extend tables has not > changed since created -U status caused my data loss not computed and the > final result was wrong. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-23054) Correct upsert optimization by upsert keys
[ https://issues.apache.org/jira/browse/FLINK-23054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-23054: Parent: FLINK-20374 Issue Type: Sub-task (was: Bug) > Correct upsert optimization by upsert keys > -- > > Key: FLINK-23054 > URL: https://issues.apache.org/jira/browse/FLINK-23054 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Runtime >Reporter: Jingsong Lee >Assignee: Jingsong Lee >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > After FLINK-22901. > We can use upsert keys to fix upsert join, upsert rank, and upsert sink. > * For join and rank: if input has no upsert keys, do not use upsert > optimization. > * For upsert sink: if input has unique keys but no upsert keys, we need add > a materialize operator to produce upsert records. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22901) Introduce getChangeLogUpsertKeys in FlinkRelMetadataQuery
[ https://issues.apache.org/jira/browse/FLINK-22901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-22901: Parent: FLINK-20374 Issue Type: Sub-task (was: Bug) > Introduce getChangeLogUpsertKeys in FlinkRelMetadataQuery > - > > Key: FLINK-22901 > URL: https://issues.apache.org/jira/browse/FLINK-22901 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Planner >Reporter: Jingsong Lee >Assignee: Jingsong Lee >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > For fix FLINK-20374, we need to resolve streaming computation disorder. we > need to introduce a change log upsert keys, this is not unique keys. > > {code:java} > /** > * Determines the set of change log upsert minimal keys for this expression. > A key is > * represented as an {@link org.apache.calcite.util.ImmutableBitSet}, where > each bit position > * represents a 0-based output column ordinal. > * > * Different from the unique keys: In distributed streaming computing, one > record may be > * divided into RowKind.UPDATE_BEFORE and RowKind.UPDATE_AFTER. If a key > changing join is > * connected downstream, the two records will be divided into different > tasks, resulting in > * disorder. In this case, the downstream cannot rely on the order of the > original key. So in > * this case, it has unique keys in the traditional sense, but it doesn't > have change log upsert > * keys. > * > * @return set of keys, or null if this information cannot be determined > (whereas empty set > * indicates definitely no keys at all) > */ > public Set getChangeLogUpsertKeys(RelNode rel); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22899) ValuesUpsertSinkFunction needs to use global upsert
[ https://issues.apache.org/jira/browse/FLINK-22899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-22899: Parent: FLINK-20374 Issue Type: Sub-task (was: Bug) > ValuesUpsertSinkFunction needs to use global upsert > --- > > Key: FLINK-22899 > URL: https://issues.apache.org/jira/browse/FLINK-22899 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Runtime >Reporter: Jingsong Lee >Assignee: Jingsong Lee >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > At present, each task does its own upsert. We need to simulate the external > connector and use the global map to do the upsert. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23839) Unclear severity of Kafka transaction recommit warning in logs
Nico Kruber created FLINK-23839: --- Summary: Unclear severity of Kafka transaction recommit warning in logs Key: FLINK-23839 URL: https://issues.apache.org/jira/browse/FLINK-23839 Project: Flink Issue Type: Sub-task Components: Connectors / Kafka Affects Versions: 1.13.2, 1.12.5, 1.11.4 Reporter: Nico Kruber In a transactional Kafka sink, after recovery, all transactions from the recovered checkpoint are recommitted even though they may have already been committed before because this is not part of the checkpoint. This second commit can lead to a number of WARN entries in the logs coming from [KafkaCommitter|https://github.com/apache/flink/blob/6c9818323b41a84137c52822d2993df788dbc9bb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/sink/KafkaCommitter.java#L66] or [FlinkKafkaProducer|https://github.com/apache/flink/blob/6c9818323b41a84137c52822d2993df788dbc9bb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java#L1057]. Examples: {code} WARN [org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer] ... Encountered error org.apache.kafka.common.errors.InvalidTxnStateException: The producer attempted a transactional operation in an invalid state. while recovering transaction KafkaTransactionState [transactionalId=..., producerId=12345, epoch=123]. Presumably this transaction has been already committed before. WARN [org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer] ... Encountered error org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker. while recovering transaction KafkaTransactionState [transactionalId=..., producerId=12345, epoch=12345]. Presumably this transaction has been already committed before {code} It sounds to me like the second exception is useful and indicates that the transaction timeout is too short. The first exception, however, seems superfluous and rather alerts the user more than it helps. Or what would you do with it? Can we instead filter out superfluous exceptions and at least put these onto DEBUG logs instead? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21336) Activate bloom filter in RocksDB State Backend via Flink configuration
[ https://issues.apache.org/jira/browse/FLINK-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400022#comment-17400022 ] Nico Kruber commented on FLINK-21336: - Looks like the document linked in FLINK-7588 is a good source for some arguments to add to the Flink docs should we allow this config setting > Activate bloom filter in RocksDB State Backend via Flink configuration > -- > > Key: FLINK-21336 > URL: https://issues.apache.org/jira/browse/FLINK-21336 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Jun Qin >Assignee: Jun Qin >Priority: Major > Labels: auto-unassigned, stale-assigned > > Activating bloom filter in the RocksDB state backend improves read > performance. Currently activating bloom filter can only be done by > implementing a custom ConfigurableRocksDBOptionsFactory. I think we should > provide an option to activate bloom filter via Flink configuration. > See also the discussion in ML: > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Activate-bloom-filter-in-RocksDB-State-Backend-via-Flink-configuration-td48636.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-23812) Support configuration of the RocksDB info logging via configuration
[ https://issues.apache.org/jira/browse/FLINK-23812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-23812: Summary: Support configuration of the RocksDB info logging via configuration (was: Support configuration of the RocksDB logging via configuration) > Support configuration of the RocksDB info logging via configuration > --- > > Key: FLINK-23812 > URL: https://issues.apache.org/jira/browse/FLINK-23812 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.13.2 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > Since FLINK-14482 has been merged now, we should also allow users to > configure more than just the log level (FLINK-20911) but also the following > parameters so that they can safely enable RocksDB logging again (once > disabled by default in FLINK-15068) by using a rolling logger, for example: > - max log file size via {{state.backend.rocksdb.log.max-file-size}} > - logging files to keep via {{state.backend.rocksdb.log.file-num}} > - log directory {{state.backend.rocksdb.log.dir}}, e.g. to put these logs > onto a (separate) volume that may not be local and is retained after > container shutdown for debugging purposes -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-23812) Support configuration of the RocksDB logging via configuration
[ https://issues.apache.org/jira/browse/FLINK-23812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-23812. --- Fix Version/s: 1.14.0 Release Note: With RocksDB bumped to 6.20.3 (FLINK-14482), you can now also configure a rolling info logging strategy by configuring it accordingly via newly added state.backend.rocksdb.log.* settings. This can be helpful for debugging RocksDB (performance) issues in containerized environments where the local data dir is volatile but the logs should be retained on a separate volume mount. Resolution: Fixed Merged on master via a86f8a5f59fbc49fb62feeda26c159a716631752 > Support configuration of the RocksDB logging via configuration > -- > > Key: FLINK-23812 > URL: https://issues.apache.org/jira/browse/FLINK-23812 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.13.2 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > Since FLINK-14482 has been merged now, we should also allow users to > configure more than just the log level (FLINK-20911) but also the following > parameters so that they can safely enable RocksDB logging again (once > disabled by default in FLINK-15068) by using a rolling logger, for example: > - max log file size via {{state.backend.rocksdb.log.max-file-size}} > - logging files to keep via {{state.backend.rocksdb.log.file-num}} > - log directory {{state.backend.rocksdb.log.dir}}, e.g. to put these logs > onto a (separate) volume that may not be local and is retained after > container shutdown for debugging purposes -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-12141) Allow @TypeInfo annotation on POJO field declarations
[ https://issues.apache.org/jira/browse/FLINK-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-12141. --- Fix Version/s: 1.14.0 Release Note: @TypeInfo annotations can now also be used on POJO fields which, for example, can help to define custom serializers for third-party classes that can otherwise not be annotated themselves. Resolution: Fixed Fixed on master via d326b0574a373bd5eef63a44261f8762709265f8 > Allow @TypeInfo annotation on POJO field declarations > - > > Key: FLINK-12141 > URL: https://issues.apache.org/jira/browse/FLINK-12141 > Project: Flink > Issue Type: New Feature > Components: API / Type Serialization System >Reporter: Gyula Fora >Assignee: Nico Kruber >Priority: Minor > Labels: auto-unassigned, pull-request-available, starter, > usability > Fix For: 1.14.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The TypeInfo annotation is a great way to declare serializers for custom > types however I feel that it's usage is limited by the fact that it can only > be used on types that are declared in the project. > By allowing the annotation to be used on field declarations we could improve > the TypeExtractor logic to use the type factory when creating the > PojoTypeInformation. > This would be a big improvement as in many cases classes from other libraries > or collection types are used within custom Pojo classes and Flink would > default to Kryo serialization which would hurt performance and cause problems > later. > The current workaround in these cases is to implement a custom serializer for > the entire pojo which is a waste of effort when only a few fields might > require custom serialization logic. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-12141) Allow @TypeInfo annotation on POJO field declarations
[ https://issues.apache.org/jira/browse/FLINK-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reassigned FLINK-12141: --- Assignee: Nico Kruber > Allow @TypeInfo annotation on POJO field declarations > - > > Key: FLINK-12141 > URL: https://issues.apache.org/jira/browse/FLINK-12141 > Project: Flink > Issue Type: New Feature > Components: API / Type Serialization System >Reporter: Gyula Fora >Assignee: Nico Kruber >Priority: Minor > Labels: auto-unassigned, pull-request-available, starter, > usability > Time Spent: 20m > Remaining Estimate: 0h > > The TypeInfo annotation is a great way to declare serializers for custom > types however I feel that it's usage is limited by the fact that it can only > be used on types that are declared in the project. > By allowing the annotation to be used on field declarations we could improve > the TypeExtractor logic to use the type factory when creating the > PojoTypeInformation. > This would be a big improvement as in many cases classes from other libraries > or collection types are used within custom Pojo classes and Flink would > default to Kryo serialization which would hurt performance and cause problems > later. > The current workaround in these cases is to implement a custom serializer for > the entire pojo which is a waste of effort when only a few fields might > require custom serialization logic. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-23812) Support configuration of the RocksDB logging via configuration
[ https://issues.apache.org/jira/browse/FLINK-23812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-23812: Description: Since FLINK-14482 has been merged now, we should also allow users to configure more than just the log level (FLINK-20911) but also the following parameters so that they can safely enable RocksDB logging again (once disabled by default in FLINK-15068) by using a rolling logger, for example: - max log file size via {{state.backend.rocksdb.log.max-file-size}} - logging files to keep via {{state.backend.rocksdb.log.file-num}} - log directory {{state.backend.rocksdb.log.dir}}, e.g. to put these logs onto a (separate) volume that may not be local and is retained after container shutdown for debugging purposes was: Since FLINK-14482 has been merged now, we should also allow users to configure more than just the log level (FLINK-20911) but also the following parameters so that they can safely enable RocksDB logging again by using a rolling logger, for example: - max log file size via {{state.backend.rocksdb.log.max-file-size}} - logging files to keep via {{state.backend.rocksdb.log.file-num}} - log directory {{state.backend.rocksdb.log.dir}}, e.g. to put these logs onto a (separate) volume that may not be local and is retained after container shutdown for debugging purposes > Support configuration of the RocksDB logging via configuration > -- > > Key: FLINK-23812 > URL: https://issues.apache.org/jira/browse/FLINK-23812 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.13.2 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > > Since FLINK-14482 has been merged now, we should also allow users to > configure more than just the log level (FLINK-20911) but also the following > parameters so that they can safely enable RocksDB logging again (once > disabled by default in FLINK-15068) by using a rolling logger, for example: > - max log file size via {{state.backend.rocksdb.log.max-file-size}} > - logging files to keep via {{state.backend.rocksdb.log.file-num}} > - log directory {{state.backend.rocksdb.log.dir}}, e.g. to put these logs > onto a (separate) volume that may not be local and is retained after > container shutdown for debugging purposes -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23812) Support configuration of the RocksDB logging via configuration
Nico Kruber created FLINK-23812: --- Summary: Support configuration of the RocksDB logging via configuration Key: FLINK-23812 URL: https://issues.apache.org/jira/browse/FLINK-23812 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Affects Versions: 1.13.2 Reporter: Nico Kruber Assignee: Nico Kruber Since FLINK-14482 has been merged now, we should also allow users to configure more than just the log level (FLINK-20911) but also the following parameters so that they can safely enable RocksDB logging again by using a rolling logger, for example: - max log file size via {{state.backend.rocksdb.log.max-file-size}} - logging files to keep via {{state.backend.rocksdb.log.file-num}} - log directory {{state.backend.rocksdb.log.dir}}, e.g. to put these logs onto a (separate) volume that may not be local and is retained after container shutdown for debugging purposes -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-15747) Enable setting RocksDB log level from configuration
[ https://issues.apache.org/jira/browse/FLINK-15747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-15747. --- Fix Version/s: (was: 1.14.0) Resolution: Duplicate > Enable setting RocksDB log level from configuration > --- > > Key: FLINK-15747 > URL: https://issues.apache.org/jira/browse/FLINK-15747 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Yu Li >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned > > Currently to open the RocksDB local log, one has to create a customized > {{OptionsFactory}}, which is not quite convenient. This JIRA proposes to > enable setting it from configuration in flink-conf.yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20911) Support configuration of RocksDB log level
[ https://issues.apache.org/jira/browse/FLINK-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-20911: Fix Version/s: 1.13.0 > Support configuration of RocksDB log level > -- > > Key: FLINK-20911 > URL: https://issues.apache.org/jira/browse/FLINK-20911 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.12.1, 1.13.0 >Reporter: fanrui >Assignee: fanrui >Priority: Major > Labels: pull-request-available > Fix For: 1.13.0 > > > The LOG level of RocksDB is hard-coded and is always HEADER_LEVEL. Sometimes > RocksDB becomes unstable due to unreasonable parameter settings, and the > problem can be reproduced after restart. At this time, you may need to check > some LOGs to troubleshoot. > It may be better if the LOG level of RocksDB can be adjusted. Of course, the > default LOG level is still HEADER_LEVEL. When dealing with problems, it can > be adjusted to DEBUG_LEVEL. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-23667) Fix training exercises IDE setup description for Scala
[ https://issues.apache.org/jira/browse/FLINK-23667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reassigned FLINK-23667: --- Assignee: Nico Kruber (was: Nico Kruber) > Fix training exercises IDE setup description for Scala > -- > > Key: FLINK-23667 > URL: https://issues.apache.org/jira/browse/FLINK-23667 > Project: Flink > Issue Type: Bug > Components: Documentation / Training / Exercises >Affects Versions: 1.13.2 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Minor > Labels: pull-request-available > Fix For: 1.13.3 > > > If you follow the training exercises instructions to set up your IDE with > code formatting and the Save Actions plugin while having Scala enabled, it > will completely reformat your Scala code files instead of keeping them as is. > The instructions should be updated to match the ones used for the Flink main > project. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-23669) Avoid using Scala >= 2.12.8 in Flink Training exercises
[ https://issues.apache.org/jira/browse/FLINK-23669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reassigned FLINK-23669: --- Assignee: Nico Kruber (was: Nico Kruber) > Avoid using Scala >= 2.12.8 in Flink Training exercises > --- > > Key: FLINK-23669 > URL: https://issues.apache.org/jira/browse/FLINK-23669 > Project: Flink > Issue Type: Bug > Components: Documentation / Training / Exercises >Affects Versions: 1.13.2 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.13.3 > > > The current IDE setup instructions of the Flink training exercises do not > mention a specific Scala SDK to set up. For compatibility reasons described > in > https://ci.apache.org/projects/flink/flink-docs-stable/docs/dev/datastream/project-configuration/#scala-versions, > we should also not use 2.12.8 and up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-23669) Avoid using Scala >= 2.12.8 in Flink Training exercises
[ https://issues.apache.org/jira/browse/FLINK-23669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-23669. --- Resolution: Fixed Merged on master via 1704fd959122aa197b48803441338c1fb0470091 > Avoid using Scala >= 2.12.8 in Flink Training exercises > --- > > Key: FLINK-23669 > URL: https://issues.apache.org/jira/browse/FLINK-23669 > Project: Flink > Issue Type: Bug > Components: Documentation / Training / Exercises >Affects Versions: 1.13.2 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.13.3 > > > The current IDE setup instructions of the Flink training exercises do not > mention a specific Scala SDK to set up. For compatibility reasons described > in > https://ci.apache.org/projects/flink/flink-docs-stable/docs/dev/datastream/project-configuration/#scala-versions, > we should also not use 2.12.8 and up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-23667) Fix training exercises IDE setup description for Scala
[ https://issues.apache.org/jira/browse/FLINK-23667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-23667. --- Resolution: Fixed Merged on master via 08f5bf4d93d241f58d67148dc3e054a2325b64a0 > Fix training exercises IDE setup description for Scala > -- > > Key: FLINK-23667 > URL: https://issues.apache.org/jira/browse/FLINK-23667 > Project: Flink > Issue Type: Bug > Components: Documentation / Training / Exercises >Affects Versions: 1.13.2 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Minor > Labels: pull-request-available > Fix For: 1.13.3 > > > If you follow the training exercises instructions to set up your IDE with > code formatting and the Save Actions plugin while having Scala enabled, it > will completely reformat your Scala code files instead of keeping them as is. > The instructions should be updated to match the ones used for the Flink main > project. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23670) Add Scala formatting checks to training exercises
Nico Kruber created FLINK-23670: --- Summary: Add Scala formatting checks to training exercises Key: FLINK-23670 URL: https://issues.apache.org/jira/browse/FLINK-23670 Project: Flink Issue Type: Bug Components: Documentation / Training / Exercises Affects Versions: 1.13.2 Reporter: Nico Kruber Assignee: Nico Kruber Fix For: 1.13.3 Currently, there are no formatting checks for Scala code in the training exercises. We should employ the same checks that the main Flink project is using. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23669) Avoid using Scala >= 2.12.8 in Flink Training exercises
Nico Kruber created FLINK-23669: --- Summary: Avoid using Scala >= 2.12.8 in Flink Training exercises Key: FLINK-23669 URL: https://issues.apache.org/jira/browse/FLINK-23669 Project: Flink Issue Type: Bug Components: Documentation / Training / Exercises Affects Versions: 1.13.2 Reporter: Nico Kruber Assignee: Nico Kruber Fix For: 1.13.3 The current IDE setup instructions of the Flink training exercises do not mention a specific Scala SDK to set up. For compatibility reasons described in https://ci.apache.org/projects/flink/flink-docs-stable/docs/dev/datastream/project-configuration/#scala-versions, we should also not use 2.12.8 and up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-23667) Fix training exercises IDE setup description for Scala
[ https://issues.apache.org/jira/browse/FLINK-23667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394725#comment-17394725 ] Nico Kruber commented on FLINK-23667: - The same actually also applies to the new {{CONTRIBUTING.md}} file > Fix training exercises IDE setup description for Scala > -- > > Key: FLINK-23667 > URL: https://issues.apache.org/jira/browse/FLINK-23667 > Project: Flink > Issue Type: Bug > Components: Documentation / Training / Exercises >Affects Versions: 1.13.2 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Minor > Fix For: 1.13.3 > > > If you follow the training exercises instructions to set up your IDE with > code formatting and the Save Actions plugin while having Scala enabled, it > will completely reformat your Scala code files instead of keeping them as is. > The instructions should be updated to match the ones used for the Flink main > project. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23667) Fix training exercises IDE setup description for Scala
Nico Kruber created FLINK-23667: --- Summary: Fix training exercises IDE setup description for Scala Key: FLINK-23667 URL: https://issues.apache.org/jira/browse/FLINK-23667 Project: Flink Issue Type: Bug Components: Documentation / Training / Exercises Affects Versions: 1.13.2 Reporter: Nico Kruber Assignee: Nico Kruber Fix For: 1.13.3 If you follow the training exercises instructions to set up your IDE with code formatting and the Save Actions plugin while having Scala enabled, it will completely reformat your Scala code files instead of keeping them as is. The instructions should be updated to match the ones used for the Flink main project. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-23495) [GCP PubSub] Make checkpoint optional for preview/staging mode
[ https://issues.apache.org/jira/browse/FLINK-23495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394161#comment-17394161 ] Nico Kruber commented on FLINK-23495: - [~airblader] or [~arvid] do you think, there would be any downsides to just let this connector also work without checkpoints (without having to allow this actively) and in this case acknowledge every message in the source when sending it to the rest of the pipeline? -> I don't see a reason not to allow it - without checkpoints you are already working without fault tolerance so I guess, acknowledging messages (and keeping the rest best-effort) is the best (and reasonable) thing to do, isn't it? > [GCP PubSub] Make checkpoint optional for preview/staging mode > -- > > Key: FLINK-23495 > URL: https://issues.apache.org/jira/browse/FLINK-23495 > Project: Flink > Issue Type: Improvement > Components: Connectors / Google Cloud PubSub >Affects Versions: 1.13.0, 1.13.1 >Reporter: Brachi Packter >Priority: Major > Labels: pull-request-available > > I'm using PubSub connector with Flink sql. > The issue that I get all the time error that PubSub required checkpoints, My > question is if I/you can submit a PR that adds a property that can configure > PubSub to start without checkpoints, and we can describe that it is just for > preview/staging mode (interactive sql, Jupiter..) > Other connectors support starting without checkpoints. > What will be the impact for this change? I tried it locally and it seems to > work ok. > That is the code that always fail the source if no checkpoint is configured, > i want to add some condition here: > {code:java} > if (hasNoCheckpointingEnabled(getRuntimeContext())) { > throw new IllegalArgumentException( "The PubSubSource REQUIRES Checkpointing > to be enabled and " + "the checkpointing frequency must be MUCH lower than > the PubSub timeout for it to retry a message."); > } > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-23495) [GCP PubSub] Make checkpoint optional for preview/staging mode
[ https://issues.apache.org/jira/browse/FLINK-23495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394015#comment-17394015 ] Nico Kruber commented on FLINK-23495: - Looks ok to me as well since you wouldn't lose anything if you don't have checkpoints anyway. You'll have to fine-tune the code a bit more though, e.g. you are still populating {{acknowledgeOnCheckpoint}} even if you don't have checkpoints, and the early exit with {{collector.isEndOfStreamSignalled}} doesn't acknowledge anything in your code... > [GCP PubSub] Make checkpoint optional for preview/staging mode > -- > > Key: FLINK-23495 > URL: https://issues.apache.org/jira/browse/FLINK-23495 > Project: Flink > Issue Type: Improvement > Components: Connectors / Google Cloud PubSub >Affects Versions: 1.13.0, 1.13.1 >Reporter: Brachi Packter >Priority: Major > Labels: pull-request-available > > I'm using PubSub connector with Flink sql. > The issue that I get all the time error that PubSub required checkpoints, My > question is if I/you can submit a PR that adds a property that can configure > PubSub to start without checkpoints, and we can describe that it is just for > preview/staging mode (interactive sql, Jupiter..) > Other connectors support starting without checkpoints. > What will be the impact for this change? I tried it locally and it seems to > work ok. > That is the code that always fail the source if no checkpoint is configured, > i want to add some condition here: > {code:java} > if (hasNoCheckpointingEnabled(getRuntimeContext())) { > throw new IllegalArgumentException( "The PubSubSource REQUIRES Checkpointing > to be enabled and " + "the checkpointing frequency must be MUCH lower than > the PubSub timeout for it to retry a message."); > } > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-16556) TopSpeedWindowing should implement checkpointing for its source
[ https://issues.apache.org/jira/browse/FLINK-16556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reassigned FLINK-16556: --- Assignee: Liebing Yu > TopSpeedWindowing should implement checkpointing for its source > --- > > Key: FLINK-16556 > URL: https://issues.apache.org/jira/browse/FLINK-16556 > Project: Flink > Issue Type: Bug > Components: Examples >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Assignee: Liebing Yu >Priority: Minor > Labels: auto-deprioritized-major, starter > > {\{org.apache.flink.streaming.examples.windowing.TopSpeedWindowing.CarSource}} > does not implement checkpointing of its state, namely the current speeds and > distances per car. The main problem with this is that the window trigger only > fires if the new distance has increased by at least 50 but after restore, it > will be reset to 0 and could thus not produce output for a while. > > Either the distance calculation could use {{Math.abs}} or the source needs > proper checkpointing. Optionally with allowing the number of cars to > increase/decrease. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16556) TopSpeedWindowing should implement checkpointing for its source
[ https://issues.apache.org/jira/browse/FLINK-16556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393744#comment-17393744 ] Nico Kruber commented on FLINK-16556: - Looks about right...luckily, the source isn't declared as a parallel source, otherwise more changes would be needed. One thing I'm not 100% sure about: do we also need to serializer the Random instance in order to get the same sequence of actions after restart? > TopSpeedWindowing should implement checkpointing for its source > --- > > Key: FLINK-16556 > URL: https://issues.apache.org/jira/browse/FLINK-16556 > Project: Flink > Issue Type: Bug > Components: Examples >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Priority: Minor > Labels: auto-deprioritized-major, starter > > {\{org.apache.flink.streaming.examples.windowing.TopSpeedWindowing.CarSource}} > does not implement checkpointing of its state, namely the current speeds and > distances per car. The main problem with this is that the window trigger only > fires if the new distance has increased by at least 50 but after restore, it > will be reset to 0 and could thus not produce output for a while. > > Either the distance calculation could use {{Math.abs}} or the source needs > proper checkpointing. Optionally with allowing the number of cars to > increase/decrease. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-23495) [GCP PubSub] Make checkpoint optional for preview/staging mode
[ https://issues.apache.org/jira/browse/FLINK-23495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393734#comment-17393734 ] Nico Kruber commented on FLINK-23495: - Messages are only acknowledged once the checkpoint is complete on all tasks of the job. If there is any failure during processing or the checkpoint, they won't be ack'd and will be re-delivered after the job is restarted. If you ack them right after receiving them, you will not get these messages again upon failure and that only seems acceptable without checkpoints (which means no failure tolerance) > [GCP PubSub] Make checkpoint optional for preview/staging mode > -- > > Key: FLINK-23495 > URL: https://issues.apache.org/jira/browse/FLINK-23495 > Project: Flink > Issue Type: Improvement > Components: Connectors / Google Cloud PubSub >Affects Versions: 1.13.0, 1.13.1 >Reporter: Brachi Packter >Priority: Major > Labels: pull-request-available > > I'm using PubSub connector with Flink sql. > The issue that I get all the time error that PubSub required checkpoints, My > question is if I/you can submit a PR that adds a property that can configure > PubSub to start without checkpoints, and we can describe that it is just for > preview/staging mode (interactive sql, Jupiter..) > Other connectors support starting without checkpoints. > What will be the impact for this change? I tried it locally and it seems to > work ok. > That is the code that always fail the source if no checkpoint is configured, > i want to add some condition here: > {code:java} > if (hasNoCheckpointingEnabled(getRuntimeContext())) { > throw new IllegalArgumentException( "The PubSubSource REQUIRES Checkpointing > to be enabled and " + "the checkpointing frequency must be MUCH lower than > the PubSub timeout for it to retry a message."); > } > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22452) Support specifying custom transactional.id prefix in FlinkKafkaProducer
[ https://issues.apache.org/jira/browse/FLINK-22452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393242#comment-17393242 ] Nico Kruber commented on FLINK-22452: - Hmm...sorry, I overread this part. For discoverability reasons, I think, I would prefer these explanations in the docs. You wouldn't look into the javadocs of a function that you'd expect not to use... Maybe instead a short section under the "Troubleshooting" part of this page? So that users can find the symptoms they see and have the page point them to the solution... > Support specifying custom transactional.id prefix in FlinkKafkaProducer > --- > > Key: FLINK-22452 > URL: https://issues.apache.org/jira/browse/FLINK-22452 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.12.2 >Reporter: Wenhao Ji >Assignee: Wenhao Ji >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > Currently, the "transactional.id"s of the Kafka producers in > FlinkKafkaProducer are generated based on the task name. This mechanism has > some limitations: > * It will exceed Kafka's limitation if the task name is too long. (resolved > in FLINK-17691) > * They will very likely clash each other if the job topologies are similar. > (discussed in FLINK-11654) > * Only certain "transactional.id" may be authorized by [Prefixed > ACLs|https://docs.confluent.io/platform/current/kafka/authorization.html#prefixed-acls] > on the target Kafka cluster. > Besides, the spring community has introduced the > [setTransactionIdPrefix|https://docs.spring.io/spring-kafka/api/org/springframework/kafka/core/DefaultKafkaProducerFactory.html#setTransactionIdPrefix(java.lang.String)] > method to their Kafka client. > Therefore, I think it will be necessary to have this feature in the Flink > Kafka connector. > > As discussed in FLINK-11654, the possible solution will be, > * either introduce an additional method called > setTransactionalIdPrefix(String) in the FlinkKafkaProducer, > * or use the existing "transactional.id" properties as the prefix. > And the behavior of the "transactional.id" generation will be > * keep the behavior as it was if absent, > * use the one if present as the prefix for the TransactionalIdsGenerator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22452) Support specifying custom transactional.id prefix in FlinkKafkaProducer
[ https://issues.apache.org/jira/browse/FLINK-22452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393068#comment-17393068 ] Nico Kruber commented on FLINK-22452: - I think, for this property in particular, we should also explain where/why this is needed for some scenarios - so not just what it is doing but also where it helps (basically the text from your JavaDoc) > Support specifying custom transactional.id prefix in FlinkKafkaProducer > --- > > Key: FLINK-22452 > URL: https://issues.apache.org/jira/browse/FLINK-22452 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.12.2 >Reporter: Wenhao Ji >Assignee: Wenhao Ji >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > Currently, the "transactional.id"s of the Kafka producers in > FlinkKafkaProducer are generated based on the task name. This mechanism has > some limitations: > * It will exceed Kafka's limitation if the task name is too long. (resolved > in FLINK-17691) > * They will very likely clash each other if the job topologies are similar. > (discussed in FLINK-11654) > * Only certain "transactional.id" may be authorized by [Prefixed > ACLs|https://docs.confluent.io/platform/current/kafka/authorization.html#prefixed-acls] > on the target Kafka cluster. > Besides, the spring community has introduced the > [setTransactionIdPrefix|https://docs.spring.io/spring-kafka/api/org/springframework/kafka/core/DefaultKafkaProducerFactory.html#setTransactionIdPrefix(java.lang.String)] > method to their Kafka client. > Therefore, I think it will be necessary to have this feature in the Flink > Kafka connector. > > As discussed in FLINK-11654, the possible solution will be, > * either introduce an additional method called > setTransactionalIdPrefix(String) in the FlinkKafkaProducer, > * or use the existing "transactional.id" properties as the prefix. > And the behavior of the "transactional.id" generation will be > * keep the behavior as it was if absent, > * use the one if present as the prefix for the TransactionalIdsGenerator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22452) Support specifying custom transactional.id prefix in FlinkKafkaProducer
[ https://issues.apache.org/jira/browse/FLINK-22452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392905#comment-17392905 ] Nico Kruber commented on FLINK-22452: - Thanks for taking care of this. Looking at the commits, however, I do not see any additions to our docs that describe the problem and the fix that is now available by setting the prefix manually. Should this be added to https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/datastream/kafka/ ? > Support specifying custom transactional.id prefix in FlinkKafkaProducer > --- > > Key: FLINK-22452 > URL: https://issues.apache.org/jira/browse/FLINK-22452 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.12.2 >Reporter: Wenhao Ji >Assignee: Wenhao Ji >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > Currently, the "transactional.id"s of the Kafka producers in > FlinkKafkaProducer are generated based on the task name. This mechanism has > some limitations: > * It will exceed Kafka's limitation if the task name is too long. (resolved > in FLINK-17691) > * They will very likely clash each other if the job topologies are similar. > (discussed in FLINK-11654) > * Only certain "transactional.id" may be authorized by [Prefixed > ACLs|https://docs.confluent.io/platform/current/kafka/authorization.html#prefixed-acls] > on the target Kafka cluster. > Besides, the spring community has introduced the > [setTransactionIdPrefix|https://docs.spring.io/spring-kafka/api/org/springframework/kafka/core/DefaultKafkaProducerFactory.html#setTransactionIdPrefix(java.lang.String)] > method to their Kafka client. > Therefore, I think it will be necessary to have this feature in the Flink > Kafka connector. > > As discussed in FLINK-11654, the possible solution will be, > * either introduce an additional method called > setTransactionalIdPrefix(String) in the FlinkKafkaProducer, > * or use the existing "transactional.id" properties as the prefix. > And the behavior of the "transactional.id" generation will be > * keep the behavior as it was if absent, > * use the one if present as the prefix for the TransactionalIdsGenerator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-22405) Support fixed-lengh chars in the LeadLag built-in function
[ https://issues.apache.org/jira/browse/FLINK-22405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reassigned FLINK-22405: --- Assignee: liwei li > Support fixed-lengh chars in the LeadLag built-in function > -- > > Key: FLINK-22405 > URL: https://issues.apache.org/jira/browse/FLINK-22405 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.12.2, 1.13.0 >Reporter: Nico Kruber >Assignee: liwei li >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available, starter > > LeadLag aggregate function does not support type: ''CHAR'', as in the > following example (a CAST to VARCHAR works around this). Technically, there > should be no reason though to support STRING/VARCHAR but not CHAR: > {code:sql} > CREATE TEMPORARY VIEW test_cardinality AS > SELECT * FROM ( VALUES > ('Alice', 'al...@test.com', ARRAY [ 'al...@test.com' ], 'Test Ltd'), > ('Alice', 'al...@test.com', ARRAY [ 'al...@test.com' ], 'Test Ltd'), > ('Alice', 'al...@test2.com', ARRAY [ 'al...@test.com', 'al...@test2.com' ], > 'Test Ltd')) > AS t ( name, email, aliases, company ); > {code} > {code:sql} > SELECT > name, > LEAD(company, 0) AS company > FROM test_cardinality > WHERE CARDINALITY(aliases) >= 2 > GROUP BY name; > {code} > -> see > https://github.com/apache/flink/blob/release-1.13.0-rc1/flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/utils/AggFunctionFactory.scala#L331 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (FLINK-22405) Support fixed-lengh chars in the LeadLag built-in function
[ https://issues.apache.org/jira/browse/FLINK-22405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-22405: Comment: was deleted (was: I think, it is... (I see no need of maintaining a specialized implementation) Do you want to work on this, create a test and open a PR? In that case, I would assign you to this ticket.) > Support fixed-lengh chars in the LeadLag built-in function > -- > > Key: FLINK-22405 > URL: https://issues.apache.org/jira/browse/FLINK-22405 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.12.2, 1.13.0 >Reporter: Nico Kruber >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available, starter > > LeadLag aggregate function does not support type: ''CHAR'', as in the > following example (a CAST to VARCHAR works around this). Technically, there > should be no reason though to support STRING/VARCHAR but not CHAR: > {code:sql} > CREATE TEMPORARY VIEW test_cardinality AS > SELECT * FROM ( VALUES > ('Alice', 'al...@test.com', ARRAY [ 'al...@test.com' ], 'Test Ltd'), > ('Alice', 'al...@test.com', ARRAY [ 'al...@test.com' ], 'Test Ltd'), > ('Alice', 'al...@test2.com', ARRAY [ 'al...@test.com', 'al...@test2.com' ], > 'Test Ltd')) > AS t ( name, email, aliases, company ); > {code} > {code:sql} > SELECT > name, > LEAD(company, 0) AS company > FROM test_cardinality > WHERE CARDINALITY(aliases) >= 2 > GROUP BY name; > {code} > -> see > https://github.com/apache/flink/blob/release-1.13.0-rc1/flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/utils/AggFunctionFactory.scala#L331 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-23495) [GCP PubSub] Make checkpoint optional for preview/staging mode
[ https://issues.apache.org/jira/browse/FLINK-23495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392198#comment-17392198 ] Nico Kruber commented on FLINK-23495: - If you [look through the code|https://github.com/apache/flink/blob/99c2a415e9eeefafacf70762b6f54070f7911ceb/flink-connectors/flink-connector-gcp-pubsub/src/main/java/org/apache/flink/streaming/connectors/gcp/pubsub/common/AcknowledgeOnCheckpoint.java#L31], you will find the reason for requiring checkpoints: {quote} The mechanism for this source assumes that messages are identified by a unique ID. When messages are taken from the message queue, the message must not be dropped immediately from the external system, but must be retained until acknowledged. Messages that are not acknowledged within a certain time interval will be served again (to a different connection, established by the recovered source). {quote} If you do not have checkpoints, it looks like you'd have to acknowledge messages in a different way, e.g. when consuming them. In that case, the source would be at-most-once just like the Flink job, so that shouldn't be an issue. If you just remove the check, then all messages are unacknowledged and should be delivered again some time - probably not what you want. > [GCP PubSub] Make checkpoint optional for preview/staging mode > -- > > Key: FLINK-23495 > URL: https://issues.apache.org/jira/browse/FLINK-23495 > Project: Flink > Issue Type: Improvement > Components: Connectors / Google Cloud PubSub >Affects Versions: 1.13.0, 1.13.1 >Reporter: Brachi Packter >Priority: Major > Labels: pull-request-available > > I'm using PubSub connector with Flink sql. > The issue that I get all the time error that PubSub required checkpoints, My > question is if I/you can submit a PR that adds a property that can configure > PubSub to start without checkpoints, and we can describe that it is just for > preview/staging mode (interactive sql, Jupiter..) > Other connectors support starting without checkpoints. > What will be the impact for this change? I tried it locally and it seems to > work ok. > That is the code that always fail the source if no checkpoint is configured, > i want to add some condition here: > {code:java} > if (hasNoCheckpointingEnabled(getRuntimeContext())) { > throw new IllegalArgumentException( "The PubSubSource REQUIRES Checkpointing > to be enabled and " + "the checkpointing frequency must be MUCH lower than > the PubSub timeout for it to retry a message."); > } > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-23501) [GCP PubSub] add table source API for GCP PubSub
[ https://issues.apache.org/jira/browse/FLINK-23501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-23501: Issue Type: New Feature (was: Improvement) > [GCP PubSub] add table source API for GCP PubSub > - > > Key: FLINK-23501 > URL: https://issues.apache.org/jira/browse/FLINK-23501 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Ecosystem >Reporter: Brachi Packter >Priority: Major > Labels: pull-request-available > > I want to add Table API Support for GCP PubSub (Source Only for now) > Based on this document: > [https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/sourcessinks/] > Users will be able to run Flink SQL on PubSub. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22405) Support fixed-lengh chars in the LeadLag built-in function
[ https://issues.apache.org/jira/browse/FLINK-22405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392072#comment-17392072 ] Nico Kruber commented on FLINK-22405: - I think, it is... (I see no need of maintaining a specialized implementation) Do you want to work on this, create a test and open a PR? In that case, I would assign you to this ticket. > Support fixed-lengh chars in the LeadLag built-in function > -- > > Key: FLINK-22405 > URL: https://issues.apache.org/jira/browse/FLINK-22405 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.12.2, 1.13.0 >Reporter: Nico Kruber >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available, starter > > LeadLag aggregate function does not support type: ''CHAR'', as in the > following example (a CAST to VARCHAR works around this). Technically, there > should be no reason though to support STRING/VARCHAR but not CHAR: > {code:sql} > CREATE TEMPORARY VIEW test_cardinality AS > SELECT * FROM ( VALUES > ('Alice', 'al...@test.com', ARRAY [ 'al...@test.com' ], 'Test Ltd'), > ('Alice', 'al...@test.com', ARRAY [ 'al...@test.com' ], 'Test Ltd'), > ('Alice', 'al...@test2.com', ARRAY [ 'al...@test.com', 'al...@test2.com' ], > 'Test Ltd')) > AS t ( name, email, aliases, company ); > {code} > {code:sql} > SELECT > name, > LEAD(company, 0) AS company > FROM test_cardinality > WHERE CARDINALITY(aliases) >= 2 > GROUP BY name; > {code} > -> see > https://github.com/apache/flink/blob/release-1.13.0-rc1/flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/utils/AggFunctionFactory.scala#L331 -- This message was sent by Atlassian Jira (v8.3.4#803005)