[jira] [Created] (FLINK-22520) KafkaSourceLegacyITCase.testMultipleSourcesOnePartition hangs
Guowei Ma created FLINK-22520: - Summary: KafkaSourceLegacyITCase.testMultipleSourcesOnePartition hangs Key: FLINK-22520 URL: https://issues.apache.org/jira/browse/FLINK-22520 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: 1.14.0 Reporter: Guowei Ma There is no any error messages. https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=17363=logs=c5f0071e-1851-543e-9a45-9ac140befc32=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5=42023 {code:java} "main" #1 prio=5 os_prio=0 tid=0x7f4d3400b000 nid=0x203f waiting on condition [0x7f4d3be2e000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xa68f3b68> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.flink.test.util.TestUtils.tryExecute(TestUtils.java:49) at org.apache.flink.streaming.connectors.kafka.KafkaConsumerTestBase.runMultipleSourcesOnePartitionExactlyOnceTest(KafkaConsumerTestBase.java:1112) at org.apache.flink.connector.kafka.source.KafkaSourceLegacyITCase.testMultipleSourcesOnePartition(KafkaSourceLegacyITCase.java:87) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Release 1.13.0, release candidate #2
+1 (binding) - checked signatures and checksums - built from source - ran example jobs with parallelism=8000 on a YARN cluster. checked output and logs. - the website PR looks good Thanks, Zhu Matthias Pohl 于2021年4月29日周四 上午2:34写道: > Thanks Dawid and Guowei for managing this release. > > - downloaded the sources and binaries and checked the checksums > - built Flink from the downloaded sources > - executed example jobs with standalone deployments - I didn't find > anything suspicious in the logs > - reviewed release announcement pull request > > - I did a pass over dependency updates: git diff release-1.12.2 > release-1.13.0-rc2 */*.xml > There's one thing someone should double-check whether that's suppose to be > like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a > dependency but I don't see it being reflected in the NOTICE file of the > flink-python module. Or is this automatically added later on? > > +1 (non-binding; please see remark on dependency above) > > Matthias > > On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen wrote: > > > Glad to hear that outcome. And no worries about the false alarm. > > Thank you for doing thorough testing, this is very helpful! > > > > On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng > wrote: > > > > > After the investigation we found that this issue is caused by the > > > implementation of connector, not by the Flink framework. > > > > > > Sorry for the false alarm. > > > > > > Stephan Ewen 于2021年4月28日周三 下午3:23写道: > > > > > > > @Caizhi and @Becket - let me reach out to you to jointly debug this > > > issue. > > > > > > > > I am wondering if there is some incorrect reporting of failed events? > > > > > > > > On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng > > > wrote: > > > > > > > > > -1 > > > > > > > > > > We're testing this version on batch jobs with large (600~1000) > > > > parallelisms > > > > > and the following exception messages appear with high frequency: > > > > > > > > > > 2021-04-27 21:27:26 > > > > > org.apache.flink.util.FlinkException: An OperatorEvent from an > > > > > OperatorCoordinator to a task was lost. Triggering task failover to > > > > ensure > > > > > consistency. Event: '[NoMoreSplitEvent]', targetTask: - > > > > > execution #0 > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) > > > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > > > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > > > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) > > > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > > > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > > > > > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > > > > > at akka.japi.pf > > .UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > > > > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > > > > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > > > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > > > > > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > > > > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > > > > > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > > > > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > > > > > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > > > > > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > > > > > at > akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > > > > at > > > > > > > > > > > > > > > > > > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > > > > at > > > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > > > > at > > > > > > > > > > > > > > > > > > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > > > > > > Becket Qin is investigating this issue. > > > > > > > > > > > > >
[jira] [Created] (FLINK-22519) Have python-archives also take tar.gz
Yik San Chan created FLINK-22519: Summary: Have python-archives also take tar.gz Key: FLINK-22519 URL: https://issues.apache.org/jira/browse/FLINK-22519 Project: Flink Issue Type: New Feature Components: API / Python Reporter: Yik San Chan [python-archives|https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/python/python_config.html#python-archives] currently only takes zip. In our use case, we want to package the whole conda environment into python-archives, similar to how the [docs|https://ci.apache.org/projects/flink/flink-docs-stable/dev/python/faq.html#cluster] suggest about using venv (Python virtual environment). As we use PyFlink for ML, there are inevitably a few large dependencies (tensorflow, torch, pyarrow), as well as a lot of small dependencies. This pattern is not friendly for zip. According to the [post|https://superuser.com/a/173825], zip compresses each file independently, and it is not performing good when dealing with a lot of small files. On the other hand, tar simply bundles all files into a tarball, then we can apply gzip to the whole tarball to achieve smaller size. This may explain why the official packaging tool - conda pack - [conda pack|https://conda.github.io/conda-pack/] - produces tar.gz by default, even though zip is an option if we really want to. To further prove the idea, I use my laptop and conda env to run an experiment. My OS: macOS 10.15.7 # Create an environment.yaml as well as a requirements.txt # Run `conda env create -f environment.yaml` to create the conda env # Run conda pack to produce a tar.gz # Run conda pack faetflow-ml-env.zip to produce a zip # environment.yaml name: featflow-ml-env channels: - pytorch - conda-forge - defaults dependencies: - python=3.7 - pytorch=1.8.0 - scikit-learn=0.23.2 - pip - pip: - -r file:requirements.txt #requirements.txt apache-flink==1.12.0 deepctr-torch==0.2.6 black==20.8b1 confluent-kafka==1.6.0 pytest==6.2.2 testcontainers==3.4.0 kafka-python==2.0.2 End result: the tar.gz is 854M, the zip is 1.6G So, long story short, python-archives only support zip, while zip is not a good choice for packaging ML libs. Let's change this by adding python-archives tar.gz support. Change will happen in this way: In ProcessPythonEnvironmentManager.java, check the suffix. If tar.gz, unarchive it using gzip decompresser. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22518) Translate the page of "High Availability (HA)" into Chinese
movesan created FLINK-22518: --- Summary: Translate the page of "High Availability (HA)" into Chinese Key: FLINK-22518 URL: https://issues.apache.org/jira/browse/FLINK-22518 Project: Flink Issue Type: New Feature Components: chinese-translation Reporter: movesan The model of "High Availability (HA)" contains the following three pages: [https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/deployment/ha] [https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/deployment/ha/zookeeper_ha.html|https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/deployment/ha/zookeeper_ha.html,] [https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/deployment/ha/kubernetes_ha.html] The markdown file can be found in [https://github.com/apache/flink/tree/master/docs/content.zh/docs/deployment/ha] in English. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Release 1.13.0, release candidate #2
Hi, Matthias Thank you very much for your careful inspection. I check the flink-python_2.11-1.13.0.jar and we do not bundle org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it. So I think we may not need to add this to the NOTICE file. (BTW The jar's scope is runtime) Best, Guowei On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl wrote: > Thanks Dawid and Guowei for managing this release. > > - downloaded the sources and binaries and checked the checksums > - built Flink from the downloaded sources > - executed example jobs with standalone deployments - I didn't find > anything suspicious in the logs > - reviewed release announcement pull request > > - I did a pass over dependency updates: git diff release-1.12.2 > release-1.13.0-rc2 */*.xml > There's one thing someone should double-check whether that's suppose to be > like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a > dependency but I don't see it being reflected in the NOTICE file of the > flink-python module. Or is this automatically added later on? > > +1 (non-binding; please see remark on dependency above) > > Matthias > > On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen wrote: > > > Glad to hear that outcome. And no worries about the false alarm. > > Thank you for doing thorough testing, this is very helpful! > > > > On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng > wrote: > > > > > After the investigation we found that this issue is caused by the > > > implementation of connector, not by the Flink framework. > > > > > > Sorry for the false alarm. > > > > > > Stephan Ewen 于2021年4月28日周三 下午3:23写道: > > > > > > > @Caizhi and @Becket - let me reach out to you to jointly debug this > > > issue. > > > > > > > > I am wondering if there is some incorrect reporting of failed events? > > > > > > > > On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng > > > wrote: > > > > > > > > > -1 > > > > > > > > > > We're testing this version on batch jobs with large (600~1000) > > > > parallelisms > > > > > and the following exception messages appear with high frequency: > > > > > > > > > > 2021-04-27 21:27:26 > > > > > org.apache.flink.util.FlinkException: An OperatorEvent from an > > > > > OperatorCoordinator to a task was lost. Triggering task failover to > > > > ensure > > > > > consistency. Event: '[NoMoreSplitEvent]', targetTask: - > > > > > execution #0 > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) > > > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > > > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > > > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) > > > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > > > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > > > > > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > > > > > at akka.japi.pf > > .UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > > > > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > > > > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > > > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > > > > > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > > > > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > > > > > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > > > > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > > > > > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > > > > > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > > > > > at > akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > > > > at > > > > > > > > > > > > > > > > > > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > > > > at > > > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > > > > at > > > > > > > > > > > > > > > > > > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > > > > > > Becket Qin is investigating this
[jira] [Created] (FLINK-22517) Fix pickle compatibility problem in different Python versions
Huang Xingbo created FLINK-22517: Summary: Fix pickle compatibility problem in different Python versions Key: FLINK-22517 URL: https://issues.apache.org/jira/browse/FLINK-22517 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.12.3, 1.13.0 Reporter: Huang Xingbo Assignee: Huang Xingbo Since release-1.12, PyFlink has supported Python3 8. Starting from Python 3.8, the default protocol version used by pickle is pickle5(https://www.python.org/dev/peps/pep-0574/), which will raising the following exception if the client uses python 3.8 to compile program and the cluster node uses python 3.7 or python 3.6 to run python udf: {code:python} ValueError: unsupported pickle protocol: 5 {code} The workaround is to first let the python version used by the client be 3.6 or 3.7. For how to specify the client-side python execution environment, please refer to the doc(https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/python/python_config.html#python-client-executable). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22516) ResourceManager cannot establish leadership
Ricky Burnett created FLINK-22516: - Summary: ResourceManager cannot establish leadership Key: FLINK-22516 URL: https://issues.apache.org/jira/browse/FLINK-22516 Project: Flink Issue Type: Bug Reporter: Ricky Burnett Attachments: jobmanager_leadership.log We are running Flink clusters with 2 Jobmanagers in HA mode. After a Zookeeper restart the two JMs begin leadership election end up in state where they are both trying to start their ResourceManager and until one of them writes to `leader//resource_manager_lock` and the JobMaster proceeds to execute `notifyOfNewResourceManagerLeader` which restarts the ResourceManager. This in turn writes to `leader//resource_manager_lock` which triggers the other JobMaster to restart it's ResourceManager. We can see this in the logs from the "ResourceManager leader changed to new address" log, that goes back and forth between the two JMs and the two IP addresses. This cycle appears to continue indefinitely with outside interruption. I've attached combined logs from two JMs in our environment that got into this state. The logs start with the loss of connection and end with a couple of cycles of back and forth. The two relevant hosts are "flink-jm-828d4aa2-d4d4-457b-995d-feb56d08c1fb-784cdb9c57-tsxb7" and "flink-jm-828d4aa2-d4d4-457b-995d-feb56d08c1fb-784cdb9c57-mpf9x". *-tsxb7 appears to be the last host that was granted leadership. {code:java} {"thread":"Curator-Framework-0-EventThread","level":"INFO","loggerName":"org.apache.flink.runtime.jobmaster.JobManagerRunner","message":"JobManager runner for job tenant: ssademo, pipeline: 828d4aa2-d4d4-457b-995d-feb56d08c1fb, name: integration-test-detection (33e12948df69077ab3b33316eacbb5e4) was granted leadership with session id 97992805-9c60-40ba-8260-aaf036694cde at akka.tcp://flink@100.97.92.73:6123/user/jobmanager_3.","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","instant":{"epochSecond":1617129712,"nanoOfSecond":44700},"contextMap":{},"threadId":152,"threadPriority":5,"source":{"class":"org.apache.flink.runtime.jobmaster.JobManagerRunner","method":"startJobMaster","file":"JobManagerRunner.java","line":313},"service":"streams","time":"2021-03-30T18:41:52.447UTC","hostname":"flink-jm-828d4aa2-d4d4-457b-995d-feb56d08c1fb-784cdb9c57-tsxb7"} {code} But *-mpf9x continues to try to wrestle control back. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Release 1.13.0, release candidate #2
Thanks Dawid and Guowei for managing this release. - downloaded the sources and binaries and checked the checksums - built Flink from the downloaded sources - executed example jobs with standalone deployments - I didn't find anything suspicious in the logs - reviewed release announcement pull request - I did a pass over dependency updates: git diff release-1.12.2 release-1.13.0-rc2 */*.xml There's one thing someone should double-check whether that's suppose to be like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a dependency but I don't see it being reflected in the NOTICE file of the flink-python module. Or is this automatically added later on? +1 (non-binding; please see remark on dependency above) Matthias On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen wrote: > Glad to hear that outcome. And no worries about the false alarm. > Thank you for doing thorough testing, this is very helpful! > > On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng wrote: > > > After the investigation we found that this issue is caused by the > > implementation of connector, not by the Flink framework. > > > > Sorry for the false alarm. > > > > Stephan Ewen 于2021年4月28日周三 下午3:23写道: > > > > > @Caizhi and @Becket - let me reach out to you to jointly debug this > > issue. > > > > > > I am wondering if there is some incorrect reporting of failed events? > > > > > > On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng > > wrote: > > > > > > > -1 > > > > > > > > We're testing this version on batch jobs with large (600~1000) > > > parallelisms > > > > and the following exception messages appear with high frequency: > > > > > > > > 2021-04-27 21:27:26 > > > > org.apache.flink.util.FlinkException: An OperatorEvent from an > > > > OperatorCoordinator to a task was lost. Triggering task failover to > > > ensure > > > > consistency. Event: '[NoMoreSplitEvent]', targetTask: - > > > > execution #0 > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) > > > > at > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > > > > at > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > > > > at > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) > > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > > > > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > > > > at akka.japi.pf > .UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > > > > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > > > > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > > > > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > > > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > > > > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > > > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > > > > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > > > > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > > > > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > > > at > > > > > > > > > > > > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > > > at > > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > > > at > > > > > > > > > > > > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > > > > Becket Qin is investigating this issue. > > > > > > > > >
[jira] [Created] (FLINK-22515) Add Documentation for Flink Glue Schema Registry Integration
Linyu Yao created FLINK-22515: - Summary: Add Documentation for Flink Glue Schema Registry Integration Key: FLINK-22515 URL: https://issues.apache.org/jira/browse/FLINK-22515 Project: Flink Issue Type: New Feature Components: Documentation Reporter: Linyu Yao Add documentation for Flink Glue Schema Registry integration to the page of Kafka Connector and Kinesis Connector: [https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/connectors/kafka.html#apache-kafka-connector] [https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/connectors/kinesis.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22514) TypeExtractor - Improving log message
Avishay Balderman created FLINK-22514: - Summary: TypeExtractor - Improving log message Key: FLINK-22514 URL: https://issues.apache.org/jira/browse/FLINK-22514 Project: Flink Issue Type: Improvement Components: API / DataStream Reporter: Avishay Balderman org.apache.flink.api.java.typeutils.TypeExtractor is checking if a field in a class is a "valid POJO field" . The method that is responsible for this is: {code:java} isValidPojoField{code} When isValidPojoField find an invalid field a log message is written (see below) but the log message does not tell which field is invalid... So now the developer needs to find out the "bad" field. Adding the field info to the log message is easy and save the developer time. {code:java} for (Field field : fields) { Type fieldType = field.getGenericType(); if(!isValidPojoField(field, clazz, typeHierarchy) && clazz != Row.class) { LOG.info("Class " + clazz + " cannot be used as a POJO type because not all fields are valid POJO fields, " + "and must be processed as GenericType. Please read the Flink documentation " + "on \"Data Types & Serialization\" for details of the effect on performance."); return null; } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22513) Failed to upload compile artifacts
Dawid Wysakowicz created FLINK-22513: Summary: Failed to upload compile artifacts Key: FLINK-22513 URL: https://issues.apache.org/jira/browse/FLINK-22513 Project: Flink Issue Type: Bug Components: Build System / Azure Pipelines Affects Versions: 1.13.0 Reporter: Dawid Wysakowicz https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=17343=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=5e2d176c-a6d4-5b54-7808-b11714e363ad -- This message was sent by Atlassian Jira (v8.3.4#803005)
Does flink support multi cluster deployment with HA on multi k8s clusters
Hi Team, production requirement is to deploy flink in multi cluster mode, i.e deploying flink cluster1 with HA on kubernetes cluster1 in data center1 & another flink cluster2 with HA on kubernetes cluster2 in data center2 .. if Flink cluster1 goes down on k8s cluster1 on DC1 ,it has to fail over to Flink cluster2 on k8s cluster2 on DC2. It has to failover with automatic HA mechanism. Please let us know, whether is this possible ?? or any solution is provided to have flink in multi cluster mode with HA .. Any solution please share the information Note: Currently deployed Flink session cluster on standalone kubernetes with HA(Kubernetes HA) -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
[jira] [Created] (FLINK-22512) Can't call current_timestamp with hive dialect for hive-3.1
Rui Li created FLINK-22512: -- Summary: Can't call current_timestamp with hive dialect for hive-3.1 Key: FLINK-22512 URL: https://issues.apache.org/jira/browse/FLINK-22512 Project: Flink Issue Type: Bug Components: Connectors / Hive Reporter: Rui Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: flink1.12.1 read hive orc table exception. could not initialize class org.apache.orc.impl.ZlibCodec
Hi WSJian, The pictures can't be displayed. Please just copy/paste the stack trace. And btw, the user mailing list (u...@flink.apache.org) is usually more appropriate for such issues. On Wed, Apr 28, 2021 at 11:36 AM WSJian <526165...@qq.com> wrote: > flink lib jar: > > flink taskmanager log: > > hive orc table create ddl: > create table xxx ... stored as orc > > flink java class, Join the hive Orc table and Kafka stream data: > > bsTableEnv.executeSql("my sql is join"); > > flink pom.xml in the attachment. > > > I convert the file type to textfile type to run. But orc throw this exception. > > Or I missing any dependency on the flink lib? Or class file conflict? > > > > > > > > > > > > > > > > -- Best regards! Rui Li
[RESULT][VOTE] Release 1.12.3, release candidate #1
Dear devs, I'm happy to announce that we have unanimously approved this release. There are 3 approving votes, 3 of which are binding: * Dian * Dawid * Robert There are no disapproving votes. Thanks everyone! Your friendly release manager Arvid On Wed, Apr 28, 2021 at 2:25 PM Arvid Heise wrote: > According to the latest exchange on "[VOTE] Release 1.13.0, release > candidate #2", it was a false alarm, so I'm going ahead with the release. > > On Wed, Apr 28, 2021 at 9:26 AM Stephan Ewen wrote: > >> A quick heads-up: >> >> A fix from 1.13.0 that I backported to 1.12.3 is apparently causing some >> issues at larger batch scale. >> We are investigating this, but it would affect this release as well. >> >> Please see the mail thread "[VOTE] Release 1.13.0, release candidate #2" >> for details. >> >> If we want to make sure this release isn't affected, we revert that issue. >> The tickets are: >> - https://issues.apache.org/jira/browse/FLINK-21996 >> - https://issues.apache.org/jira/browse/FLINK-18071 >> >> On Wed, Apr 28, 2021 at 7:59 AM Robert Metzger >> wrote: >> >> > +1 (binding) >> > >> > - started cluster, ran example job on macos >> > - sources look fine >> > - Eyeballed the diff: >> > >> https://github.com/apache/flink/compare/release-1.12.2...release-1.12.3-rc1 >> > . >> > According to "git diff release-1.12.2...release-1.12.3-rc1 '*.xml'", >> there >> > was only one external dependency change (snappy-java, which seems to be >> > properly reflected in the NOTICE file) >> > - the last CI run of the "release-1.12" branch is okay-ish: >> > >> > >> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=17315=results >> > (this one failure occurred: >> > https://issues.apache.org/jira/browse/FLINK-20950) >> > >> > >> > On Tue, Apr 27, 2021 at 5:03 PM Dawid Wysakowicz < >> dwysakow...@apache.org> >> > wrote: >> > >> > > +1 (binding) >> > > >> > > - Verified checksums and signatures >> > > - Reviewed the website PR >> > > - Built from sources >> > > - verified dependency version upgrades and updates in NOTICE files >> > > compared to 1.12.2 >> > > - started cluster and run WordCount example in BATCH mode and >> everything >> > > looked good >> > > >> > > On 23/04/2021 23:52, Arvid Heise wrote: >> > > > Hi everyone, >> > > > Please review and vote on the release candidate #1 for the version >> > > 1.12.3, >> > > > as follows: >> > > > [ ] +1, Approve the release >> > > > [ ] -1, Do not approve the release (please provide specific >> comments) >> > > > >> > > > >> > > > The complete staging area is available for your review, which >> includes: >> > > > * JIRA release notes [1], >> > > > * the official Apache source release and binary convenience >> releases to >> > > be >> > > > deployed to dist.apache.org [2], which are signed with the key with >> > > > fingerprint 476DAA5D1FF08189 [3], >> > > > * all artifacts to be deployed to the Maven Central Repository [4], >> > > > * source code tag "release-1.2.3-rc3" [5], >> > > > * website pull request listing the new release and adding >> announcement >> > > blog >> > > > post [6]. >> > > > >> > > > The vote will be open for at least 72 hours. It is adopted by >> majority >> > > > approval, with at least 3 PMC affirmative votes. >> > > > >> > > > Thanks, >> > > > Your friendly release manager Arvid >> > > > >> > > > [1] >> > > > >> > > >> > >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12349691 >> > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.12.3-rc1/ >> > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS >> > > > [4] >> > > >> https://repository.apache.org/content/repositories/orgapacheflink-1419 >> > > > [5] https://github.com/apache/flink/releases/tag/release-1.12.3-rc1 >> > > > [6] https://github.com/apache/flink-web/pull/437 >> > > > >> > > >> > > >> > >> >
Re: [VOTE] Release 1.12.3, release candidate #1
According to the latest exchange on "[VOTE] Release 1.13.0, release candidate #2", it was a false alarm, so I'm going ahead with the release. On Wed, Apr 28, 2021 at 9:26 AM Stephan Ewen wrote: > A quick heads-up: > > A fix from 1.13.0 that I backported to 1.12.3 is apparently causing some > issues at larger batch scale. > We are investigating this, but it would affect this release as well. > > Please see the mail thread "[VOTE] Release 1.13.0, release candidate #2" > for details. > > If we want to make sure this release isn't affected, we revert that issue. > The tickets are: > - https://issues.apache.org/jira/browse/FLINK-21996 > - https://issues.apache.org/jira/browse/FLINK-18071 > > On Wed, Apr 28, 2021 at 7:59 AM Robert Metzger > wrote: > > > +1 (binding) > > > > - started cluster, ran example job on macos > > - sources look fine > > - Eyeballed the diff: > > > https://github.com/apache/flink/compare/release-1.12.2...release-1.12.3-rc1 > > . > > According to "git diff release-1.12.2...release-1.12.3-rc1 '*.xml'", > there > > was only one external dependency change (snappy-java, which seems to be > > properly reflected in the NOTICE file) > > - the last CI run of the "release-1.12" branch is okay-ish: > > > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=17315=results > > (this one failure occurred: > > https://issues.apache.org/jira/browse/FLINK-20950) > > > > > > On Tue, Apr 27, 2021 at 5:03 PM Dawid Wysakowicz > > > wrote: > > > > > +1 (binding) > > > > > > - Verified checksums and signatures > > > - Reviewed the website PR > > > - Built from sources > > > - verified dependency version upgrades and updates in NOTICE files > > > compared to 1.12.2 > > > - started cluster and run WordCount example in BATCH mode and > everything > > > looked good > > > > > > On 23/04/2021 23:52, Arvid Heise wrote: > > > > Hi everyone, > > > > Please review and vote on the release candidate #1 for the version > > > 1.12.3, > > > > as follows: > > > > [ ] +1, Approve the release > > > > [ ] -1, Do not approve the release (please provide specific comments) > > > > > > > > > > > > The complete staging area is available for your review, which > includes: > > > > * JIRA release notes [1], > > > > * the official Apache source release and binary convenience releases > to > > > be > > > > deployed to dist.apache.org [2], which are signed with the key with > > > > fingerprint 476DAA5D1FF08189 [3], > > > > * all artifacts to be deployed to the Maven Central Repository [4], > > > > * source code tag "release-1.2.3-rc3" [5], > > > > * website pull request listing the new release and adding > announcement > > > blog > > > > post [6]. > > > > > > > > The vote will be open for at least 72 hours. It is adopted by > majority > > > > approval, with at least 3 PMC affirmative votes. > > > > > > > > Thanks, > > > > Your friendly release manager Arvid > > > > > > > > [1] > > > > > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12349691 > > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.12.3-rc1/ > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS > > > > [4] > > > https://repository.apache.org/content/repositories/orgapacheflink-1419 > > > > [5] https://github.com/apache/flink/releases/tag/release-1.12.3-rc1 > > > > [6] https://github.com/apache/flink-web/pull/437 > > > > > > > > > > > > >
[jira] [Created] (FLINK-22511) Fix the bug of non-composite result type in Python TableAggregateFunction
Huang Xingbo created FLINK-22511: Summary: Fix the bug of non-composite result type in Python TableAggregateFunction Key: FLINK-22511 URL: https://issues.apache.org/jira/browse/FLINK-22511 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.13.0 Reporter: Huang Xingbo Assignee: Huang Xingbo -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22510) ConfigurationUtils always serializes Duration to nanoseconds
Ingo Bürk created FLINK-22510: - Summary: ConfigurationUtils always serializes Duration to nanoseconds Key: FLINK-22510 URL: https://issues.apache.org/jira/browse/FLINK-22510 Project: Flink Issue Type: Improvement Affects Versions: 1.12.2, 1.13.0 Reporter: Ingo Bürk ConfigurationUtils#convertToString is hard-coded to serialize instances of Duration into nanoseconds. This often produces output which isn't really human-understandable. It would be good to be a bit smarter here in how this is serialized to make it more natural to humans. A possible solution could be choosing the lowest unit with which the value can be expressed as an integer, e.g. "1 min", "35 s", "3256 ms", … -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Release 1.13.0, release candidate #2
Glad to hear that outcome. And no worries about the false alarm. Thank you for doing thorough testing, this is very helpful! On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng wrote: > After the investigation we found that this issue is caused by the > implementation of connector, not by the Flink framework. > > Sorry for the false alarm. > > Stephan Ewen 于2021年4月28日周三 下午3:23写道: > > > @Caizhi and @Becket - let me reach out to you to jointly debug this > issue. > > > > I am wondering if there is some incorrect reporting of failed events? > > > > On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng > wrote: > > > > > -1 > > > > > > We're testing this version on batch jobs with large (600~1000) > > parallelisms > > > and the following exception messages appear with high frequency: > > > > > > 2021-04-27 21:27:26 > > > org.apache.flink.util.FlinkException: An OperatorEvent from an > > > OperatorCoordinator to a task was lost. Triggering task failover to > > ensure > > > consistency. Event: '[NoMoreSplitEvent]', targetTask: - > > > execution #0 > > > at > > > > > > > > > org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) > > > at > > > > > > > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > > > at > > > > > > > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > > > at > > > > > > > > > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > > > at > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) > > > at > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) > > > at > > > > > > > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > > > at > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > > > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > > > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > > > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > > > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > > > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > > > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > > > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > > > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > > > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > > at > > > > > > > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > > at > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > > at > > > > > > > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > > Becket Qin is investigating this issue. > > > > > >
[jira] [Created] (FLINK-22509) ./bin/flink run -m yarn-cluster -d submission leads to IllegalStateException
Robert Metzger created FLINK-22509: -- Summary: ./bin/flink run -m yarn-cluster -d submission leads to IllegalStateException Key: FLINK-22509 URL: https://issues.apache.org/jira/browse/FLINK-22509 Project: Flink Issue Type: Bug Components: Deployment / YARN Affects Versions: 1.13.0, 1.14.0 Reporter: Robert Metzger Submitting a detached, per-job YARN cluster in Flink (like this: {{./bin/flink run -m yarn-cluster -d ./examples/streaming/TopSpeedWindowing.jar}}), leads to the following exception: {code} 2021-04-28 11:39:00,786 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Found Web Interface ip-172-31-27-232.eu-central-1.compute.internal:45689 of application 'application_1619607372651_0005'. Job has been submitted with JobID 5543e81db9c2de78b646088891f23bfc Exception in thread "Thread-4" java.lang.IllegalStateException: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration 'classloader.check-leaked-classloader'. at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164) at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183) at org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2570) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2783) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2758) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2638) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1100) at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1707) at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1688) at org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183) at org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145) at org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102) {code} The job is still running as expected. Detached submission with {{./bin/flink run-application -t yarn-application -d}} works as expected. This is also the documented approach. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Release 1.13.0, release candidate #2
After the investigation we found that this issue is caused by the implementation of connector, not by the Flink framework. Sorry for the false alarm. Stephan Ewen 于2021年4月28日周三 下午3:23写道: > @Caizhi and @Becket - let me reach out to you to jointly debug this issue. > > I am wondering if there is some incorrect reporting of failed events? > > On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng wrote: > > > -1 > > > > We're testing this version on batch jobs with large (600~1000) > parallelisms > > and the following exception messages appear with high frequency: > > > > 2021-04-27 21:27:26 > > org.apache.flink.util.FlinkException: An OperatorEvent from an > > OperatorCoordinator to a task was lost. Triggering task failover to > ensure > > consistency. Event: '[NoMoreSplitEvent]', targetTask: - > > execution #0 > > at > > > > > org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) > > at > > > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > > at > > > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > > at > > > > > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > > at > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) > > at > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) > > at > > > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > > at > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > at > > > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > at > > > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > Becket Qin is investigating this issue. > > >
[jira] [Created] (FLINK-22508) add serialVersionUID for all Serializable class in non-table module
lincoln lee created FLINK-22508: --- Summary: add serialVersionUID for all Serializable class in non-table module Key: FLINK-22508 URL: https://issues.apache.org/jira/browse/FLINK-22508 Project: Flink Issue Type: Sub-task Reporter: lincoln lee -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22507) add serialVersionUID for all Serializable class in table module
lincoln lee created FLINK-22507: --- Summary: add serialVersionUID for all Serializable class in table module Key: FLINK-22507 URL: https://issues.apache.org/jira/browse/FLINK-22507 Project: Flink Issue Type: Sub-task Components: Table SQL / Runtime Reporter: lincoln lee -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted
Paul Lin created FLINK-22506: Summary: YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted Key: FLINK-22506 URL: https://issues.apache.org/jira/browse/FLINK-22506 Project: Flink Issue Type: Bug Components: Deployment / YARN Affects Versions: 1.11.3 Reporter: Paul Lin If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) occurs during the initiation of the job manager, the job cluster exits with an error code. But since it does not mark the attempt as failed, it won't be count as a failed attempt, and YARN will keep retrying forever. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22505) Limit the precision of Resource
Yangze Guo created FLINK-22505: -- Summary: Limit the precision of Resource Key: FLINK-22505 URL: https://issues.apache.org/jira/browse/FLINK-22505 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.13.0 Reporter: Yangze Guo In our internal deployment, we found that a high precision {{CPUResource}} may cause the required resource never to be fulfilled. Think about the following scenario: - The {{SlotManager}} receives a slot request with 1.001 CPU and decides to allocate a pending task manager with that resource spec. - The resource manager starts a task manager and sets the CPU by dynamic config. In this step, we cast the {{CPUResource}} to a double value, where the precision loss happens. The task manager will finally register with 1.0 CPU and thus can not deduct any pending task manager or fulfill the slot request. To solve that issue, we proposed to limit the precision of Resource to a safe value, e.g. 8, to prevent the precision loss when cast to double. - For {{CPUResource}}, the supported scale for the CPU is 3 in k8s while in Yarn, the CPU should be an integer. - For {{ExternalResource}}, the value will always be treated as an integer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22503) getting eroor on select query in flink sql
Bhagi created FLINK-22503: - Summary: getting eroor on select query in flink sql Key: FLINK-22503 URL: https://issues.apache.org/jira/browse/FLINK-22503 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.12.2 Environment: Hi Team, I added source table *orders* with filesystem connector using csv file. when i am running the select * from orders, its throwing error. even i chaged the mode also not working. !image-2021-04-28-13-33-36-784.png! [ERROR] Could not execute SQL statement. Reason: org.apache.flink.table.client.gateway.SqlExecutionException: Results of batch queries can only be served in table or tableau mode 2) I used sink as Elasticsearch , but not able to see the data in elasticsearch index.. Can you please check and help me two issues !image-2021-04-28-13-40-42-375.png! !image-2021-04-28-13-43-00-455.png! Reporter: Bhagi Attachments: image-2021-04-28-13-33-36-784.png, image-2021-04-28-13-40-42-375.png, image-2021-04-28-13-43-00-455.png -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22502) DefaultCompletedCheckpointStore drops unrecoverable checkpoints silently
Till Rohrmann created FLINK-22502: - Summary: DefaultCompletedCheckpointStore drops unrecoverable checkpoints silently Key: FLINK-22502 URL: https://issues.apache.org/jira/browse/FLINK-22502 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing, Runtime / Coordination Affects Versions: 1.12.2, 1.11.3, 1.13.0, 1.14.0 Reporter: Till Rohrmann Fix For: 1.14.0, 1.13.1, 1.12.4 The {{DefaultCompletedCheckpointStore.recover()}} tries to be resilient if it cannot recover a checkpoint (e.g. due to a transient storage outage or a checkpoint being corrupted). This behaviour was introduced with FLINK-7783. The problem is that this behaviour might cause us to ignore the latest valid checkpoint if there is a transient problem when restoring it. This might be ok for at least once processing guarantees, but it clearly violates exactly once processing guarantees. On top of it, it is very hard to spot. I propose to change this behaviour so that {{DefaultCompletedCheckpointStore.recover()}} fails if it cannot read the checkpoints it is supposed to read. If the {{recover}} method fails during a recovery, it will kill the process. This will usually restart the process which will retry the checkpoint recover operation. If the problem is of transient nature, then it should eventually succeed. In case that this problem occurs during an initial job submission, then the job will directly transition to a {{FAILED}} state. The proposed behaviour entails that if there is a permanent problem with the checkpoint (e.g. corrupted checkpoint), then Flink won't be able to recover without the intervention of the user. I believe that this is the right decision because Flink can no longer give exactly once guarantees in this situation and a user needs to explicitly resolve this situation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22501) Flink HiveCatalog ParquetTable join Dim table error
zhanglw created FLINK-22501: --- Summary: Flink HiveCatalog ParquetTable join Dim table error Key: FLINK-22501 URL: https://issues.apache.org/jira/browse/FLINK-22501 Project: Flink Issue Type: Bug Affects Versions: 1.12.0 Reporter: zhanglw Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:244) at org.apache.flink.hive.shaded.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes(Binary.java:424) at org.apache.flink.hive.shaded.formats.parquet.vector.ParquetDictionary.decodeToBinary(ParquetDictionary.java:59) at org.apache.flink.table.data.vector.heap.HeapBytesVector.getBytes(HeapBytesVector.java:124) at org.apache.flink.table.data.vector.VectorizedColumnBatch.getByteArray(VectorizedColumnBatch.java:97) at org.apache.flink.table.data.ColumnarRowData.getString(ColumnarRowData.java:113) at BatchCalc$12.processElement(Unknown Source) at org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:112) at org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:93) at org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39) at org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask$AsyncDataOutputToOutput.emitRecord(SourceOperatorStreamTask.java:163) at org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:110) at org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:101) at org.apache.flink.connector.file.src.impl.FileSourceRecordEmitter.emitRecord(FileSourceRecordEmitter.java:45) at org.apache.flink.connector.file.src.impl.FileSourceRecordEmitter.emitRecord(FileSourceRecordEmitter.java:35) at org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:126) at org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:251) at org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:65) at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:372) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:575) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:539) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:547) at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22500) flink stop 命令找不到00000000000000000000000000000000
陈孝忠 created FLINK-22500: --- Summary: flink stop 命令找不到 Key: FLINK-22500 URL: https://issues.apache.org/jira/browse/FLINK-22500 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Reporter: 陈孝忠 flin 1.12.1.版本, k8s applicton mode 使用K8S 做HA stop 命令和 run 都会报不到 下面是 STOP 的日志 开始提交任务:2021-04-28 06:18:37开始提交任务:2021-04-28 06:18:37启动命令:/opt/flink/bin/flink stop --target kubernetes-application -Dkubernetes.namespace=middle-flink -Dkubernetes.config.file=/opt/flink/conf/kubeconfig -Dkubernetes.cluster-id=test0001Suspending job "" with a savepoint. 2021-04-28 06:18:39,574 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Retrieve flink cluster market-data-full-chain-0427 successfully, JobManager Web Interface: http://market-data-full-chain-0427-rest.middle-flink:10243 rs=1 The program finished with the following exception: org.apache.flink.util.FlinkException: Could not stop with a savepoint job "". at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:585) at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1006) at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:573) at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1073) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1136) at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1136)Caused by: java.util.concurrent.ExecutionException: java.util.concurrent.CompletionException: org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could not find Flink job () at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:583) ... 6 moreCaused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could not find Flink job () at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:989) at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2137) at org.apache.flink.runtime.dispatcher.Dispatcher.performOperationOnJobMasterGateway(Dispatcher.java:910) at org.apache.flink.runtime.dispatcher.Dispatcher.stopWithSavepoint(Dispatcher.java:709) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:306) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:213) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:159) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at akka.actor.Actor.aroundReceive(Actor.scala:517) at akka.actor.Actor.aroundReceive$(Actor.scala:515) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)Caused by:
Re: [VOTE] Release 1.12.3, release candidate #1
A quick heads-up: A fix from 1.13.0 that I backported to 1.12.3 is apparently causing some issues at larger batch scale. We are investigating this, but it would affect this release as well. Please see the mail thread "[VOTE] Release 1.13.0, release candidate #2" for details. If we want to make sure this release isn't affected, we revert that issue. The tickets are: - https://issues.apache.org/jira/browse/FLINK-21996 - https://issues.apache.org/jira/browse/FLINK-18071 On Wed, Apr 28, 2021 at 7:59 AM Robert Metzger wrote: > +1 (binding) > > - started cluster, ran example job on macos > - sources look fine > - Eyeballed the diff: > https://github.com/apache/flink/compare/release-1.12.2...release-1.12.3-rc1 > . > According to "git diff release-1.12.2...release-1.12.3-rc1 '*.xml'", there > was only one external dependency change (snappy-java, which seems to be > properly reflected in the NOTICE file) > - the last CI run of the "release-1.12" branch is okay-ish: > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=17315=results > (this one failure occurred: > https://issues.apache.org/jira/browse/FLINK-20950) > > > On Tue, Apr 27, 2021 at 5:03 PM Dawid Wysakowicz > wrote: > > > +1 (binding) > > > > - Verified checksums and signatures > > - Reviewed the website PR > > - Built from sources > > - verified dependency version upgrades and updates in NOTICE files > > compared to 1.12.2 > > - started cluster and run WordCount example in BATCH mode and everything > > looked good > > > > On 23/04/2021 23:52, Arvid Heise wrote: > > > Hi everyone, > > > Please review and vote on the release candidate #1 for the version > > 1.12.3, > > > as follows: > > > [ ] +1, Approve the release > > > [ ] -1, Do not approve the release (please provide specific comments) > > > > > > > > > The complete staging area is available for your review, which includes: > > > * JIRA release notes [1], > > > * the official Apache source release and binary convenience releases to > > be > > > deployed to dist.apache.org [2], which are signed with the key with > > > fingerprint 476DAA5D1FF08189 [3], > > > * all artifacts to be deployed to the Maven Central Repository [4], > > > * source code tag "release-1.2.3-rc3" [5], > > > * website pull request listing the new release and adding announcement > > blog > > > post [6]. > > > > > > The vote will be open for at least 72 hours. It is adopted by majority > > > approval, with at least 3 PMC affirmative votes. > > > > > > Thanks, > > > Your friendly release manager Arvid > > > > > > [1] > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12349691 > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.12.3-rc1/ > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS > > > [4] > > https://repository.apache.org/content/repositories/orgapacheflink-1419 > > > [5] https://github.com/apache/flink/releases/tag/release-1.12.3-rc1 > > > [6] https://github.com/apache/flink-web/pull/437 > > > > > > > >
Re: [VOTE] Release 1.13.0, release candidate #2
@Caizhi and @Becket - let me reach out to you to jointly debug this issue. I am wondering if there is some incorrect reporting of failed events? On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng wrote: > -1 > > We're testing this version on batch jobs with large (600~1000) parallelisms > and the following exception messages appear with high frequency: > > 2021-04-27 21:27:26 > org.apache.flink.util.FlinkException: An OperatorEvent from an > OperatorCoordinator to a task was lost. Triggering task failover to ensure > consistency. Event: '[NoMoreSplitEvent]', targetTask: - > execution #0 > at > > org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) > at > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > at > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > at > > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > at > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) > at > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) > at > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > at > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > Becket Qin is investigating this issue. >
Re: [VOTE] Release 1.13.0, release candidate #2
-1 We're testing this version on batch jobs with large (600~1000) parallelisms and the following exception messages appear with high frequency: 2021-04-27 21:27:26 org.apache.flink.util.FlinkException: An OperatorEvent from an OperatorCoordinator to a task was lost. Triggering task failover to ensure consistency. Event: '[NoMoreSplitEvent]', targetTask: - execution #0 at org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Becket Qin is investigating this issue.