Hi, Marco, The root cause is NoResourceAvailableException. Could you provide the following information? - How many slots each TM has? - Your job's topology, it would also be good to share the job manager log.
Best, Yangze Guo On Tue, May 25, 2021 at 12:10 PM Marco Villalobos <mvillalo...@kineteque.com> wrote: > > I am running with one job manager and three task managers. > > Each task manager is receiving at most 8 gb of data, but the job is timing > out. > > What parameters must I adjust? > > Sink: back fill db sink) (15/32) (50626268d1f0d4c0833c5fa548863abd) switched > from SCHEDULED to FAILED on [unassigned resource]. > java.util.concurrent.CompletionException: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Slot request bulk is not fulfillable! Could not allocate the required slot > within slot request timeout > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > ~[?:1.8.0_282] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > ~[?:1.8.0_282] > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > ~[?:1.8.0_282] > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > ~[?:1.8.0_282] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_282] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_282] > at > org.apache.flink.runtime.scheduler.SharedSlot.cancelLogicalSlotRequest(SharedSlot.java:223) > ~[feature-LUM-3882-toledo--850a6747.jar:?] > at > org.apache.flink.runtime.scheduler.SlotSharingExecutionSlotAllocator.cancelLogicalSlotRequest(SlotSharingExecutionSlotAllocator.java:168) > ~[feature-LUM-3882-toledo--850a6747.jar:?] > at > org.apache.flink.runtime.scheduler.SharingPhysicalSlotRequestBulk.cancel(SharingPhysicalSlotRequestBulk.java:86) > ~[feature-LUM-3882-toledo--850a6747.jar:?] > at > org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkWithTimestamp.cancel(PhysicalSlotRequestBulkWithTimestamp.java:66) > ~[feature-LUM-3882-toledo--850a6747.jar:?] > at > org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:91) > ~[feature-LUM-3882-toledo--850a6747.jar:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_282] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_282] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:442) > ~[feature-LUM-3882-toledo--850a6747.jar:?] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:209) > ~[feature-LUM-3882-toledo--850a6747.jar:?] > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > ~[feature-LUM-3882-toledo--850a6747.jar:?] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:159) > ~[feature-LUM-3882-toledo--850a6747.jar:?] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > [feature-LUM-3882-toledo--850a6747.jar:?] > at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) > [feature-LUM-3882-toledo--850a6747.jar:?] > at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) > [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > [feature-LUM-3882-toledo--850a6747.jar:?] > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [feature-LUM-3882-toledo--850a6747.jar:?] > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) > [feature-LUM-3882-toledo--850a6747.jar:?] > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) > [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.actor.Actor.aroundReceive(Actor.scala:517) > [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.actor.Actor.aroundReceive$(Actor.scala:515) > [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > [feature-LUM-3882-toledo--850a6747.jar:?] > at > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > [feature-LUM-3882-toledo--850a6747.jar:?] > at > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > [feature-LUM-3882-toledo--850a6747.jar:?] > Caused by: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Slot request bulk is not fulfillable! Could not allocate the required slot > within slot request timeout > at > org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:86) > ~[feature-LUM-3882-toledo--850a6747.jar:?] > ... 26 more > Caused by: java.util.concurrent.TimeoutException: Timeout has occurred: > 300000 ms > at > org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0