Hi, Deniz. The exception implies that there are not enough slots in your standalone cluster. You need to increase the `taskmanager.numberOfTaskSlots` or the `numberOfTaskManagers`. You can search the related log "Received resource requirements from job" in jobManager, which indicates how many slots your job needs.
Best, Yangze Guo On Thu, Nov 4, 2021 at 5:58 PM Deniz Koçak <lend...@gmail.com> wrote: > > Hi, > > We have been running our job on flink image > 1.13.2-stream1-scala_2.12-java11. It's a standalone deployment on a > Kubernetes cluster (EKS on AWS which uses EC2 nodes as hosts and also > depends on a auto-scaler to adjust the resources cluster wide). After > a few mins. (5-20) we see the exception below on taskmanager(s). The > job quite busy so we see backpressure on some tasks, though wasn't > expecting such a problem under heavy load (we are ok with slow > processing and backlog). Neither restarting the task or increasing the > resources solved the issue. We always get the the exception below > after a period of time which makes the job unstable. > > --------------------------------------------------------------------------------------------------- > java.util.concurrent.CompletionException: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not acquire the minimum required resources. > at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown > Source) > at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown > Source) > at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown > Source) > at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown > Source) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown > Source) > at > org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolBridge$PendingRequest.failRequest(DeclarativeSlotPoolBridge.java:535) > .... > Caused by: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not acquire the minimum required resources. > --------------------------------------------------------------------------------------------------- > > We tried different configs. in terms of cpu/mem allocated for the task > managers in Flink configuration. We tried more cpu & mem. after > realized the problem though none of the increases actually solved the > problem. Part of the config we have is below. > > taskmanager.numberOfTaskSlots: '4' > kubernetes: > pods: > affinity: null > annotations: > prometheus.io.port: '9249' > prometheus.io.scrape: 'true' > labels: {} > nodeSelector: {} > securityContext: null > logging: > log4jLoggers: > '': INFO > loggingProfile: default > numberOfTaskManagers: 2 > parallelism: 8 > resources: > jobmanager: > cpu: 2 > memory: 2G > taskmanager: > cpu: 2 > memory: 8G > > > Please find the attached the configuration file we use at the moment. > > Thanks,