Re: NoResourceAvailableException on taskmanager(s)

Yangze Guo Thu, 04 Nov 2021 03:22:24 -0700

Hi, Deniz.

The exception implies that there are not enough slots in your
standalone cluster. You need to increase the
`taskmanager.numberOfTaskSlots` or the `numberOfTaskManagers`.
You can search the related log "Received resource requirements from
job" in jobManager, which indicates how many slots your job needs.


Best,
Yangze Guo

On Thu, Nov 4, 2021 at 5:58 PM Deniz Koçak <lend...@gmail.com> wrote:
>
> Hi,
>
> We have been running our job on flink image
> 1.13.2-stream1-scala_2.12-java11. It's a standalone deployment on a
> Kubernetes cluster (EKS on AWS which uses EC2 nodes as hosts and also
> depends on a auto-scaler to adjust the resources cluster wide). After
> a few mins. (5-20) we see the exception below on taskmanager(s). The
> job quite busy so we see backpressure on some tasks, though wasn't
> expecting such a problem under heavy load (we are ok with slow
> processing and backlog). Neither restarting the task or increasing the
> resources solved the issue. We always get the the exception below
> after a period of time which makes the job unstable.
>
> ---------------------------------------------------------------------------------------------------
> java.util.concurrent.CompletionException:
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Could not acquire the minimum required resources.
> at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown
> Source)
> at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown
> Source)
> at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown
> Source)
> at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown 
> Source)
> at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
> Source)
> at 
> org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolBridge$PendingRequest.failRequest(DeclarativeSlotPoolBridge.java:535)
> ....
> Caused by: 
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Could not acquire the minimum required resources.
> ---------------------------------------------------------------------------------------------------
>
> We tried different configs. in terms of cpu/mem allocated for the task
> managers in Flink configuration. We tried more cpu & mem. after
> realized the problem though none of the increases actually solved the
> problem. Part of the config we have is below.
>
> taskmanager.numberOfTaskSlots: '4'
>       kubernetes:
>         pods:
>           affinity: null
>           annotations:
>             prometheus.io.port: '9249'
>             prometheus.io.scrape: 'true'
>           labels: {}
>           nodeSelector: {}
>           securityContext: null
>       logging:
>         log4jLoggers:
>           '': INFO
>         loggingProfile: default
>       numberOfTaskManagers: 2
>       parallelism: 8
>       resources:
>         jobmanager:
>           cpu: 2
>           memory: 2G
>         taskmanager:
>           cpu: 2
>           memory: 8G
>
>
> Please find the attached the configuration file we use at the moment.
>
> Thanks,

Re: NoResourceAvailableException on taskmanager(s)

Reply via email to