Could you please share the JobManager logs of failed deployment? It will also help a lot if you could show the pending pod status via "kubectl describe <pod_name>".
Given that the current Flink Kubernetes Operator is built on top of native K8s integration[1], the Flink ResourceManager should allocate enough TaskManager pods automatically. We need to find out what is wrong via the logs. Maybe the service account or taint or something else. [1]. https://flink.apache.org/2021/02/10/native-k8s-with-ha.html Best, Yang Matt Casters <matt.cast...@neotechnology.com> 于2022年6月24日周五 23:48写道: > Yes of-course. I already feel a bit less intelligent for having asked the > question ;-) > > The status now is that I managed to have it all puzzled together. Copying > the files from s3 to an ephemeral volume takes all of 2 seconds so it's > really not an issue. The cluster starts and our fat jar and Apache Hop > MainBeam class is found and started. > > The only thing that remains is figuring out how to configure the Flink > cluster itself. I have a couple of m5.large ec2 instances in a node group > on EKS and I set taskmanager.numberOfTaskSlots to "4". However, the tasks > in the pipeline can't seem to find resources to start. > > Caused by: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Slot request bulk is not fulfillable! Could not allocate the required slot > within slot request timeout > > Parallelism was set to 1 for the runner and there are only 2 tasks in my > first Beam pipeline so it should be simple enough but it just times out. > > Next step for me is to document the result which will end up on > hop.apache.org. I'll probably also want to demo this in Austin at the > upcoming Beam summit. > > Thanks a lot for your time and help so far! > > Cheers, > Matt > >