The problem was a misconfiguration of the initContainer which would copy my artifacts from s3 to an ephemeral volume. This caused the task manager to get started for a bit and then to be shut down. It was hard to get logging about this since the pods were gone before I could get logging from it. I chalk all that up to just me lacking a bit of experience with k8s.
That being said... It's all working now and I documented the deployment over here: https://hop.apache.org/manual/next/pipeline/beam/flink-k8s-operator-running-hop-pipeline.html A big thank you to everyone that helped me out! Cheers, Matt On Mon, Jun 27, 2022 at 4:59 AM Yang Wang <danrtsey...@gmail.com> wrote: > Could you please share the JobManager logs of failed deployment? It will > also help a lot if you could show the pending pod status via "kubectl > describe <pod_name>". > > Given that the current Flink Kubernetes Operator is built on top of native > K8s integration[1], the Flink ResourceManager should allocate enough > TaskManager pods automatically. > We need to find out what is wrong via the logs. Maybe the service account > or taint or something else. > > > [1]. https://flink.apache.org/2021/02/10/native-k8s-with-ha.html > > > Best, > Yang > > Matt Casters <matt.cast...@neotechnology.com> 于2022年6月24日周五 23:48写道: > >> Yes of-course. I already feel a bit less intelligent for having asked >> the question ;-) >> >> The status now is that I managed to have it all puzzled together. >> Copying the files from s3 to an ephemeral volume takes all of 2 seconds so >> it's really not an issue. The cluster starts and our fat jar and Apache >> Hop MainBeam class is found and started. >> >> The only thing that remains is figuring out how to configure the Flink >> cluster itself. I have a couple of m5.large ec2 instances in a node group >> on EKS and I set taskmanager.numberOfTaskSlots to "4". However, the tasks >> in the pipeline can't seem to find resources to start. >> >> Caused by: >> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: >> Slot request bulk is not fulfillable! Could not allocate the required slot >> within slot request timeout >> >> Parallelism was set to 1 for the runner and there are only 2 tasks in my >> first Beam pipeline so it should be simple enough but it just times out. >> >> Next step for me is to document the result which will end up on >> hop.apache.org. I'll probably also want to demo this in Austin at the >> upcoming Beam summit. >> >> Thanks a lot for your time and help so far! >> >> Cheers, >> Matt >> >>