Could you please share the JobManager logs of failed deployment? It will
also help a lot if you could show the pending pod status via "kubectl
describe <pod_name>".

Given that the current Flink Kubernetes Operator is built on top of native
K8s integration[1], the Flink ResourceManager should allocate enough
TaskManager pods automatically.
We need to find out what is wrong via the logs. Maybe the service account
or taint or something else.


[1]. https://flink.apache.org/2021/02/10/native-k8s-with-ha.html


Best,
Yang

Matt Casters <matt.cast...@neotechnology.com> 于2022年6月24日周五 23:48写道:

> Yes of-course.  I already feel a bit less intelligent for having asked the
> question ;-)
>
> The status now is that I managed to have it all puzzled together.  Copying
> the files from s3 to an ephemeral volume takes all of 2 seconds so it's
> really not an issue.  The cluster starts and our fat jar and Apache Hop
> MainBeam class is found and started.
>
> The only thing that remains is figuring out how to configure the Flink
> cluster itself.  I have a couple of m5.large ec2 instances in a node group
> on EKS and I set taskmanager.numberOfTaskSlots to "4".  However, the tasks
> in the pipeline can't seem to find resources to start.
>
> Caused by:
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Slot request bulk is not fulfillable! Could not allocate the required slot
> within slot request timeout
>
> Parallelism was set to 1 for the runner and there are only 2 tasks in my
> first Beam pipeline so it should be simple enough but it just times out.
>
> Next step for me is to document the result which will end up on
> hop.apache.org.   I'll probably also want to demo this in Austin at the
> upcoming Beam summit.
>
> Thanks a lot for your time and help so far!
>
> Cheers,
> Matt
>
>

Reply via email to