For 1. Thank you for pointing out that property. I surely overlooked it. For 2. Will try out the other options. It seems the suggestion that best suits us ( we do not want to over engineer on the init container side
- configure *metrics.internal.query-service.port* property to some fixed port (e.g. *6666*) - modifying the docker entrypoint script to first configure *taskmanager.host* I think this is what you seem to refer to as a possible solution ? The headless service would generally imply a single service for each TM and that is not sustainable.. On Sat, Jan 26, 2019 at 1:37 PM Nagarjun Guraja <nagar...@gmail.com> wrote: > For 1. you need to setup high-availability.jobmanager.port as a predefined > port in your flink-conf.yaml and expose the port via job-manager-deployment > and job-manager-service resources as well. That should do the trick. > > For 2. I am not sure of the timelines, but there are a few decent/not > hacky workarounds to get around the problem, mentioned in the comments. > Feel free to pick one to unblock yourselves. > > Regards, > Nagarjun > > *Success is not final, failure is not fatal: it is the courage to continue > that counts. * > *- Winston Churchill - * > > > On Sat, Jan 26, 2019 at 5:39 AM Vishal Santoshi <vishal.santo...@gmail.com> > wrote: > >> There are we issues with 1.7.1 "job as a cluster" set up that I need >> guidance on >> >> 1. In HA set up, the TMs are not able to resolve the job manager's random >> port through the jobmanager.rpc.port >> <https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#jobmanager-rpc-port> >> setting. The setting does work in the non HA mode ( The containerPort >> /TCP with the same port facilitates that ), but then we loose the job if >> the JM was to reboot. This is a high priority for us and I am sure there is >> a work around but I rather ask the experts. >> >> 2. The metrics on JM are not visible possibly due to >> https://issues.apache.org/jira/browse/FLINK-11127 . It is an open issue >> and both a service per TM and stateful set approach appear non production >> ready (not scalable and kludgey ). Do you have a time line when these will >> be resolved. >> >> Thanks. >> >> Vishal >> >