And both worked.. I should have said - modifying the docker entrypoint script to first configure *taskmanager.host* using the status.podIP as an override or in flink-conf.yaml before the process is launched through the entry script.
Thank you all. On Sat, Jan 26, 2019 at 4:11 PM Vishal Santoshi <vishal.santo...@gmail.com> wrote: > For 1. Thank you for pointing out that property. I surely overlooked it. > For 2. Will try out the other options. It seems the suggestion that best > suits us ( we do not want to over engineer on the init container side > > - configure *metrics.internal.query-service.port* property to some > fixed port (e.g. *6666*) > - modifying the docker entrypoint script to first configure > *taskmanager.host* > > > I think this is what you seem to refer to as a possible solution ? > > The headless service would generally imply a single service for each TM > and that is not sustainable.. > > > > > > On Sat, Jan 26, 2019 at 1:37 PM Nagarjun Guraja <nagar...@gmail.com> > wrote: > >> For 1. you need to setup high-availability.jobmanager.port as a >> predefined port in your flink-conf.yaml and expose the port via >> job-manager-deployment and job-manager-service resources as well. That >> should do the trick. >> >> For 2. I am not sure of the timelines, but there are a few decent/not >> hacky workarounds to get around the problem, mentioned in the comments. >> Feel free to pick one to unblock yourselves. >> >> Regards, >> Nagarjun >> >> *Success is not final, failure is not fatal: it is the courage to >> continue that counts. * >> *- Winston Churchill - * >> >> >> On Sat, Jan 26, 2019 at 5:39 AM Vishal Santoshi < >> vishal.santo...@gmail.com> wrote: >> >>> There are we issues with 1.7.1 "job as a cluster" set up that I need >>> guidance on >>> >>> 1. In HA set up, the TMs are not able to resolve the job manager's >>> random port through the jobmanager.rpc.port >>> <https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#jobmanager-rpc-port> >>> setting. The setting does work in the non HA mode ( The containerPort >>> /TCP with the same port facilitates that ), but then we loose the job if >>> the JM was to reboot. This is a high priority for us and I am sure there is >>> a work around but I rather ask the experts. >>> >>> 2. The metrics on JM are not visible possibly due to >>> https://issues.apache.org/jira/browse/FLINK-11127 . It is an open issue >>> and both a service per TM and stateful set approach appear non production >>> ready (not scalable and kludgey ). Do you have a time line when these will >>> be resolved. >>> >>> Thanks. >>> >>> Vishal >>> >>