Sorry to ping my own thread, but has anyone else encountered this? -Joey
> On Jul 30, 2018, at 11:10 AM, Joey Echeverria <jechever...@splunk.com> wrote: > > I’m running Flink 1.5.0 in Kubernetes with HA enabled, but only a single Job > Manager running. I’m using Zookeeper to store the fencing/leader information > and S3 to store the job manager state. We’ve been running around 250 or so > streaming jobs and we’ve noticed that if the job manager pod is deleted, it > takes something like 20-45 minutes for the job manager’s REST endpoints and > web UI to become available. Until it becomes available, we get a 503 response > from the HTTP server with the message "Could not retrieve the redirect > address of the current leader. Please try to refresh.”. > > Has anyone else run into this? > > Are there any configuration settings I should be looking at to speed up the > availability of the HTTP endpoints? > > Thanks! > > -Joey