Re: Delay in REST/UI readiness during JM recovery

Joey Echeverria Wed, 01 Aug 2018 12:40:13 -0700

Sorry to ping my own thread, but has anyone else encountered this?

-Joey


> On Jul 30, 2018, at 11:10 AM, Joey Echeverria <jechever...@splunk.com> wrote:
> 
> I’m running Flink 1.5.0 in Kubernetes with HA enabled, but only a single Job 
> Manager running. I’m using Zookeeper to store the fencing/leader information 
> and S3 to store the job manager state. We’ve been running around 250 or so 
> streaming jobs and we’ve noticed that if the job manager pod is deleted, it 
> takes something like 20-45 minutes for the job manager’s REST endpoints and 
> web UI to become available. Until it becomes available, we get a 503 response 
> from the HTTP server with the message "Could not retrieve the redirect 
> address of the current leader. Please try to refresh.”.
> 
> Has anyone else run into this?
> 
> Are there any configuration settings I should be looking at to speed up the 
> availability of the HTTP endpoints?
> 
> Thanks!
> 
> -Joey

Re: Delay in REST/UI readiness during JM recovery

Reply via email to