Hi Romi, I've observed this many times as well. So much so that on some clusters I restart the workers every night in order to maintain these worker -> master connections.
I couldn't find an open SPARK ticket on it so filed https://issues.apache.org/jira/browse/SPARK-3736 with you and Piotr mentioned. Please discuss on that ticket what you think the proper fix should be! Cheers, Andrew On Mon, Sep 29, 2014 at 4:36 AM, Romi Kuntsman <r...@totango.com> wrote: > Hi all, > > Regarding a post here a few months ago > > http://apache-spark-user-list.1001560.n3.nabble.com/Workers-disconnected-from-master-sometimes-and-never-reconnect-back-tp6240.html > > Is there an answer to this? > I saw workers being still active and not reconnecting after they lost > connection to the master. Using Spark 1.1.0. > > What if a master server is restarted, should worker retry to register on > it? > > Greetings, > > -- > *Romi Kuntsman*, *Big Data Engineer* > http://www.totango.com > ​Join the Customer Success Manifesto <http://youtu.be/XvFi2Wh6wgU> >