bq. Failed to connect to master XXX:7077 Is the 'XXX' above the hostname for the new master ?
Thanks On Tue, Feb 2, 2016 at 1:48 AM, Anthony Tang <[email protected]> wrote: > Hi - > > I'm running Spark 1.5.2 in standalone mode with multiple masters using > zookeeper for failover. The master fails over correctly to the standby > when it goes down, and running applications seem to continue to run, but in > the new active master web UI, they are marked as "WAITING", and the workers > have these entries in their logs: > > 16/01/30 00:51:13 ERROR Worker: Connection to master failed! Waiting for > master to reconnect... > 16/01/30 00:51:13 WARN Worker: Failed to connect to master XXX:7077 > akka.actor.ActorNotFound: Actor not found for: > ActorSelection[Anchor(akka.tcp://sparkMaster@XXX:7077/), > Path(/user/Master)] > > Should they be "RUNNING" still? One time, it looked like the job stopped > functioning (This is a continuously running streaming job), but I haven't > been able to reproduce it. FWIW, the driver that started it is still > marked as "RUNNING". > > Thanks. > - Anthony >
