Is there any way to make this url more dynamic, so a case such as you described where I would need to add new node wouldn't require recompilation? For example, by using a dns record or haproxy or some other software? On Feb 23, 2014 3:51 AM, "Aaron Davidson" <ilike...@gmail.com> wrote:
> The current way of solving this problem is to list all three masters as > your master url; e.g.,: > spark://host1:port1,host2:port2,host3:port3 > > This will try all three in parallel and use whichever one is currently the > master. This should work as long as you don't have to introduce a new node > as a backup master (due to one of the others failing permanently) -- in > that case, you'd have to update the master URL to include the new node in > case it is elected leader for all *newly created* clients/workers. Old > clients are ambivalent to the coming and goings of masters, as any new > master will reconnect to all old clients and workers. > > > > On Sat, Feb 22, 2014 at 4:12 PM, Matan Shukry <matanshu...@gmail.com>wrote: > >> Lately I started messing around with hadoop and spark. >> >> I noticed spark can leverage zookeeper in order to create >> multiple "secondaries" masters. >> >> I was wondering however, how one may implement the client >> in such situation? >> >> that is, what should the spark master URL be for a spark client >> application? >> >> Let's say for example, I have 10 nodes, and 3 of them (1/3/5) are >> masters. >> I don't want to put either one of the masters url, since they may be >> brought down. >> >> so, which master URL do I use? or rather, how do I use one url >> which will change when a new master is chosen? >> >> Note: >> I know I can simply have a list of masters, use try/catch to see which >> one fails, and try other ones - I was hoping for something "better", in >> performance context, and more dynamic as well. >> >> Yours, Jones. >> > >