Re: Spark High Availability

Matan Shukry Sat, 22 Feb 2014 19:12:30 -0800

Is there any way to make this url more dynamic, so a case such as you
described where I would need to add new node wouldn't require
recompilation? For example, by using a dns record or haproxy or some other
software?
On Feb 23, 2014 3:51 AM, "Aaron Davidson" <ilike...@gmail.com> wrote:


> The current way of solving this problem is to list all three masters as
> your master url; e.g.,:
> spark://host1:port1,host2:port2,host3:port3
>
> This will try all three in parallel and use whichever one is currently the
> master. This should work as long as you don't have to introduce a new node
> as a backup master (due to one of the others failing permanently) -- in
> that case, you'd have to update the master URL to include the new node in
> case it is elected leader for all *newly created* clients/workers. Old
> clients are ambivalent to the coming and goings of masters, as any new
> master will reconnect to all old clients and workers.
>
>
>
> On Sat, Feb 22, 2014 at 4:12 PM, Matan Shukry <matanshu...@gmail.com>wrote:
>
>> Lately I started messing around with hadoop and spark.
>>
>> I noticed spark can leverage zookeeper in order to create
>> multiple "secondaries" masters.
>>
>> I was wondering however, how one may implement the client
>> in such situation?
>>
>> that is, what should the spark master URL be for a spark client
>> application?
>>
>>  Let's say for example, I have 10 nodes, and 3 of them (1/3/5) are
>> masters.
>> I don't want to put either one of the masters url, since they may be
>> brought down.
>>
>> so, which master URL do I use? or rather, how do I use one url
>> which will change when a new master is chosen?
>>
>> Note:
>> I know I can simply have a list of masters, use try/catch to see which
>> one fails, and try other ones - I was hoping for something "better", in
>> performance context, and more dynamic as well.
>>
>> Yours, Jones.
>>
>
>

Re: Spark High Availability

Reply via email to