Re: Which strategy is used for broadcast variables?

2015-03-11 Thread Mosharaf Chowdhury
up to 100 machines, and it does follow O(log N) scaling. -- Mosharaf Chowdhury http://www.mosharaf.com/ On Wed, Mar 11, 2015 at 3:11 PM, Tom Hubregtsen wrote: > Thanks Mosharaf, for the quick response! Can you maybe give me some > pointers to an explanation of this strategy? Or elaborate a

RE: Which strategy is used for broadcast variables?

2015-03-11 Thread Mosharaf Chowdhury
ubject: Which strategy is used for broadcast variables? In "Performance and Scalability of Broadcast in Spark" by Mosharaf Chowdhury I read that Spark uses HDFS for its broadcast variables. This seems highly inefficient. In the same paper alternatives are proposed, among which "Bittore

Re: How Broadcast variable scale?.

2015-02-23 Thread Mosharaf Chowdhury
Hi Guillermo, The current broadcast algorithm in Spark approximates the one described in the Section 5 of this paper . It is expected to scale sub-linearly; i.e., O(log N), where N is the number of machines in your cluster. We eva

Re: Running the BroadcastTest.scala with TorrentBroadcastFactory in a standalone cluster

2014-07-03 Thread Mosharaf Chowdhury
t (>=1GB) or too many nodes (many 10s or 100s). Hope it helps, Mosharaf -- Mosharaf Chowdhury http://www.mosharaf.com/ On Thu, Jul 3, 2014 at 7:48 AM, jackxucs wrote: > Hello, > > I am running the BroadcastTest example in a standalone cluster using > spark-submit. I have 8 hos

Re: sync master with slaves with bittorrent?

2014-05-19 Thread Mosharaf Chowdhury
Good catch. In that case, using BitTornado/murder would be better. -- Mosharaf Chowdhury http://www.mosharaf.com/ On Mon, May 19, 2014 at 11:17 AM, Aaron Davidson wrote: > On the ec2 machines, you can update the slaves from the master using > something like "~/spark-ec2/copy-

Re: sync master with slaves with bittorrent?

2014-05-18 Thread Mosharaf Chowdhury
from the master, one can simply rsync from the master first to one slave; then use the two sources (master and the first slave) to rsync to two more; then four and so on. Might be a simpler solution without many changes. -- Mosharaf Chowdhury http://www.mosharaf.com/ On Sun, May 18, 2014 at 11