Thanks, I'll read that paper. We haven't tried with a cluster so big, but it's suppose we should in the future and I was worried about it. I'll comment something if you finally do, but it's not going to be tomorrow :)
2015-02-23 17:38 GMT+01:00 Mosharaf Chowdhury <mosharafka...@gmail.com>: > Hi Guillermo, > > The current broadcast algorithm in Spark approximates the one described in > the Section 5 of this paper. > It is expected to scale sub-linearly; i.e., O(log N), where N is the number > of machines in your cluster. > We evaluated up to 100 machines, and it does follow O(log N) scaling. > > Have you tried it on your 300-machine cluster? I'm curious to know what > happened. > > -Mosharaf > > On Mon, Feb 23, 2015 at 8:06 AM, Guillermo Ortiz <konstt2...@gmail.com> > wrote: >> >> I'm looking for about how scale broadcast variables in Spark and what >> algorithm uses. >> >> I have found >> http://www.cs.berkeley.edu/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf >> I don't know if they're talking about the current version (1.2.1) >> because the file was created in 2010. >> I took a look to the documentation and API and I read that there is an >> TorrentFactory for broadcast variable >> it's which it uses Spark right now? In the article they talk that >> Spark uses another one (Centralized HDFS Broadcast) >> >> How does it scale if I have a big cluster (about 300 nodes) the >> current algorithm?? is it linear? are there others options to choose >> others algorithms? >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org