Re: How Broadcast variable scale?.
Thanks, I'll read that paper. We haven't tried with a cluster so big, but it's suppose we should in the future and I was worried about it. I'll comment something if you finally do, but it's not going to be tomorrow :) 2015-02-23 17:38 GMT+01:00 Mosharaf Chowdhury mosharafka...@gmail.com: Hi Guillermo, The current broadcast algorithm in Spark approximates the one described in the Section 5 of this paper. It is expected to scale sub-linearly; i.e., O(log N), where N is the number of machines in your cluster. We evaluated up to 100 machines, and it does follow O(log N) scaling. Have you tried it on your 300-machine cluster? I'm curious to know what happened. -Mosharaf On Mon, Feb 23, 2015 at 8:06 AM, Guillermo Ortiz konstt2...@gmail.com wrote: I'm looking for about how scale broadcast variables in Spark and what algorithm uses. I have found http://www.cs.berkeley.edu/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf I don't know if they're talking about the current version (1.2.1) because the file was created in 2010. I took a look to the documentation and API and I read that there is an TorrentFactory for broadcast variable it's which it uses Spark right now? In the article they talk that Spark uses another one (Centralized HDFS Broadcast) How does it scale if I have a big cluster (about 300 nodes) the current algorithm?? is it linear? are there others options to choose others algorithms? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How Broadcast variable scale?.
Hi Guillermo, The current broadcast algorithm in Spark approximates the one described in the Section 5 of this paper http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf. It is expected to scale sub-linearly; i.e., O(log N), where N is the number of machines in your cluster. We evaluated up to 100 machines, and it does follow O(log N) scaling. Have you tried it on your 300-machine cluster? I'm curious to know what happened. -Mosharaf On Mon, Feb 23, 2015 at 8:06 AM, Guillermo Ortiz konstt2...@gmail.com wrote: I'm looking for about how scale broadcast variables in Spark and what algorithm uses. I have found http://www.cs.berkeley.edu/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf I don't know if they're talking about the current version (1.2.1) because the file was created in 2010. I took a look to the documentation and API and I read that there is an TorrentFactory for broadcast variable it's which it uses Spark right now? In the article they talk that Spark uses another one (Centralized HDFS Broadcast) How does it scale if I have a big cluster (about 300 nodes) the current algorithm?? is it linear? are there others options to choose others algorithms? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
How Broadcast variable scale?.
I'm looking for about how scale broadcast variables in Spark and what algorithm uses. I have found http://www.cs.berkeley.edu/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf I don't know if they're talking about the current version (1.2.1) because the file was created in 2010. I took a look to the documentation and API and I read that there is an TorrentFactory for broadcast variable it's which it uses Spark right now? In the article they talk that Spark uses another one (Centralized HDFS Broadcast) How does it scale if I have a big cluster (about 300 nodes) the current algorithm?? is it linear? are there others options to choose others algorithms? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org