Re: How Broadcast variable scale?.

2015-02-23 Thread Guillermo Ortiz
Thanks, I'll read that paper. We haven't tried with a cluster so big,
but it's suppose we should in the future and I was worried about it.
I'll comment something if you finally do, but it's not going to be
tomorrow :)

2015-02-23 17:38 GMT+01:00 Mosharaf Chowdhury mosharafka...@gmail.com:
 Hi Guillermo,

 The current broadcast algorithm in Spark approximates the one described in
 the Section 5 of this paper.
 It is expected to scale sub-linearly; i.e., O(log N), where N is the number
 of machines in your cluster.
 We evaluated up to 100 machines, and it does follow O(log N) scaling.

 Have you tried it on your 300-machine cluster? I'm curious to know what
 happened.

 -Mosharaf

 On Mon, Feb 23, 2015 at 8:06 AM, Guillermo Ortiz konstt2...@gmail.com
 wrote:

 I'm looking for about how scale broadcast variables in Spark and what
 algorithm uses.

 I have found
 http://www.cs.berkeley.edu/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf
 I don't know if they're talking about the current version (1.2.1)
 because the file was created in 2010.
 I took a look to the documentation and API and I read that there is an
 TorrentFactory for broadcast variable
  it's which it uses Spark right now? In the article they talk that
 Spark uses another one (Centralized HDFS Broadcast)

 How does it scale if I have a big cluster (about 300 nodes) the
 current algorithm?? is it linear? are there others options to choose
 others algorithms?

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How Broadcast variable scale?.

2015-02-23 Thread Mosharaf Chowdhury
Hi Guillermo,

The current broadcast algorithm in Spark approximates the one described in
the Section 5 of this paper
http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf.
It is expected to scale sub-linearly; i.e., O(log N), where N is the number
of machines in your cluster.
We evaluated up to 100 machines, and it does follow O(log N) scaling.

Have you tried it on your 300-machine cluster? I'm curious to know what
happened.

-Mosharaf

On Mon, Feb 23, 2015 at 8:06 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:

 I'm looking for about how scale broadcast variables in Spark and what
 algorithm uses.

 I have found
 http://www.cs.berkeley.edu/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf
 I don't know if they're talking about the current version (1.2.1)
 because the file was created in 2010.
 I took a look to the documentation and API and I read that there is an
 TorrentFactory for broadcast variable
  it's which it uses Spark right now? In the article they talk that
 Spark uses another one (Centralized HDFS Broadcast)

 How does it scale if I have a big cluster (about 300 nodes) the
 current algorithm?? is it linear? are there others options to choose
 others algorithms?

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




How Broadcast variable scale?.

2015-02-23 Thread Guillermo Ortiz
I'm looking for about how scale broadcast variables in Spark and what
algorithm uses.

I have found 
http://www.cs.berkeley.edu/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf
I don't know if they're talking about the current version (1.2.1)
because the file was created in 2010.
I took a look to the documentation and API and I read that there is an
TorrentFactory for broadcast variable
 it's which it uses Spark right now? In the article they talk that
Spark uses another one (Centralized HDFS Broadcast)

How does it scale if I have a big cluster (about 300 nodes) the
current algorithm?? is it linear? are there others options to choose
others algorithms?

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org