Thanks, I'll read that paper. We haven't tried with a cluster so big,
but it's suppose we should in the future and I was worried about it.
I'll comment something if you finally do, but it's not going to be
tomorrow :)

2015-02-23 17:38 GMT+01:00 Mosharaf Chowdhury <mosharafka...@gmail.com>:
> Hi Guillermo,
>
> The current broadcast algorithm in Spark approximates the one described in
> the Section 5 of this paper.
> It is expected to scale sub-linearly; i.e., O(log N), where N is the number
> of machines in your cluster.
> We evaluated up to 100 machines, and it does follow O(log N) scaling.
>
> Have you tried it on your 300-machine cluster? I'm curious to know what
> happened.
>
> -Mosharaf
>
> On Mon, Feb 23, 2015 at 8:06 AM, Guillermo Ortiz <konstt2...@gmail.com>
> wrote:
>>
>> I'm looking for about how scale broadcast variables in Spark and what
>> algorithm uses.
>>
>> I have found
>> http://www.cs.berkeley.edu/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf
>> I don't know if they're talking about the current version (1.2.1)
>> because the file was created in 2010.
>> I took a look to the documentation and API and I read that there is an
>> TorrentFactory for broadcast variable
>>  it's which it uses Spark right now? In the article they talk that
>> Spark uses another one (Centralized HDFS Broadcast)
>>
>> How does it scale if I have a big cluster (about 300 nodes) the
>> current algorithm?? is it linear? are there others options to choose
>> others algorithms?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to