If you're using pyspark, beware that there are some known issues associated
with large broadcast variables.

https://spark-project.atlassian.net/browse/SPARK-1065
http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/browser

-Brad


On Wed, Mar 12, 2014 at 10:15 AM, Guillaume Pitel <
guillaume.pi...@exensa.com> wrote:

>  From my experience, it shouldn't be a problem since 0.8.1 (before that,
> the akka FrameSize was a limit).
> I've broadcast arrays of max 1.4GB so far
>
> Keep in mind that it will be stored in spark.local.dir so you must have
> room on the disk.
>
> Guillaume
>
>   Hi,
>
>  I asked a similar question a while ago, didn't get any answers.
>
>  I'd like to share a 10 gb double array between 50 to 100 workers. The
> physical memory of workers is over 40 gb, so it can fit in each memory. The
> reason I'm sharing this array is that a cartesian operation is applied to
> this array, and I want to avoid network shuffling.
>
>  1. Is Spark broadcast built for pushing variables of gb size? Does it
> need special configurations (eg akka config, etc) to work under this
> condition?
>
>  2. (Not directly related to spark) Is the an upper limit for scala/java
> arrays other than the physical memory? Do they stop working when the array
> elements count exceeds a certain number?
>
>
>
> --
>    [image: eXenSa]
>  *Guillaume PITEL, Président*
> +33(0)6 25 48 86 80
>
> eXenSa S.A.S. <http://www.exensa.com/>
>  41, rue Périer - 92120 Montrouge - FRANCE
> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
>

<<inline: exensa_logo_mail.png>>

Reply via email to