Re: Unable to broadcast a very large variable

2019-04-12 Thread Dillon Dukek
That's fine. The other points that I mentioned still apply.

On Thu, Apr 11, 2019 at 4:52 PM V0lleyBallJunki3 
wrote:

> I am not using pyspark. The job is written in Scala
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Unable to broadcast a very large variable

2019-04-11 Thread V0lleyBallJunki3
I am not using pyspark. The job is written in Scala



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Unable to broadcast a very large variable

2019-04-10 Thread Siddharth Reddy
unsubscribe


Re: Unable to broadcast a very large variable

2019-04-10 Thread Dillon Dukek
You will probably need to do a couple of things. One, you will need to
probably increase the "spark.sql.broadcastTimeout" setting. As well, when
you broadcast a variable it gets replicated once per executor not once per
machine so you will need to increase your executor size and allow more
cores to run per executor. Depending on if you are using pyspark or not,
you will also need to remember that if you are trying to use this large
variable in a python process (RDD functions, UDFs, etc) that that variable
will be transferred to python memory space once per python process that
gets spawned which means that you could ultimately end up with many more
copies of that variable in memory at any given point in time than you may
have intended.

On Wed, Apr 10, 2019 at 9:40 AM V0lleyBallJunki3 
wrote:

> I am using spark.sparkContext.broadcast() to broadcast. Is this even true
> if
> the memory on our machines is 244 Gb a 70 Gb variable can't be broadcasted
> even with high network speed?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Unable to broadcast a very large variable

2019-04-10 Thread V0lleyBallJunki3
I am using spark.sparkContext.broadcast() to broadcast. Is this even true if
the memory on our machines is 244 Gb a 70 Gb variable can't be broadcasted
even with high network speed?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Unable to broadcast a very large variable

2019-04-10 Thread Ashic Mahtab
Default is 10mb. Depends on memory available, and what the network transfer 
effects are going to be. You can specify spark.sql.autoBroadcastJoinThreshold 
to increase the threshold in case of spark sql. But you definitely shouldn't be 
broadcasting gigabytes.

From: V0lleyBallJunki3 
Sent: 10 April 2019 10:06
To: user@spark.apache.org
Subject: Unable to broadcast a very large variable

Hello,
   I have a 110 node cluster with each executor having 50 GB memory and I
want to broadcast a variable of 70GB with each machine have 244 GB of
memory. I am having difficulty doing that. I was wondering at what size is
it unwise to broadcast a variable. Is there a general rule of thumb?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Unable to broadcast a very large variable

2019-04-10 Thread V0lleyBallJunki3
Hello,
   I have a 110 node cluster with each executor having 50 GB memory and I
want to broadcast a variable of 70GB with each machine have 244 GB of
memory. I am having difficulty doing that. I was wondering at what size is
it unwise to broadcast a variable. Is there a general rule of thumb?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org