Re: Any way to see the size of the broadcast variable?

Gourav Sengupta Tue, 09 Oct 2018 09:13:09 -0700

Hi Venkat,

do you executors have that much amount of memory?


Regards,
Gourav Sengupta

On Tue, Oct 9, 2018 at 4:44 PM V0lleyBallJunki3 <[email protected]>
wrote:

> Hello,
>    I have set the value of spark.sql.autoBroadcastJoinThreshold to a very
> high value of 20 GB. I am joining a table that I am sure is below this
> variable, however spark is doing a SortMergeJoin. If I set a broadcast hint
> then spark does a broadcast join and job finishes much faster. However,
> when
> run in production for some large tables, I run into errors. Is there a way
> to see the actual size of the table being broadcast? I wrote the table
> being
> broadcast to disk and it took only 32 MB in parquet. I tried to cache this
> table in Zeppelin and run a table.count() operation but nothing gets shown
> on on the Storage tab of the Spark History Server. spark.util.SizeEstimator
> doesn't seem to be giving accurate numbers for this table either. Any way
> to
> figure out the size of this table being broadcast?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>

Re: Any way to see the size of the broadcast variable?

Reply via email to