Hi Venkat, do you executors have that much amount of memory?
Regards, Gourav Sengupta On Tue, Oct 9, 2018 at 4:44 PM V0lleyBallJunki3 <[email protected]> wrote: > Hello, > I have set the value of spark.sql.autoBroadcastJoinThreshold to a very > high value of 20 GB. I am joining a table that I am sure is below this > variable, however spark is doing a SortMergeJoin. If I set a broadcast hint > then spark does a broadcast join and job finishes much faster. However, > when > run in production for some large tables, I run into errors. Is there a way > to see the actual size of the table being broadcast? I wrote the table > being > broadcast to disk and it took only 32 MB in parquet. I tried to cache this > table in Zeppelin and run a table.count() operation but nothing gets shown > on on the Storage tab of the Spark History Server. spark.util.SizeEstimator > doesn't seem to be giving accurate numbers for this table either. Any way > to > figure out the size of this table being broadcast? > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: [email protected] > >
