subject:"Spark not doing a broadcast join inspite of the table being well below spark.sql.autoBroadcastJoinThreshold"

Re: Spark not doing a broadcast join inspite of the table being well below spark.sql.autoBroadcastJoinThreshold

2019-05-10 Thread V0lleyBallJunki3

So what I discovered was that if I write the table being joined to the disk and then read it again Spark correctly broadcasts it. I think it is because when Spark estimates the size of smaller table it estimates it incorrectly to be much bigger that what it is and hence decides to do a

Spark not doing a broadcast join inspite of the table being well below spark.sql.autoBroadcastJoinThreshold

2019-05-09 Thread V0lleyBallJunki3

I have a small table well below 50 MB that I want to broadcast join with a larger table. However, if I set spark.sql.autoBroadcastJoinThreshold to 100 MB spark still decides to do a SortMergeJoin instead of a broadcast join. I have to set an explicit broadcast hint on the table for it to do the