So what I discovered was that if I write the table being joined to the disk
and then read it again Spark correctly broadcasts it. I think it is because
when Spark estimates the size of smaller table it estimates it incorrectly
to be much bigger that what it is and hence decides to do a
I have a small table well below 50 MB that I want to broadcast join with a
larger table. However, if I set spark.sql.autoBroadcastJoinThreshold to 100
MB spark still decides to do a SortMergeJoin instead of a broadcast join. I
have to set an explicit broadcast hint on the table for it to do the