What's the query? On Tue, Jul 5, 2016 at 2:28 PM, Lalitha MV <lalitham...@gmail.com> wrote:
> It picks sort merge join, when spark.sql.autoBroadcastJoinThreshold is > set to -1, or when the size of the small table is more than spark.sql. > spark.sql.autoBroadcastJoinThreshold. > > On Mon, Jul 4, 2016 at 10:17 PM, Takeshi Yamamuro <linguin....@gmail.com> > wrote: > >> The join selection can be described in >> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L92 >> . >> If you have join keys, you can set -1 at >> `spark.sql.autoBroadcastJoinThreshold` to disable broadcast joins. Then, >> hash joins are used in queries. >> >> // maropu >> >> On Tue, Jul 5, 2016 at 4:23 AM, Lalitha MV <lalitham...@gmail.com> wrote: >> >>> Hi maropu, >>> >>> Thanks for your reply. >>> >>> Would it be possible to write a rule for this, to make it always pick >>> shuffle hash join, over other join implementations(i.e. sort merge and >>> broadcast)? >>> >>> Is there any documentation demonstrating rule based transformation for >>> physical plan trees? >>> >>> Thanks, >>> Lalitha >>> >>> On Sat, Jul 2, 2016 at 12:58 AM, Takeshi Yamamuro <linguin....@gmail.com >>> > wrote: >>> >>>> Hi, >>>> >>>> No, spark has no hint for the hash join. >>>> >>>> // maropu >>>> >>>> On Fri, Jul 1, 2016 at 4:56 PM, Lalitha MV <lalitham...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> In order to force broadcast hash join, we can set >>>>> the spark.sql.autoBroadcastJoinThreshold config. Is there a way to enforce >>>>> shuffle hash join in spark sql? >>>>> >>>>> >>>>> Thanks, >>>>> Lalitha >>>>> >>>> >>>> >>>> >>>> -- >>>> --- >>>> Takeshi Yamamuro >>>> >>> >>> >>> >>> -- >>> Regards, >>> Lalitha >>> >> >> >> >> -- >> --- >> Takeshi Yamamuro >> > > > > -- > Regards, > Lalitha > -- --- Takeshi Yamamuro