Re: Enforcing shuffle hash join

Takeshi Yamamuro Mon, 04 Jul 2016 22:30:38 -0700

What's the query?

On Tue, Jul 5, 2016 at 2:28 PM, Lalitha MV <lalitham...@gmail.com> wrote:


> It picks sort merge join, when spark.sql.autoBroadcastJoinThreshold is
> set to -1, or when the size of the small table is more than spark.sql.
> spark.sql.autoBroadcastJoinThreshold.
>
> On Mon, Jul 4, 2016 at 10:17 PM, Takeshi Yamamuro <linguin....@gmail.com>
> wrote:
>
>> The join selection can be described in
>> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L92
>> .
>> If you have join keys, you can set -1 at
>> `spark.sql.autoBroadcastJoinThreshold` to disable broadcast joins. Then,
>> hash joins are used in queries.
>>
>> // maropu
>>
>> On Tue, Jul 5, 2016 at 4:23 AM, Lalitha MV <lalitham...@gmail.com> wrote:
>>
>>> Hi maropu,
>>>
>>> Thanks for your reply.
>>>
>>> Would it be possible to write a rule for this, to make it always pick
>>> shuffle hash join, over other join implementations(i.e. sort merge and
>>> broadcast)?
>>>
>>> Is there any documentation demonstrating rule based transformation for
>>> physical plan trees?
>>>
>>> Thanks,
>>> Lalitha
>>>
>>> On Sat, Jul 2, 2016 at 12:58 AM, Takeshi Yamamuro <linguin....@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> No, spark has no hint for the hash join.
>>>>
>>>> // maropu
>>>>
>>>> On Fri, Jul 1, 2016 at 4:56 PM, Lalitha MV <lalitham...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> In order to force broadcast hash join, we can set
>>>>> the spark.sql.autoBroadcastJoinThreshold config. Is there a way to enforce
>>>>> shuffle hash join in spark sql?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Lalitha
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ---
>>>> Takeshi Yamamuro
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Lalitha
>>>
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>
>
> --
> Regards,
> Lalitha
>



-- 
---
Takeshi Yamamuro

Re: Enforcing shuffle hash join

Reply via email to