will shuffle, and following join COULD cause another shuffle.
>> So I am not sure if it is a smart way.
>>
>> Yong
>>
>> --
>> *From:* shyla deshpande <deshpandesh...@gmail.com>
>> *Sent:* Wednesday, March 29, 2017 12:33 PM
it is a smart way.
>
> Yong
>
> --
> *From:* shyla deshpande <deshpandesh...@gmail.com>
> *Sent:* Wednesday, March 29, 2017 12:33 PM
> *To:* user
> *Subject:* Re: Spark SQL, dataframe join questions.
>
>
>
> On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande <deshpa
join COULD cause another shuffle. So I
am not sure if it is a smart way.
Yong
From: shyla deshpande <deshpandesh...@gmail.com>
Sent: Wednesday, March 29, 2017 12:33 PM
To: user
Subject: Re: Spark SQL, dataframe join questions.
On Tue, Mar 28, 2017 at 2
On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande
wrote:
> Following are my questions. Thank you.
>
> 1. When joining dataframes is it a good idea to repartition on the key column
> that is used in the join or
> the optimizer is too smart so forget it.
>
> 2. In RDD
On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande
wrote:
> Following are my questions. Thank you.
>
> 1. When joining dataframes is it a good idea to repartition on the key column
> that is used in the join or
> the optimizer is too smart so forget it.
>
> 2. In RDD
Following are my questions. Thank you.
1. When joining dataframes is it a good idea to repartition on the key
column that is used in the join or
the optimizer is too smart so forget it.
2. In RDD join, wherever possible we do reduceByKey before the join to
avoid a big shuffle of data. Do we need