Re: Spark SQL, dataframe join questions.

2017-03-29 Thread vaquar khan
will shuffle, and following join COULD cause another shuffle. >> So I am not sure if it is a smart way. >> >> Yong >> >> -- >> *From:* shyla deshpande <deshpandesh...@gmail.com> >> *Sent:* Wednesday, March 29, 2017 12:33 PM

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread Vidya Sujeet
it is a smart way. > > Yong > > -- > *From:* shyla deshpande <deshpandesh...@gmail.com> > *Sent:* Wednesday, March 29, 2017 12:33 PM > *To:* user > *Subject:* Re: Spark SQL, dataframe join questions. > > > > On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande <deshpa

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread Yong Zhang
join COULD cause another shuffle. So I am not sure if it is a smart way. Yong From: shyla deshpande <deshpandesh...@gmail.com> Sent: Wednesday, March 29, 2017 12:33 PM To: user Subject: Re: Spark SQL, dataframe join questions. On Tue, Mar 28, 2017 at 2

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread shyla deshpande
On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande wrote: > Following are my questions. Thank you. > > 1. When joining dataframes is it a good idea to repartition on the key column > that is used in the join or > the optimizer is too smart so forget it. > > 2. In RDD

Re: dataframe join questions. Appreciate your input.

2017-03-29 Thread shyla deshpande
On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande wrote: > Following are my questions. Thank you. > > 1. When joining dataframes is it a good idea to repartition on the key column > that is used in the join or > the optimizer is too smart so forget it. > > 2. In RDD

dataframe join questions?

2017-03-28 Thread shyla deshpande
Following are my questions. Thank you. 1. When joining dataframes is it a good idea to repartition on the key column that is used in the join or the optimizer is too smart so forget it. 2. In RDD join, wherever possible we do reduceByKey before the join to avoid a big shuffle of data. Do we need