Hi,

I am trying something like this..

val sesDS:  Dataset[XXX] = hiveContext.sql(select).as[XXX]

The select statement is something like this : "select * from sometable ....
DISTRIBUTE by col1, col2, col3"

Then comes groupByKey...

val gpbyDS = sesDS .groupByKey(x => (x.col1, x.col2, x.col3))

As my select is already Distribute the data based on columns which are same
as what I used in groupByKey, Why does groupByKey  still doing the shuffle
? Is this an issue or I am missing something ?

Regards,
Dibyendu

Reply via email to