Re: Groupby in fast in Impala than spark sql - any suggestions

2017-03-28 Thread Ryan
and could you paste the stage and task information from SparkUI On Wed, Mar 29, 2017 at 11:30 AM, Ryan wrote: > how long does it take if you remove the repartition and just collect the > result? I don't think repartition is needed here. There's already a shuffle > for group by > > On Tue, Mar 28

Re: Groupby in fast in Impala than spark sql - any suggestions

2017-03-28 Thread Ryan
how long does it take if you remove the repartition and just collect the result? I don't think repartition is needed here. There's already a shuffle for group by On Tue, Mar 28, 2017 at 10:35 PM, KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Hi, > > I am working on requirement where i

Groupby in fast in Impala than spark sql - any suggestions

2017-03-28 Thread KhajaAsmath Mohammed
Hi, I am working on requirement where i need to join two tables and do group by to get max value on some fileds. Table1: 10 GB of data Table2: 96 GB of data Same query in Impala is taking around 20 miniutes and it took almost 3 hours to run in spark sql. I have added repartition to dataframe, p