and could you paste the stage and task information from SparkUI
On Wed, Mar 29, 2017 at 11:30 AM, Ryan wrote:
> how long does it take if you remove the repartition and just collect the
> result? I don't think repartition is needed here. There's already a shuffle
> for group by
>
> On Tue, Mar 28
how long does it take if you remove the repartition and just collect the
result? I don't think repartition is needed here. There's already a shuffle
for group by
On Tue, Mar 28, 2017 at 10:35 PM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:
> Hi,
>
> I am working on requirement where i
Hi,
I am working on requirement where i need to join two tables and do group by
to get max value on some fileds.
Table1: 10 GB of data
Table2: 96 GB of data
Same query in Impala is taking around 20 miniutes and it took almost 3
hours to run in spark sql.
I have added repartition to dataframe, p