Thank you Mostafa. My bad on mentioning the wrong version. We are using 2.7
and not 1.7. We have upgrade in our plans and actually waiting for Impala
2.12 as it has IMPALA-5058 fixes.

On Fri, Feb 23, 2018 at 6:18 PM, Mostafa Mokhtar <>

> AFAIK there is no such flag.
> You are more likely to get much higher gains if you upgrade to a more
> recent version of Impala.
> Thanks
> Mostafa
> On Feb 23, 2018, at 6:12 PM, Arya Goudarzi <> wrote:
> Hi Team,
> TL;DR; I am wondering if there is a way to instruct Impala to use shuffle
> by default for all join queries as my research didn't end anywhere so far.
> We have a multi PiB cluster with hundreds of thousand of partitions. We
> are using Impala 1.7 with HDFS. Due to our cluster size, compute_stats, and
> compute_incremental_stats are not feasible for us as compute_stats seems a
> heavy operation on a lot of our large tables and destabilizes the cluster,
> and with compute_incremental_stats we hit IMPALA-2648
> <>.
> Therefore, to optimize our queries we need to add [shuffle] hint to the
> queries with joins, and we have seen that this improves performance 3x on
> simple tests because the system doesn't have to stream too much data and
> dump it for broadcast join.
> We have a large team of analysts who are pushing tons of queries to the
> system. It is hard to enforce policy at the moment for them to remember to
> use shuffle hint so it doesn't take our system down.
> --
> Cheers,
> -Arya


Reply via email to