AFAIK there is no such flag.
You are more likely to get much higher gains if you upgrade to a more recent
version of Impala.
> On Feb 23, 2018, at 6:12 PM, Arya Goudarzi <gouda...@gmail.com> wrote:
> Hi Team,
> TL;DR; I am wondering if there is a way to instruct Impala to use shuffle by
> default for all join queries as my research didn't end anywhere so far.
> We have a multi PiB cluster with hundreds of thousand of partitions. We are
> using Impala 1.7 with HDFS. Due to our cluster size, compute_stats, and
> compute_incremental_stats are not feasible for us as compute_stats seems a
> heavy operation on a lot of our large tables and destabilizes the cluster,
> and with compute_incremental_stats we hit IMPALA-2648.
> Therefore, to optimize our queries we need to add [shuffle] hint to the
> queries with joins, and we have seen that this improves performance 3x on
> simple tests because the system doesn't have to stream too much data and dump
> it for broadcast join.
> We have a large team of analysts who are pushing tons of queries to the
> system. It is hard to enforce policy at the moment for them to remember to
> use shuffle hint so it doesn't take our system down.