AFAIK there is no such flag. You are more likely to get much higher gains if you upgrade to a more recent version of Impala.
https://www.slideshare.net/cloudera/performance-of-apache-impala Thanks Mostafa > On Feb 23, 2018, at 6:12 PM, Arya Goudarzi <gouda...@gmail.com> wrote: > > Hi Team, > > TL;DR; I am wondering if there is a way to instruct Impala to use shuffle by > default for all join queries as my research didn't end anywhere so far. > > We have a multi PiB cluster with hundreds of thousand of partitions. We are > using Impala 1.7 with HDFS. Due to our cluster size, compute_stats, and > compute_incremental_stats are not feasible for us as compute_stats seems a > heavy operation on a lot of our large tables and destabilizes the cluster, > and with compute_incremental_stats we hit IMPALA-2648. > > Therefore, to optimize our queries we need to add [shuffle] hint to the > queries with joins, and we have seen that this improves performance 3x on > simple tests because the system doesn't have to stream too much data and dump > it for broadcast join. > > We have a large team of analysts who are pushing tons of queries to the > system. It is hard to enforce policy at the moment for them to remember to > use shuffle hint so it doesn't take our system down. > > -- > Cheers, > -Arya