AFAIK there is no such flag. 
You are more likely to get much higher gains if you upgrade to a more recent 
version of Impala. 

https://www.slideshare.net/cloudera/performance-of-apache-impala

Thanks 
Mostafa

> On Feb 23, 2018, at 6:12 PM, Arya Goudarzi <gouda...@gmail.com> wrote:
> 
> Hi Team,
> 
> TL;DR; I am wondering if there is a way to instruct Impala to use shuffle by 
> default for all join queries as my research didn't end anywhere so far. 
> 
> We have a multi PiB cluster with hundreds of thousand of partitions. We are 
> using Impala 1.7 with HDFS. Due to our cluster size, compute_stats, and 
> compute_incremental_stats are not feasible for us as compute_stats seems a 
> heavy operation on a lot of our large tables and destabilizes the cluster, 
> and with compute_incremental_stats we hit IMPALA-2648.
> 
> Therefore, to optimize our queries we need to add [shuffle] hint to the 
> queries with joins, and we have seen that this improves performance 3x on 
> simple tests because the system doesn't have to stream too much data and dump 
> it for broadcast join. 
> 
> We have a large team of analysts who are pushing tons of queries to the 
> system. It is hard to enforce policy at the moment for them to remember to 
> use shuffle hint so it doesn't take our system down. 
> 
> -- 
> Cheers,
> -Arya

Reply via email to