Maybe this improvement could help. It's available since Impala 2.9.
https://issues.apache.org/jira/browse/IMPALA-5381



On Fri, Feb 23, 2018 at 6:40 PM, Arya Goudarzi <gouda...@gmail.com> wrote:

> Thank you Mostafa. My bad on mentioning the wrong version. We are using
> 2.7 and not 1.7. We have upgrade in our plans and actually waiting for
> Impala 2.12 as it has IMPALA-5058 fixes.
>
> On Fri, Feb 23, 2018 at 6:18 PM, Mostafa Mokhtar <mmokh...@cloudera.com>
> wrote:
>
>> AFAIK there is no such flag.
>> You are more likely to get much higher gains if you upgrade to a more
>> recent version of Impala.
>>
>> https://www.slideshare.net/cloudera/performance-of-apache-impala
>>
>> Thanks
>> Mostafa
>>
>> On Feb 23, 2018, at 6:12 PM, Arya Goudarzi <gouda...@gmail.com> wrote:
>>
>> Hi Team,
>>
>> TL;DR; I am wondering if there is a way to instruct Impala to use
>> shuffle by default for all join queries as my research didn't end anywhere
>> so far.
>>
>> We have a multi PiB cluster with hundreds of thousand of partitions. We
>> are using Impala 1.7 with HDFS. Due to our cluster size, compute_stats, and
>> compute_incremental_stats are not feasible for us as compute_stats seems a
>> heavy operation on a lot of our large tables and destabilizes the cluster,
>> and with compute_incremental_stats we hit IMPALA-2648
>> <https://issues.apache.org/jira/browse/IMPALA-2648>.
>>
>> Therefore, to optimize our queries we need to add [shuffle] hint to the
>> queries with joins, and we have seen that this improves performance 3x on
>> simple tests because the system doesn't have to stream too much data and
>> dump it for broadcast join.
>>
>> We have a large team of analysts who are pushing tons of queries to the
>> system. It is hard to enforce policy at the moment for them to remember to
>> use shuffle hint so it doesn't take our system down.
>>
>> --
>> Cheers,
>> -Arya
>>
>>
>
>
> --
> Cheers,
> -Arya
>

Reply via email to