To be clearer, the main problem with that plan is that the join order is
bad. Broadcast vs shuffle is a secondary issue. The query doesn't look that
complex so with stats you should get a reasonable plan without hinting.

On 9 Feb. 2018 17:29, "Tim Armstrong" <tarmstr...@cloudera.com> wrote:

> Most of the intelligence in the planning process relies on having stats,
> including the BROADCAST/SHUFFLE join mode selection.
>
> If you compute stats you'll have a much better experience.
>
> On Fri, Feb 9, 2018 at 11:44 AM, Piyush Narang <p.nar...@criteo.com>
> wrote:
>
>> Actually, looking at this again, the hash join that is consuming 179GB is
>> supposed to be partitioned right? How would stats change that?
>>
>> I checked the query I kicked off and I have this there, “left outer join
>> /* +SHUFFLE */”. I think without it I end up with query failures.
>>
>>
>>
>> Is there something I’m missing?
>>
>>
>>
>> -- Piyush
>>
>>
>>
>>
>>
>> *From: *Tim Armstrong <tarmstr...@cloudera.com>
>> *Reply-To: *"user@impala.apache.org" <user@impala.apache.org>
>> *Date: *Friday, February 9, 2018 at 12:24 PM
>> *To: *"user@impala.apache.org" <user@impala.apache.org>
>> *Subject: *Re: Debugging Impala query that consistently hangs
>>
>>
>>
>> 07:HASH JOIN              1    0.000ns    0.000ns        0          -1
>> 179.72 GB        2.00 GB  LEFT OUTER JOIN, PARTITIONED
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Reply via email to