Re: Debugging Impala query that consistently hangs

2018-02-09 Thread Tim Armstrong
I suspect it's busy building the hash tables in the join with id=7. If you drill down into the profile I suspect you'll see a bunch of time spent there. The top-level time counter isn't necessarily updated live for the time spent building the hash tables, but the fact it's using 179GB of memory is

Re: Debugging Impala query that consistently hangs

2018-02-09 Thread Piyush Narang
Thanks Tim. I had issues running compute stats on some of our tables (calling alter table on Hive was failing and I wasn’t able to resolve it) and I think this was one of them. I’ll try switching over to a shuffle join and see if that helps. -- Piyush From: Tim Armstrong Reply-To: "user@impa

Re: Debugging Impala query that consistently hangs

2018-02-09 Thread Piyush Narang
Actually, looking at this again, the hash join that is consuming 179GB is supposed to be partitioned right? How would stats change that? I checked the query I kicked off and I have this there, “left outer join /* +SHUFFLE */”. I think without it I end up with query failures. Is there something I

Re: Debugging Impala query that consistently hangs

2018-02-09 Thread Tim Armstrong
Most of the intelligence in the planning process relies on having stats, including the BROADCAST/SHUFFLE join mode selection. If you compute stats you'll have a much better experience. On Fri, Feb 9, 2018 at 11:44 AM, Piyush Narang wrote: > Actually, looking at this again, the hash join that is

Re: Debugging Impala query that consistently hangs

2018-02-09 Thread Tim Armstrong
To be clearer, the main problem with that plan is that the join order is bad. Broadcast vs shuffle is a secondary issue. The query doesn't look that complex so with stats you should get a reasonable plan without hinting. On 9 Feb. 2018 17:29, "Tim Armstrong" wrote: > Most of the intelligence in