Hi Dongwon Thanks for the presentation! Very insightful. I just filed a bug for query72. Hive’s CBO seems to be selecting wrong join order. https://issues.apache.org/jira/browse/HIVE-15771
In the following link you can find a rewrite for the query which gives much better runtime (in my testing I was able to run in 130s on 1TB scale on 6 node LLAP cluster). I also disabled the date filter that turns into NULL > NULL + 5 expression in the queries. Ideally, we want CBO to pick up join order in the rewritten query (which should be fixed with HIVE-15771). https://transfer.sh/E0XfV/q72.zip The above link contains - original query - modified query - the explain output (txt and svg) for original and modified Thanks for reporting the issue and I hope this helps. Thanks Prasanth On Jan 30, 2017, at 9:39 PM, 김동원 <eastcirc...@gmail.com<mailto:eastcirc...@gmail.com>> wrote: gopal : In the attached gopal.tar.gz, I put two svg images and two text files after rerunning query72 with and without the following inequation: and d3.d_date > d1.d_date + 5 FYI, I already did Hive experiments with and without the inequation because Presto doesn't allow it at the time of query submission, but Hive's running times are not that different. <to_gopal.tar.gz> - Dongwon 2017. 1. 31. 오후 12:48, Gopal Vijayaraghavan <gop...@apache.org<mailto:gop...@apache.org>> 작성: Gopal : (yarn logs -application $APPID) doesn't contain a line containing HISTORY so it doesn't produce svg file. Should I turn on some option to get the lines containing HISTORY in yarn application log? There's a config option tez.am.log.level=INFO which controls who much data is written to the log there. I think there's an interval type clause in the 72 query, which might be a problem. and d3.d_date > d1.d_date + 5 That might be doing UDFToDouble(d_date) > UDFToDouble(d_date) + 5, which will evaluate into NULL > NULL + 5 Because UDFToDouble("1997-01-01") is NULL. So, seeing your explain would go a long way in finding out what's going on. The swimlane raw data is also somewhat interesting to me, because I also draw a differen t set of graphs from the same HISTORY data. http://people.apache.org/~gopalv/q21_suppliers_who_kept_orders_waiting.svg to locate bottlenecks in the system. Cheers, Gopal