[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451321#comment-15451321 ] Lefty Leverenz commented on HIVE-14362: --- Doc note: EXPLAIN ANALYZE needs to be documented in the wiki for release 2.2.0. * [LanguageManual -- Explain | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain] Added a TODOC2.2 label. > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Affects Versions: 2.1.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, > HIVE-14362.03.patch, HIVE-14362.05.patch, HIVE-14362.06.patch, > compare_on_cluster.pdf > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450103#comment-15450103 ] Pengcheng Xiong commented on HIVE-14362: pushed to master. Thanks [~ashutoshc], [~gopalv] and [~gszadovszky] for the reviews! [~gopalv], i will open another jira to support abort stats. > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Affects Versions: 2.1.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.2.0 > > Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, > HIVE-14362.03.patch, HIVE-14362.05.patch, HIVE-14362.06.patch, > compare_on_cluster.pdf > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450022#comment-15450022 ] Hive QA commented on HIVE-14362: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12826218/HIVE-14362.06.patch {color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10472 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1044/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1044/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1044/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12826218 - PreCommit-HIVE-MASTER-Build > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, > HIVE-14362.03.patch, HIVE-14362.05.patch, HIVE-14362.06.patch, > compare_on_cluster.pdf > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447468#comment-15447468 ] Ashutosh Chauhan commented on HIVE-14362: - +1 > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, > HIVE-14362.03.patch, HIVE-14362.05.patch, compare_on_cluster.pdf > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15441398#comment-15441398 ] Hive QA commented on HIVE-14362: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12825703/HIVE-14362.05.patch {color:green}SUCCESS:{color} +1 due to 8 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10470 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters1] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_0] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1019/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1019/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1019/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12825703 - PreCommit-HIVE-MASTER-Build > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, > HIVE-14362.03.patch, HIVE-14362.05.patch, compare_on_cluster.pdf > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435150#comment-15435150 ] Pengcheng Xiong commented on HIVE-14362: Thanks [~gopalv] for the detailed performance analysis. I have addressed the local file and also vectorization issue. I still have some other small issue to address before i submit another patch. Thanks. > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, > compare_on_cluster.pdf > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435149#comment-15435149 ] Pengcheng Xiong commented on HIVE-14362: Thanks [~gopalv] for the detailed performance analysis. I have addressed the local file and also vectorization issue. I still have some other small issue to address before i submit another patch. Thanks. > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, > compare_on_cluster.pdf > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433811#comment-15433811 ] Gopal V commented on HIVE-14362: bq. I assume that your major concern is performance difference, rather than functional, right? [~pxiong]: Ran through my perf tests last night and this patch is nearly free - because there's no branch or virtual calls, the incq instruction doesn't have a CPU stall associated with it and is pretty much running with no additional perf impact (nothing measurable). No perf concerns for this impl - the closeOp() is not a hot function, so it doesn't matter unless a user is explicitly running "explain analyze". > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, > compare_on_cluster.pdf > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432092#comment-15432092 ] Pengcheng Xiong commented on HIVE-14362: Hi [~gopalv], it works for q tests see explain analyze 1-5. And i really tested it on the cluster and it worked for some simple queries. Thanks for finding this out and I agree with you. I think we really need to change that to HDFS temp folder somewhere. We can improve that anyway. Btw, I assume that your major concern is performance difference, rather than functional, right? Thanks. :) > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, > compare_on_cluster.pdf > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432091#comment-15432091 ] Pengcheng Xiong commented on HIVE-14362: Hi [~gopalv], it works for q tests see explain analyze 1-5. And i really tested it on the cluster and it worked for some simple queries. Thanks for finding this out and I agree with you. I think we really need to change that to HDFS temp folder somewhere. We can improve that anyway. Btw, I assume that your major concern is performance difference, rather than functional, right? Thanks. :) > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, > compare_on_cluster.pdf > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432018#comment-15432018 ] Gopal V commented on HIVE-14362: The local path was traced down to - config.setExplainRootPath(ctx.getLocalTmpPath()); in SemanticAnalyzer. The path for collecting stats has to be in the Hive Session dir on HDFS, I'll try to patch this tomorrow and try running again. > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, > compare_on_cluster.pdf > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431939#comment-15431939 ] Gopal V commented on HIVE-14362: [~pxiong]: tested this patch - running explain analyze seems to disable vectorization for all queries after that point. {code} + HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED, false); {code} And explain analyze does not actually work. {code} 2016-08-23T01:13:10,961 INFO [667a4e5f-6194-438f-85d6-339aca3ebecc main] physical.AnnotateRunTimeStatsOptimizer: setRuntimeStatsDir for RS_8 2016-08-23T01:13:10,962 INFO [667a4e5f-6194-438f-85d6-339aca3ebecc main] fs.FSStatsPublisher: created : file:/tmp/gopal/667a4e5f-6194-438f-85d6-339aca3ebecc/hive_2016-08-23_01-13-10_705_7555853843090786759-1/-local-1/RS_8 {code} The paths for output are in local dirs, not the HDFS dirs - so the stats written on a machine are not making their way back to the HiveServer2 box. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30002]: StatsPublisher cannot be connected to.There was a error while connecting to the StatsPublisher, and retrying might help. If you dont want the query to fail because accurate statistics could not be collected, set hive.stats.reliable=false at org.apache.hadoop.hive.ql.exec.Operator.publishRunTimeStats(Operator.java:1444) at org.apache.hadoop.hive.ql.exec.Operator.closeOp(Operator.java:723) at org.apache.hadoop.hive.ql.exec.TableScanOperator.closeOp(TableScanOperator.java:270) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:691) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:705) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:433) {code} > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, > compare_on_cluster.pdf > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429347#comment-15429347 ] Hive QA commented on HIVE-14362: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12824671/HIVE-14362.02.patch {color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 10476 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_partlvl] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_partlvl_dp] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_quoting] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_tbllvl] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[compute_stats_date] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constant_prop_2] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[display_colstats_tbllvl] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dynpart_sort_optimization_acid] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exec_parallel_column_stats] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[temp_table_display_colstats_tbllvl] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1] org.apache.hadoop.hive.ql.exec.TestExplainTask.testExplainDoesSortMapValues org.apache.hadoop.hive.ql.exec.TestExplainTask.testExplainDoesSortPathAsStrings org.apache.hadoop.hive.ql.exec.TestExplainTask.testExplainDoesSortTopLevelMapEntries org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/946/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/946/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-946/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 24 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12824671 - PreCommit-HIVE-MASTER-Build > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, > compare_on_cluster.pdf > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429171#comment-15429171 ] Gopal V commented on HIVE-14362: Thanks, [~pxiong] for running the benchmarks, I'll add this to Monday's build. > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, > compare_on_cluster.pdf > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429128#comment-15429128 ] Pengcheng Xiong commented on HIVE-14362: [~gopalv], the patch is ready. Could u please let us know if you are satisfied with the performance comparison results? Thanks. > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, > compare_on_cluster.pdf > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419783#comment-15419783 ] Hive QA commented on HIVE-14362: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12823510/HIVE-14362.01.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 234 failed/errored test(s), 10471 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join32] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_smb_mapjoin_14] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_13] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_1] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_3] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_4] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_5] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_6] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_7] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_8] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_9] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_1] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_2] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_3] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_4] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_5] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_6] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_7] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_8] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketizedhiveinputformat_auto] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_2] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_6] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_7] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_8] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explainanalyze_0] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explainanalyze_1] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explainanalyze_2] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explainanalyze_3] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explainanalyze_4] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_filters] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_nulls] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_nullsafe] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_join] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_join_partition_key] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin9] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_11] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_12] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_13] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_14] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_15] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_16] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_17] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_1] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_2] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_3] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_4] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_5] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_6] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_7] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smblimit] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sort_merge_join_desc_1] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sort_merge_join_desc_2] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sort_merge_join_desc_3] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sort_merge_join_desc_5] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sort_merge_join_desc_8] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_auto_smb_mapjoin_14]
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419249#comment-15419249 ] Gopal V commented on HIVE-14362: [~pxiong]: this approach was abandoned earlier due to known performance issues - https://issues.apache.org/jira/browse/HIVE-4318?focusedCommentId=13629957=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13629957 > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14362.01.patch > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14362) Support explain analyze in Hive
[ https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419242#comment-15419242 ] Pengcheng Xiong commented on HIVE-14362: ccing [~gopalv], will do a performance test soon. > Support explain analyze in Hive > --- > > Key: HIVE-14362 > URL: https://issues.apache.org/jira/browse/HIVE-14362 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14362.01.patch > > > Right now all the explain levels only support stats before query runs. We > would like to have an explain analyze similar to Postgres for real stats > after query runs. This will help to identify the major gap between > estimated/real stats and make not only query optimization better but also > query performance debugging easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)