[jira] [Commented] (HIVE-16412) Hive on Tez incorrect partition pruning ANALYZE TABLE
[ https://issues.apache.org/jira/browse/HIVE-16412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970250#comment-15970250 ] Amir Shenavandeh commented on HIVE-16412: - An empty PruneExpression. is passed: 2017-04-16T00:59:35,372 TRACE [main] ppr.PartitionPruner: Started pruning partiton 2017-04-16T00:59:35,372 TRACE [main] ppr.PartitionPruner: dbname = default 2017-04-16T00:59:35,372 TRACE [main] ppr.PartitionPruner: tabname = ext_data_part 2017-04-16T00:59:35,372 TRACE [main] ppr.PartitionPruner: prune Expression = > Hive on Tez incorrect partition pruning ANALYZE TABLE > - > > Key: HIVE-16412 > URL: https://issues.apache.org/jira/browse/HIVE-16412 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.1.1 > Environment: Hadoop2.7.3, Hive 2.1.1, Tez 0.8.5 >Reporter: Amir Shenavandeh > Labels: Tez, hive, partition_pruner > > Hive on Tez, on partitioned tables ANALYZE TABLE T PARTITION (...) COMPUTE > STATISTICS; will gather stats for all partitions from metastore even though > partition spec only chooses a subset. Hive on MR runs efficiently. > For example: > --- > analyze table ext_data_part partition(a=9957) compute statistics noscan > --- > Will cause: > --- > 2017-04-09T22:25:30,332 DEBUG [main] metastore.MetaStoreDirectSql: Direct SQL > query in 12.30189ms + 0.037891ms, the query is [select "PARTITIONS"."PART_ID" > from "PARTITIONS" inner join "TBLS" on "PARTITIONS"."TBL_ID" = > "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join "DBS" on > "TBLS"."DB_ID" = "DBS"."DB_ID" > and "DBS"."NAME" = ? ] > --- > And: > 2017-03-02T16:54:08,104 DEBUG [main([])]: log.PerfLogger (:()) - method=TezCompiler start=1488473648104 end=1488473648104 duration=0 > from=org.apache.hadoop.hive.ql.parse.TezCompiler Setup dynamic partition > pruning> > 2017-03-02T16:54:08,104 DEBUG [main([])]: log.PerfLogger (:()) - method=TezCompiler from=org.apache.hadoop.hive.ql.parse.TezCompiler> > 2017-03-02T16:54:08,110 DEBUG [main([])]: log.PerfLogger (:()) - method=partition-retrieving > from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner> > 2017-03-02T16:54:08,153 DEBUG [main([])]: log.PerfLogger (:()) - method=partition-retrieving start=1488473648110 end=1488473648153 duration=43 > from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner> > --- > The stackTrace: > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2265) > - locked <0x0003de3798f0> (a > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler) > at com.sun.proxy.$Proxy21.listPartitions(Unknown Source) > at > org.apache.hadoop.hive.ql.metadata.Hive.getAllPartitionsOf(Hive.java:2301) > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getAllPartitions(PartitionPruner.java:454) > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getAllPartsFromCacheOrServer(PartitionPruner.java:236) > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:195) > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:144) > at > org.apache.hadoop.hive.ql.parse.ParseContext.getPrunedPartitions(ParseContext.java:511) > at > org.apache.hadoop.hive.ql.parse.ParseContext.getPrunedPartitions(ParseContext.java:504) > at > org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:121) > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) > at > org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143) > at > org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122) > at > org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78) > at > org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:259) > at > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:128) > at > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:134) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10947) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10526) > at >
[jira] [Commented] (HIVE-13976) UNION ALL which takes actual source table in one side failed
[ https://issues.apache.org/jira/browse/HIVE-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335834#comment-15335834 ] Amir Shenavandeh commented on HIVE-13976: - I should specifically add, in my case , this fails only if the data is stored in compressed files ( gzip , bzip2, etc ) . The same dataset , If uncompressed, the query runs perfectly. -- --- Amir H Shenavandeh EMail: shenavandeh {@} gmail {Dot} com > UNION ALL which takes actual source table in one side failed > > > Key: HIVE-13976 > URL: https://issues.apache.org/jira/browse/HIVE-13976 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0 > Environment: Ubuntu 12.04, JDK 7 >Reporter: Kai Sasaki > > UNION ALL must take actual source table in both side or none exclusively. > * UNION ALL with actual table in both side -> Succeed as expected > {code} > SELECT > 1 AS id, > 'Alice' AS name > FROM > table1 > UNION ALL > SELECT > 2 AS id, > 'Bob' AS name > FROM > table2 > {code} > * UNION ALL without actual table in both side -> Succeed as expected > {code} > SELECT > 1 AS id, > 'Alice' AS name > UNION ALL > SELECT > 2 AS id, > 'Bob' AS name > {code} > * UNION ALL with actual table on one side -> Failed > {code} > SELECT > 1 AS id, > 'Alice' AS name > UNION ALL > SELECT > 2 AS id, > 'Bob' AS name > FROM >some_table > {code} > The error message from map task of third case is this. > {code} > Diagnostic Messages for this Task: > Error: java.lang.IllegalArgumentException: Can not create a Path from an > empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:135) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.getPath(HiveInputFormat.java:116) > at org.apache.hadoop.mapred.MapTask.updateJobWithSplit(MapTask.java:458) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13976) UNION ALL which takes actual source table in one side failed
[ https://issues.apache.org/jira/browse/HIVE-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335828#comment-15335828 ] Amir Shenavandeh commented on HIVE-13976: - Hello, I meant , this query runs perfectly in hive version 2.0.0 and version 2.0.1. It fails on hive version 1.12.0 and below. Sorry for confusion. -- --- Amir H Shenavandeh EMail: shenavandeh {@} gmail {Dot} com > UNION ALL which takes actual source table in one side failed > > > Key: HIVE-13976 > URL: https://issues.apache.org/jira/browse/HIVE-13976 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0 > Environment: Ubuntu 12.04, JDK 7 >Reporter: Kai Sasaki > > UNION ALL must take actual source table in both side or none exclusively. > * UNION ALL with actual table in both side -> Succeed as expected > {code} > SELECT > 1 AS id, > 'Alice' AS name > FROM > table1 > UNION ALL > SELECT > 2 AS id, > 'Bob' AS name > FROM > table2 > {code} > * UNION ALL without actual table in both side -> Succeed as expected > {code} > SELECT > 1 AS id, > 'Alice' AS name > UNION ALL > SELECT > 2 AS id, > 'Bob' AS name > {code} > * UNION ALL with actual table on one side -> Failed > {code} > SELECT > 1 AS id, > 'Alice' AS name > UNION ALL > SELECT > 2 AS id, > 'Bob' AS name > FROM >some_table > {code} > The error message from map task of third case is this. > {code} > Diagnostic Messages for this Task: > Error: java.lang.IllegalArgumentException: Can not create a Path from an > empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:135) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.getPath(HiveInputFormat.java:116) > at org.apache.hadoop.mapred.MapTask.updateJobWithSplit(MapTask.java:458) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13976) UNION ALL which takes actual source table in one side failed
[ https://issues.apache.org/jira/browse/HIVE-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332899#comment-15332899 ] Amir Shenavandeh commented on HIVE-13976: - This seems to be fixed in HIVE-2 and above. > UNION ALL which takes actual source table in one side failed > > > Key: HIVE-13976 > URL: https://issues.apache.org/jira/browse/HIVE-13976 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0 > Environment: Ubuntu 12.04, JDK 7 >Reporter: Kai Sasaki > > UNION ALL must take actual source table in both side or none exclusively. > * UNION ALL with actual table in both side -> Succeed as expected > {code} > SELECT > 1 AS id, > 'Alice' AS name > FROM > table1 > UNION ALL > SELECT > 2 AS id, > 'Bob' AS name > FROM > table2 > {code} > * UNION ALL without actual table in both side -> Succeed as expected > {code} > SELECT > 1 AS id, > 'Alice' AS name > UNION ALL > SELECT > 2 AS id, > 'Bob' AS name > {code} > * UNION ALL with actual table on one side -> Failed > {code} > SELECT > 1 AS id, > 'Alice' AS name > UNION ALL > SELECT > 2 AS id, > 'Bob' AS name > FROM >some_table > {code} > The error message from map task of third case is this. > {code} > Diagnostic Messages for this Task: > Error: java.lang.IllegalArgumentException: Can not create a Path from an > empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:135) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.getPath(HiveInputFormat.java:116) > at org.apache.hadoop.mapred.MapTask.updateJobWithSplit(MapTask.java:458) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13976) UNION ALL which takes actual source table in one side failed
[ https://issues.apache.org/jira/browse/HIVE-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332899#comment-15332899 ] Amir Shenavandeh edited comment on HIVE-13976 at 6/16/16 2:10 AM: -- This seems to be fixed in HIVE-2 and above. Still interesting to see which patch has fixed it. was (Author: shenavandeh): This seems to be fixed in HIVE-2 and above. > UNION ALL which takes actual source table in one side failed > > > Key: HIVE-13976 > URL: https://issues.apache.org/jira/browse/HIVE-13976 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0 > Environment: Ubuntu 12.04, JDK 7 >Reporter: Kai Sasaki > > UNION ALL must take actual source table in both side or none exclusively. > * UNION ALL with actual table in both side -> Succeed as expected > {code} > SELECT > 1 AS id, > 'Alice' AS name > FROM > table1 > UNION ALL > SELECT > 2 AS id, > 'Bob' AS name > FROM > table2 > {code} > * UNION ALL without actual table in both side -> Succeed as expected > {code} > SELECT > 1 AS id, > 'Alice' AS name > UNION ALL > SELECT > 2 AS id, > 'Bob' AS name > {code} > * UNION ALL with actual table on one side -> Failed > {code} > SELECT > 1 AS id, > 'Alice' AS name > UNION ALL > SELECT > 2 AS id, > 'Bob' AS name > FROM >some_table > {code} > The error message from map task of third case is this. > {code} > Diagnostic Messages for this Task: > Error: java.lang.IllegalArgumentException: Can not create a Path from an > empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:135) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.getPath(HiveInputFormat.java:116) > at org.apache.hadoop.mapred.MapTask.updateJobWithSplit(MapTask.java:458) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)