[jira] [Commented] (HIVE-16412) Hive on Tez incorrect partition pruning ANALYZE TABLE

2017-04-16 Thread Amir Shenavandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970250#comment-15970250
 ] 

Amir Shenavandeh commented on HIVE-16412:
-

An empty PruneExpression. is passed:

2017-04-16T00:59:35,372 TRACE [main] ppr.PartitionPruner: Started pruning 
partiton
2017-04-16T00:59:35,372 TRACE [main] ppr.PartitionPruner: dbname = default
2017-04-16T00:59:35,372 TRACE [main] ppr.PartitionPruner: tabname = 
ext_data_part
2017-04-16T00:59:35,372 TRACE [main] ppr.PartitionPruner: prune Expression = 


> Hive on Tez incorrect partition pruning ANALYZE TABLE
> -
>
> Key: HIVE-16412
> URL: https://issues.apache.org/jira/browse/HIVE-16412
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: Hadoop2.7.3, Hive 2.1.1, Tez 0.8.5
>Reporter: Amir Shenavandeh
>  Labels: Tez, hive, partition_pruner
>
> Hive on Tez, on partitioned tables ANALYZE TABLE T PARTITION (...) COMPUTE 
> STATISTICS; will gather stats for all partitions from metastore even though 
> partition spec only chooses a subset. Hive on MR runs efficiently. 
> For example:
> ---
> analyze table ext_data_part partition(a=9957) compute statistics noscan
> ---
> Will cause:
> ---
> 2017-04-09T22:25:30,332 DEBUG [main] metastore.MetaStoreDirectSql: Direct SQL 
> query in 12.30189ms + 0.037891ms, the query is [select "PARTITIONS"."PART_ID" 
> from "PARTITIONS"  inner join "TBLS" on "PARTITIONS"."TBL_ID" = 
> "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ?   inner join "DBS" on 
> "TBLS"."DB_ID" = "DBS"."DB_ID"  
> and "DBS"."NAME" = ? ]
> ---
> And:
> 2017-03-02T16:54:08,104 DEBUG [main([])]: log.PerfLogger (:()) -  method=TezCompiler start=1488473648104 end=1488473648104 duration=0 
> from=org.apache.hadoop.hive.ql.parse.TezCompiler Setup dynamic partition 
> pruning>
> 2017-03-02T16:54:08,104 DEBUG [main([])]: log.PerfLogger (:()) -  method=TezCompiler from=org.apache.hadoop.hive.ql.parse.TezCompiler>
> 2017-03-02T16:54:08,110 DEBUG [main([])]: log.PerfLogger (:()) -  method=partition-retrieving 
> from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>
> 2017-03-02T16:54:08,153 DEBUG [main([])]: log.PerfLogger (:()) -  method=partition-retrieving start=1488473648110 end=1488473648153 duration=43 
> from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>
> ---
> The stackTrace:
> at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2265)
>   - locked <0x0003de3798f0> (a 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler)
>   at com.sun.proxy.$Proxy21.listPartitions(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.getAllPartitionsOf(Hive.java:2301)
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getAllPartitions(PartitionPruner.java:454)
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getAllPartsFromCacheOrServer(PartitionPruner.java:236)
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:195)
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:144)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseContext.getPrunedPartitions(ParseContext.java:511)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseContext.getPrunedPartitions(ParseContext.java:504)
>   at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:121)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143)
>   at 
> org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122)
>   at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:259)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:128)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:134)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10947)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10526)
>   at 
> 

[jira] [Commented] (HIVE-13976) UNION ALL which takes actual source table in one side failed

2016-06-17 Thread Amir Shenavandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335834#comment-15335834
 ] 

Amir Shenavandeh commented on HIVE-13976:
-

I should specifically add, in my case , this fails only if the data is
stored in compressed files ( gzip , bzip2, etc ) . The same dataset , If
uncompressed, the query runs perfectly.




-- 
---
Amir H Shenavandeh
EMail: shenavandeh {@} gmail {Dot} com


> UNION ALL which takes actual source table in one side failed
> 
>
> Key: HIVE-13976
> URL: https://issues.apache.org/jira/browse/HIVE-13976
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
> Environment: Ubuntu 12.04, JDK 7
>Reporter: Kai Sasaki
>
> UNION ALL must take actual source table in both side or none exclusively.
> * UNION ALL with actual table in both side -> Succeed as expected
> {code}
> SELECT 
>   1 AS id,
>   'Alice' AS name
> FROM
>   table1
> UNION ALL 
> SELECT 
>   2 AS id,
>   'Bob' AS name
> FROM
>   table2
> {code}
> * UNION ALL without actual table in both side -> Succeed as expected
> {code}
> SELECT 
>   1 AS id,
>   'Alice' AS name
> UNION ALL 
> SELECT 
>   2 AS id,
>   'Bob' AS name
> {code}
> * UNION ALL with actual table on one side -> Failed
> {code}
> SELECT 
>   1 AS id,
>   'Alice' AS name
> UNION ALL 
> SELECT 
>   2 AS id,
>   'Bob' AS name
> FROM
>some_table
> {code}
> The error message from map task of third case is this.
> {code}
> Diagnostic Messages for this Task:
> Error: java.lang.IllegalArgumentException: Can not create a Path from an 
> empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:135)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.getPath(HiveInputFormat.java:116)
>   at org.apache.hadoop.mapred.MapTask.updateJobWithSplit(MapTask.java:458)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13976) UNION ALL which takes actual source table in one side failed

2016-06-17 Thread Amir Shenavandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335828#comment-15335828
 ] 

Amir Shenavandeh commented on HIVE-13976:
-

Hello,
I meant , this query runs perfectly in hive version 2.0.0 and version 2.0.1.
It fails on hive version 1.12.0 and below.

Sorry for confusion.





-- 
---
Amir H Shenavandeh
EMail: shenavandeh {@} gmail {Dot} com


> UNION ALL which takes actual source table in one side failed
> 
>
> Key: HIVE-13976
> URL: https://issues.apache.org/jira/browse/HIVE-13976
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
> Environment: Ubuntu 12.04, JDK 7
>Reporter: Kai Sasaki
>
> UNION ALL must take actual source table in both side or none exclusively.
> * UNION ALL with actual table in both side -> Succeed as expected
> {code}
> SELECT 
>   1 AS id,
>   'Alice' AS name
> FROM
>   table1
> UNION ALL 
> SELECT 
>   2 AS id,
>   'Bob' AS name
> FROM
>   table2
> {code}
> * UNION ALL without actual table in both side -> Succeed as expected
> {code}
> SELECT 
>   1 AS id,
>   'Alice' AS name
> UNION ALL 
> SELECT 
>   2 AS id,
>   'Bob' AS name
> {code}
> * UNION ALL with actual table on one side -> Failed
> {code}
> SELECT 
>   1 AS id,
>   'Alice' AS name
> UNION ALL 
> SELECT 
>   2 AS id,
>   'Bob' AS name
> FROM
>some_table
> {code}
> The error message from map task of third case is this.
> {code}
> Diagnostic Messages for this Task:
> Error: java.lang.IllegalArgumentException: Can not create a Path from an 
> empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:135)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.getPath(HiveInputFormat.java:116)
>   at org.apache.hadoop.mapred.MapTask.updateJobWithSplit(MapTask.java:458)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13976) UNION ALL which takes actual source table in one side failed

2016-06-15 Thread Amir Shenavandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332899#comment-15332899
 ] 

Amir Shenavandeh commented on HIVE-13976:
-

This seems to be fixed in HIVE-2 and above.


> UNION ALL which takes actual source table in one side failed
> 
>
> Key: HIVE-13976
> URL: https://issues.apache.org/jira/browse/HIVE-13976
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
> Environment: Ubuntu 12.04, JDK 7
>Reporter: Kai Sasaki
>
> UNION ALL must take actual source table in both side or none exclusively.
> * UNION ALL with actual table in both side -> Succeed as expected
> {code}
> SELECT 
>   1 AS id,
>   'Alice' AS name
> FROM
>   table1
> UNION ALL 
> SELECT 
>   2 AS id,
>   'Bob' AS name
> FROM
>   table2
> {code}
> * UNION ALL without actual table in both side -> Succeed as expected
> {code}
> SELECT 
>   1 AS id,
>   'Alice' AS name
> UNION ALL 
> SELECT 
>   2 AS id,
>   'Bob' AS name
> {code}
> * UNION ALL with actual table on one side -> Failed
> {code}
> SELECT 
>   1 AS id,
>   'Alice' AS name
> UNION ALL 
> SELECT 
>   2 AS id,
>   'Bob' AS name
> FROM
>some_table
> {code}
> The error message from map task of third case is this.
> {code}
> Diagnostic Messages for this Task:
> Error: java.lang.IllegalArgumentException: Can not create a Path from an 
> empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:135)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.getPath(HiveInputFormat.java:116)
>   at org.apache.hadoop.mapred.MapTask.updateJobWithSplit(MapTask.java:458)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13976) UNION ALL which takes actual source table in one side failed

2016-06-15 Thread Amir Shenavandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332899#comment-15332899
 ] 

Amir Shenavandeh edited comment on HIVE-13976 at 6/16/16 2:10 AM:
--

This seems to be fixed in HIVE-2 and above. Still interesting to see which 
patch has fixed it. 




was (Author: shenavandeh):
This seems to be fixed in HIVE-2 and above.


> UNION ALL which takes actual source table in one side failed
> 
>
> Key: HIVE-13976
> URL: https://issues.apache.org/jira/browse/HIVE-13976
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
> Environment: Ubuntu 12.04, JDK 7
>Reporter: Kai Sasaki
>
> UNION ALL must take actual source table in both side or none exclusively.
> * UNION ALL with actual table in both side -> Succeed as expected
> {code}
> SELECT 
>   1 AS id,
>   'Alice' AS name
> FROM
>   table1
> UNION ALL 
> SELECT 
>   2 AS id,
>   'Bob' AS name
> FROM
>   table2
> {code}
> * UNION ALL without actual table in both side -> Succeed as expected
> {code}
> SELECT 
>   1 AS id,
>   'Alice' AS name
> UNION ALL 
> SELECT 
>   2 AS id,
>   'Bob' AS name
> {code}
> * UNION ALL with actual table on one side -> Failed
> {code}
> SELECT 
>   1 AS id,
>   'Alice' AS name
> UNION ALL 
> SELECT 
>   2 AS id,
>   'Bob' AS name
> FROM
>some_table
> {code}
> The error message from map task of third case is this.
> {code}
> Diagnostic Messages for this Task:
> Error: java.lang.IllegalArgumentException: Can not create a Path from an 
> empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:135)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.getPath(HiveInputFormat.java:116)
>   at org.apache.hadoop.mapred.MapTask.updateJobWithSplit(MapTask.java:458)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)