Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-12 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review91427
---

Ship it!


Ship It!

- chengxiang li


On 七月 8, 2015, 6:04 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated 七月 8, 2015, 6:04 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 27f68df 
   itests/src/test/resources/testconfiguration.properties 4f2de12 
   ql/if/queryplan.thrift c8dfa35 
   
 ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  e18f935 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java f58a10b 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 ca0ffb6 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 2ff3951 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
  8546d21 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java a7cf8b7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 ad47547 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 7992c88 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 7f2c079 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 3217df2 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9e9a2a2 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
 363e49e 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
 PRE-CREATION 
   
 ql/src/test/queries/clientpositive/spark_vectorized_dynamic_partition_pruning.q
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/spark_vectorized_dynamic_partition_pruning.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34666/diff/
 
 
 Testing
 ---
 
 spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both 
 are clone from tez's test.
 
 
 Thanks,
 
 Chao Sun
 




Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-08 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/
---

(Updated July 8, 2015, 6:04 p.m.)


Review request for hive, chengxiang li and Xuefu Zhang.


Bugs: HIVE-9152
https://issues.apache.org/jira/browse/HIVE-9152


Repository: hive-git


Description
---

Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
optimization and we should implement the same in HOS.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 27f68df 
  itests/src/test/resources/testconfiguration.properties 4f2de12 
  ql/if/queryplan.thrift c8dfa35 
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
 e18f935 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java f58a10b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
21398d8 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java ca0ffb6 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 2ff3951 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
 8546d21 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java a7cf8b7 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
ad47547 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
447f104 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 7992c88 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java 
f7586a4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 7f2c079 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 3217df2 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9e9a2a2 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
  ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
PRE-CREATION 
  
ql/src/test/queries/clientpositive/spark_vectorized_dynamic_partition_pruning.q 
PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out
 PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/spark_vectorized_dynamic_partition_pruning.q.out
 PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out
 PRE-CREATION 

Diff: https://reviews.apache.org/r/34666/diff/


Testing
---

spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both 
are clone from tez's test.


Thanks,

Chao Sun



Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-07 Thread Chao Sun


 On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java,
   line 92
  https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line92
 
  Can we still get conflicts in the file name?
 
 Chao Sun wrote:
 It shouldn't - I think work ID and Random#nextInt() should both be 
 unique, right?
 
 Xuefu Zhang wrote:
 Random.nextint() doesn't gives uniqueness. If targetWorkID/sourceWorkID 
 gives you uniqueness, then you don't need a random number, right? If 
 targetWorkID/sourceWorkID doesn't give uniqueness, then adding a random 
 number doesn't help much.

Yes targetWorkID/sourceWorkID should be unique, but it could have multiple 
tasks from a single work, and if we don't have the random number, their results 
may overwrite each other. We also did the same thing for the hash table sink in 
Spark, and we haven't seen any issue with that.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
---


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated July 3, 2015, 10:45 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
  8546d21 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java
  4803959 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
 363e49e 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
 e95d2ab 
   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out 
 e38ccf8 
   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
   
 ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out
  PRE-CREATION 
   
 

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-07 Thread Chao Sun


 On July 6, 2015, 10:26 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java,
   line 77
  https://reviews.apache.org/r/34666/diff/2/?file=999023#file999023line77
 
  I guess I don't know enough to comment on this, but looking at 
  VectorReduceSinkOperator and VectorAppMasterEventOperator I can see some 
  prominent differences:
  
  1. first batch detection and processing there
  2. VectorizedSerde logic here
  
  Probably a live review will help.

Chatted with Xuefu offline and we've cleared some doubts here.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90593
---


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated July 3, 2015, 10:45 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
  8546d21 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java
  4803959 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
 363e49e 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
 e95d2ab 
   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out 
 e38ccf8 
   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
   
 ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
   

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-07 Thread Chao Sun


 On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
  ql/if/queryplan.thrift, line 60
  https://reviews.apache.org/r/34666/diff/1/?file=971689#file971689line60
 
  I'm not sure if it matters, but it's probably better if we add it as 
  the last.
 
 Xuefu Zhang wrote:
 It's still needed to move to the last as other also pointed out.

OK, fixed. Sorry I forgot last time.


 On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java, line 
  177
  https://reviews.apache.org/r/34666/diff/1/?file=971700#file971700line177
 
  Any chance that an op might be visited multiple times?
 
 Chao Sun wrote:
 It shouldn't - it'a tree traversing and every operator should only be 
 added once.
 
 Xuefu Zhang wrote:
 Actually there could be a diamond shape in the operator graph such as 
 that formed by demux and mux operators. Join operator is another example. We 
 should use graph traverse instead of tree traverse.

Yes you're right. However, the only usage for this is in SplitOpTreeForDPP, in 
which we pass a set as parameter. So it should only contain unique operators.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review85230
---


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated July 3, 2015, 10:45 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
  8546d21 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java
  4803959 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
 363e49e 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
 e95d2ab 
   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out 
 e38ccf8 
   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
   

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-07 Thread Chao Sun


 On July 2, 2015, 6:36 a.m., chengxiang li wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java,
   line 59
  https://reviews.apache.org/r/34666/diff/1/?file=971706#file971706line59
 
  The statistic data shoud be quite unaccurate after filter and group, as 
  it's computered based on estimation during compile time. I think threshold 
  verification on unaccurate data should be unacceptable as that means the 
  threshold may not work at all.
  We may check this threshold in SparkPartitionPruningSinkOperator at 
  runtime.
 
 Chao Sun wrote:
 Switching to runtime would be very different - here we want to check this 
 threshold, and avoid generating the pruning task if possible.
 How inaccurate the stats would be? I'm fine if it's always more 
 conservative.
 
 chengxiang li wrote:
 Take FilterOperator for example, the worst case is, it may just half the 
 input rows as its statistic, you can find the rule for FilterOperator at 
 FilterStatsRule, so it's a bad news that estimated statistics is not always 
 conservative, this would make the threshold does not work as expected 
 sometimes. You may create a followup work for this if it changes a lot.

OK, that makes sense. I think we can address this issue in the follow-up JIRA.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90197
---


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated July 3, 2015, 10:45 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
  8546d21 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java
  4803959 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
 363e49e 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
 e95d2ab 
   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out 
 e38ccf8 
   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
   

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-07 Thread Chao Sun


 On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java,
   line 92
  https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line92
 
  Can we still get conflicts in the file name?
 
 Chao Sun wrote:
 It shouldn't - I think work ID and Random#nextInt() should both be 
 unique, right?
 
 Xuefu Zhang wrote:
 Random.nextint() doesn't gives uniqueness. If targetWorkID/sourceWorkID 
 gives you uniqueness, then you don't need a random number, right? If 
 targetWorkID/sourceWorkID doesn't give uniqueness, then adding a random 
 number doesn't help much.
 
 Chao Sun wrote:
 Yes targetWorkID/sourceWorkID should be unique, but it could have 
 multiple tasks from a single work, and if we don't have the random number, 
 their results may overwrite each other. We also did the same thing for the 
 hash table sink in Spark, and we haven't seen any issue with that.

targetWorkID/sourceWorkID are unique. We need random number because we could 
have multiple tasks for a particular work, in which case they may overwrite 
each other's file.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
---


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated July 3, 2015, 10:45 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
  8546d21 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java
  4803959 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
 363e49e 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
 e95d2ab 
   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out 
 e38ccf8 
   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
   

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-06 Thread chengxiang li


 On 七月 2, 2015, 6:36 a.m., chengxiang li wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java,
   line 59
  https://reviews.apache.org/r/34666/diff/1/?file=971706#file971706line59
 
  The statistic data shoud be quite unaccurate after filter and group, as 
  it's computered based on estimation during compile time. I think threshold 
  verification on unaccurate data should be unacceptable as that means the 
  threshold may not work at all.
  We may check this threshold in SparkPartitionPruningSinkOperator at 
  runtime.
 
 Chao Sun wrote:
 Switching to runtime would be very different - here we want to check this 
 threshold, and avoid generating the pruning task if possible.
 How inaccurate the stats would be? I'm fine if it's always more 
 conservative.

Take FilterOperator for example, the worst case is, it may just half the input 
rows as its statistic, you can find the rule for FilterOperator at 
FilterStatsRule, so it's a bad news that estimated statistics is not always 
conservative, this would make the threshold does not work as expected 
sometimes. You may create a followup work for this if it changes a lot.


- chengxiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90197
---


On 七月 3, 2015, 10:45 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated 七月 3, 2015, 10:45 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
  8546d21 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java
  4803959 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
 363e49e 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
 e95d2ab 
   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out 
 e38ccf8 
   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
   

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-06 Thread Xuefu Zhang


 On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
  ql/if/queryplan.thrift, line 60
  https://reviews.apache.org/r/34666/diff/1/?file=971689#file971689line60
 
  I'm not sure if it matters, but it's probably better if we add it as 
  the last.

It's still needed to move to the last as other also pointed out.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review85230
---


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated July 3, 2015, 10:45 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
  8546d21 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java
  4803959 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
 363e49e 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
 e95d2ab 
   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out 
 e38ccf8 
   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
   
 ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out 
 bafd62f 
   

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-06 Thread Gopal V

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90598
---



ql/if/queryplan.thrift (line 60)
https://reviews.apache.org/r/34666/#comment143696

Enum ordering nit - this needs to move down to the end for b/c.


- Gopal V


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated July 3, 2015, 10:45 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
  8546d21 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java
  4803959 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
 363e49e 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
 e95d2ab 
   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out 
 e38ccf8 
   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
   
 ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out 
 bafd62f 
   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
   
 

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-06 Thread Xuefu Zhang


 On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java, line 
  177
  https://reviews.apache.org/r/34666/diff/1/?file=971700#file971700line177
 
  Any chance that an op might be visited multiple times?
 
 Chao Sun wrote:
 It shouldn't - it'a tree traversing and every operator should only be 
 added once.

Actually there could be a diamond shape in the operator graph such as that 
formed by demux and mux operators. Join operator is another example. We should 
use graph traverse instead of tree traverse.


 On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java, line 
  519
  https://reviews.apache.org/r/34666/diff/1/?file=971702#file971702line519
 
  numThread could be = 0?
 
 Chao Sun wrote:
 It could equal to 0, since getInputPaths() could return 0. This would 
 result an IAE from newFixedThreadPool.

Maybe there is a problem as you described, but I think that's irrelavent to the 
work here. Thus, we should create a separate JIRA to fix that instead of 
including it here.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review85230
---


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated July 3, 2015, 10:45 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
  8546d21 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java
  4803959 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
 363e49e 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
 e95d2ab 
   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out 
 e38ccf8 
   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
   

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-06 Thread Xuefu Zhang


 On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java,
   line 92
  https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line92
 
  Can we still get conflicts in the file name?
 
 Chao Sun wrote:
 It shouldn't - I think work ID and Random#nextInt() should both be 
 unique, right?

Random.nextint() doesn't gives uniqueness. If targetWorkID/sourceWorkID gives 
you uniqueness, then you don't need a random number, right? If 
targetWorkID/sourceWorkID doesn't give uniqueness, then adding a random number 
doesn't help much.


 On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java,
   line 98
  https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line98
 
  Nit: Potential leak of BufferedOutputStream.
 
 Chao Sun wrote:
 Can you explain a little under which situation this would happen? and 
 what is the better way to do this? Thanks.

fs.create() can be successful, while either new BufferedOutputStream() or 
new ObjectOutputStream() can fail (returning null). In that case, the file 
descriptor returned by fs.create() will leak.

There is a new notation in java 7 for automatic resource management. Refer to: 
http://radar.oreilly.com/2011/09/java7-features.html


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
---


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated July 3, 2015, 10:45 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
  8546d21 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java
  4803959 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
 363e49e 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
 e95d2ab 
   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out 
 e38ccf8 
   

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-06 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90593
---



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
 (line 69)
https://reviews.apache.org/r/34666/#comment143691

Nit: remove the blank line.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
 (line 77)
https://reviews.apache.org/r/34666/#comment143693

I guess I don't know enough to comment on this, but looking at 
VectorReduceSinkOperator and VectorAppMasterEventOperator I can see some 
prominent differences:

1. first batch detection and processing there
2. VectorizedSerde logic here

Probably a live review will help.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
 (line 69)
https://reviews.apache.org/r/34666/#comment143690

Nit: remove the empty line.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
 (line 75)
https://reviews.apache.org/r/34666/#comment143689

Isn't this is the last operator in the operator graph? If so, we don't need 
to call forward(), right?


- Xuefu Zhang


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated July 3, 2015, 10:45 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
  8546d21 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java
  4803959 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
 363e49e 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
 e95d2ab 
   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out 
 e38ccf8 
   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
   

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-03 Thread Chao Sun


 On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
  itests/src/test/resources/testconfiguration.properties, line 894
  https://reviews.apache.org/r/34666/diff/1/?file=971683#file971683line894
 
  Are there more test cases that can be turned on?

will turn on vectorized_dynamic_partition_pruning.q


 On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java,
   line 74
  https://reviews.apache.org/r/34666/diff/1/?file=971705#file971705line74
 
  Is there anything specific to Spark? If not, we should probably reuse 
  rather than copying.

The only difference is the type of pruning sink added - we use 
SparkPartitionPruningSinkOp while Tez uses AppMasterEventOp.
OK, I'll reuse the existing class.


 On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java, line 
  589
  https://reviews.apache.org/r/34666/diff/1/?file=971711#file971711line589
 
  It seems that an operator might be visited multiple times.

Yea, but I guess it doesn't matter here. We just use this to find the enclosing 
work for a op, and we just need to find at least one root op.
Duplicate doesn't matter here I think.


 On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java,
   line 107
  https://reviews.apache.org/r/34666/diff/1/?file=971714#file971714line107
 
  For the cloned tree, don't we need to remove the branches that's not 
  connected to the pruning sink operator, i.e., RS-Join?

This is done before we clone the branch:

```
ListOperator? savedChildOps = filterOp.getChildOperators();
filterOp.setChildOperators(Utilities.makeList(selOp));
```


 On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java,
   line 111
  https://reviews.apache.org/r/34666/diff/1/?file=971714#file971714line111
 
  This is not cloned as part of cloneOperatorTree()?

no - because it is a transient field.


 On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java,
   line 92
  https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line92
 
  Can we still get conflicts in the file name?

It shouldn't - I think work ID and Random#nextInt() should both be unique, 
right?


 On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java,
   line 68
  https://reviews.apache.org/r/34666/diff/1/?file=971701#file971701line68
 
  I think we should delegate the processing to the parent when processing 
  one row from the batch. Refer to VectorReduceSinkOperator for an example.

Not much we can do here, since here the processing is more complicated. I 
changed part of the code to call the super.process().


 On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java,
   line 98
  https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line98
 
  Nit: Potential leak of BufferedOutputStream.

Can you explain a little under which situation this would happen? and what is 
the better way to do this? Thanks.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
---


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated May 26, 2015, 4:28 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 
 4cc54e8 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
   
 

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-03 Thread Chao Sun


 On July 1, 2015, midnight, Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 142
  https://reviews.apache.org/r/34666/diff/1/?file=971707#file971707line142
 
  Why do we need this now?

This is to prevent a newly generated task to be processed again. In case the 
task contains localwork, it maybe overwritten. See HIVE-9424 for more details.
However, I just found out that this is also fixed as part of HIVE-9659, so I'll 
remove this code now.


 On July 1, 2015, midnight, Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java, line 
  227
  https://reviews.apache.org/r/34666/diff/1/?file=971711#file971711line227
 
  why putting the old work in the map.

This is because even though we are cloning the op tree, we are still retaining 
the old work. So, after we've created a new root op, we need to
update the rootToWorkMap, and map the cloned root op to the old work. This is 
later used in getEnclosingWork.
We also need to keep the old entry because in SparkPartitionPruningSink, it 
still stores the old TableScanOperator, and in processPartitionPruningSink it
will look up the op to get a corresponding target work.
Added more comments in the code.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89972
---


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated May 26, 2015, 4:28 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 
 4cc54e8 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
   
 ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  e18f935 
   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  8e56263 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-03 Thread Chao Sun


 On July 2, 2015, 6:36 a.m., chengxiang li wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java,
   line 59
  https://reviews.apache.org/r/34666/diff/1/?file=971706#file971706line59
 
  The statistic data shoud be quite unaccurate after filter and group, as 
  it's computered based on estimation during compile time. I think threshold 
  verification on unaccurate data should be unacceptable as that means the 
  threshold may not work at all.
  We may check this threshold in SparkPartitionPruningSinkOperator at 
  runtime.

Switching to runtime would be very different - here we want to check this 
threshold, and avoid generating the pruning task if possible.
How inaccurate the stats would be? I'm fine if it's always more conservative.


 On July 2, 2015, 6:36 a.m., chengxiang li wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java, line 
  396
  https://reviews.apache.org/r/34666/diff/1/?file=971711#file971711line396
 
  Why we need List for table/cloumnname/partkey here? do we support multi 
  PartitionPruningSinkOperator inside single operator tree?

This is because a target work with a partitioned table could have multiple 
partition columns which could come from multiple table and/or partkeys.
You can check test output file for some examples.


 On July 2, 2015, 6:36 a.m., chengxiang li wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java,
   line 61
  https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line61
 
  While append data size overwhelm its capability, DataOutputBuffer 
  expand its byte array size by create a new byte array with 2x size and copy 
  old one to new one. A estimated initial byte array size should be able to 
  reduce most array copy.

Yes, this would be an improvement. Xuefu and me talked about adding an extra 
parameter to control the generated file size. We plan to do that as a follow-up 
task.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90197
---


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated May 26, 2015, 4:28 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 
 4cc54e8 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
   
 ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  e18f935 
   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  8e56263 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-03 Thread Chao Sun


 On July 2, 2015, 5:25 a.m., chengxiang li wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java,
   line 251
  https://reviews.apache.org/r/34666/diff/1/?file=971699#file971699line251
 
  Log in error level should means some error happens,the process would be 
  interrupted, if we really expect single field here, should we throw an 
  exception while it has more? otherwise, we should downgrade the log level 
  to WARN with more precise information.

OK, I changed this to assert, since it would be a bug if # of fields is not 1 
here.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90191
---


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated May 26, 2015, 4:28 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 
 4cc54e8 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
   
 ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  e18f935 
   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  8e56263 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
 363e49e 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
 

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-03 Thread Chao Sun


 On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java, line 
  177
  https://reviews.apache.org/r/34666/diff/1/?file=971700#file971700line177
 
  Any chance that an op might be visited multiple times?

It shouldn't - it'a tree traversing and every operator should only be added 
once.


 On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java, line 
  519
  https://reviews.apache.org/r/34666/diff/1/?file=971702#file971702line519
 
  numThread could be = 0?

It could equal to 0, since getInputPaths() could return 0. This would result an 
IAE from newFixedThreadPool.


 On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java, line 164
  https://reviews.apache.org/r/34666/diff/1/?file=971704#file971704line164
 
  what's this change about?

This is to delay stats annotation until we've done the DPP optimization. The 
generated branches also need to be annotated with stats.


 On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
  ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out, line 1914
  https://reviews.apache.org/r/34666/diff/1/?file=971731#file971731line1914
 
  why the stats are gone?

This is because we moved the stats annotation to SparkCompiler, and therefore 
in SimpleFetchOptimizer, the fetch task generated won't have the stats.
I don't see this is a big issue and Tez does the same thing as well.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review85230
---


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated May 26, 2015, 4:28 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 
 4cc54e8 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
   
 ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  e18f935 
   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  8e56263 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-03 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/
---

(Updated July 3, 2015, 10:45 p.m.)


Review request for hive, chengxiang li and Xuefu Zhang.


Bugs: HIVE-9152
https://issues.apache.org/jira/browse/HIVE-9152


Repository: hive-git


Description
---

Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
optimization and we should implement the same in HOS.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
  itests/src/test/resources/testconfiguration.properties 2a5f7e3 
  ql/if/queryplan.thrift c8dfa35 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
21398d8 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
 8546d21 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 
4803959 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
5f731d7 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
447f104 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java 
f7586a4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
  ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
  ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
  ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
  ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
  ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
e95d2ab 
  ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out 
e38ccf8 
  ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
  ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
  ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
  ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
  
ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
  ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
  ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
  ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
  ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
  ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
  ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out 
bafd62f 
  ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
  ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
  
ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
  ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out 
ef98ae9 

Diff: https://reviews.apache.org/r/34666/diff/


Testing
---

spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both 
are clone from tez's test.


Thanks,

Chao Sun



Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-02 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90197
---



ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java
 (line 59)
https://reviews.apache.org/r/34666/#comment143202

The statistic data shoud be quite unaccurate after filter and group, as 
it's computered based on estimation during compile time. I think threshold 
verification on unaccurate data should be unacceptable as that means the 
threshold may not work at all.
We may check this threshold in SparkPartitionPruningSinkOperator at runtime.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java (line 396)
https://reviews.apache.org/r/34666/#comment143199

Why we need List for table/cloumnname/partkey here? do we support multi 
PartitionPruningSinkOperator inside single operator tree?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
 (line 61)
https://reviews.apache.org/r/34666/#comment143203

While append data size overwhelm its capability, DataOutputBuffer expand 
its byte array size by create a new byte array with 2x size and copy old one to 
new one. A estimated initial byte array size should be able to reduce most 
array copy.


- chengxiang li


On 五月 26, 2015, 4:28 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated 五月 26, 2015, 4:28 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 
 4cc54e8 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
   
 ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  e18f935 
   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  8e56263 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
   

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-07-01 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90191
---



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
 (line 246)
https://reviews.apache.org/r/34666/#comment143192

Should encapsulated with LOG.isDebugEnabled.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
 (line 251)
https://reviews.apache.org/r/34666/#comment143193

Log in error level should means some error happens,the process would be 
interrupted, if we really expect single field here, should we throw an 
exception while it has more? otherwise, we should downgrade the log level to 
WARN with more precise information.


- chengxiang li


On 五月 26, 2015, 4:28 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated 五月 26, 2015, 4:28 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 
 4cc54e8 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
   
 ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  e18f935 
   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  8e56263 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
 363e49e 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
   

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-06-30 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89972
---



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 (line 142)
https://reviews.apache.org/r/34666/#comment142879

Why do we need this now?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java (line 227)
https://reviews.apache.org/r/34666/#comment142880

why putting the old work in the map.


- Xuefu Zhang


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated May 26, 2015, 4:28 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 
 4cc54e8 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
   
 ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  e18f935 
   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  8e56263 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
 363e49e 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out 
 e95d2ab 
   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out 
 e38ccf8 
   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
   

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-06-30 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review89826
---



itests/src/test/resources/testconfiguration.properties (line 894)
https://reviews.apache.org/r/34666/#comment142628

Are there more test cases that can be turned on?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
 (line 68)
https://reviews.apache.org/r/34666/#comment142851

I think we should delegate the processing to the parent when processing one 
row from the batch. Refer to VectorReduceSinkOperator for an example.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java
 (line 74)
https://reviews.apache.org/r/34666/#comment142852

Is there anything specific to Spark? If not, we should probably reuse 
rather than copying.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java
 (line 45)
https://reviews.apache.org/r/34666/#comment142853

Same as above. We should probably reuse.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java (line 375)
https://reviews.apache.org/r/34666/#comment142860

Instead of throwing an AssertionError, should we do a condition assertion 
instead?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java (line 589)
https://reviews.apache.org/r/34666/#comment142870

It seems that an operator might be visited multiple times.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java (line 219)
https://reviews.apache.org/r/34666/#comment142758

The comment here is a little confusing. break op tree seems having 
already happened above.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java (line 224)
https://reviews.apache.org/r/34666/#comment142759

Nit: add comments here, like regenerate task dependency.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java (line 262)
https://reviews.apache.org/r/34666/#comment142756

Rename generateWorkTree() to generateTaskTreeHelper() or something like 
that.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java
 (line 71)
https://reviews.apache.org/r/34666/#comment142757

Rename the class to something like OperatorTreeSplitterForPPD().



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java
 (line 81)
https://reviews.apache.org/r/34666/#comment142760

Nit: Split this into two lines instead.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java
 (line 107)
https://reviews.apache.org/r/34666/#comment142764

For the cloned tree, don't we need to remove the branches that's not 
connected to the pruning sink operator, i.e., RS-Join?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java
 (line 111)
https://reviews.apache.org/r/34666/#comment142768

This is not cloned as part of cloneOperatorTree()?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
 (line 69)
https://reviews.apache.org/r/34666/#comment142765

Nit: remove the blank line.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
 (line 92)
https://reviews.apache.org/r/34666/#comment142766

Can we still get conflicts in the file name?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
 (line 98)
https://reviews.apache.org/r/34666/#comment142767

Nit: Potential leak of BufferedOutputStream.


- Xuefu Zhang


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated May 26, 2015, 4:28 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 
 4cc54e8 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
   

Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]

2015-05-27 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review85230
---


This a big patch, for a big feature. It's hard to review offline. Here I 
offered about things that are obvious. For better understanding, I think an 
in-person review would be more effective.


ql/if/queryplan.thrift
https://reviews.apache.org/r/34666/#comment136752

I'm not sure if it matters, but it's probably better if we add it as the 
last.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
https://reviews.apache.org/r/34666/#comment136753

Did you make any changes in this file? If not, let's leave it as it is.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
https://reviews.apache.org/r/34666/#comment136942

File descriptor needs to be closed in final block. In addition, closing in 
is not sufficient, as in might be null while fs.open(fstatus.getPath() returns 
not null.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java
https://reviews.apache.org/r/34666/#comment136943

Any chance that an op might be visited multiple times?



ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
https://reviews.apache.org/r/34666/#comment136946

numThread could be = 0?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
https://reviews.apache.org/r/34666/#comment136948

what's this change about?



ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out
https://reviews.apache.org/r/34666/#comment136976

why the stats are gone?


- Xuefu Zhang


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34666/
 ---
 
 (Updated May 26, 2015, 4:28 p.m.)
 
 
 Review request for hive, chengxiang li and Xuefu Zhang.
 
 
 Bugs: HIVE-9152
 https://issues.apache.org/jira/browse/HIVE-9152
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 
 4cc54e8 
   ql/if/queryplan.thrift c8dfa35 
   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
   
 ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  e18f935 
   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 
 21398d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 e6c845c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1de7e40 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  8e56263 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 5f731d7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 447f104 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 e27ce0d 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 19aae70