Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]
On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote: itests/src/test/resources/testconfiguration.properties, line 894 https://reviews.apache.org/r/34666/diff/1/?file=971683#file971683line894 Are there more test cases that can be turned on? will turn on vectorized_dynamic_partition_pruning.q On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java, line 74 https://reviews.apache.org/r/34666/diff/1/?file=971705#file971705line74 Is there anything specific to Spark? If not, we should probably reuse rather than copying. The only difference is the type of pruning sink added - we use SparkPartitionPruningSinkOp while Tez uses AppMasterEventOp. OK, I'll reuse the existing class. On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java, line 589 https://reviews.apache.org/r/34666/diff/1/?file=971711#file971711line589 It seems that an operator might be visited multiple times. Yea, but I guess it doesn't matter here. We just use this to find the enclosing work for a op, and we just need to find at least one root op. Duplicate doesn't matter here I think. On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java, line 107 https://reviews.apache.org/r/34666/diff/1/?file=971714#file971714line107 For the cloned tree, don't we need to remove the branches that's not connected to the pruning sink operator, i.e., RS-Join? This is done before we clone the branch: ``` ListOperator? savedChildOps = filterOp.getChildOperators(); filterOp.setChildOperators(Utilities.makeList(selOp)); ``` On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java, line 111 https://reviews.apache.org/r/34666/diff/1/?file=971714#file971714line111 This is not cloned as part of cloneOperatorTree()? no - because it is a transient field. On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 92 https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line92 Can we still get conflicts in the file name? It shouldn't - I think work ID and Random#nextInt() should both be unique, right? On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java, line 68 https://reviews.apache.org/r/34666/diff/1/?file=971701#file971701line68 I think we should delegate the processing to the parent when processing one row from the batch. Refer to VectorReduceSinkOperator for an example. Not much we can do here, since here the processing is more complicated. I changed part of the code to call the super.process(). On June 30, 2015, 8:55 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 98 https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line98 Nit: Potential leak of BufferedOutputStream. Can you explain a little under which situation this would happen? and what is the better way to do this? Thanks. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34666/#review89826 --- On May 26, 2015, 4:28 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34666/ --- (Updated May 26, 2015, 4:28 p.m.) Review request for hive, chengxiang li and Xuefu Zhang. Bugs: HIVE-9152 https://issues.apache.org/jira/browse/HIVE-9152 Repository: hive-git Description --- Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc itests/src/test/resources/testconfiguration.properties 2a5f7e3 metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 ql/if/queryplan.thrift c8dfa35 ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806
Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]
On July 1, 2015, midnight, Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java, line 142 https://reviews.apache.org/r/34666/diff/1/?file=971707#file971707line142 Why do we need this now? This is to prevent a newly generated task to be processed again. In case the task contains localwork, it maybe overwritten. See HIVE-9424 for more details. However, I just found out that this is also fixed as part of HIVE-9659, so I'll remove this code now. On July 1, 2015, midnight, Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java, line 227 https://reviews.apache.org/r/34666/diff/1/?file=971711#file971711line227 why putting the old work in the map. This is because even though we are cloning the op tree, we are still retaining the old work. So, after we've created a new root op, we need to update the rootToWorkMap, and map the cloned root op to the old work. This is later used in getEnclosingWork. We also need to keep the old entry because in SparkPartitionPruningSink, it still stores the old TableScanOperator, and in processPartitionPruningSink it will look up the op to get a corresponding target work. Added more comments in the code. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34666/#review89972 --- On May 26, 2015, 4:28 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34666/ --- (Updated May 26, 2015, 4:28 p.m.) Review request for hive, chengxiang li and Xuefu Zhang. Bugs: HIVE-9152 https://issues.apache.org/jira/browse/HIVE-9152 Repository: hive-git Description --- Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc itests/src/test/resources/testconfiguration.properties 2a5f7e3 metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 ql/if/queryplan.thrift c8dfa35 ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 ql/src/gen/thrift/gen-php/Types.php 7121ed4 ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841
Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]
On July 2, 2015, 6:36 a.m., chengxiang li wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java, line 59 https://reviews.apache.org/r/34666/diff/1/?file=971706#file971706line59 The statistic data shoud be quite unaccurate after filter and group, as it's computered based on estimation during compile time. I think threshold verification on unaccurate data should be unacceptable as that means the threshold may not work at all. We may check this threshold in SparkPartitionPruningSinkOperator at runtime. Switching to runtime would be very different - here we want to check this threshold, and avoid generating the pruning task if possible. How inaccurate the stats would be? I'm fine if it's always more conservative. On July 2, 2015, 6:36 a.m., chengxiang li wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java, line 396 https://reviews.apache.org/r/34666/diff/1/?file=971711#file971711line396 Why we need List for table/cloumnname/partkey here? do we support multi PartitionPruningSinkOperator inside single operator tree? This is because a target work with a partitioned table could have multiple partition columns which could come from multiple table and/or partkeys. You can check test output file for some examples. On July 2, 2015, 6:36 a.m., chengxiang li wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java, line 61 https://reviews.apache.org/r/34666/diff/1/?file=971715#file971715line61 While append data size overwhelm its capability, DataOutputBuffer expand its byte array size by create a new byte array with 2x size and copy old one to new one. A estimated initial byte array size should be able to reduce most array copy. Yes, this would be an improvement. Xuefu and me talked about adding an extra parameter to control the generated file size. We plan to do that as a follow-up task. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34666/#review90197 --- On May 26, 2015, 4:28 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34666/ --- (Updated May 26, 2015, 4:28 p.m.) Review request for hive, chengxiang li and Xuefu Zhang. Bugs: HIVE-9152 https://issues.apache.org/jira/browse/HIVE-9152 Repository: hive-git Description --- Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc itests/src/test/resources/testconfiguration.properties 2a5f7e3 metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 ql/if/queryplan.thrift c8dfa35 ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 ql/src/gen/thrift/gen-php/Types.php 7121ed4 ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7
Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]
On July 2, 2015, 5:25 a.m., chengxiang li wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java, line 251 https://reviews.apache.org/r/34666/diff/1/?file=971699#file971699line251 Log in error level should means some error happens,the process would be interrupted, if we really expect single field here, should we throw an exception while it has more? otherwise, we should downgrade the log level to WARN with more precise information. OK, I changed this to assert, since it would be a bug if # of fields is not 1 here. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34666/#review90191 --- On May 26, 2015, 4:28 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34666/ --- (Updated May 26, 2015, 4:28 p.m.) Review request for hive, chengxiang li and Xuefu Zhang. Bugs: HIVE-9152 https://issues.apache.org/jira/browse/HIVE-9152 Repository: hive-git Description --- Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc itests/src/test/resources/testconfiguration.properties 2a5f7e3 metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 ql/if/queryplan.thrift c8dfa35 ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 ql/src/gen/thrift/gen-php/Types.php 7121ed4 ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out
Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]
On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java, line 177 https://reviews.apache.org/r/34666/diff/1/?file=971700#file971700line177 Any chance that an op might be visited multiple times? It shouldn't - it'a tree traversing and every operator should only be added once. On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java, line 519 https://reviews.apache.org/r/34666/diff/1/?file=971702#file971702line519 numThread could be = 0? It could equal to 0, since getInputPaths() could return 0. This would result an IAE from newFixedThreadPool. On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java, line 164 https://reviews.apache.org/r/34666/diff/1/?file=971704#file971704line164 what's this change about? This is to delay stats annotation until we've done the DPP optimization. The generated branches also need to be annotated with stats. On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote: ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out, line 1914 https://reviews.apache.org/r/34666/diff/1/?file=971731#file971731line1914 why the stats are gone? This is because we moved the stats annotation to SparkCompiler, and therefore in SimpleFetchOptimizer, the fetch task generated won't have the stats. I don't see this is a big issue and Tez does the same thing as well. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34666/#review85230 --- On May 26, 2015, 4:28 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34666/ --- (Updated May 26, 2015, 4:28 p.m.) Review request for hive, chengxiang li and Xuefu Zhang. Bugs: HIVE-9152 https://issues.apache.org/jira/browse/HIVE-9152 Repository: hive-git Description --- Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc itests/src/test/resources/testconfiguration.properties 2a5f7e3 metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 ql/if/queryplan.thrift c8dfa35 ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 ql/src/gen/thrift/gen-php/Types.php 7121ed4 ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70
Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34666/ --- (Updated July 3, 2015, 10:45 p.m.) Review request for hive, chengxiang li and Xuefu Zhang. Bugs: HIVE-9152 https://issues.apache.org/jira/browse/HIVE-9152 Repository: hive-git Description --- Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc itests/src/test/resources/testconfiguration.properties 2a5f7e3 ql/if/queryplan.thrift c8dfa35 ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java 8546d21 ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 Diff: https://reviews.apache.org/r/34666/diff/ Testing --- spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone from tez's test. Thanks, Chao Sun
Re: Review Request 36156: HIVE-11053: Add more tests for HIVE-10844[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36156/#review90314 --- ql/src/test/queries/clientpositive/dynamic_rdd_cache.q (line 78) https://reviews.apache.org/r/36156/#comment143335 this query is quite same as the previous one, we shoud just need one of thoese. - chengxiang li On 七月 3, 2015, 7:34 a.m., lun gao wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36156/ --- (Updated 七月 3, 2015, 7:34 a.m.) Review request for hive and chengxiang li. Bugs: HIVE-11053 https://issues.apache.org/jira/browse/HIVE-11053 Repository: hive-git Description --- Add some test cases for self union, self-join, CWE, and repeated sub-queries to verify the job of combining quivalent works in HIVE-10844. Diffs - ql/src/test/queries/clientpositive/dynamic_rdd_cache.q PRE-CREATION ql/src/test/results/clientpositive/spark/dynamic_rdd_cache.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36156/diff/ Testing --- Thanks, lun gao
Re: Review Request 36156: HIVE-11053: Add more tests for HIVE-10844[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36156/ --- (Updated July 3, 2015, 7:34 a.m.) Review request for hive and chengxiang li. Changes --- Add new test Bugs: HIVE-11053 https://issues.apache.org/jira/browse/HIVE-11053 Repository: hive-git Description --- Add some test cases for self union, self-join, CWE, and repeated sub-queries to verify the job of combining quivalent works in HIVE-10844. Diffs (updated) - ql/src/test/queries/clientpositive/dynamic_rdd_cache.q PRE-CREATION ql/src/test/results/clientpositive/spark/dynamic_rdd_cache.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36156/diff/ Testing --- Thanks, lun gao
Re: Review Request 36156: HIVE-11053: Add more tests for HIVE-10844[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36156/#review90312 --- ql/src/test/queries/clientpositive/dynamic_rdd_cache.q (line 102) https://reviews.apache.org/r/36156/#comment143334 drop temp table at the end. - chengxiang li On 七月 3, 2015, 7:34 a.m., lun gao wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36156/ --- (Updated 七月 3, 2015, 7:34 a.m.) Review request for hive and chengxiang li. Bugs: HIVE-11053 https://issues.apache.org/jira/browse/HIVE-11053 Repository: hive-git Description --- Add some test cases for self union, self-join, CWE, and repeated sub-queries to verify the job of combining quivalent works in HIVE-10844. Diffs - ql/src/test/queries/clientpositive/dynamic_rdd_cache.q PRE-CREATION ql/src/test/results/clientpositive/spark/dynamic_rdd_cache.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36156/diff/ Testing --- Thanks, lun gao
[jira] [Created] (HIVE-11180) Enable native vectorized map join for spark [Spark Branch]
Rui Li created HIVE-11180: - Summary: Enable native vectorized map join for spark [Spark Branch] Key: HIVE-11180 URL: https://issues.apache.org/jira/browse/HIVE-11180 Project: Hive Issue Type: Sub-task Reporter: Rui Li Assignee: Rui Li The improvement was introduced in HIVE-9824. Let's use this task to track how we can enable that for spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)