Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review92321 --- ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java (line 235) https://reviews.apache.org/r/34059/#comment146398 This will likely fail when we have a vectorized shuffle join. Can you add a note about needing to fix this in case we add a vectorized shuffle join. - Vikram Dixit Kumaraswamy On July 10, 2015, 10:17 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated July 10, 2015, 10:17 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 39477d6 itests/src/test/resources/testconfiguration.properties 97715fc ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 7b57550 ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 15cafdd ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java 7d79e87 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java e9bd44a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java 4c8c4b1 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 5a87bd6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java bca91dd ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java f474eae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 93ad145 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6b3e19d ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java fa697ef ql/src/java/org/apache/hadoop/hive/ql/plan/CommonMergeJoinDesc.java f9c34cb ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceWork.java a78a92e ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated July 10, 2015, 10:17 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Changes --- - Rebase with trunk, looks like some methods in GenTezUtils were converted to static - When selecting distributed hash join, the join operator should get OpTraits/stats set - For the issue regarding the flattened expressions in the vectorized rowObjectInspector, change the workaround to un-flatten the object inspector during JoinUtil.getObjectInspectorsFromEvaluators(). This is still a bit of a workaround, but only requires a change in 1 place, rather than the 2 changes needed in the previous solution (having to modify the column names during vectorized MapJoinOperator, as well as when generating the vectorized rowObjectInspector in VectorizedBatchUtil) - In the reducer, only the big table's input source should be vectorized Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 39477d6 itests/src/test/resources/testconfiguration.properties 97715fc ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 7b57550 ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 15cafdd ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java 7d79e87 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java e9bd44a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java 4c8c4b1 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 5a87bd6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java bca91dd ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java f474eae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 93ad145 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6b3e19d ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java fa697ef ql/src/java/org/apache/hadoop/hive/ql/plan/CommonMergeJoinDesc.java f9c34cb ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceWork.java a78a92e ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated July 2, 2015, 7:46 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Changes --- Fixing failure in tez_smb_1.q - the big table position in CommonMergeJoinOperator and the ReduceWork were different, they need to be consistent for the merge join to work properly. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6d0cf15 itests/src/test/resources/testconfiguration.properties 441b278 ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 15cafdd ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java 7d79e87 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java e9bd44a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java 3780113 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java 4c8c4b1 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 5a87bd6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java bca91dd ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 11c1df6 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/CommonMergeJoinDesc.java f9c34cb ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceWork.java a78a92e ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated July 1, 2015, 9:52 a.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Changes --- patch v7 - update unit tests Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6d0cf15 itests/src/test/resources/testconfiguration.properties 441b278 ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 15cafdd ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java 7d79e87 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java e9bd44a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java 3780113 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java 4c8c4b1 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 5a87bd6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java bca91dd ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 11c1df6 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceWork.java a78a92e ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 672 https://reviews.apache.org/r/34059/diff/1/?file=955661#file955661line672 Move this to ExprNodeUtils? I can move both versions of flattenExpr() to ExprNodeDescUtils, but I think flattenExprsIfNecessary() contains details specific to the MapJoinOperator On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 669 https://reviews.apache.org/r/34059/diff/5/?file=989880#file989880line669 Nit: typo initializion - initialization will fix On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 691 https://reviews.apache.org/r/34059/diff/5/?file=989880#file989880line691 May want to remove this comment in case we don't need it. will fix On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 705 https://reviews.apache.org/r/34059/diff/5/?file=989880#file989880line705 Incomplete comment needs fixing. This comment is unnecessary, looks like remnants of an old version of ExprNodeDescUtils.resolveJoinKeysAsRSColumns() which I cut/pasted to create the initial version of flattenExpr(). I'll remove the comment. On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 721 https://reviews.apache.org/r/34059/diff/5/?file=989880#file989880line721 Can you add comments as to why we need to do that? Adding comments to flattenExpr() On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java, line 26 https://reviews.apache.org/r/34059/diff/5/?file=989883#file989883line26 Can you add comments for describing the need/use of this class? Adding comment: /** * Provides a key/values (note the plural values) interface out of a Key/Value reader, * needed by ReduceRecordSource when reading input from a key/value source. */ On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java, line 24 https://reviews.apache.org/r/34059/diff/5/?file=989884#file989884line24 Comments please. will add comment On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java, line 186 https://reviews.apache.org/r/34059/diff/5/?file=989886#file989886line186 Not sure if this is correct? Shouldn't this be true only if this is dynamically partitioned hash join? Not totally sure, but I'm thinking this should always be done for vectorized rowObjectInspector. I think if any other reduce-side operation had to do any task-time initialization involving the vectorized row Object Inspectors, they would also hit the same issue hit in this patch, where there is a mismatch between the compile-time rowObjectInspector and the query-time rowObjectInspector (for vectorized queries only). Another way to avoid this workaround (plus flattenExprs()) would be to simply bypass all of this unnecessary initialization in the vectorized MapJoin operator, since according to Matt McCline many of the object inspectors that are set up during MapJoin initialization are not actually used in the vectorized version. But right now I have no idea what the unnecessary parts are. On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java, line 657 https://reviews.apache.org/r/34059/diff/5/?file=989889#file989889line657 Is there in the code here? No, it's HashSet Operator? On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 695 https://reviews.apache.org/r/34059/diff/5/?file=989880#file989880line695 Rename function to flattenExprList makes code easier to read. will change On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java, line 216 https://reviews.apache.org/r/34059/diff/1/?file=955672#file955672line216 Minreducers for conservative estimation There is no MINREDUCERS config parameter. Discussed offline with Vikram, a more appropriate value is ReduceSinkDesc.getNumReducers(). - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83846 --- On June 22, 2015, 7:30 p.m., Jason Dere wrote:
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated July 1, 2015, 1:46 a.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Changes --- Patch v6 - incorporating Vikram's review feedback Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6d0cf15 itests/src/test/resources/testconfiguration.properties 441b278 ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 15cafdd ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java 7d79e87 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java e9bd44a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java 3780113 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java 4c8c4b1 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 5a87bd6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java bca91dd ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 11c1df6 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83846 --- ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java (line 672) https://reviews.apache.org/r/34059/#comment134936 Move this to ExprNodeUtils? ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java (line 210) https://reviews.apache.org/r/34059/#comment134932 Minreducers for conservative estimation ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java (line 669) https://reviews.apache.org/r/34059/#comment141831 Nit: typo initializion - initialization ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java (line 691) https://reviews.apache.org/r/34059/#comment141832 May want to remove this comment in case we don't need it. ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java (line 695) https://reviews.apache.org/r/34059/#comment141833 Rename function to flattenExprList makes code easier to read. ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java (line 705) https://reviews.apache.org/r/34059/#comment141834 Incomplete comment needs fixing. ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java (line 721) https://reviews.apache.org/r/34059/#comment141835 Can you add comments as to why we need to do that? ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java (line 26) https://reviews.apache.org/r/34059/#comment141840 Can you add comments for describing the need/use of this class? ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java (line 24) https://reviews.apache.org/r/34059/#comment141841 Comments please. ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java (line 186) https://reviews.apache.org/r/34059/#comment141870 Not sure if this is correct? Shouldn't this be true only if this is dynamically partitioned hash join? ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java (line 652) https://reviews.apache.org/r/34059/#comment141884 Is there in the code here? - Vikram Dixit Kumaraswamy On June 22, 2015, 7:30 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated June 22, 2015, 7:30 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 27f68df itests/src/test/resources/testconfiguration.properties 7b7559a ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 15cafdd ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java e9bd44a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java 4c8c4b1 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 5a87bd6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java bca91dd ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 11c1df6 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated June 22, 2015, 7:30 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Changes --- rebasing patch with trunk again Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 27f68df itests/src/test/resources/testconfiguration.properties 7b7559a ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 15cafdd ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java e9bd44a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java 4c8c4b1 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 5a87bd6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java bca91dd ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 11c1df6 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated June 5, 2015, 9:47 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Changes --- Patch v4 - fixed rebase Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a724fd1 itests/src/test/resources/testconfiguration.properties 47a1107 ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 15cafdd ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java e9bd44a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java 4c8c4b1 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java bcffdbc ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java bca91dd ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 0edfc5d ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 6:26 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java, line 107 https://reviews.apache.org/r/34059/diff/1/?file=955672#file955672line107 ReduceSinkOperator uses Object.hashCode() and equals() methods. HashSet algo relies on hashCode/equals methods Jason Dere wrote: So that means equals() only works if it is the exact same ReduceSinkOperator object. This should be ok for our usage, if we are referring to the same ReduceSinkOperator, we should be using that exact same object. Alexander Pivovarov wrote: Do you want to use IdentityHashMap then? This class implements the Map interface with a hash table, using reference-equality in place of object-equality when comparing keys (and values). In other words, in an IdentityHashMap, two keys k1 and k2 are considered equal if and only if (k1==k2) Jason Dere wrote: We're using a Set here as opposed to a Map. I'll change to use Sets.newIdentityHashSet() from Guava. Alexander Pivovarov wrote: IdentityHashMap contains private KeySet class already to get its instance you can call keySet() method e.g. IdentityHashMapInteger, Object rsMap = new IdentityHashMapInteger, Object(); rsMap.put(1, null); rsMap.put(2, null); rsMap.put(3, null); SetInteger rsSet = rsMap.keySet(); System.out.println(rsSet); [3, 1, 2] I do not see a need to change this further - it has already been changed to use an Identity-based set - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83362 --- On May 15, 2015, 10:02 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 15, 2015, 10:02 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties c79c36c ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java e9bd44a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java af78776 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java bcffdbc ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 0edfc5d ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 15, 2015, 10:02 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Changes --- rebase patch with trunk Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties c79c36c ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java e9bd44a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java af78776 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java bcffdbc ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 0edfc5d ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 6:04 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java, line 33 https://reviews.apache.org/r/34059/diff/1/?file=955664#file955664line33 booleans in java are false by default I find this provides better readability. Are there any negatives to having the initial value set here? On May 12, 2015, 6:04 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java, line 50 https://reviews.apache.org/r/34059/diff/1/?file=955664#file955664line50 It is not necessary but I do not see a reason why the visibility of this method should be reduced. Should it be public as all others? The public functionality we need from that class is provided by the Iterator/Iterable interfaces, I didn't think it would be necessary to expose reset() since it is only really being used by the outer class. - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83359 --- On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 6:42 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java, line 439 https://reviews.apache.org/r/34059/diff/1/?file=955675#file955675line439 trailing space will fix On May 12, 2015, 6:42 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java, line 315 https://reviews.apache.org/r/34059/diff/1/?file=955677#file955677line315 Remove this line and add String type declaration 3 lines below. Do not confuse GC. will fix - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83371 --- On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 5:51 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 674 https://reviews.apache.org/r/34059/diff/1/?file=955661#file955661line674 I think it's better to use Map.Entry here to avoid unnecessary lookup get(pos) Map.Entry provides getKey, getValue, setValue methods. will fix On May 12, 2015, 5:51 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 679 https://reviews.apache.org/r/34059/diff/1/?file=955661#file955661line679 the same recommendation as avove will fix On May 12, 2015, 5:51 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 715 https://reviews.apache.org/r/34059/diff/1/?file=955661#file955661line715 Using replace(char, char) is faster than replace(CharSequence target, CharSequence replacement) because it is not using Pattern.compile().matcher().replaceAll API Can you use replace('.', '_') instead of replace(., _)? will fix - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83356 --- On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 6:26 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java, line 95 https://reviews.apache.org/r/34059/diff/1/?file=955671#file955671line95 usually static Log should be private because superclass static methods should use their own static Log to avoid confusion. will change to private On May 12, 2015, 6:26 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java, line 1094 https://reviews.apache.org/r/34059/diff/1/?file=955671#file955671line1094 Can you use Map.Entry to avoid unnecesary lookup 3 lines below? will fix On May 12, 2015, 6:26 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java, line 107 https://reviews.apache.org/r/34059/diff/1/?file=955672#file955672line107 ReduceSinkOperator uses Object.hashCode() and equals() methods. HashSet algo relies on hashCode/equals methods So that means equals() only works if it is the exact same ReduceSinkOperator object. This should be ok for our usage, if we are referring to the same ReduceSinkOperator, we should be using that exact same object. - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83362 --- On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 6:35 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java, line 109 https://reviews.apache.org/r/34059/diff/1/?file=955673#file955673line109 trailing space will fix On May 12, 2015, 6:35 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java, line 423 https://reviews.apache.org/r/34059/diff/1/?file=955675#file955675line423 Why calling getEntry(key) two times consequently? containsKey() and get() call getEntry internally Just call get(rs) one time, check thet result is not null and remove the second get(rs) will fix - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83367 --- On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 6:26 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java, line 107 https://reviews.apache.org/r/34059/diff/1/?file=955672#file955672line107 ReduceSinkOperator uses Object.hashCode() and equals() methods. HashSet algo relies on hashCode/equals methods Jason Dere wrote: So that means equals() only works if it is the exact same ReduceSinkOperator object. This should be ok for our usage, if we are referring to the same ReduceSinkOperator, we should be using that exact same object. Do you want to use IdentityHashMap then? This class implements the Map interface with a hash table, using reference-equality in place of object-equality when comparing keys (and values). In other words, in an IdentityHashMap, two keys k1 and k2 are considered equal if and only if (k1==k2) - Alexander --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83362 --- On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 6:26 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java, line 107 https://reviews.apache.org/r/34059/diff/1/?file=955672#file955672line107 ReduceSinkOperator uses Object.hashCode() and equals() methods. HashSet algo relies on hashCode/equals methods Jason Dere wrote: So that means equals() only works if it is the exact same ReduceSinkOperator object. This should be ok for our usage, if we are referring to the same ReduceSinkOperator, we should be using that exact same object. Alexander Pivovarov wrote: Do you want to use IdentityHashMap then? This class implements the Map interface with a hash table, using reference-equality in place of object-equality when comparing keys (and values). In other words, in an IdentityHashMap, two keys k1 and k2 are considered equal if and only if (k1==k2) Jason Dere wrote: We're using a Set here as opposed to a Map. I'll change to use Sets.newIdentityHashSet() from Guava. IdentityHashMap contains private KeySet class already to get its instance you can call keySet() method e.g. IdentityHashMapInteger, Object rsMap = new IdentityHashMapInteger, Object(); rsMap.put(1, null); rsMap.put(2, null); rsMap.put(3, null); SetInteger rsSet = rsMap.keySet(); System.out.println(rsSet); [3, 1, 2] - Alexander --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83362 --- On May 15, 2015, 1:02 a.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 15, 2015, 1:02 a.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties f9c9351 ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java e9bd44a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
On May 12, 2015, 6:26 a.m., Alexander Pivovarov wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java, line 107 https://reviews.apache.org/r/34059/diff/1/?file=955672#file955672line107 ReduceSinkOperator uses Object.hashCode() and equals() methods. HashSet algo relies on hashCode/equals methods Jason Dere wrote: So that means equals() only works if it is the exact same ReduceSinkOperator object. This should be ok for our usage, if we are referring to the same ReduceSinkOperator, we should be using that exact same object. Alexander Pivovarov wrote: Do you want to use IdentityHashMap then? This class implements the Map interface with a hash table, using reference-equality in place of object-equality when comparing keys (and values). In other words, in an IdentityHashMap, two keys k1 and k2 are considered equal if and only if (k1==k2) We're using a Set here as opposed to a Map. I'll change to use Sets.newIdentityHashSet() from Guava. - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83362 --- On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 15, 2015, 1:02 a.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Changes --- Addressing RB feedback from apivovarov Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties f9c9351 ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java e9bd44a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83359 --- ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java https://reviews.apache.org/r/34059/#comment134334 booleans in java are false by default ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java https://reviews.apache.org/r/34059/#comment134335 Objects are null by default in Java ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java https://reviews.apache.org/r/34059/#comment134336 It is not necessary but I do not see a reason why the visibility of this method should be reduced. Should it be public as all others? - Alexander Pivovarov On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83362 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java https://reviews.apache.org/r/34059/#comment134342 usually static Log should be private because superclass static methods should use their own static Log to avoid confusion. ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java https://reviews.apache.org/r/34059/#comment134340 Can you use Map.Entry to avoid unnecesary lookup 3 lines below? ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java https://reviews.apache.org/r/34059/#comment134343 ReduceSinkOperator uses Object.hashCode() and equals() methods. HashSet algo relies on hashCode/equals methods - Alexander Pivovarov On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83367 --- ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java https://reviews.apache.org/r/34059/#comment134344 trailing space ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java https://reviews.apache.org/r/34059/#comment134347 Why calling getEntry(key) two times consequently? containsKey() and get() call getEntry internally Just call get(rs) one time, check thet result is not null and remove the second get(rs) - Alexander Pivovarov On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83371 --- ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java https://reviews.apache.org/r/34059/#comment134348 trailing space ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java https://reviews.apache.org/r/34059/#comment134349 Java will set it to 0 in constructor anyway. ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java https://reviews.apache.org/r/34059/#comment134350 Remove this line and add String type declaration 3 lines below. Do not confuse GC. ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java https://reviews.apache.org/r/34059/#comment134351 it will be false by default - Alexander Pivovarov On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- Review request for hive and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83356 --- ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java https://reviews.apache.org/r/34059/#comment134330 I think it's better to use Map.Entry here to avoid unnecessary lookup get(pos) Map.Entry provides getKey, getValue, setValue methods. ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java https://reviews.apache.org/r/34059/#comment134331 the same recommendation as avove ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java https://reviews.apache.org/r/34059/#comment134332 Using replace(char, char) is faster than replace(CharSequence target, CharSequence replacement) because it is not using Pattern.compile().matcher().replaceAll API Can you use replace('.', '_') instead of replace(., _)? - Alexander Pivovarov On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere