Re: Review Request 13059: HIVE-4850 Implement vector mode map join

2013-11-11 Thread Eric Hanson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/#review28671
---



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java
https://reviews.apache.org/r/13059/#comment55588

Nice to see good comments



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java
https://reviews.apache.org/r/13059/#comment55589

A comment that explains at a high level where and how this interface is 
used would be helpful.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
https://reviews.apache.org/r/13059/#comment55593

should these fields be marked private?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
https://reviews.apache.org/r/13059/#comment55594

Please add a comment that explains this method.

Should this be a new method on ByteColumnVector or can you use 
BytesColumnVector.setVal()?

setVal automatically extends the internal buffer if needed.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
https://reviews.apache.org/r/13059/#comment55612

Please add a comment to explain the purpose of this method




ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
https://reviews.apache.org/r/13059/#comment55617

conventions are to put blanks before and after operators =,  etc.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
https://reviews.apache.org/r/13059/#comment55616

The Sun Java coding style conventions that are used for Hive say to use 
this style:

} else [if (...)] {
  ...
}



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
https://reviews.apache.org/r/13059/#comment55618

Please add a descriptive comment for this method



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
https://reviews.apache.org/r/13059/#comment55622

and the what?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
https://reviews.apache.org/r/13059/#comment55627

remove blank comment?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
https://reviews.apache.org/r/13059/#comment55629

correct spelling of Vectorization



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
https://reviews.apache.org/r/13059/#comment55632

supper - super?
Please explain what out-of-band params are.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
https://reviews.apache.org/r/13059/#comment55635

colon should be surounded by blanks



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java
https://reviews.apache.org/r/13059/#comment55653

Please add a comment that describes what this method does.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
https://reviews.apache.org/r/13059/#comment55654

Please add a comment



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
https://reviews.apache.org/r/13059/#comment55649

Please add a comment explaining what's done by this method



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
https://reviews.apache.org/r/13059/#comment55650

Please add a descriptive comment for this method


This looks good. I made a bunch of stylistic comments. Could you also add a 
page or so of design description to the design document for vectorized query 
execution attached to HIVE-4160? Thanks Remus. -Eric

- Eric Hanson


On Oct. 12, 2013, 9:51 p.m., Remus Rusanu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/13059/
 ---
 
 (Updated Oct. 12, 2013, 9:51 p.m.)
 
 
 Review request for hive, Eric Hanson and Jitendra Pandey.
 
 
 Bugs: HIVE-4850
 https://issues.apache.org/jira/browse/HIVE-4850
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This is a working implementation based on current trunk. It is simpler than 
 the .1 patch in as it delegates the JOIN entirely to the row-mode 
 MapJoinOperator. The vectorized operator is literally calling the row-mode 
 implementaiton for each row in the input batch and collects the row-mode 
 forward into the output batch. This is not as bad as it seems because the 
 JOIN operators has to resort to row-mode operations anyway, due to the small 
 tables (hashtables) being row-mode (objects and object-inspectors). By 
 delegating the entire join logic to the row mode we piggyback on the 
 correctness of exiting implementation. I do plan to come up with a 
 full-vectorized mode implementation but that would 

RE: Review Request 13059: HIVE-4850 Implement vector mode map join

2013-11-11 Thread Remus Rusanu
Thanks for the review!
I will probably have to add a JIRA to address them, otherwise I don't have a 
vehicle for submitting the patch :)

-Original Message-
From: Eric Hanson [mailto:nore...@reviews.apache.org] On Behalf Of Eric Hanson
Sent: Monday, November 11, 2013 10:09 PM
To: Jitendra Pandey; Eric Hanson (SQL SERVER)
Cc: Remus Rusanu; hive
Subject: Re: Review Request 13059: HIVE-4850 Implement vector mode map join


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/#review28671
---



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java
https://reviews.apache.org/r/13059/#comment55588

Nice to see good comments



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java
https://reviews.apache.org/r/13059/#comment55589

A comment that explains at a high level where and how this interface is 
used would be helpful.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
https://reviews.apache.org/r/13059/#comment55593

should these fields be marked private?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
https://reviews.apache.org/r/13059/#comment55594

Please add a comment that explains this method.

Should this be a new method on ByteColumnVector or can you use 
BytesColumnVector.setVal()?

setVal automatically extends the internal buffer if needed.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
https://reviews.apache.org/r/13059/#comment55612

Please add a comment to explain the purpose of this method




ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
https://reviews.apache.org/r/13059/#comment55617

conventions are to put blanks before and after operators =,  etc.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
https://reviews.apache.org/r/13059/#comment55616

The Sun Java coding style conventions that are used for Hive say to use 
this style:

} else [if (...)] {
  ...
}



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
https://reviews.apache.org/r/13059/#comment55618

Please add a descriptive comment for this method



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
https://reviews.apache.org/r/13059/#comment55622

and the what?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
https://reviews.apache.org/r/13059/#comment55627

remove blank comment?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
https://reviews.apache.org/r/13059/#comment55629

correct spelling of Vectorization



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
https://reviews.apache.org/r/13059/#comment55632

supper - super?
Please explain what out-of-band params are.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
https://reviews.apache.org/r/13059/#comment55635

colon should be surounded by blanks



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java
https://reviews.apache.org/r/13059/#comment55653

Please add a comment that describes what this method does.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
https://reviews.apache.org/r/13059/#comment55654

Please add a comment



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
https://reviews.apache.org/r/13059/#comment55649

Please add a comment explaining what's done by this method



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
https://reviews.apache.org/r/13059/#comment55650

Please add a descriptive comment for this method


This looks good. I made a bunch of stylistic comments. Could you also add a 
page or so of design description to the design document for vectorized query 
execution attached to HIVE-4160? Thanks Remus. -Eric

- Eric Hanson


On Oct. 12, 2013, 9:51 p.m., Remus Rusanu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/13059/
 ---
 
 (Updated Oct. 12, 2013, 9:51 p.m.)
 
 
 Review request for hive, Eric Hanson and Jitendra Pandey.
 
 
 Bugs: HIVE-4850
 https://issues.apache.org/jira/browse/HIVE-4850
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This is a working implementation based on current trunk. It is simpler than 
 the .1 patch in as it delegates the JOIN entirely to the row-mode 
 MapJoinOperator. The vectorized operator is literally calling the row-mode 
 implementaiton for each row in the input batch and collects

Re: Review Request 13059: HIVE-4850 Implement vector mode map join

2013-10-12 Thread Remus Rusanu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/
---

(Updated Oct. 12, 2013, 9:51 p.m.)


Review request for hive, Eric Hanson and Jitendra Pandey.


Bugs: HIVE-4850
https://issues.apache.org/jira/browse/HIVE-4850


Repository: hive-git


Description
---

This is a working implementation based on current trunk. It is simpler than the 
.1 patch in as it delegates the JOIN entirely to the row-mode MapJoinOperator. 
The vectorized operator is literally calling the row-mode implementaiton for 
each row in the input batch and collects the row-mode forward into the output 
batch. This is not as bad as it seems because the JOIN operators has to resort 
to row-mode operations anyway, due to the small tables (hashtables) being 
row-mode (objects and object-inspectors). By delegating the entire join logic 
to the row mode we piggyback on the correctness of exiting implementation. I do 
plan to come up with a full-vectorized mode implementation but that would 
require changes to the hash table creation-serialization. Note that the 
filtering and key evaluation of the big table does use vectorized operators. 
the row mode applies only to the key HT lookup and to the JOIN logic


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java d320b47 
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java 86db044 
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java fa9ee35 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 153b8ea 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 54f2644 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java cde1a59 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java
 9955d09 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java 
6df3551 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSelectOperator.java 
0fb763a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
8f10644 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java 
ff13f89 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java
 9e189c9 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
02c32cb 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java a72ec8b 
  ql/src/test/queries/clientpositive/vectorized_mapjoin.q PRE-CREATION 
  ql/src/test/results/clientpositive/vectorized_mapjoin.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/13059/diff/


Testing
---

Manually run some join queries on alltypes_orc table.


Thanks,

Remus Rusanu



Re: Review Request 13059: HIVE-4850 Implement vector mode map join

2013-10-09 Thread Remus Rusanu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/
---

(Updated Oct. 9, 2013, 1:50 p.m.)


Review request for hive, Eric Hanson and Jitendra Pandey.


Bugs: HIVE-4850
https://issues.apache.org/jira/browse/HIVE-4850


Repository: hive-git


Description
---

This is a working implementation based on current trunk. It is simpler than the 
.1 patch in as it delegates the JOIN entirely to the row-mode MapJoinOperator. 
The vectorized operator is literally calling the row-mode implementaiton for 
each row in the input batch and collects the row-mode forward into the output 
batch. This is not as bad as it seems because the JOIN operators has to resort 
to row-mode operations anyway, due to the small tables (hashtables) being 
row-mode (objects and object-inspectors). By delegating the entire join logic 
to the row mode we piggyback on the correctness of exiting implementation. I do 
plan to come up with a full-vectorized mode implementation but that would 
require changes to the hash table creation-serialization. Note that the 
filtering and key evaluation of the big table does use vectorized operators. 
the row mode applies only to the key HT lookup and to the JOIN logic


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java d320b47 
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java 86db044 
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java fa9ee35 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 153b8ea 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 8ab5395 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java cde1a59 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java
 9955d09 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java 
6df3551 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSelectOperator.java 
0fb763a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
bd0955e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java 
ff13f89 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java
 9e189c9 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
df1c5a6 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java a72ec8b 
  ql/src/test/queries/clientpositive/vectorized_mapjoin.q PRE-CREATION 
  ql/src/test/results/clientpositive/vectorized_mapjoin.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/13059/diff/


Testing
---

Manually run some join queries on alltypes_orc table.


Thanks,

Remus Rusanu



Re: Review Request 13059: HIVE-4850 Implement vector mode map join

2013-10-03 Thread Remus Rusanu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/
---

(Updated Oct. 3, 2013, 2:17 p.m.)


Review request for hive, Eric Hanson and Jitendra Pandey.


Bugs: HIVE-4850
https://issues.apache.org/jira/browse/HIVE-4850


Repository: hive-git


Description
---

This is not the final iteration, but I thought is easier to discuss it with a 
review.
This implementation works, handles multiple aliases and multiple values per 
key. The implementation uses the exiting hash tables saved by the local task 
for the map join, which are row mode hash tables (have row mode keys and store 
row mode writable object values). Going forward we should avoid the 
size-of-big-table conversions of big table keys to row-mode and conversion of 
small table values to vector data. This would require either converting 
on-the-fly the hash tables to vector friendly ones (when loaded) or changing 
the local task tahstable sink to create a vectorization friendly hash. First 
approach may have memory consumption problems (potentially two hash tables end 
up in memory, would have to stream the transformation or transform as reading 
from serialized format... nasty).


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java d320b47 
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java 86db044 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 153b8ea 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 8ab5395 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java cde1a59 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java
 9955d09 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java 
6df3551 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
02ebe14 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java 
ff13f89 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java
 9e189c9 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
df1c5a6 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java a72ec8b 

Diff: https://reviews.apache.org/r/13059/diff/


Testing
---

Manually run some join queries on alltypes_orc table.


Thanks,

Remus Rusanu



Re: Review Request 13059: HIVE-4850 Implement vector mode map join

2013-10-03 Thread Remus Rusanu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/
---

(Updated Oct. 3, 2013, 2:20 p.m.)


Review request for hive, Eric Hanson and Jitendra Pandey.


Bugs: HIVE-4850
https://issues.apache.org/jira/browse/HIVE-4850


Repository: hive-git


Description (updated)
---

This is a working implementation based on current trunk. It is simpler than the 
.1 patch in as it delegates the JOIN entirely to the row-mode MapJoinOperator. 
The vectorized operator is literally calling the row-mode implementaiton for 
each row in the input batch and collects the row-mode forward into the output 
batch. This is not as bad as it seems because the JOIN operators has to resort 
to row-mode operations anyway, due to the small tables (hashtables) being 
row-mode (objects and object-inspectors). By delegating the entire join logic 
to the row mode we piggyback on the correctness of exiting implementation. I do 
plan to come up with a full-vectorized mode implementation but that would 
require changes to the hash table creation-serialization. Note that the 
filtering and key evaluation of the big table does use vectorized operators. 
the row mode applies only to the key HT lookup and to the JOIN logic


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java d320b47 
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java 86db044 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 153b8ea 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 8ab5395 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java cde1a59 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java
 9955d09 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java 
6df3551 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
02ebe14 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java 
ff13f89 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java
 9e189c9 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
df1c5a6 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java a72ec8b 

Diff: https://reviews.apache.org/r/13059/diff/


Testing
---

Manually run some join queries on alltypes_orc table.


Thanks,

Remus Rusanu



Review Request 13059: HIVE-4850 Implement vector mode map join

2013-07-30 Thread Remus Rusanu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/
---

Review request for hive, Eric Hanson and Jitendra Pandey.


Bugs: HIVE-4850
https://issues.apache.org/jira/browse/HIVE-4850


Repository: hive-git


Description
---

This is not the final iteration, but I thought is easier to discuss it with a 
review.
This implementation works, handles multiple aliases and multiple values per 
key. The implementation uses the exiting hash tables saved by the local task 
for the map join, which are row mode hash tables (have row mode keys and store 
row mode writable object values). Going forward we should avoid the 
size-of-big-table conversions of big table keys to row-mode and conversion of 
small table values to vector data. This would require either converting 
on-the-fly the hash tables to vector friendly ones (when loaded) or changing 
the local task tahstable sink to create a vectorization friendly hash. First 
approach may have memory consumption problems (potentially two hash tables end 
up in memory, would have to stream the transformation or transform as reading 
from serialized format... nasty).


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 82d4b93 
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 31dbf41 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 4da1be8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 29de38d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java e579c00 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinDoubleKeys.java 
d774226 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectKey.java 
791bb3f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java 
58a9dc0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinSingleKey.java 
4bff936 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorExecMapper.java 
083b9b9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapOperator.java 
41d2001 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
9c90230 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java 
ff13f89 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java
 9e189c9 
  ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableDummyDesc.java f15ce48 

Diff: https://reviews.apache.org/r/13059/diff/


Testing
---

Manually run some join queries on alltypes_orc table.


Thanks,

Remus Rusanu