[GitHub] drill pull request #1057: DRILL-5993 Append Row Method For VectorContainer (...

2017-11-29 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r153958856
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/record/TestVectorContainer.java
 ---
@@ -124,4 +132,52 @@ public void testContainerMerge() {
 leftIndirect.clear();
 right.clear();
   }
+
+  @Test
+  public void testAppendRow()
+  {
+MaterializedField colA = MaterializedField.create("colA", 
Types.required(TypeProtos.MinorType.INT));
+MaterializedField colB = MaterializedField.create("colB", 
Types.required(TypeProtos.MinorType.INT));
--- End diff --

Added more interesting types. Currently RowSet classes don't support the 
Map data type. Paul asked me to look into adding support for this a while ago 
for DRILL-5870 . I'll update the test framework to support that in the next PR.


---


[GitHub] drill pull request #1057: DRILL-5993 Append Row Method For VectorContainer (...

2017-11-29 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r153934729
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java 
---
@@ -353,6 +353,23 @@ public int getRecordCount() {
 
   public boolean hasRecordCount() { return recordCount != -1; }
 
+  /**
+   * This works with non-hyper {@link VectorContainer}s which have no 
selection vectors.
+   * Appends a row taken from a source {@link VectorContainer} to this 
{@link VectorContainer}.
+   * @param srcContainer The {@link VectorContainer} to copy a row from.
+   * @param srcIndex The index of the row to copy from the source {@link 
VectorContainer}.
+   */
+  public void appendRow(VectorContainer srcContainer, int srcIndex) {
+for (int vectorIndex = 0; vectorIndex < wrappers.size(); 
vectorIndex++) {
+  ValueVector destVector = wrappers.get(vectorIndex).getValueVector();
+  ValueVector srcVector = 
srcContainer.wrappers.get(vectorIndex).getValueVector();
+
+  destVector.copyEntry(recordCount, srcVector, srcIndex);
+}
+
+recordCount++;
--- End diff --

 The immediate need for appendRow() is to distribute rows from a single 
incoming batch into multiple other batches (for the Hash Join internal 
partitioning), based on the hash value of the key columns at each row. This 
would not work well with the second suggestion (vectorizing - column 1, column 
2, etc.)  


---


[GitHub] drill pull request #1057: DRILL-5993 Append Row Method For VectorContainer (...

2017-11-29 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r153935071
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/record/TestVectorContainer.java
 ---
@@ -124,4 +132,52 @@ public void testContainerMerge() {
 leftIndirect.clear();
 right.clear();
   }
+
+  @Test
+  public void testAppendRow()
+  {
+MaterializedField colA = MaterializedField.create("colA", 
Types.required(TypeProtos.MinorType.INT));
+MaterializedField colB = MaterializedField.create("colB", 
Types.required(TypeProtos.MinorType.INT));
--- End diff --

 Maybe add some "interesting" datatypes ?  Testing integers only may miss 
some issue. 



---


[GitHub] drill pull request #1057: DRILL-5993 Append Row Method For VectorContainer (...

2017-11-29 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r153926390
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java 
---
@@ -353,6 +353,23 @@ public int getRecordCount() {
 
   public boolean hasRecordCount() { return recordCount != -1; }
 
+  /**
+   * This works with non-hyper {@link VectorContainer}s which have no 
selection vectors.
+   * Appends a row taken from a source {@link VectorContainer} to this 
{@link VectorContainer}.
+   * @param srcContainer The {@link VectorContainer} to copy a row from.
+   * @param srcIndex The index of the row to copy from the source {@link 
VectorContainer}.
+   */
+  public void appendRow(VectorContainer srcContainer, int srcIndex) {
+for (int vectorIndex = 0; vectorIndex < wrappers.size(); 
vectorIndex++) {
+  ValueVector destVector = wrappers.get(vectorIndex).getValueVector();
+  ValueVector srcVector = 
srcContainer.wrappers.get(vectorIndex).getValueVector();
+
+  destVector.copyEntry(recordCount, srcVector, srcIndex);
+}
+
+recordCount++;
--- End diff --

This is OK for a row-by-row copy. But, you'll get better performance if you 
optimize for the entire batch. Because you have no SV4, the source and dest 
batches are the same so the vectors can be preloaded into an array of vectors 
to avoid the vector wrapper lookup per column.

Plus, if the code is written per batch, you can go a step further: 
vectorize the operation. Copy all values for column 1, then all for column 2, 
and so on. (In this case, you only get each vector once, so sticking with the 
wrappers is fine.) By vectorizing, you may get the vectorized cache-locality 
benefit that Drill promises from its operations. Worth a try to see if you get 
any speed-up.


---


[GitHub] drill pull request #1057: DRILL-5993 Append Row Method For VectorContainer (...

2017-11-29 Thread ilooner
GitHub user ilooner opened a pull request:

https://github.com/apache/drill/pull/1057

DRILL-5993 Append Row Method For VectorContainer (WIP)

## Motivation

HashJoin requires a method that can take a row from a VectorContainer and 
append it to a destination VectorContainer. This is a WIP and this PR is mainly 
opened to improve my understanding.

## Implementation

This is an initial implementation that works with simple VectorContainers 
that are not hyper batches and do not have selection vectors. It is also 
assumed that the user called **SchemaUtil.coerceContainer** on the source 
VectorContainer before using the newly added **appendRow** method.

## Questions

- Do we have to worry about selection vectors in the source container?
- Do we have to think about support hyper batches in the destination 
container?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner/drill DRILL-5993

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1057.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1057


commit ee43d6808562a1ff60c17fa7622b8358b63c7276
Author: Timothy Farkas 
Date:   2017-11-29T20:38:41Z

 - Initial Implementation of append row for a vector container




---