[GitHub] drill pull request #1057: DRILL-5993 VectorContainer Append and Generic Copi...

2017-12-20 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r158179222
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java 
---
@@ -353,6 +353,23 @@ public int getRecordCount() {
 
   public boolean hasRecordCount() { return recordCount != -1; }
 
+  /**
+   * This works with non-hyper {@link VectorContainer}s which have no 
selection vectors.
+   * Appends a row taken from a source {@link VectorContainer} to this 
{@link VectorContainer}.
+   * @param srcContainer The {@link VectorContainer} to copy a row from.
+   * @param srcIndex The index of the row to copy from the source {@link 
VectorContainer}.
+   */
+  public void appendRow(VectorContainer srcContainer, int srcIndex) {
+for (int vectorIndex = 0; vectorIndex < wrappers.size(); 
vectorIndex++) {
+  ValueVector destVector = wrappers.get(vectorIndex).getValueVector();
+  ValueVector srcVector = 
srcContainer.wrappers.get(vectorIndex).getValueVector();
+
+  destVector.copyEntry(recordCount, srcVector, srcIndex);
+}
+
+recordCount++;
--- End diff --

I would not advise changing the meaning of -1; that is baked into the code 
in many places. Instead, before adding the first row, simply set the count to 0 
when the target container is created.


---


[GitHub] drill pull request #1057: DRILL-5993 VectorContainer Append and Generic Copi...

2017-12-20 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r158179561
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/svremover/GenericCopierTest.java
 ---
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.svremover;
+
+import org.apache.drill.exec.memory.RootAllocator;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.test.rowSet.RowSet;
+import org.apache.drill.test.rowSet.RowSetBuilder;
+
+public class GenericCopierTest extends AbstractGenericCopierTest {
--- End diff --

One reason I was a bit hesitant to add this support earlier is that we need 
this to work for ALL Drill's data types, including the obscure ones. Clearly, 
this mechanism works for scalars. But, what about:

* Arrays of scalars
* Maps
* Arrays of maps
* Unions
* Lists of unions
* Lists of scalars
* Lists of maps
* Repeated lists of scalars
* Repeated lists of maps
* Repeated lists of repeated lists
* Repeated lists of unions

There are probably a few more that I'm overlooking. Until we test ALL of 
these, we can't be sure if the new code works as well as the old code.


---


[GitHub] drill pull request #1057: DRILL-5993 VectorContainer Append and Generic Copi...

2017-12-19 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r157894540
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/Copier.java
 ---
@@ -19,13 +19,15 @@
 
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.VectorContainer;
 
 public interface Copier {
-  public static TemplateClassDefinition TEMPLATE_DEFINITION2 = new 
TemplateClassDefinition(Copier.class, CopierTemplate2.class);
-  public static TemplateClassDefinition TEMPLATE_DEFINITION4 = new 
TemplateClassDefinition(Copier.class, CopierTemplate4.class);
+  TemplateClassDefinition TEMPLATE_DEFINITION2 = new 
TemplateClassDefinition(Copier.class, CopierTemplate2.class);
+  TemplateClassDefinition TEMPLATE_DEFINITION4 = new 
TemplateClassDefinition(Copier.class, CopierTemplate4.class);
--- End diff --

I will create a separate PR to do that since that change is unrelated to 
this PR


---


[GitHub] drill pull request #1057: DRILL-5993 VectorContainer Append and Generic Copi...

2017-12-19 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r157894423
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/CopierTemplate2.java
 ---
@@ -53,17 +51,32 @@ public int copyRecords(int index, int recordCount) 
throws SchemaChangeException
   }
 }
 
-int outgoingPosition = 0;
+return insertRecords(0, index, recordCount);
+  }
+
+  @Override
+  public int appendRecord(int index) throws SchemaChangeException {
+return appendRecords(index, 1);
+  }
+
+  @Override
+  public int appendRecords(int index, int recordCount) throws 
SchemaChangeException {
+return insertRecords(outgoing.getRecordCount(), index, recordCount);
+  }
+
+  private int insertRecords(int outgoingPosition, int index, int 
recordCount) throws SchemaChangeException {
+final int endIndex = index + recordCount;
 
-for(int svIndex = index; svIndex < index + recordCount; svIndex++, 
outgoingPosition++){
+for(int svIndex = index; svIndex < endIndex; svIndex++, 
outgoingPosition++){
   doEval(sv2.getIndex(svIndex), outgoingPosition);
 }
+
+outgoing.setRecordCount(outgoingPosition);
 return outgoingPosition;
   }
 
-  public abstract void doSetup(@Named("context") FragmentContext context,
-   @Named("incoming") RecordBatch incoming,
-   @Named("outgoing") RecordBatch outgoing)
+  public abstract void doSetup(@Named("incoming") RecordBatch incoming,
+   @Named("outgoing") VectorContainer outgoing)
--- End diff --

The copiers are only used in the SVRemover and TopN operator. I have 
replaced the code generated copiers in both now to use the GenericCopiers.


---


[GitHub] drill pull request #1057: DRILL-5993 VectorContainer Append and Generic Copi...

2017-12-19 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r157853697
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/CopierTemplate4.java
 ---
@@ -54,17 +52,33 @@ public int copyRecords(int index, int recordCount) 
throws SchemaChangeException
   }
 }
 
-int outgoingPosition = 0;
-for(int svIndex = index; svIndex < index + recordCount; svIndex++, 
outgoingPosition++){
+return insertRecords(0, index, recordCount);
+  }
+
+  @Override
+  public int appendRecord(int index) throws SchemaChangeException {
+return appendRecords(index, 1);
+  }
--- End diff --

Updated the code and made an implementation of appendRecord which doesn't 
use a for loop


---


[GitHub] drill pull request #1057: DRILL-5993 VectorContainer Append and Generic Copi...

2017-12-18 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r157616077
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/Copier.java
 ---
@@ -19,13 +19,15 @@
 
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.VectorContainer;
 
 public interface Copier {
-  public static TemplateClassDefinition TEMPLATE_DEFINITION2 = new 
TemplateClassDefinition(Copier.class, CopierTemplate2.class);
-  public static TemplateClassDefinition TEMPLATE_DEFINITION4 = new 
TemplateClassDefinition(Copier.class, CopierTemplate4.class);
+  TemplateClassDefinition TEMPLATE_DEFINITION2 = new 
TemplateClassDefinition(Copier.class, CopierTemplate2.class);
+  TemplateClassDefinition TEMPLATE_DEFINITION4 = new 
TemplateClassDefinition(Copier.class, CopierTemplate4.class);
--- End diff --

Just for cleanliness, should do the same (remove all the 
public/static/abstract/final ) in **filter/Filterer.java** as well ! 


---


[GitHub] drill pull request #1057: DRILL-5993 VectorContainer Append and Generic Copi...

2017-12-18 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r157646058
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/CopierTemplate2.java
 ---
@@ -53,17 +51,32 @@ public int copyRecords(int index, int recordCount) 
throws SchemaChangeException
   }
 }
 
-int outgoingPosition = 0;
+return insertRecords(0, index, recordCount);
+  }
+
+  @Override
+  public int appendRecord(int index) throws SchemaChangeException {
+return appendRecords(index, 1);
+  }
+
+  @Override
+  public int appendRecords(int index, int recordCount) throws 
SchemaChangeException {
+return insertRecords(outgoing.getRecordCount(), index, recordCount);
+  }
+
+  private int insertRecords(int outgoingPosition, int index, int 
recordCount) throws SchemaChangeException {
+final int endIndex = index + recordCount;
 
-for(int svIndex = index; svIndex < index + recordCount; svIndex++, 
outgoingPosition++){
+for(int svIndex = index; svIndex < endIndex; svIndex++, 
outgoingPosition++){
   doEval(sv2.getIndex(svIndex), outgoingPosition);
 }
+
+outgoing.setRecordCount(outgoingPosition);
 return outgoingPosition;
   }
 
-  public abstract void doSetup(@Named("context") FragmentContext context,
-   @Named("incoming") RecordBatch incoming,
-   @Named("outgoing") RecordBatch outgoing)
+  public abstract void doSetup(@Named("incoming") RecordBatch incoming,
+   @Named("outgoing") VectorContainer outgoing)
--- End diff --

As the "generated code" methods are replaced with "in line" methods, we 
should remove these abstract signatures (both *doSetup* and *doEval*) to avoid 
generating code, and to distinguish these methods. Maybe we should _rename_ 
them as well to make the distinction clear.  Then probably also eliminate the 
"code generation" code in **getGenerated2Copier()** and 
**getGenerated4Copier()**.  


---


[GitHub] drill pull request #1057: DRILL-5993 VectorContainer Append and Generic Copi...

2017-12-18 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r157612792
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java 
---
@@ -353,6 +353,23 @@ public int getRecordCount() {
 
   public boolean hasRecordCount() { return recordCount != -1; }
 
+  /**
+   * This works with non-hyper {@link VectorContainer}s which have no 
selection vectors.
+   * Appends a row taken from a source {@link VectorContainer} to this 
{@link VectorContainer}.
+   * @param srcContainer The {@link VectorContainer} to copy a row from.
+   * @param srcIndex The index of the row to copy from the source {@link 
VectorContainer}.
+   */
+  public void appendRow(VectorContainer srcContainer, int srcIndex) {
+for (int vectorIndex = 0; vectorIndex < wrappers.size(); 
vectorIndex++) {
+  ValueVector destVector = wrappers.get(vectorIndex).getValueVector();
+  ValueVector srcVector = 
srcContainer.wrappers.get(vectorIndex).getValueVector();
+
+  destVector.copyEntry(recordCount, srcVector, srcIndex);
+}
+
+recordCount++;
--- End diff --

(1) Need to change the initial value of *recordCount* from -1 to 0 ;
(2) That "-1" is used to note "uninitialized". So need to add a boolean 
flag for that purpose.
(3) Please change the *appendRow()* signature to return *recordCount*. 
(E.g., so the caller would know when the target container is full).
 


---


[GitHub] drill pull request #1057: DRILL-5993 VectorContainer Append and Generic Copi...

2017-12-18 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r157646636
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/CopierTemplate4.java
 ---
@@ -54,17 +52,33 @@ public int copyRecords(int index, int recordCount) 
throws SchemaChangeException
   }
 }
 
-int outgoingPosition = 0;
-for(int svIndex = index; svIndex < index + recordCount; svIndex++, 
outgoingPosition++){
+return insertRecords(0, index, recordCount);
+  }
+
+  @Override
+  public int appendRecord(int index) throws SchemaChangeException {
+return appendRecords(index, 1);
+  }
--- End diff --

Minor performance comment: If *appendRecord()* is to be used heavily, then 
the overhead of calling the for() loop (in insertRecords() ) many times for 
only a size of 1 may be significant. 


---