[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-17 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314967436
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/RowSetWriterImpl.java
 ##
 @@ -129,7 +129,7 @@ public RowSetWriter addRow(Object...values) {
 
   @Override
   public RowSetWriter addSingleCol(Object value) {
-return addRow(new Object[] {value});
+return addRow(value);
 
 Review comment:
   Thanks for the explanation, makes sense.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-17 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314967421
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/log/TestLogReader.java
 ##
 @@ -781,9 +781,19 @@ public void testPluginSerialization() throws IOException {
 
   @Test
   public void testFirewallSchema() throws RpcException {
-String sql = "SELECT * FROM cp.`regex/firewall.ssdlog`";
+String sql = "SELECT * FROM cp.`regex/firewall.ssdlog` limit 0";
 RowSet result = client.queryBuilder().sql(sql).rowSet();
-result.print();
+TupleMetadata expectedSchema = new SchemaBuilder()
 
 Review comment:
   Thanks for trying to fix this. Doing the change as another PR sounds good.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-16 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314816402
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/RowSetWriterImpl.java
 ##
 @@ -129,7 +129,7 @@ public RowSetWriter addRow(Object...values) {
 
   @Override
   public RowSetWriter addSingleCol(Object value) {
-return addRow(new Object[] {value});
+return addRow(value);
 
 Review comment:
   Ah! But I don't want to do the cast; that is the purpose of this method. I 
want:
   
   ```
 ...addRow(mapValue(10, "foo"), 20); // convenience
 ...addRow(new Object[] {10, "foo"}, 20); // low-level
   ...
 ...addSingleCol(mapValue(10, "foo"); // convenience
... addSingleCol(new Object[] {10, "foo"}); // low-level
   ```
   
   Unless we wrap that single column in another array, the second form is the 
same as:
   
   ```
... addRow(new Object[] {10, "foo"});
   ```
   
   Which will be interpreted as a set of two columns, not as a single column 
with a map-value.
   
   I could be wrong, and I'd have to test actual code to be sure. In fact, I'll 
do that test this weekend.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-16 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314788921
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/log/TestLogReader.java
 ##
 @@ -781,9 +781,19 @@ public void testPluginSerialization() throws IOException {
 
   @Test
   public void testFirewallSchema() throws RpcException {
-String sql = "SELECT * FROM cp.`regex/firewall.ssdlog`";
+String sql = "SELECT * FROM cp.`regex/firewall.ssdlog` limit 0";
 RowSet result = client.queryBuilder().sql(sql).rowSet();
-result.print();
+TupleMetadata expectedSchema = new SchemaBuilder()
 
 Review comment:
   Another change you might consider is renaming either this, or the other 
`SchemaBuilder`. Since this is the one we want to use moving forward, maybe 
rename the other to `BatchSchemaBuilder` so that future folks don't have to 
figure out the correct import when the reference the class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-16 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314787920
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/test/QueryRowSetIterator.java
 ##
 @@ -101,7 +103,7 @@ public DirectRowSet next() {
 
   public void printAll() {
 for (DirectRowSet rowSet : this) {
-  rowSet.print();
+  new RowSetStringBuilder(rowSet).writeTo(new 
OutputStreamWriter(System.out));
 
 Review comment:
   This is *not* a simplification; it exposes too much detail. Better:
   
   ```
   RowSetFormatter.print(rowSet);
   ```
   
   Of course, if we go this route, we might as well leave the `print()` method 
in `RowSet` and have it call the above code.
   
   Just to be clear, the purpose of `print()` is to help us developers 
visualize what's going on in Drill; it is not for production use. This is 
important because, if you can't visualize a batch, you'll make many mistakes 
because you can't tell what's what.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-16 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314786038
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/package-info.java
 ##
 @@ -32,6 +32,22 @@
  * batch" or "record batch." (But, in Drill, the term "record batch" also
  * usually means an operator on that set of records. Here, a row set is
  * just the rows  separate from operations on that data.
+ * SingleRowSet (abstract)
 
 Review comment:
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-16 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314783903
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/AbstractSingleRowSet.java
 ##
 @@ -15,37 +15,27 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.drill.test.rowSet;
+package org.apache.drill.exec.physical.rowSet;
 
 import org.apache.drill.exec.record.RecordBatchSizer;
 import org.apache.drill.exec.physical.rowSet.model.ReaderIndex;
 import 
org.apache.drill.exec.physical.rowSet.model.MetadataProvider.MetadataRetrieval;
 import org.apache.drill.exec.physical.rowSet.model.single.BaseReaderBuilder;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.metadata.TupleMetadata;
-import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;
 
 /**
  * Base class for row sets backed by a single record batch.
  */
 
 public abstract class AbstractSingleRowSet extends AbstractRowSet implements 
SingleRowSet {
 
-  public static class RowSetReaderBuilder extends BaseReaderBuilder {
-
-public RowSetReader buildReader(AbstractSingleRowSet rowSet, ReaderIndex 
rowIndex) {
-  TupleMetadata schema = rowSet.schema();
-  return new RowSetReaderImpl(schema, rowIndex,
-  buildContainerChildren(rowSet.container(),
-  new MetadataRetrieval(schema)));
-}
-  }
-
-  public AbstractSingleRowSet(AbstractSingleRowSet rowSet) {
+  protected AbstractSingleRowSet(AbstractSingleRowSet rowSet) {
 
 Review comment:
   I can live with this, though the nested-classes at top is a more common 
pattern in the Java community. The general rule is that constants and nested 
classes at top, then fields, then methods. Original Java formatters tended to 
put fields at the bottom. Some projects mix nested classes with methods: the 
class is defined adjacent to the method that uses it.
   
   I also recall that Arina once suggested removing an anonymous inner class in 
favor of a declared class. This is, again, a style issue since anonymous inner 
classes are a very common style, especially in UI and functional programming.
   
   In general, we need to establish rules for the project, else each person 
that touches the file will reformat code to their preferred style. We should 
discuss those rules on the dev list before we start enforcing them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-16 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314789387
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/test/QueryRowSetIterator.java
 ##
 @@ -17,24 +17,26 @@
  */
 package org.apache.drill.test;
 
+import java.io.OutputStreamWriter;
 import java.util.Iterator;
 
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.rowSet.RowSetStringBuilder;
 import org.apache.drill.exec.proto.UserBitShared.QueryId;
 import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState;
 import org.apache.drill.exec.proto.helper.QueryIdHelper;
 import org.apache.drill.exec.record.RecordBatchLoader;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.rpc.user.QueryDataBatch;
 import org.apache.drill.test.BufferingQueryEventListener.QueryEvent;
-import org.apache.drill.test.rowSet.DirectRowSet;
+import org.apache.drill.exec.physical.rowSet.DirectRowSet;
 
 public class QueryRowSetIterator implements Iterator, 
Iterable {
   private final BufferingQueryEventListener listener;
   private int recordCount = 0;
   private int batchCount = 0;
-  QueryId queryId = null;
+  private QueryId queryId = null;
 
 Review comment:
   As long as you are touching this code, please remove the do-nothing 
initializers:
   
   ```
   private int recordCount;
   private int batchCount;
   private QueryId queryId;
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-16 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314785834
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/RowSetStringBuilder.java
 ##
 @@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet;
+
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Helper class to obtain string representation of RowSet.
+ * Example of the output:
+ * 
+ *   #: year, month, day
+ *   0: 2017, 12, 17
+ *   1: 2017, 12, 18
+ *   2: 2017, 12, 19
+ * 
+ */
+public class RowSetStringBuilder {
 
 Review comment:
   The class name is still "RowSetStringBuilder" Better to rename it to, say, 
"RowSetFormatter". The constructor should take both a row set and a writer. We 
can then provide static methods for the two most common cases: print to stdout 
(which I use all the time when creating tests) and to string (which you seem to 
prefer):
   
   ```
   public static void print(RowSet rowSet) {
 new RowSetFormatter(rowSet, System.out).write();
   }
   
   public static String toString(RowSet rowSet) {
 StringWriter out = new StringWriter();
 new RowSetFormatter(rowSet, out).write();
 return out.toString();
   }
   ```
   
   The constructor allows other options, such as writing to a file. Sadly, it 
won't work to write to a log; we'd have to materialize to a string...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-16 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314787032
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/log/TestLogReader.java
 ##
 @@ -781,9 +781,19 @@ public void testPluginSerialization() throws IOException {
 
   @Test
   public void testFirewallSchema() throws RpcException {
-String sql = "SELECT * FROM cp.`regex/firewall.ssdlog`";
+String sql = "SELECT * FROM cp.`regex/firewall.ssdlog` limit 0";
 RowSet result = client.queryBuilder().sql(sql).rowSet();
-result.print();
+TupleMetadata expectedSchema = new SchemaBuilder()
+.add("eventDate", TypeProtos.MinorType.TIMESTAMP, 
TypeProtos.DataMode.OPTIONAL)
 
 Review comment:
   Simpler: `addNullable("eventDate", MinorType.TIMESTAMP)`
   
   Import `MinorType`, use the `addNullable` method rather than the more 
verbose form of using the `DataMode`. That is, express intent, not 
implementation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-16 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314780985
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/RowSetWriterImpl.java
 ##
 @@ -129,7 +129,7 @@ public RowSetWriter addRow(Object...values) {
 
   @Override
   public RowSetWriter addSingleCol(Object value) {
-return addRow(new Object[] {value});
+return addRow(value);
 
 Review comment:
   While this would seem to be true, it won't work for the case of a map:
   
   addRow(new Object[]{"foo", 10})
   
   Is the above two column values, or a single column that happens to be an 
object?
   
   There is actually no need to debate this, however. Without this feature, 
some unit tests will fail. (I know that because I added this method directly in 
response to those test failures...)
   
   Still, if you can get those tests to pass, even with the map value 
ambiguity, then my concern will be addressed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-15 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314396758
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/AbstractSingleRowSet.java
 ##
 @@ -66,4 +57,14 @@ public long size() {
   protected RowSetReader buildReader(ReaderIndex rowIndex) {
 return new RowSetReaderBuilder().buildReader(this, rowIndex);
   }
+
+  public static class RowSetReaderBuilder extends BaseReaderBuilder {
 
 Review comment:
   See note above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-15 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314396624
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/AbstractSingleRowSet.java
 ##
 @@ -15,37 +15,27 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.drill.test.rowSet;
+package org.apache.drill.exec.physical.rowSet;
 
 import org.apache.drill.exec.record.RecordBatchSizer;
 import org.apache.drill.exec.physical.rowSet.model.ReaderIndex;
 import 
org.apache.drill.exec.physical.rowSet.model.MetadataProvider.MetadataRetrieval;
 import org.apache.drill.exec.physical.rowSet.model.single.BaseReaderBuilder;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.metadata.TupleMetadata;
-import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;
 
 /**
  * Base class for row sets backed by a single record batch.
  */
 
 public abstract class AbstractSingleRowSet extends AbstractRowSet implements 
SingleRowSet {
 
-  public static class RowSetReaderBuilder extends BaseReaderBuilder {
-
-public RowSetReader buildReader(AbstractSingleRowSet rowSet, ReaderIndex 
rowIndex) {
-  TupleMetadata schema = rowSet.schema();
-  return new RowSetReaderImpl(schema, rowIndex,
-  buildContainerChildren(rowSet.container(),
-  new MetadataRetrieval(schema)));
-}
-  }
-
-  public AbstractSingleRowSet(AbstractSingleRowSet rowSet) {
+  protected AbstractSingleRowSet(AbstractSingleRowSet rowSet) {
 
 Review comment:
   Why remove nested classes? These are perfectly normal part of Java that 
allows small, single-purpose classes to be associated with the one and only 
class that uses them. Again, if we want to change the Drill style guidelines, 
let's have the discussion on dev list first.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-15 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314395723
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/AbstractRowSet.java
 ##
 @@ -41,13 +38,19 @@ public AbstractRowSet(VectorContainer container, 
TupleMetadata schema) {
   }
 
   @Override
-  public VectorAccessible vectorAccessible() { return container(); }
+  public VectorAccessible vectorAccessible() {
 
 Review comment:
   Why change formatting? Drill has long used single-line methods for 
conciseness. Is this an artifact of some code formatter? I didn't see 
discussion on the dev list about changing our coding standards...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-15 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314398737
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/RowSetWriterImpl.java
 ##
 @@ -129,7 +129,7 @@ public RowSetWriter addRow(Object...values) {
 
   @Override
   public RowSetWriter addSingleCol(Object value) {
-return addRow(new Object[] {value});
+return addRow(value);
 
 Review comment:
   Won't work. Must be an array. The idea is that addRow() takes an array of 
columns. But, if the row has a single column, and that one column is repeated, 
the Java code is ambiguous. The original (correct) implementation handles this 
case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-15 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314397062
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/DirectRowSet.java
 ##
 @@ -45,21 +45,10 @@
 
 public class DirectRowSet extends AbstractSingleRowSet implements 
ExtendableRowSet {
 
-  public static class RowSetWriterBuilder extends BaseWriterBuilder {
-
-public RowSetWriterBuilder(ColumnConversionFactory conversionFactory) {
-  super(conversionFactory);
-}
-
-public RowSetWriter buildWriter(DirectRowSet rowSet) {
-  WriterIndexImpl index = new WriterIndexImpl();
-  TupleMetadata schema = rowSet.schema();
-  RowSetWriterImpl writer = new RowSetWriterImpl(rowSet, schema, index,
-  buildContainerChildren(rowSet.container(),
-  new MetadataRetrieval(schema)));
-  return writer;
-}
-  }
+  /**
+   * Initial row count, used for preliminary memory allocation.
+   */
+  public static final int INITIAL_ROW_COUNT = 10;
 
 Review comment:
   See note above about removal of nested class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-15 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314398293
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/RowSetStringBuilder.java
 ##
 @@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet;
+
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+ * Helper class to obtain string representation of RowSet.
+ * Example of the output:
+ * 
+ *   #: year, month, day
+ *   0: 2017, 12, 17
+ *   1: 2017, 12, 18
+ *   2: 2017, 12, 19
+ * 
+ */
+public class RowSetStringBuilder {
 
 Review comment:
   The problem with this approach is that row sets can be large: creating a 
string to print causes memory pressure.
   
   Consider creating a row set builder that takes a stream (or Writer). The 
Writer can be a String writer. That way, when used with a large data set, the 
writer can be to stdout, or whatever.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-15 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314400808
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/package-info.java
 ##
 @@ -117,12 +134,14 @@
  * are added while loading, their index is always at the end of the existing
  * columns.
  * Writing Data to the Batch
- * Each batch is delimited by a call to {@link #startBatch()} and a call to
- * {@link #harvestWithLookAhead()} to obtain the completed batch. Note that 
readers do not
+ * Each batch is delimited by a call to {@link 
org.apache.drill.exec.physical.rowSet.ResultSetLoader#startBatch()}
+ * and a call to {@link 
org.apache.drill.exec.physical.rowSet.impl.VectorState#harvestWithLookAhead()}
+ * to obtain the completed batch. Note that readers do not
  * call these methods; the scan operator does this work.
  * 
- * Each row is delimited by a call to {@link #startValue()} and a call to
- * {@link #saveRow()}. startRow() performs initialization necessary
+ * Each row is delimited by a call to {@link 
org.apache.drill.exec.vector.accessor.writer.WriterEvents#startRow()}
 
 Review comment:
   WriterEvents is meant to be entirely internal; it should not be part of the 
API. WriterEvents is the internal implementation of the public API described 
here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-15 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314395350
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/AbstractRowSet.java
 ##
 @@ -15,23 +15,20 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.drill.test.rowSet;
+package org.apache.drill.exec.physical.rowSet;
 
 import org.apache.drill.exec.memory.BufferAllocator;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.metadata.TupleMetadata;
-import org.apache.drill.exec.vector.SchemaChangeCallBack;
 
 /**
  * Basic implementation of a row set for both the single and multiple
- * (hyper) varieties, both the fixed and extendible varieties.
+ * (hyper) varieties, both the fixed and extendable varieties.
 
 Review comment:
   extensible


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-15 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314400385
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/package-info.java
 ##
 @@ -32,6 +32,22 @@
  * batch" or "record batch." (But, in Drill, the term "record batch" also
  * usually means an operator on that set of records. Here, a row set is
  * just the rows  separate from operations on that data.
+ * SingleRowSet (abstract)
 
 Review comment:
   See general comment. Please do not combine the RowSet and ResultSetLoader 
classes into a single package: they present very different models used for 
different purposes.
   
   Yes, it is confusing because there is no ResultSetReader at present, but 
there should be to handle schema changes, etc. across batches. RowSet handles 
only a single row set such as in tests. It was deliberately designed to NOT be 
ResultSetReader solution.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-15 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314401892
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/test/QueryRowSetIterator.java
 ##
 @@ -99,13 +99,6 @@ public DirectRowSet next() {
 }
   }
 
-  public void printAll() {
 
 Review comment:
   Please either leave this, or move it to a utility class. This turns out to 
be a very handy way to validate or debug a unit test: add a call while working 
on the test to visualize the results, remove the call when done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-15 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314397262
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/DirectRowSet.java
 ##
 @@ -141,5 +136,22 @@ public SingleRowSet toIndirect(Set skipIndices) {
   }
 
   @Override
-  public SelectionVector2 getSv2() { return null; }
+  public SelectionVector2 getSv2() {
+return null;
+  }
+
+  public static class RowSetWriterBuilder extends BaseWriterBuilder {
 
 Review comment:
   See note above about cluttering the project with many small classes used in 
only one location.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1843: DRILL-7350: Move RowSet related classes from test folder

2019-08-15 Thread GitBox
paul-rogers commented on a change in pull request #1843: DRILL-7350: Move 
RowSet related classes from test folder
URL: https://github.com/apache/drill/pull/1843#discussion_r314399600
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/package-info.java
 ##
 @@ -32,6 +32,22 @@
  * batch" or "record batch." (But, in Drill, the term "record batch" also
  * usually means an operator on that set of records. Here, a row set is
  * just the rows  separate from operations on that data.
+ * SingleRowSet (abstract)
+ * Represents a row set that contains a single record batch (the typical
+ * case.
+ * DirectRowSet
+ * A read-only single row set without a selection vector.
+ * IndirectRowSet
+ * A read-only, single row set with an SV2. Note that the SV2 itself is
+ * writable (such as for sorting.)
+ * ExtendableRowSet
+ * A write-only, single row set used to create a new row set. Because of
+ * the way Drill sets row counts, an extendable row set cannot be read; instead
+ * at the completion of the write the extendable row set becomes a direct or
+ * indirect row set.
+ * HyperRowSet
+ * A read-only row set made up of a collection of record batches, indexed 
via an
+ * SV4. As with the SV2, the SV4 itself is writable.
 
 Review comment:
   The SV is not writable within a RowSet. Changing the SV will mess up 
indexing.
   
   The way to use that is to have a DirectRowSet on the original batch. Create 
the SV. Then, use the DirectRowSet and the SV to create an indirect row set.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services