subject:"\[GitHub\] drill pull request #984\: DRILL\-5783 Made a unit test for generated Priority ..."

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/984


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-09 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r15009
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetComparison.java 
---
@@ -255,4 +257,39 @@ private void verifyArray(String colLabel, ArrayReader 
ea,
   }
 }
   }
+
+  // TODO make a native RowSetComparison comparator
+  public static class ObjectComparator implements Comparator {
--- End diff --

This is used in the DrillTestWrapper to verify the ordering of results. I 
agree this is not suitable for equality tests, but it's intended to be used 
only for ordering tests. I didn't add support for all the supported RowSet 
types because we would first have to move DrillTestWrapper to use RowSets 
(currently it uses Maps and Lists to represent data). Currently it is not used 
by RowSets, but the intention is to move DrillTestWrapper to use RowSets and 
then make this comparator operate on RowSets, but that will be an incremental 
process.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-09 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150097140
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java ---
@@ -85,8 +85,7 @@
* new row set with the updated columns, then merge the new
* and old row sets to create a new immutable row set.
*/
-
-  public interface RowSetWriter extends TupleWriter {
+  interface RowSetWriter extends TupleWriter {
--- End diff --

Ah, forgot that the file defines an interface, not a class. (The situation 
I described was an interface nested inside a class.) So, you're good.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-09 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150096444
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/file/JsonFileBuilder.java
 ---
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test.rowSet.file;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.accessor.ColumnAccessor;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.test.rowSet.RowSet;
+
+import java.io.BufferedOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class JsonFileBuilder
+{
+  public static final String DEFAULT_DOUBLE_FORMATTER = "%f";
+  public static final String DEFAULT_INTEGER_FORMATTER = "%d";
+  public static final String DEFAULT_LONG_FORMATTER = "%d";
+  public static final String DEFAULT_STRING_FORMATTER = "\"%s\"";
+  public static final String DEFAULT_DECIMAL_FORMATTER = "%s";
+  public static final String DEFAULT_PERIOD_FORMATTER = "%s";
+
+  public static final Map DEFAULT_FORMATTERS = new 
ImmutableMap.Builder()
+.put(ColumnAccessor.ValueType.DOUBLE, DEFAULT_DOUBLE_FORMATTER)
+.put(ColumnAccessor.ValueType.INTEGER, DEFAULT_INTEGER_FORMATTER)
+.put(ColumnAccessor.ValueType.LONG, DEFAULT_LONG_FORMATTER)
+.put(ColumnAccessor.ValueType.STRING, DEFAULT_STRING_FORMATTER)
+.put(ColumnAccessor.ValueType.DECIMAL, DEFAULT_DECIMAL_FORMATTER)
+.put(ColumnAccessor.ValueType.PERIOD, DEFAULT_PERIOD_FORMATTER)
+.build();
+
+  private final RowSet rowSet;
+  private final Map customFormatters = Maps.newHashMap();
+
+  public JsonFileBuilder(RowSet rowSet) {
+this.rowSet = Preconditions.checkNotNull(rowSet);
+Preconditions.checkArgument(rowSet.rowCount() > 0, "The given rowset 
is empty.");
+  }
+
+  public JsonFileBuilder setCustomFormatter(final String columnName, final 
String columnFormatter) {
+Preconditions.checkNotNull(columnName);
+Preconditions.checkNotNull(columnFormatter);
+
+Iterator fields = rowSet
+  .schema()
+  .batch()
+  .iterator();
+
+boolean hasColumn = false;
+
+while (!hasColumn && fields.hasNext()) {
+  hasColumn = fields.next()
+.getName()
+.equals(columnName);
+}
+
+final String message = String.format("(%s) is not a valid column", 
columnName);
+Preconditions.checkArgument(hasColumn, message);
+
+customFormatters.put(columnName, columnFormatter);
+
+return this;
+  }
+
+  public void build(File tableFile) throws IOException {
--- End diff --

Sounds Good


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-09 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150096261
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java ---
@@ -85,8 +85,7 @@
* new row set with the updated columns, then merge the new
* and old row sets to create a new immutable row set.
*/
-
-  public interface RowSetWriter extends TupleWriter {
+  interface RowSetWriter extends TupleWriter {
--- End diff --

IntelliJ gave a warning that the modifier is redundant. Also an interface 
nested inside another interface is public by default.

https://beginnersbook.com/2016/03/nested-or-inner-interfaces-in-java/


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-09 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150073673
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetComparison.java 
---
@@ -255,4 +257,39 @@ private void verifyArray(String colLabel, ArrayReader 
ea,
   }
 }
   }
+
+  // TODO make a native RowSetComparison comparator
+  public static class ObjectComparator implements Comparator {
--- End diff --

Defined here, but not used in this file. Does not include all types that 
Drill supports (via the RowSet): Date, byte arrays, BigDecimal, etc. Does not 
allow for ranges for floats & doubles as does JUnit. (Two floats are seldom 
exactly equal.)


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-09 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150072992
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/file/JsonFileBuilder.java
 ---
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test.rowSet.file;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.accessor.ColumnAccessor;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.test.rowSet.RowSet;
+
+import java.io.BufferedOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class JsonFileBuilder
+{
+  public static final String DEFAULT_DOUBLE_FORMATTER = "%f";
+  public static final String DEFAULT_INTEGER_FORMATTER = "%d";
+  public static final String DEFAULT_LONG_FORMATTER = "%d";
+  public static final String DEFAULT_STRING_FORMATTER = "\"%s\"";
+  public static final String DEFAULT_DECIMAL_FORMATTER = "%s";
+  public static final String DEFAULT_PERIOD_FORMATTER = "%s";
+
+  public static final Map DEFAULT_FORMATTERS = new 
ImmutableMap.Builder()
+.put(ColumnAccessor.ValueType.DOUBLE, DEFAULT_DOUBLE_FORMATTER)
+.put(ColumnAccessor.ValueType.INTEGER, DEFAULT_INTEGER_FORMATTER)
+.put(ColumnAccessor.ValueType.LONG, DEFAULT_LONG_FORMATTER)
+.put(ColumnAccessor.ValueType.STRING, DEFAULT_STRING_FORMATTER)
+.put(ColumnAccessor.ValueType.DECIMAL, DEFAULT_DECIMAL_FORMATTER)
+.put(ColumnAccessor.ValueType.PERIOD, DEFAULT_PERIOD_FORMATTER)
+.build();
+
+  private final RowSet rowSet;
+  private final Map customFormatters = Maps.newHashMap();
+
+  public JsonFileBuilder(RowSet rowSet) {
+this.rowSet = Preconditions.checkNotNull(rowSet);
+Preconditions.checkArgument(rowSet.rowCount() > 0, "The given rowset 
is empty.");
+  }
+
+  public JsonFileBuilder setCustomFormatter(final String columnName, final 
String columnFormatter) {
+Preconditions.checkNotNull(columnName);
+Preconditions.checkNotNull(columnFormatter);
+
+Iterator fields = rowSet
+  .schema()
+  .batch()
+  .iterator();
+
+boolean hasColumn = false;
+
+while (!hasColumn && fields.hasNext()) {
+  hasColumn = fields.next()
+.getName()
+.equals(columnName);
+}
+
+final String message = String.format("(%s) is not a valid column", 
columnName);
+Preconditions.checkArgument(hasColumn, message);
+
+customFormatters.put(columnName, columnFormatter);
+
+return this;
+  }
+
+  public void build(File tableFile) throws IOException {
--- End diff --

Great! This does not yet handle nested tuples or arrays; in part because 
the row set work for that is still sitting in PR #914. You can update this to 
be aware of maps and map arrays once that PR is committed.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-09 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150073945
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java ---
@@ -85,8 +85,7 @@
* new row set with the updated columns, then merge the new
* and old row sets to create a new immutable row set.
*/
-
-  public interface RowSetWriter extends TupleWriter {
+  interface RowSetWriter extends TupleWriter {
--- End diff --

Aren't nested interfaces `protected` by default? Just had to change one 
from default to `public` so I could use it in another package...


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-01 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r148396417
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/TableFileBuilder.java ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import java.io.BufferedOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.List;
+
+public class TableFileBuilder
+{
+  private final List columnNames;
+  private final List columnFormatters;
+  private final List rows = Lists.newArrayList();
+  private final String rowString;
+
+  public TableFileBuilder(List columnNames, List 
columnFormatters) {
--- End diff --

I've switched this around a bit. The general flow is now the following:

  1. A **RowSet** is constructed with a **RowSetBuilder**
  1. A **JsonFileBuilder** is created and configured to use the **RowSet** 
that was just created.
  1. The **JsonFileBuilder** reads the **RowSet** and writes it out to a 
file in a json format.



---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-01 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r148395420
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java
 ---
@@ -377,21 +386,21 @@ public void 
testReadOldMetadataCacheFileOverrideCorrection() throws Exception {
 
   @Test
   public void testReadNewMetadataCacheFileOverOldAndNewFiles() throws 
Exception {
-String table = format("dfs.`%s`", new 
Path(getDfsTestTmpSchemaLocation(), 
MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER));
-copyMetaDataCacheToTempReplacingInternalPaths(
+File meta = dirTestWatcher.copyResourceToRoot(
 
"parquet/4203_corrupt_dates/mixed_version_partitioned_metadata.requires_replace.txt",
-MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER, 
Metadata.METADATA_FILENAME);
+Paths.get(MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER, 
Metadata.METADATA_FILENAME).toString());
--- End diff --

I've cleaned this up a bit. Of the two patterns for building paths and 
files I prefer pattern **A** because it is cleaner.

**Pattern A:**
```
myPath.resolve("subDir1")
  .resolve("subDir2")
  .toFile();
```

**Pattern B:**
```
new File(new File(myFile, "subDir1"), "subDir2")
```


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-01 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r148394268
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestExternalSort.java
 ---
@@ -138,34 +141,34 @@ public void testNewColumnsManaged() throws Exception {
 testNewColumns(false);
   }
 
-
   @Test
   public void testNewColumnsLegacy() throws Exception {
 testNewColumns(true);
   }
 
   private void testNewColumns(boolean testLegacy) throws Exception {
 final int record_count = 1;
-String dfs_temp = getDfsTestTmpSchemaLocation();
-System.out.println(dfs_temp);
-File table_dir = new File(dfs_temp, "newColumns");
-table_dir.mkdir();
-try (BufferedOutputStream os = new BufferedOutputStream(new 
FileOutputStream(new File(table_dir, "a.json" {
-  String format = "{ a : %d, b : %d }%n";
-  for (int i = 0; i <= record_count; i += 2) {
-os.write(String.format(format, i, i).getBytes());
-  }
+final String tableDirName = "newColumns";
+
+TableFileBuilder tableA = new TableFileBuilder(Lists.newArrayList("a", 
"b"), Lists.newArrayList("%d", "%d"));
--- End diff --

I've made the JsonFileBuilder, which takes a RowSet and writes it out to a 
file as json.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-01 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r148393893
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.junit.Assert;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+public class BatchUtils {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
+
+  public static Map 
containerToObjects(VectorContainer vectorContainer) {
--- End diff --

Removed BatchUtils


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-31 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r148157850
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.junit.Assert;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+public class BatchUtils {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
+
+  public static Map 
containerToObjects(VectorContainer vectorContainer) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int recordCount = vectorContainer.getRecordCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  ValueVector.Accessor valueVectorAccessor = vectorContainer
+.getValueVector(columnIndex)
+.getValueVector()
+.getAccessor();
+
+  for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
+data.add(valueVectorAccessor.getObject(recordIndex));
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static Map 
hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, 
SelectionVector4 selectionVector4) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int numIndices = selectionVector4.getCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  VectorWrapper vectorWrapper = 
vectorContainer.getValueVector(columnIndex);
+
+  for (int indexIndex = 0; indexIndex < numIndices; indexIndex++) {
+int sv4Index = selectionVector4.get(indexIndex);
+int batchIndex = SelectionVector4.getBatchIndex(sv4Index);
+int recordIndex = SelectionVector4.getRecordIndex(sv4Index);
+
+ValueVector valueVector = 
vectorWrapper.getValueVectors()[batchIndex];
+Object columnValue = 
valueVector.getAccessor().getObject(recordIndex);
+data.add(columnValue);
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static String toString(Map table) {
+if (table.isEmpty()) {
+  return "[ empty table ]";
+}
+
+List columnNames = Lists.newArrayList(table.keySet());
+Collections.sort(columnNames);
+int numRecords = table.get(columnNames.get(0)).size();
+
+StringBuilder sb = new StringBuilder();
+
+{
+  sb.append("[ ");
+  String separator = "";
+
+  for (String columnName : columnNames) {
+sb.append(separator);
+separator = ", ";
+sb.append(columnName);
+  }
+
+  sb.append(" ]\n");
+}
+
+for (int recordIndex = 0; recordIndex < numRecords; recordIndex++) {
+  sb.append("{");
+  String separator = "";
+
+  for (String columnName : columnNames) {
+sb.append(separator);
+separator = ", ";

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-31 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r148157170
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.junit.Assert;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+public class BatchUtils {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
+
+  public static Map 
containerToObjects(VectorContainer vectorContainer) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int recordCount = vectorContainer.getRecordCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  ValueVector.Accessor valueVectorAccessor = vectorContainer
+.getValueVector(columnIndex)
+.getValueVector()
+.getAccessor();
+
+  for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
+data.add(valueVectorAccessor.getObject(recordIndex));
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static Map 
hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, 
SelectionVector4 selectionVector4) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int numIndices = selectionVector4.getCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  VectorWrapper vectorWrapper = 
vectorContainer.getValueVector(columnIndex);
+
+  for (int indexIndex = 0; indexIndex < numIndices; indexIndex++) {
+int sv4Index = selectionVector4.get(indexIndex);
+int batchIndex = SelectionVector4.getBatchIndex(sv4Index);
+int recordIndex = SelectionVector4.getRecordIndex(sv4Index);
+
+ValueVector valueVector = 
vectorWrapper.getValueVectors()[batchIndex];
+Object columnValue = 
valueVector.getAccessor().getObject(recordIndex);
+data.add(columnValue);
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static String toString(Map table) {
+if (table.isEmpty()) {
+  return "[ empty table ]";
+}
+
+List columnNames = Lists.newArrayList(table.keySet());
+Collections.sort(columnNames);
+int numRecords = table.get(columnNames.get(0)).size();
+
+StringBuilder sb = new StringBuilder();
+
+{
+  sb.append("[ ");
+  String separator = "";
+
+  for (String columnName : columnNames) {
+sb.append(separator);
+separator = ", ";
+sb.append(columnName);
+  }
+
+  sb.append(" ]\n");
+}
+
+for (int recordIndex = 0; recordIndex < numRecords; recordIndex++) {
+  sb.append("{");
+  String separator = "";
+
+  for (String columnName : columnNames) {
+sb.append(separator);
+separator = ", ";

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-31 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r148157140
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.junit.Assert;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+public class BatchUtils {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
+
+  public static Map 
containerToObjects(VectorContainer vectorContainer) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int recordCount = vectorContainer.getRecordCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  ValueVector.Accessor valueVectorAccessor = vectorContainer
+.getValueVector(columnIndex)
+.getValueVector()
+.getAccessor();
+
+  for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
+data.add(valueVectorAccessor.getObject(recordIndex));
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static Map 
hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, 
SelectionVector4 selectionVector4) {
--- End diff --

Done


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-31 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r148144235
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BaseDirTestWatcher.java ---
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Charsets;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.util.TestTools;
+import org.apache.drill.exec.util.TestUtilities;
+import org.junit.runner.Description;
+
+import java.io.File;
+import java.io.IOException;
+
+public class BaseDirTestWatcher extends DirTestWatcher {
+  public enum DirType {
+ROOT,
+TEST_TMP;
+  }
+
+  private File tmpDir;
+  private File storeDir;
+  private File dfsTestTmpParentDir;
+  private File dfsTestTmpDir;
+  private File rootDir;
+
+  public BaseDirTestWatcher() {
+super();
+  }
+
+  @Override
+  protected void starting(Description description) {
+super.starting(description);
+
+rootDir = makeSubDir("root");
+tmpDir = new File(rootDir, "tmp");
+storeDir = new File(rootDir, "store");
+dfsTestTmpParentDir = new File(rootDir, "dfsTestTmp");
+
+tmpDir.mkdirs();
+storeDir.mkdirs();
+dfsTestTmpParentDir.mkdirs();
+  }
+
+  public File getTmpDir() {
+return tmpDir;
+  }
+
+  public File getStoreDir() {
+return storeDir;
+  }
+
+  public File getDfsTestTmpParentDir() {
+return dfsTestTmpParentDir;
+  }
+
+  public File getDfsTestTmpDir() {
+return dfsTestTmpDir;
+  }
+
+  public String getDfsTestTmpDirPath() {
+return dfsTestTmpDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public File getRootDir() {
+return rootDir;
+  }
+
+  public String getRootDirPath() {
+return rootDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public void newDfsTestTmpDir() {
+dfsTestTmpDir = 
TestUtilities.createTempDir(BaseTestQuery.dirTestWatcher.getDfsTestTmpParentDir());
+  }
+
+  private File getDir(DirType type) {
+switch (type) {
+  case ROOT:
+return rootDir;
+  case TEST_TMP:
+return dfsTestTmpDir;
+  default:
+throw new IllegalArgumentException(String.format("Unsupported type 
%s", type));
+}
+  }
+
+  public File makeRootSubDir(String relPath) {
+return makeSubDir(relPath, DirType.ROOT);
+  }
+
+  public File makeTestTmpSubDir(String relPath) {
+return makeSubDir(relPath, DirType.TEST_TMP);
+  }
+
+  private File makeSubDir(String relPath, DirType type) {
+File subDir = new File(getDir(type), relPath);
+subDir.mkdirs();
+return subDir;
+  }
+
+  /**
+   * This preserves the relative path of the directory in root
+   * @param relPath
+   * @return
+   */
+  public File copyResourceToRoot(String relPath) {
+return copyTo(relPath, relPath, TestTools.DataType.RESOURCE, 
DirType.ROOT);
+  }
+
+  public File copyFileToRoot(String relPath) {
+return copyTo(relPath, relPath, TestTools.DataType.PROJECT, 
DirType.ROOT);
+  }
+
+  public File copyResourceToRoot(String relPath, String destPath) {
+return copyTo(relPath, destPath, TestTools.DataType.RESOURCE, 
DirType.ROOT);
+  }
+
+  public File copyResourceToTestTmp(String relPath, String destPath) {
+return copyTo(relPath, destPath, TestTools.DataType.RESOURCE, 
DirType.TEST_TMP);
+  }
+
+  private File copyTo(String relPath, String destPath, TestTools.DataType 
dataType, DirType dirType) {
+File file = TestTools.getFile(relPath, dataType);
+
+if (file.isDirectory()) {
+  File subDir = makeSubDir(destPath, dirType);
+  TestTools.copyDirToDest(relPath, subDir,

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-26 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147262655
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ExampleTest.java ---
@@ -59,6 +59,9 @@
 @Ignore
 public class ExampleTest {
 
+  @Rule
+  public final DirTestWatcher dirTestWatcher = new DirTestWatcher();
--- End diff --

I added java doc, and added example of how it's used in secondTest


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-26 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147261203
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -226,43 +217,16 @@ private void createConfig(FixtureBuilder builder) 
throws Exception {
 
   serviceSet = null;
   usesZk = true;
-  isLocal = false;
 } else {
   // Embedded Drillbit.
 
   serviceSet = RemoteServiceSet.getLocalServiceSet();
-  isLocal = true;
 }
   }
 
-  private void startDrillbits(FixtureBuilder builder) throws Exception {
-//// Ensure that Drill uses the log directory determined here rather 
than
-//// it's hard-coded defaults. WIP: seems to be needed some times but
-//// not others.
-//
-//String logDir = null;
-//if (builder.tempDir != null) {
-//  logDir = builder.tempDir.getAbsolutePath();
-//}
-//if (logDir == null) {
-//  logDir = config.getString(ExecConstants.DRILL_TMP_DIR);
-//  if (logDir != null) {
-//logDir += "/drill/log";
-//  }
-//}
-//if (logDir == null) {
-//  logDir = "/tmp/drill";
-//}
-//new File(logDir).mkdirs();
-//System.setProperty("drill.log-dir", logDir);
-
-dfsTestTempDir = makeTempDir("dfs-test");
-
-// Clean up any files that may have been left from the
-// last run.
-
-preserveLocalFiles = builder.preserveLocalFiles;
-removeLocalFiles();
+  private void startDrillbits() throws Exception {
+dfsTestTempDir = new File(getRootDir(), "dfs-test");
+dfsTestTempDir.mkdirs();
--- End diff --

This is possible through the BaseDirTestWatcher. It is configured through 
the BaseDireTestWatcher's constructor. If you pass it false it will not delete 
your directories at the end of each test.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-26 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147258230
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -213,7 +204,7 @@ private void configureZk(FixtureBuilder builder) {
 }
   }
 
-  private void createConfig(FixtureBuilder builder) throws Exception {
+  private void createConfig() throws Exception {
--- End diff --

They should be. The only change I made is that I didn't require passing the 
FixtureBuilder as an arg, because the class is already holding a reference to 
it.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-26 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147257244
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BaseDirTestWatcher.java ---
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Charsets;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.util.TestTools;
+import org.apache.drill.exec.util.TestUtilities;
+import org.junit.runner.Description;
+
+import java.io.File;
+import java.io.IOException;
+
+public class BaseDirTestWatcher extends DirTestWatcher {
+  public enum DirType {
+ROOT,
+TEST_TMP;
+  }
+
+  private File tmpDir;
+  private File storeDir;
+  private File dfsTestTmpParentDir;
+  private File dfsTestTmpDir;
+  private File rootDir;
+
+  public BaseDirTestWatcher() {
+super();
+  }
+
+  @Override
+  protected void starting(Description description) {
+super.starting(description);
+
+rootDir = makeSubDir("root");
+tmpDir = new File(rootDir, "tmp");
+storeDir = new File(rootDir, "store");
+dfsTestTmpParentDir = new File(rootDir, "dfsTestTmp");
+
+tmpDir.mkdirs();
+storeDir.mkdirs();
+dfsTestTmpParentDir.mkdirs();
+  }
+
+  public File getTmpDir() {
+return tmpDir;
+  }
+
+  public File getStoreDir() {
+return storeDir;
+  }
+
+  public File getDfsTestTmpParentDir() {
+return dfsTestTmpParentDir;
+  }
+
+  public File getDfsTestTmpDir() {
+return dfsTestTmpDir;
+  }
+
+  public String getDfsTestTmpDirPath() {
+return dfsTestTmpDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public File getRootDir() {
+return rootDir;
+  }
+
+  public String getRootDirPath() {
+return rootDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public void newDfsTestTmpDir() {
+dfsTestTmpDir = 
TestUtilities.createTempDir(BaseTestQuery.dirTestWatcher.getDfsTestTmpParentDir());
+  }
+
+  private File getDir(DirType type) {
+switch (type) {
+  case ROOT:
+return rootDir;
+  case TEST_TMP:
+return dfsTestTmpDir;
+  default:
+throw new IllegalArgumentException(String.format("Unsupported type 
%s", type));
+}
+  }
+
+  public File makeRootSubDir(String relPath) {
--- End diff --

Fair enough. I would prefer to use File objects to represent concrete files 
or directories. Java.nio.Path objects are useful because they can represent 
relative paths, and provide a bunch of methods for manipulating paths. I'll 
update the methods I introduced to use jva.io.Files and java.nio.Paths


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-26 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147256621
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestSimpleExternalSort.java
 ---
@@ -147,7 +152,7 @@ public void sortOneKeyDescendingExternalSortLegacy() 
throws Throwable {
   }
 
   private void sortOneKeyDescendingExternalSort(boolean testLegacy) throws 
Throwable {
-FixtureBuilder builder = ClusterFixture.builder()
+FixtureBuilder builder = ClusterFixture.builder(dirTestWatcher)
--- End diff --

Is there a case where the tmp directories are optional? The Drillbit will 
always have to be configured with a drill.tmp-dir at minimum, so I think 
it should be required to use a BaseDirTestWatcher with a ClusterFixture. I 
could move passing the dirTestWatcher to the build method instead of the 
constructor. Let me know what you think.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-26 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147255211
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
@@ -19,24 +19,33 @@
 
 import org.apache.drill.categories.OperatorTest;
 import org.apache.drill.categories.SqlTest;
+import org.apache.drill.test.BaseTestQuery;
+import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 @Category({SqlTest.class, OperatorTest.class})
-public class TestAltSortQueries extends BaseTestQuery{
+public class TestAltSortQueries extends BaseTestQuery {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestAltSortQueries.class);
 
+  @BeforeClass
+  public static void setupTestFiles() {
+dirTestWatcher.copyFileToRoot("sample-data/region.parquet");
+dirTestWatcher.copyFileToRoot("sample-data/regionsSF");
+dirTestWatcher.copyFileToRoot("sample-data/nation.parquet");
+  }
+
   @Test
   public void testOrderBy() throws Exception{
 test("select R_REGIONKEY " +
- "from dfs_test.`[WORKING_PATH]/../../sample-data/region.parquet` 
" +
+ "from dfs_test.`/sample-data/region.parquet` " +
--- End diff --

Updated package-info.java and ExampleTest.java



---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-26 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147229671
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java
 ---
@@ -76,43 +78,50 @@
   //- detecting corrupt values must be deferred to actual data 
page reading
   //- one from 1.4, where there is a proper created-by, but the 
corruption is present
   private static final String MIXED_CORRUPTED_AND_CORRECT_DATES_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions";
+  "/parquet/4203_corrupt_dates/mixed_drill_versions";
   // partitioned with 1.2.0, no certain metadata that these were written 
with Drill
   // the value will be checked to see that they look corrupt and they will 
be corrected
   // by default. Users can use the format plugin option 
autoCorrectCorruptDates to disable
   // this behavior if they have foreign parquet files with valid rare date 
values that are
   // in the similar range as Drill's corrupt values
+  private static final String PARTITIONED_1_2_FOLDER = 
"partitioned_with_corruption_4203_1_2";
   private static final String CORRUPTED_PARTITIONED_DATES_1_2_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_2_FOLDER).toString();
   // partitioned with 1.4.0, no certain metadata regarding the date 
corruption status.
   // The same detection approach of the corrupt date values as for the 
files partitioned with 1.2.0
+  private static final String PARTITIONED_1_4_FOLDER = 
"partitioned_with_corruption_4203";
   private static final String CORRUPTED_PARTITIONED_DATES_1_4_0_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_4_FOLDER).toString();
   private static final String PARQUET_DATE_FILE_WITH_NULL_FILLED_COLS =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet";
+  
"/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet";
+  private static final String PARTITIONED_1_9_FOLDER = 
"1_9_0_partitioned_no_corruption";
   private static final String CORRECT_PARTITIONED_DATES_1_9_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_9_FOLDER).toString();
   private static final String VARCHAR_PARTITIONED =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition";
+  "/parquet/4203_corrupt_dates/fewtypes_varcharpartition";
   private static final String DATE_PARTITIONED =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition";
+  "/parquet/4203_corrupt_dates/fewtypes_datepartition";
   private static final String EXCEPTION_WHILE_PARSING_CREATED_BY_META =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/hive1dot2_fewtypes_null";
+  "/parquet/4203_corrupt_dates/hive1dot2_fewtypes_null";
   private static final String CORRECT_DATES_1_6_0_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/correct_dates_and_old_drill_parquet_writer.parquet";
-  private static final String PARTITIONED_1_2_FOLDER = 
"partitioned_with_corruption_4203_1_2";
+  
"/parquet/4203_corrupt_dates/correct_dates_and_old_drill_parquet_writer.parquet";
   private static final String 
MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER = "mixed_partitioned";
 
-
   @BeforeClass
   public static void initFs() throws Exception {
+dirTestWatcher.copyResourceToRoot("/parquet/4203_corrupt_dates");
--- End diff --

fixed


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-26 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147229627
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java
 ---
@@ -76,43 +78,50 @@
   //- detecting corrupt values must be deferred to actual data 
page reading
   //- one from 1.4, where there is a proper created-by, but the 
corruption is present
   private static final String MIXED_CORRUPTED_AND_CORRECT_DATES_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions";
+  "/parquet/4203_corrupt_dates/mixed_drill_versions";
   // partitioned with 1.2.0, no certain metadata that these were written 
with Drill
   // the value will be checked to see that they look corrupt and they will 
be corrected
   // by default. Users can use the format plugin option 
autoCorrectCorruptDates to disable
   // this behavior if they have foreign parquet files with valid rare date 
values that are
   // in the similar range as Drill's corrupt values
+  private static final String PARTITIONED_1_2_FOLDER = 
"partitioned_with_corruption_4203_1_2";
   private static final String CORRUPTED_PARTITIONED_DATES_1_2_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_2_FOLDER).toString();
--- End diff --

Updated all paths to be relative in the tests


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-26 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147229478
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/TestBugFixes.java 
---
@@ -33,8 +35,11 @@
 @Category(UnlikelyTest.class)
 public class TestBugFixes extends BaseTestQuery {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestBugFixes.class);
-  private static final String WORKING_PATH = TestTools.getWorkingPath();
-  private static final String TEST_RES_PATH = WORKING_PATH + 
"/src/test/resources";
+
+  @BeforeClass
+  public static void setupTestFiles() {
+dirTestWatcher.copyResourceToRoot("/bugs/DRILL-4192");
--- End diff --

I have made all the path references relative in the tests now



---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-26 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147229235
  
--- Diff: common/src/test/java/org/apache/drill/test/SubDirTestWatcher.java 
---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import org.junit.rules.TestWatcher;
+import org.junit.runner.Description;
+
+import java.io.File;
+import java.util.List;
+
+public class SubDirTestWatcher extends TestWatcher {
--- End diff --

Done


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-25 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147009092
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TopN/TopNBatchTest.java
 ---
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.TopN;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.Random;
+
+import com.google.common.collect.Lists;
+import org.apache.drill.test.TestBuilder;
+import org.apache.drill.categories.OperatorTest;
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.logical.data.Order;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.compile.ClassBuilder;
+import org.apache.drill.exec.compile.CodeCompiler;
+import org.apache.drill.exec.expr.fn.FunctionImplementationRegistry;
+import org.apache.drill.exec.memory.RootAllocator;
+import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
+import org.apache.drill.exec.pop.PopUnitTestBase;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.ExpandableHyperContainer;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.server.options.OptionSet;
+import org.apache.drill.test.ClientFixture;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.FixtureBuilder;
+import org.apache.drill.test.OperatorFixture;
+import org.apache.drill.test.BatchUtils;
+import org.apache.drill.test.DirTestWatcher;
+import org.apache.drill.test.rowSet.RowSetBuilder;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(OperatorTest.class)
+public class TopNBatchTest extends PopUnitTestBase {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TopNBatchTest.class);
+
+  // Allows you to look at generated code after tests execute
+  @Rule
+  public DirTestWatcher dirTestWatcher = new DirTestWatcher(false);
+
+  /**
+   * Priority queue unit test.
+   * @throws Exception
+   */
+  @Test
+  public void priorityQueueOrderingTest() throws Exception {
+Properties properties = new Properties();
+properties.setProperty(ClassBuilder.CODE_DIR_OPTION, 
dirTestWatcher.getDirPath());
+
+DrillConfig drillConfig = DrillConfig.create(properties);
+OptionSet optionSet = new OperatorFixture.TestOptionSet();
+
+FieldReference expr = FieldReference.getWithQuotedRef("colA");
+Order.Ordering ordering = new 
Order.Ordering(Order.Ordering.ORDER_DESC, expr, Order.Ordering.NULLS_FIRST);
+List orderings = Lists.newArrayList(ordering);
+
+MaterializedField colA = MaterializedField.create("colA", 
Types.required(TypeProtos.MinorType.INT));
+MaterializedField colB = MaterializedField.create("colB", 
Types.required(TypeProtos.MinorType.INT));
+
+List cols = Lists.newArrayList(colA, colB);
+BatchSchema batchSchema = new 
BatchSchema(BatchSchema.SelectionVectorMode.NONE, cols);
+
+try (RootAllocator allocator = new RootAllocator(100_000_000)) {
+  VectorContainer expectedVectors = new RowSetBuilder(allocator, 
batchSchema)
+.add(110, 10)
+.add(109, 9)
+.add(108, 8)
+.add(107, 7)
+.add(106, 6)
+.add(105, 5)
+.add(104, 4)
+.add(103, 3)
+.add(102, 2)
+.add(101, 1)
+.build()
+.container();
+
+  Map expectedTable = 
BatchUtils.containerToObjects(expectedVectors);
+  expectedVectors.clear();
+
+  PriorityQueue queue;

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-25 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147008923
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestCTASPartitionFilter.java ---
@@ -59,48 +58,48 @@ public void withDistribution() throws Exception {
 test("alter session set `planner.slice_target` = 1");
 test("alter session set `store.partition.hash_distribute` = true");
 test("use dfs_test.tmp");
-test(String.format("create table orders_distribution partition by 
(o_orderpriority) as select * from dfs_test.`%s/multilevel/parquet`", 
TEST_RES_PATH));
+test("create table orders_distribution partition by (o_orderpriority) 
as select * from dfs_test.`/multilevel/parquet`");
 String query = "select * from orders_distribution where 
o_orderpriority = '1-URGENT'";
-testExcludeFilter(query, 1, "Filter", 24);
+testExcludeFilter(query, 1, "Filter\\(", 24);
--- End diff --

It is no longer sufficient to match "Filter" because the test class name 
contains "Filter" and the test class name is used to create the tmp directory. 
And the fully qualified path of a queried file is included in the plan. We want 
to only match the Filter steps generated in the plan, not the 
Filters in our file paths. In order to do this I tell it to match 
"Filter(" which corresponds to a filter step in the plan.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-25 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147007945
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
@@ -64,9 +73,9 @@ public void testJoinWithLimit() throws Exception{
 "  nations.N_NAME,\n" +
 "  regions.R_NAME\n" +
 "FROM\n" +
-"  dfs_test.`[WORKING_PATH]/../../sample-data/nation.parquet` 
nations\n" +
+"  dfs.`/sample-data/nation.parquet` nations\n" +
--- End diff --

Just mentioned this above but will repeat here.I have now removed 
**dfs_test** completely. There was no reason for it to be added and it was 
inconsistently being mixed with **dfs**. If you want to query a file on the 
local filesystem that is not on the classpath just using **dfs** will be 
sufficient now.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-25 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147007663
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
@@ -19,24 +19,33 @@
 
 import org.apache.drill.categories.OperatorTest;
 import org.apache.drill.categories.SqlTest;
+import org.apache.drill.test.BaseTestQuery;
+import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 @Category({SqlTest.class, OperatorTest.class})
-public class TestAltSortQueries extends BaseTestQuery{
+public class TestAltSortQueries extends BaseTestQuery {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestAltSortQueries.class);
 
+  @BeforeClass
+  public static void setupTestFiles() {
+dirTestWatcher.copyFileToRoot("sample-data/region.parquet");
+dirTestWatcher.copyFileToRoot("sample-data/regionsSF");
+dirTestWatcher.copyFileToRoot("sample-data/nation.parquet");
+  }
+
   @Test
   public void testOrderBy() throws Exception{
 test("select R_REGIONKEY " +
- "from dfs_test.`[WORKING_PATH]/../../sample-data/region.parquet` 
" +
+ "from dfs_test.`/sample-data/region.parquet` " +
--- End diff --

I have now removed **dfs_test** completely. There was no reason for it to 
be added and it was inconsistently being mixed with **dfs**. The **dfs** 
workspaces are automatically mapped to the correct temp directories for you 
provided that you use **BaseTestQuery** or the **ClusterFixture**. I will 
update **org.apache.drill.test.package-info.java** with the theory of how this 
works and will add a simple example to **ExampleTest.java**


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-19 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145827440
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
@@ -64,9 +73,9 @@ public void testJoinWithLimit() throws Exception{
 "  nations.N_NAME,\n" +
 "  regions.R_NAME\n" +
 "FROM\n" +
-"  dfs_test.`[WORKING_PATH]/../../sample-data/nation.parquet` 
nations\n" +
+"  dfs.`/sample-data/nation.parquet` nations\n" +
--- End diff --

I accidentally changed it to dfs, will switch it back to dfs_test for 
consistency. dfs and dfs_test>/b> are configured to point to the same 
temp directories so they are in fact interchangeable. This configuration 
happens in the openClient method of BaseTestQuery and in the 
configureStoragePlugins method of the ClusterFixture. 

Throughout making this change I was scratching my head as to why 
dfs_test is used instead of dfs. After going through the code I 
don't see any reason as to why dfs_test was created in the first place. I think 
we should actually remove dfs_test and just use dfs everywhere 
for uniformity. I'll go ahead and do that now.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-19 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145815670
  
--- Diff: exec/java-exec/src/test/resources/topN/one_key_sort.json ---
@@ -12,11 +12,11 @@
 pop:"mock-scan",
 url: "http://apache.org;,
 entries:[
-{records: 1000, types: [
+{records: 10, types: [
--- End diff --

The original test was checking to make sure the the Top N results returned 
are ordered. It was never really necessary to generate that much data. The 
benefit of reducing the amount of data generated is that the test runs faster.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-19 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145790506
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -908,6 +908,15 @@ public void generateTestData(int count) {
 }
 
   <#else> <#-- type.width <= 8 -->
+@Override
+public void add(Object value) {
+  int index = accessor.getValueCount();
+  int valueCount = index + 1;
+  setValueCount(valueCount);
+
+  set(index, (${type.javaType}) value);
--- End diff --

All the issues with this make sense. I should have deleted this when I 
switched to using the RowSet classes, but forgot too :). Removing this now.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145575975
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAggNullable.java ---
@@ -21,29 +21,25 @@
 
 import org.apache.drill.categories.OperatorTest;
 import org.apache.drill.common.util.TestTools;
+import org.apache.drill.test.BaseTestQuery;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 @Category(OperatorTest.class)
-public class TestAggNullable extends BaseTestQuery{
+public class TestAggNullable extends BaseTestQuery {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestAggNullable.class);
 
-  static final String WORKING_PATH = TestTools.getWorkingPath();
-  static final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources";
-
   private static void enableAggr(boolean ha, boolean sa) throws Exception {
 
-test(String.format("alter session set `planner.enable_hashagg` = %s", 
ha ? "true":"false"));
-test(String.format("alter session set `planner.enable_streamagg` = 
%s", sa ? "true":"false"));
+test("alter session set `planner.enable_hashagg` = %s", ha);
+test("alter session set `planner.enable_streamagg` = %s", sa);
--- End diff --

We can update these after the other PR goes in then.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145575781
  
--- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
@@ -247,7 +248,7 @@ drill.exec: {
 }
   },
   sort: {
-purge.threshold : 10,
+purge.threshold : 1000,
--- End diff --

Oh my. Thanks for catching that!


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145575588
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector4.java
 ---
@@ -116,6 +116,14 @@ public void clear() {
 }
   }
 
+  public static int getBatchIndex(int sv4Index) {
+return (sv4Index >> 16) & 0x;
+  }
+
+  public static int getRecordIndex(int sv4Index) {
+return (sv4Index) & 0x;
+  }
--- End diff --

Didn't do any performance testing. I mainly intended to use these methods 
for testing purposes. For example I use them in 
org.apache.drill.test.BatchUtils .


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145575310
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -335,20 +336,42 @@ private void purge() throws SchemaChangeException {
 logger.debug("Took {} us to purge", 
watch.elapsed(TimeUnit.MICROSECONDS));
   }
 
-  public PriorityQueue createNewPriorityQueue(FragmentContext context, 
List orderings,
- VectorAccessible 
batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
-  throws ClassTransformationException, IOException, 
SchemaChangeException{
-CodeGenerator cg = 
CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, 
context.getFunctionRegistry(), context.getOptions());
+  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int 
limit)
+throws SchemaChangeException, ClassTransformationException, 
IOException {
+return createNewPriorityQueue(
+  mainMapping, leftMapping, rightMapping, context.getOptionSet(), 
context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
+  config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, 
oContext.getAllocator(), schema.getSelectionVectorMode());
+  }
+
+  public static MappingSet createMainMappingSet() {
+return new MappingSet((String) null, null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createLeftMappingSet() {
+return new MappingSet("leftIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createRightMappingSet() {
+return new MappingSet("rightIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static PriorityQueue createNewPriorityQueue(
+MappingSet mainMapping, MappingSet leftMapping, MappingSet 
rightMapping,
--- End diff --

Agreed. Pritesh has asked me to look at Codegen improvements in the long 
term, so I will work on this down the line.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145574915
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -335,20 +336,42 @@ private void purge() throws SchemaChangeException {
 logger.debug("Took {} us to purge", 
watch.elapsed(TimeUnit.MICROSECONDS));
   }
 
-  public PriorityQueue createNewPriorityQueue(FragmentContext context, 
List orderings,
- VectorAccessible 
batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
-  throws ClassTransformationException, IOException, 
SchemaChangeException{
-CodeGenerator cg = 
CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, 
context.getFunctionRegistry(), context.getOptions());
+  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int 
limit)
+throws SchemaChangeException, ClassTransformationException, 
IOException {
+return createNewPriorityQueue(
+  mainMapping, leftMapping, rightMapping, context.getOptionSet(), 
context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
+  config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, 
oContext.getAllocator(), schema.getSelectionVectorMode());
+  }
+
+  public static MappingSet createMainMappingSet() {
+return new MappingSet((String) null, null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createLeftMappingSet() {
+return new MappingSet("leftIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createRightMappingSet() {
+return new MappingSet("rightIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static PriorityQueue createNewPriorityQueue(
+MappingSet mainMapping, MappingSet leftMapping, MappingSet 
rightMapping,
--- End diff --

We cannot declare the MappingSets as constants because they contain some 
internal state (the mappingIndex field). If they were constants the state could 
become corrupt in the case where multiple queries run and create PriorityQueues 
on the same node simultaneously, so we have to create new MappingSets each time 
we create a new priority queue.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145574026
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java
 ---
@@ -20,22 +20,58 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 public interface PriorityQueue {
-  public void add(FragmentContext context, RecordBatchData batch) throws 
SchemaChangeException;
-  public void init(int limit, FragmentContext context, BufferAllocator 
allocator, boolean hasSv2) throws SchemaChangeException;
-  public void generate() throws SchemaChangeException;
-  public VectorContainer getHyperBatch();
-  public SelectionVector4 getHeapSv4();
-  public SelectionVector4 getFinalSv4();
-  public boolean validate();
-  public void resetQueue(VectorContainer container, SelectionVector4 
vector4) throws SchemaChangeException;
-  public void cleanup();
-
-  public static TemplateClassDefinition TEMPLATE_DEFINITION 
= new TemplateClassDefinition(PriorityQueue.class, 
PriorityQueueTemplate.class);
+  /**
+   * The elements in the given batch are added to the priority queue. Note 
that the priority queue
+   * only retains the top elements that fit within the size specified by 
the {@link #init(int, BufferAllocator, boolean)}
+   * method.
+   * @param batch The batch containing elements we want to add.
+   * @throws SchemaChangeException
+   */
+  void add(RecordBatchData batch) throws SchemaChangeException;
 
+  /**
+   * Initializes the priority queue. This method must be called before any 
other methods on the priority
+   * queue are called.
+   * @param limit The size of the priority queue.
+   * @param allocator The {@link BufferAllocator} to use when creating the 
priority queue.
+   * @param hasSv2 True when incoming batches have v2 selection vectors. 
False otherwise.
+   * @throws SchemaChangeException
+   */
+  void init(int limit, BufferAllocator allocator, boolean hasSv2) throws 
SchemaChangeException;
+
+  /**
+   * This method must be called before fetching the final heap hyper batch 
and final Sv4 vector.
+   * @throws SchemaChangeException
+   */
+  void generate() throws SchemaChangeException;
+
+  /**
+   * Retrieves the final priority queue HyperBatch containing the results. 
Note: this should be called
+   * after {@link #generate()}.
+   * @return The final priority queue HyperBatch containing the results.
+   */
+  VectorContainer getHyperBatch();
+
+  SelectionVector4 getSv4();
+
+  /**
+   * Retrieves the selection vector used to select the elements in the 
priority queue from the hyper batch
+   * provided by the {@link #getHyperBatch()} method. Note: this 
should be called after {@link #generate()}.
+   * @return The selection vector used to select the elements in the 
priority queue.
+   */
+  SelectionVector4 getFinalSv4();
--- End diff --

The code already works that way. There is a config option 
drill.exec.sort.purge.threshold which controls the maximum number of 
batches allowed in the hyper batch. Once the threshold is exceeded the top N 
values are consolidated into a single batch and the process is repeated. There 
is an issue in the case where the limit is large. Ex. 100,000,000 . In this 
case the operator will keep all the records in memory and die. There is a jira 
created to address this issue: 
[https://issues.apache.org/jira/browse/DRILL-5823]


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread ilooner-mapr

Github user ilooner-mapr commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145572262
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java
 ---
@@ -20,22 +20,58 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 public interface PriorityQueue {
-  public void add(FragmentContext context, RecordBatchData batch) throws 
SchemaChangeException;
-  public void init(int limit, FragmentContext context, BufferAllocator 
allocator, boolean hasSv2) throws SchemaChangeException;
-  public void generate() throws SchemaChangeException;
-  public VectorContainer getHyperBatch();
-  public SelectionVector4 getHeapSv4();
-  public SelectionVector4 getFinalSv4();
-  public boolean validate();
-  public void resetQueue(VectorContainer container, SelectionVector4 
vector4) throws SchemaChangeException;
-  public void cleanup();
-
-  public static TemplateClassDefinition TEMPLATE_DEFINITION 
= new TemplateClassDefinition(PriorityQueue.class, 
PriorityQueueTemplate.class);
+  /**
--- End diff --

I think the priority queue is generic and not specific to TopN. I agree we 
should make an effort to consolidate the code generated data structures.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread ilooner-mapr

Github user ilooner-mapr commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145571158
  
--- Diff: common/src/test/java/org/apache/drill/test/DirTestWatcher.java ---
@@ -32,23 +32,50 @@
 public class DirTestWatcher extends TestWatcher {
   private String dirPath;
   private File dir;
+  private boolean deleteDirAtEnd = true;
--- End diff --

Done


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread julianhyde

Github user julianhyde commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145561518
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit drillbit, 
String pluginName,
   public static final String EXPLAIN_PLAN_TEXT = "text";
   public static final String EXPLAIN_PLAN_JSON = "json";
 
-  public static FixtureBuilder builder() {
-FixtureBuilder builder = new FixtureBuilder()
+  public static FixtureBuilder builder(DirTestWatcher dirTestWatcher) {
--- End diff --

Jeez Paul, please start a review rather than making single review comments. 
I just got 39 emails from you, and so did everyone else on dev@drill.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread ilooner-mapr

Github user ilooner-mapr commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145561373
  
--- Diff: common/src/main/java/org/apache/drill/common/util/TestTools.java 
---
@@ -17,15 +17,28 @@
  */
 package org.apache.drill.common.util;
 
+import java.io.File;
+import java.io.IOException;
 import java.nio.file.Paths;
 
+import org.apache.commons.io.FileUtils;
 import org.junit.rules.TestName;
 import org.junit.rules.TestRule;
 import org.junit.rules.Timeout;
 
+import static org.apache.drill.common.util.TestTools.DataType.PROJECT;
+import static org.apache.drill.common.util.TestTools.DataType.RESOURCE;
+
 public class TestTools {
   // private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestTools.class);
 
+  public enum DataType {
--- End diff --

Done changed to FileSource


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145522320
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ExampleTest.java ---
@@ -59,6 +59,9 @@
 @Ignore
 public class ExampleTest {
 
+  @Rule
+  public final DirTestWatcher dirTestWatcher = new DirTestWatcher();
--- End diff --

This is meant as an example, so perhaps include some Javadoc to explain 
what this is and how to use it. Maybe explain how to use the temp directories, 
and include a new example test below that illustrates the main points. People 
are busy, they don't often don't have time to read documentation, but they 
often will copy & paste a good example...


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145516665
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.junit.Assert;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+public class BatchUtils {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
+
+  public static Map 
containerToObjects(VectorContainer vectorContainer) {
--- End diff --

Do we really want to create yet another way to muck with data? Any reason 
we can't use the `RowSet` concepts here? I fear that if we have multiple ways 
to do the same thing, we won't achieve critical mass on any of them and we'll 
have a bunch of half-baked solutions. I'd rather have one fully-baked solution 
we use over and over. Allows new tests to get done easily because the required 
tools already exist. This, in turn, encourages testing.

What are we doing here that `RowSet` can't (yet) do?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145522737
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/TableFileBuilder.java ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import java.io.BufferedOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.List;
+
+public class TableFileBuilder
+{
+  private final List columnNames;
+  private final List columnFormatters;
+  private final List rows = Lists.newArrayList();
+  private final String rowString;
+
+  public TableFileBuilder(List columnNames, List 
columnFormatters) {
--- End diff --

See prior notes. We now have three ways to build a collection of rows...


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145505524
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java
 ---
@@ -76,43 +78,50 @@
   //- detecting corrupt values must be deferred to actual data 
page reading
   //- one from 1.4, where there is a proper created-by, but the 
corruption is present
   private static final String MIXED_CORRUPTED_AND_CORRECT_DATES_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions";
+  "/parquet/4203_corrupt_dates/mixed_drill_versions";
   // partitioned with 1.2.0, no certain metadata that these were written 
with Drill
   // the value will be checked to see that they look corrupt and they will 
be corrected
   // by default. Users can use the format plugin option 
autoCorrectCorruptDates to disable
   // this behavior if they have foreign parquet files with valid rare date 
values that are
   // in the similar range as Drill's corrupt values
+  private static final String PARTITIONED_1_2_FOLDER = 
"partitioned_with_corruption_4203_1_2";
   private static final String CORRUPTED_PARTITIONED_DATES_1_2_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_2_FOLDER).toString();
--- End diff --

Again, do we want an absolute-looking path when we actually have a relative 
path?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145515947
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BaseDirTestWatcher.java ---
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Charsets;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.util.TestTools;
+import org.apache.drill.exec.util.TestUtilities;
+import org.junit.runner.Description;
+
+import java.io.File;
+import java.io.IOException;
+
+public class BaseDirTestWatcher extends DirTestWatcher {
+  public enum DirType {
+ROOT,
+TEST_TMP;
+  }
+
+  private File tmpDir;
+  private File storeDir;
+  private File dfsTestTmpParentDir;
+  private File dfsTestTmpDir;
+  private File rootDir;
+
+  public BaseDirTestWatcher() {
+super();
+  }
+
+  @Override
+  protected void starting(Description description) {
+super.starting(description);
+
+rootDir = makeSubDir("root");
+tmpDir = new File(rootDir, "tmp");
+storeDir = new File(rootDir, "store");
+dfsTestTmpParentDir = new File(rootDir, "dfsTestTmp");
+
+tmpDir.mkdirs();
+storeDir.mkdirs();
+dfsTestTmpParentDir.mkdirs();
+  }
+
+  public File getTmpDir() {
+return tmpDir;
+  }
+
+  public File getStoreDir() {
+return storeDir;
+  }
+
+  public File getDfsTestTmpParentDir() {
+return dfsTestTmpParentDir;
+  }
+
+  public File getDfsTestTmpDir() {
+return dfsTestTmpDir;
+  }
+
+  public String getDfsTestTmpDirPath() {
+return dfsTestTmpDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public File getRootDir() {
+return rootDir;
+  }
+
+  public String getRootDirPath() {
+return rootDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public void newDfsTestTmpDir() {
+dfsTestTmpDir = 
TestUtilities.createTempDir(BaseTestQuery.dirTestWatcher.getDfsTestTmpParentDir());
+  }
+
+  private File getDir(DirType type) {
+switch (type) {
+  case ROOT:
+return rootDir;
+  case TEST_TMP:
+return dfsTestTmpDir;
+  default:
+throw new IllegalArgumentException(String.format("Unsupported type 
%s", type));
+}
+  }
+
+  public File makeRootSubDir(String relPath) {
--- End diff --

Here is where we might have a bit of a discussion. Should we represent 
directory paths as:

* Strings (as done here)
* Java NIO `Path` objects (which Java seems to want to be the new standard)
* Hadoop `Path` objects (since that is what the Drill file system uses)
* Good old fashioned Java `File` objects (which, while old, works well in 
common cases)

I would suggest we use `File` for references to file system paths:

* The Java `Path` has the same name as the Hadoop `Path`, causing endless 
confusion.
* Hadoop `Path` is overkill for simple uses in tests.
* `String` encourages people to write their own code to join paths by 
concatenating.

The use of `File` keeps things simple and standard. Concatenation is simple.

Other solutions are, of course, possible. We could use `String` and 
replicate the methods on `File` or either of the `Path` classes, say. The point 
is, let's pick a standard and use it everywhere.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145516134
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BaseDirTestWatcher.java ---
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Charsets;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.util.TestTools;
+import org.apache.drill.exec.util.TestUtilities;
+import org.junit.runner.Description;
+
+import java.io.File;
+import java.io.IOException;
+
+public class BaseDirTestWatcher extends DirTestWatcher {
+  public enum DirType {
+ROOT,
+TEST_TMP;
+  }
+
+  private File tmpDir;
+  private File storeDir;
+  private File dfsTestTmpParentDir;
+  private File dfsTestTmpDir;
+  private File rootDir;
+
+  public BaseDirTestWatcher() {
+super();
+  }
+
+  @Override
+  protected void starting(Description description) {
+super.starting(description);
+
+rootDir = makeSubDir("root");
+tmpDir = new File(rootDir, "tmp");
+storeDir = new File(rootDir, "store");
+dfsTestTmpParentDir = new File(rootDir, "dfsTestTmp");
+
+tmpDir.mkdirs();
+storeDir.mkdirs();
+dfsTestTmpParentDir.mkdirs();
+  }
+
+  public File getTmpDir() {
+return tmpDir;
+  }
+
+  public File getStoreDir() {
+return storeDir;
+  }
+
+  public File getDfsTestTmpParentDir() {
+return dfsTestTmpParentDir;
+  }
+
+  public File getDfsTestTmpDir() {
+return dfsTestTmpDir;
+  }
+
+  public String getDfsTestTmpDirPath() {
+return dfsTestTmpDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public File getRootDir() {
+return rootDir;
+  }
+
+  public String getRootDirPath() {
+return rootDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public void newDfsTestTmpDir() {
+dfsTestTmpDir = 
TestUtilities.createTempDir(BaseTestQuery.dirTestWatcher.getDfsTestTmpParentDir());
+  }
+
+  private File getDir(DirType type) {
+switch (type) {
+  case ROOT:
+return rootDir;
+  case TEST_TMP:
+return dfsTestTmpDir;
+  default:
+throw new IllegalArgumentException(String.format("Unsupported type 
%s", type));
+}
+  }
+
+  public File makeRootSubDir(String relPath) {
+return makeSubDir(relPath, DirType.ROOT);
+  }
+
+  public File makeTestTmpSubDir(String relPath) {
+return makeSubDir(relPath, DirType.TEST_TMP);
+  }
+
+  private File makeSubDir(String relPath, DirType type) {
+File subDir = new File(getDir(type), relPath);
+subDir.mkdirs();
+return subDir;
+  }
+
+  /**
+   * This preserves the relative path of the directory in root
+   * @param relPath
+   * @return
+   */
+  public File copyResourceToRoot(String relPath) {
+return copyTo(relPath, relPath, TestTools.DataType.RESOURCE, 
DirType.ROOT);
+  }
+
+  public File copyFileToRoot(String relPath) {
+return copyTo(relPath, relPath, TestTools.DataType.PROJECT, 
DirType.ROOT);
+  }
+
+  public File copyResourceToRoot(String relPath, String destPath) {
+return copyTo(relPath, destPath, TestTools.DataType.RESOURCE, 
DirType.ROOT);
+  }
+
+  public File copyResourceToTestTmp(String relPath, String destPath) {
+return copyTo(relPath, destPath, TestTools.DataType.RESOURCE, 
DirType.TEST_TMP);
+  }
+
+  private File copyTo(String relPath, String destPath, TestTools.DataType 
dataType, DirType dirType) {
+File file = TestTools.getFile(relPath, dataType);
+
+if (file.isDirectory()) {
+  File subDir = makeSubDir(destPath, dirType);
+  TestTools.copyDirToDest(relPath, subDir,

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145523267
  
--- Diff: exec/java-exec/src/test/resources/topN/one_key_sort.json ---
@@ -12,11 +12,11 @@
 pop:"mock-scan",
 url: "http://apache.org;,
 entries:[
-{records: 1000, types: [
+{records: 10, types: [
--- End diff --

This change reduces the record count by two orders of magnitude. Do we know 
if this results in testing the same functionality as the original test?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145509549
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestSimpleExternalSort.java
 ---
@@ -147,7 +152,7 @@ public void sortOneKeyDescendingExternalSortLegacy() 
throws Throwable {
   }
 
   private void sortOneKeyDescendingExternalSort(boolean testLegacy) throws 
Throwable {
-FixtureBuilder builder = ClusterFixture.builder()
+FixtureBuilder builder = ClusterFixture.builder(dirTestWatcher)
--- End diff --

Nice, but I might move this to a method:

```
FixtureBuilder builder = ClusterFixture.builder()
.withTempDir(dirTestWatcher)
.configProperty(...) ...
```

Reason: if this has to go into the constructor, then every constructor use 
must change, and tests must create a temp directory even when not needed. But, 
if passed in using a builder method, then we can use it regardless of how we 
create the builder object, and the directory is optional.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145519174
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -213,7 +204,7 @@ private void configureZk(FixtureBuilder builder) {
 }
   }
 
-  private void createConfig(FixtureBuilder builder) throws Exception {
+  private void createConfig() throws Exception {
--- End diff --

There are actually three ZK cases:

* Does not use ZK at all.
* Uses ZK, but it is local (mini ZK started within test JDK)
* Uses ZK, but ZK is remote. (Seldom used in tests, needed for an embedded 
Drillbit.)

Are these semantics preserved with the present changes?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145516858
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.junit.Assert;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+public class BatchUtils {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
+
+  public static Map 
containerToObjects(VectorContainer vectorContainer) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int recordCount = vectorContainer.getRecordCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  ValueVector.Accessor valueVectorAccessor = vectorContainer
+.getValueVector(columnIndex)
+.getValueVector()
+.getAccessor();
+
+  for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
+data.add(valueVectorAccessor.getObject(recordIndex));
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static Map 
hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, 
SelectionVector4 selectionVector4) {
--- End diff --

See `HyperRowSet`


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145521268
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit drillbit, 
String pluginName,
   public static final String EXPLAIN_PLAN_TEXT = "text";
   public static final String EXPLAIN_PLAN_JSON = "json";
 
-  public static FixtureBuilder builder() {
-FixtureBuilder builder = new FixtureBuilder()
+  public static FixtureBuilder builder(DirTestWatcher dirTestWatcher) {
--- End diff --

See note above about passing in the test watcher via a method instead of 
the constructor.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145297931
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TopN/TopNBatchTest.java
 ---
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.TopN;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.Random;
+
+import com.google.common.collect.Lists;
+import org.apache.drill.test.TestBuilder;
+import org.apache.drill.categories.OperatorTest;
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.logical.data.Order;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.compile.ClassBuilder;
+import org.apache.drill.exec.compile.CodeCompiler;
+import org.apache.drill.exec.expr.fn.FunctionImplementationRegistry;
+import org.apache.drill.exec.memory.RootAllocator;
+import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
+import org.apache.drill.exec.pop.PopUnitTestBase;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.ExpandableHyperContainer;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.server.options.OptionSet;
+import org.apache.drill.test.ClientFixture;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.FixtureBuilder;
+import org.apache.drill.test.OperatorFixture;
+import org.apache.drill.test.BatchUtils;
+import org.apache.drill.test.DirTestWatcher;
+import org.apache.drill.test.rowSet.RowSetBuilder;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(OperatorTest.class)
+public class TopNBatchTest extends PopUnitTestBase {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TopNBatchTest.class);
+
+  // Allows you to look at generated code after tests execute
+  @Rule
+  public DirTestWatcher dirTestWatcher = new DirTestWatcher(false);
+
+  /**
+   * Priority queue unit test.
+   * @throws Exception
+   */
+  @Test
+  public void priorityQueueOrderingTest() throws Exception {
+Properties properties = new Properties();
+properties.setProperty(ClassBuilder.CODE_DIR_OPTION, 
dirTestWatcher.getDirPath());
+
+DrillConfig drillConfig = DrillConfig.create(properties);
+OptionSet optionSet = new OperatorFixture.TestOptionSet();
+
+FieldReference expr = FieldReference.getWithQuotedRef("colA");
+Order.Ordering ordering = new 
Order.Ordering(Order.Ordering.ORDER_DESC, expr, Order.Ordering.NULLS_FIRST);
+List orderings = Lists.newArrayList(ordering);
+
+MaterializedField colA = MaterializedField.create("colA", 
Types.required(TypeProtos.MinorType.INT));
+MaterializedField colB = MaterializedField.create("colB", 
Types.required(TypeProtos.MinorType.INT));
+
+List cols = Lists.newArrayList(colA, colB);
+BatchSchema batchSchema = new 
BatchSchema(BatchSchema.SelectionVectorMode.NONE, cols);
+
+try (RootAllocator allocator = new RootAllocator(100_000_000)) {
+  VectorContainer expectedVectors = new RowSetBuilder(allocator, 
batchSchema)
+.add(110, 10)
+.add(109, 9)
+.add(108, 8)
+.add(107, 7)
+.add(106, 6)
+.add(105, 5)
+.add(104, 4)
+.add(103, 3)
+.add(102, 2)
+.add(101, 1)
+.build()
+.container();
+
+  Map expectedTable = 
BatchUtils.containerToObjects(expectedVectors);
+  expectedVectors.clear();
+
+  PriorityQueue queue;

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145520129
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -226,43 +217,16 @@ private void createConfig(FixtureBuilder builder) 
throws Exception {
 
   serviceSet = null;
   usesZk = true;
-  isLocal = false;
 } else {
   // Embedded Drillbit.
 
   serviceSet = RemoteServiceSet.getLocalServiceSet();
-  isLocal = true;
 }
   }
 
-  private void startDrillbits(FixtureBuilder builder) throws Exception {
-//// Ensure that Drill uses the log directory determined here rather 
than
-//// it's hard-coded defaults. WIP: seems to be needed some times but
-//// not others.
-//
-//String logDir = null;
-//if (builder.tempDir != null) {
-//  logDir = builder.tempDir.getAbsolutePath();
-//}
-//if (logDir == null) {
-//  logDir = config.getString(ExecConstants.DRILL_TMP_DIR);
-//  if (logDir != null) {
-//logDir += "/drill/log";
-//  }
-//}
-//if (logDir == null) {
-//  logDir = "/tmp/drill";
-//}
-//new File(logDir).mkdirs();
-//System.setProperty("drill.log-dir", logDir);
-
-dfsTestTempDir = makeTempDir("dfs-test");
-
-// Clean up any files that may have been left from the
-// last run.
-
-preserveLocalFiles = builder.preserveLocalFiles;
-removeLocalFiles();
+  private void startDrillbits() throws Exception {
+dfsTestTempDir = new File(getRootDir(), "dfs-test");
+dfsTestTempDir.mkdirs();
--- End diff --

Is the "preserve local files" behavior maintained? Let me explain.

In automated tests, files are created, used, and discarded.

But, we also use this framework for ad-hoc tests when debugging. In this 
case, we create files, end the test, and examine the files by hand. Examples: 
logs, output files, query profiles, etc.

So, for this test fixture (but not for `BaseTestQuery`), we want two modes:

* Normal mode: as you have implemented.
* "Preserve local files" mode: files are written to `/tmp/drill` (or some 
other fixed, well-known location, can be within the `target` folder) and 
preserved past test exit.

A method on the `FixtureBuilder` identifies which mode is desired. We turn 
on the "preserve" mode for ad-hoc tests.

For more info on this framework, see [this 
description](https://github.com/paul-rogers/drill/wiki/Cluster-Fixture-Framework).


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145508960
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestExternalSort.java
 ---
@@ -138,34 +141,34 @@ public void testNewColumnsManaged() throws Exception {
 testNewColumns(false);
   }
 
-
   @Test
   public void testNewColumnsLegacy() throws Exception {
 testNewColumns(true);
   }
 
   private void testNewColumns(boolean testLegacy) throws Exception {
 final int record_count = 1;
-String dfs_temp = getDfsTestTmpSchemaLocation();
-System.out.println(dfs_temp);
-File table_dir = new File(dfs_temp, "newColumns");
-table_dir.mkdir();
-try (BufferedOutputStream os = new BufferedOutputStream(new 
FileOutputStream(new File(table_dir, "a.json" {
-  String format = "{ a : %d, b : %d }%n";
-  for (int i = 0; i <= record_count; i += 2) {
-os.write(String.format(format, i, i).getBytes());
-  }
+final String tableDirName = "newColumns";
+
+TableFileBuilder tableA = new TableFileBuilder(Lists.newArrayList("a", 
"b"), Lists.newArrayList("%d", "%d"));
--- End diff --

Clever -- but how do we handle types? The original code created JSON of the 
form:

```
{ a : 10, b : 20 }
```

Aside from the fact that the labels are not true JSON (not quoted), the 
mechanism does not work for strings, which need to be quoted. The mechanism 
here assumes numbers. But, of course, if we assume numbers, we don't need the 
second argument, the "%d".

If you look in the `ClusterFixture` code, you'll see a method 
`ClusterFixture.stringify()` that converts an Object to a SQL-compatible 
string. We could create a similar one for JSON.

But, if we take a step back, we are creating JSON. Perfectly fine JSON 
builder classes exist that can be used to build up type-aware JSON and render 
the result as a string. The one limitation is that these classes are for single 
documents; they can't handle the non-standard list-of-objects format that Drill 
users. Still, we can use the per-object classes to build up the list of objects.

Yet another solution is to use the `RowSet` classes which are type aware. 
Yes, they build value vectors, but that is not important here. What is 
important is that they use the full schema power of Drill: including data type, 
repeated, nullable, etc. Then, we just create a RowSet-to-JSON converter. In 
fact, I may have something like that in the "Jig" project I did earlier; I'll 
rummage around and see if I can find it. 


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145556819
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -908,6 +908,15 @@ public void generateTestData(int count) {
 }
 
   <#else> <#-- type.width <= 8 -->
+@Override
+public void add(Object value) {
+  int index = accessor.getValueCount();
+  int valueCount = index + 1;
+  setValueCount(valueCount);
+
+  set(index, (${type.javaType}) value);
--- End diff --

I don't think this will work as you hope it will -- for obscure reasons 
that we've all learned the hard way...

* The accessor does not keep an ongoing count of values. Rather, the value 
count comes from the writer index which is set only at the end of a write.
* This method sets the value count, but doing so is, in general, expensive.
* Vectors in a batch must be kept in sync. This adds a value to one vector, 
but does not coordinate across a set of vectors.
* Not all values can be set by casting. Works for some numeric values, but 
does not handle, say, `add(10)` when the item is a byte or a short because of 
the need for a two-step cast: `(byte)(int) value`.
* Does not handle conversions for date or decimal.

Yes, this is all a colossal pain in the neck. But, it is the reason that 
the `RowSet` classes were created.

We can talk in person about how to move this idea forward effectively.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145518566
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.junit.Assert;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+public class BatchUtils {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
+
+  public static Map 
containerToObjects(VectorContainer vectorContainer) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int recordCount = vectorContainer.getRecordCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  ValueVector.Accessor valueVectorAccessor = vectorContainer
+.getValueVector(columnIndex)
+.getValueVector()
+.getAccessor();
+
+  for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
+data.add(valueVectorAccessor.getObject(recordIndex));
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static Map 
hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, 
SelectionVector4 selectionVector4) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int numIndices = selectionVector4.getCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  VectorWrapper vectorWrapper = 
vectorContainer.getValueVector(columnIndex);
+
+  for (int indexIndex = 0; indexIndex < numIndices; indexIndex++) {
+int sv4Index = selectionVector4.get(indexIndex);
+int batchIndex = SelectionVector4.getBatchIndex(sv4Index);
+int recordIndex = SelectionVector4.getRecordIndex(sv4Index);
+
+ValueVector valueVector = 
vectorWrapper.getValueVectors()[batchIndex];
+Object columnValue = 
valueVector.getAccessor().getObject(recordIndex);
+data.add(columnValue);
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static String toString(Map table) {
+if (table.isEmpty()) {
+  return "[ empty table ]";
+}
+
+List columnNames = Lists.newArrayList(table.keySet());
+Collections.sort(columnNames);
+int numRecords = table.get(columnNames.get(0)).size();
+
+StringBuilder sb = new StringBuilder();
+
+{
+  sb.append("[ ");
+  String separator = "";
+
+  for (String columnName : columnNames) {
+sb.append(separator);
+separator = ", ";
+sb.append(columnName);
+  }
+
+  sb.append(" ]\n");
+}
+
+for (int recordIndex = 0; recordIndex < numRecords; recordIndex++) {
+  sb.append("{");
+  String separator = "";
+
+  for (String columnName : columnNames) {
+sb.append(separator);
+separator = ", ";

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145505747
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java
 ---
@@ -76,43 +78,50 @@
   //- detecting corrupt values must be deferred to actual data 
page reading
   //- one from 1.4, where there is a proper created-by, but the 
corruption is present
   private static final String MIXED_CORRUPTED_AND_CORRECT_DATES_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions";
+  "/parquet/4203_corrupt_dates/mixed_drill_versions";
   // partitioned with 1.2.0, no certain metadata that these were written 
with Drill
   // the value will be checked to see that they look corrupt and they will 
be corrected
   // by default. Users can use the format plugin option 
autoCorrectCorruptDates to disable
   // this behavior if they have foreign parquet files with valid rare date 
values that are
   // in the similar range as Drill's corrupt values
+  private static final String PARTITIONED_1_2_FOLDER = 
"partitioned_with_corruption_4203_1_2";
   private static final String CORRUPTED_PARTITIONED_DATES_1_2_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_2_FOLDER).toString();
   // partitioned with 1.4.0, no certain metadata regarding the date 
corruption status.
   // The same detection approach of the corrupt date values as for the 
files partitioned with 1.2.0
+  private static final String PARTITIONED_1_4_FOLDER = 
"partitioned_with_corruption_4203";
   private static final String CORRUPTED_PARTITIONED_DATES_1_4_0_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_4_FOLDER).toString();
   private static final String PARQUET_DATE_FILE_WITH_NULL_FILLED_COLS =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet";
+  
"/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet";
+  private static final String PARTITIONED_1_9_FOLDER = 
"1_9_0_partitioned_no_corruption";
   private static final String CORRECT_PARTITIONED_DATES_1_9_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_9_FOLDER).toString();
   private static final String VARCHAR_PARTITIONED =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition";
+  "/parquet/4203_corrupt_dates/fewtypes_varcharpartition";
   private static final String DATE_PARTITIONED =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition";
+  "/parquet/4203_corrupt_dates/fewtypes_datepartition";
   private static final String EXCEPTION_WHILE_PARSING_CREATED_BY_META =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/hive1dot2_fewtypes_null";
+  "/parquet/4203_corrupt_dates/hive1dot2_fewtypes_null";
   private static final String CORRECT_DATES_1_6_0_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/correct_dates_and_old_drill_parquet_writer.parquet";
-  private static final String PARTITIONED_1_2_FOLDER = 
"partitioned_with_corruption_4203_1_2";
+  
"/parquet/4203_corrupt_dates/correct_dates_and_old_drill_parquet_writer.parquet";
   private static final String 
MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER = "mixed_partitioned";
 
-
   @BeforeClass
   public static void initFs() throws Exception {
+dirTestWatcher.copyResourceToRoot("/parquet/4203_corrupt_dates");
--- End diff --

Each time I see this I have to remember that this is an absolute path 
relative to some unstated root. That is, although it looks absolute, it is 
actually relative...


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145298677
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/agg/TestHashAggrSpill.java
 ---
@@ -43,146 +44,122 @@
 import static org.junit.Assert.assertTrue;
 
 /**
- *  Test spilling for the Hash Aggr operator (using the mock reader)
+ * Test spilling for the Hash Aggr operator (using the mock reader)
  */
 @Category({SlowTest.class, OperatorTest.class})
 public class TestHashAggrSpill {
 
-private void runAndDump(ClientFixture client, String sql, long 
expectedRows, long spillCycle, long fromSpilledPartitions, long 
toSpilledPartitions) throws Exception {
-String plan = client.queryBuilder().sql(sql).explainJson();
-
-QueryBuilder.QuerySummary summary = 
client.queryBuilder().sql(sql).run();
-if ( expectedRows > 0 ) {
-assertEquals(expectedRows, summary.recordCount());
-}
-// System.out.println(String.format(" \n Results: %,d 
records, %d batches, %,d ms\n ", summary.recordCount(), 
summary.batchCount(), summary.runTimeMs() ) );
-
-//System.out.println("Query ID: " + summary.queryIdString());
-ProfileParser profile = 
client.parseProfile(summary.queryIdString());
-//profile.print();
-List ops = 
profile.getOpsOfType(UserBitShared.CoreOperatorType.HASH_AGGREGATE_VALUE);
-
-assertTrue( ! ops.isEmpty() );
-// check for the first op only
-ProfileParser.OperatorProfile hag0 = ops.get(0);
-long opCycle = 
hag0.getMetric(HashAggTemplate.Metric.SPILL_CYCLE.ordinal());
-assertEquals(spillCycle, opCycle);
-long op_spilled_partitions = 
hag0.getMetric(HashAggTemplate.Metric.SPILLED_PARTITIONS.ordinal());
-assertTrue( op_spilled_partitions >= fromSpilledPartitions && 
op_spilled_partitions <= toSpilledPartitions );
-/* assertEquals(3, ops.size());
-for ( int i = 0; i < ops.size(); i++ ) {
-ProfileParser.OperatorProfile hag = ops.get(i);
-long cycle = 
hag.getMetric(HashAggTemplate.Metric.SPILL_CYCLE.ordinal());
-long num_partitions = 
hag.getMetric(HashAggTemplate.Metric.NUM_PARTITIONS.ordinal());
-long spilled_partitions = 
hag.getMetric(HashAggTemplate.Metric.SPILLED_PARTITIONS.ordinal());
-long mb_spilled = 
hag.getMetric(HashAggTemplate.Metric.SPILL_MB.ordinal());
-System.out.println(String.format("(%d) Spill cycle: %d, num 
partitions: %d, spilled partitions: %d, MB spilled: %d", i,cycle, 
num_partitions, spilled_partitions,
-mb_spilled));
-} */
-}
+  @Rule
+  public final DirTestWatcher dirTestWatcher = new DirTestWatcher();
 
-/**
- *  A template for Hash Aggr spilling tests
- *
- * @throws Exception
- */
-private void testSpill(long maxMem, long numPartitions, long 
minBatches, int maxParallel, boolean fallback ,boolean predict,
-   String sql, long expectedRows, int cycle, int 
fromPart, int toPart) throws Exception {
-LogFixture.LogFixtureBuilder logBuilder = LogFixture.builder()
-  .toConsole()
-  //.logger("org.apache.drill.exec.physical.impl.aggregate", 
Level.INFO)
-  .logger("org.apache.drill", Level.WARN)
-  ;
-
-FixtureBuilder builder = ClusterFixture.builder()
-  .sessionOption(ExecConstants.HASHAGG_MAX_MEMORY_KEY,maxMem)
-  
.sessionOption(ExecConstants.HASHAGG_NUM_PARTITIONS_KEY,numPartitions)
-  
.sessionOption(ExecConstants.HASHAGG_MIN_BATCHES_PER_PARTITION_KEY,minBatches)
-  
.configProperty(ExecConstants.SYS_STORE_PROVIDER_LOCAL_ENABLE_WRITE, false)
-  .sessionOption(PlannerSettings.FORCE_2PHASE_AGGR_KEY,true)
-  // .sessionOption(PlannerSettings.EXCHANGE.getOptionName(), true)
-  .sessionOption(ExecConstants.HASHAGG_FALLBACK_ENABLED_KEY, 
fallback)
-  
.sessionOption(ExecConstants.HASHAGG_USE_MEMORY_PREDICTION_KEY,predict)
-
-  .maxParallelization(maxParallel)
-  .saveProfiles()
-  //.keepLocalFiles()
-  ;
-String sqlStr = sql != null ? sql :  // if null then use this 
default query
-  "SELECT empid_s17, dept_i, branch_i, AVG(salary_i) FROM 
`mock`.`employee_1200K` GROUP BY empid_s17, dept_i, branch_i";
-
-try (LogFixture logs = logBuilder.build();
- ClusterFixture cluster = builder.build();
- ClientFixture client = cluster.clientFixture()) {
-runAndDump(client, sqlStr, expectedRows, cycle, 
fromPart,toPart);

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145292655
  
--- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
@@ -247,7 +248,7 @@ drill.exec: {
 }
   },
   sort: {
-purge.threshold : 10,
+purge.threshold : 1000,
--- End diff --

Are you reverting your own change here?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144943191
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/CodeGenerator.java ---
@@ -90,12 +89,11 @@
   private String generatedCode;
   private String generifiedCode;
 
-  CodeGenerator(TemplateClassDefinition definition, 
FunctionImplementationRegistry funcRegistry, OptionSet optionManager) {
-this(ClassGenerator.getDefaultMapping(), definition, funcRegistry, 
optionManager);
+  CodeGenerator(TemplateClassDefinition definition, OptionSet 
optionManager) {
+this(ClassGenerator.getDefaultMapping(), definition, optionManager);
--- End diff --

Good analysis. It was a real pain to lug the function registry around. 
Presumably the registry is used elsewhere in code gen; perhaps in the 
expression materializer.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145292776
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAggNullable.java ---
@@ -21,29 +21,25 @@
 
 import org.apache.drill.categories.OperatorTest;
 import org.apache.drill.common.util.TestTools;
+import org.apache.drill.test.BaseTestQuery;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 @Category(OperatorTest.class)
-public class TestAggNullable extends BaseTestQuery{
+public class TestAggNullable extends BaseTestQuery {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestAggNullable.class);
 
-  static final String WORKING_PATH = TestTools.getWorkingPath();
-  static final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources";
-
   private static void enableAggr(boolean ha, boolean sa) throws Exception {
 
-test(String.format("alter session set `planner.enable_hashagg` = %s", 
ha ? "true":"false"));
-test(String.format("alter session set `planner.enable_streamagg` = 
%s", sa ? "true":"false"));
+test("alter session set `planner.enable_hashagg` = %s", ha);
+test("alter session set `planner.enable_streamagg` = %s", sa);
--- End diff --

I was about to suggest using the functions created for this purpose. But, 
then I realized those changes are in a PR that has not yet been reviewed...


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145294641
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/TestBugFixes.java 
---
@@ -33,8 +35,11 @@
 @Category(UnlikelyTest.class)
 public class TestBugFixes extends BaseTestQuery {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestBugFixes.class);
-  private static final String WORKING_PATH = TestTools.getWorkingPath();
-  private static final String TEST_RES_PATH = WORKING_PATH + 
"/src/test/resources";
+
+  @BeforeClass
+  public static void setupTestFiles() {
+dirTestWatcher.copyResourceToRoot("/bugs/DRILL-4192");
--- End diff --

Small nit. Presumably `/bugs/DRILL-4192` is a relative path. Should we 
express it in the normal Unix fashion as `bugs/DRILL-4192`? That way, in 
time-honored fashion, a leading slash tells us the path is absolute.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144943882
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java
 ---
@@ -20,22 +20,58 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 public interface PriorityQueue {
-  public void add(FragmentContext context, RecordBatchData batch) throws 
SchemaChangeException;
-  public void init(int limit, FragmentContext context, BufferAllocator 
allocator, boolean hasSv2) throws SchemaChangeException;
-  public void generate() throws SchemaChangeException;
-  public VectorContainer getHyperBatch();
-  public SelectionVector4 getHeapSv4();
-  public SelectionVector4 getFinalSv4();
-  public boolean validate();
-  public void resetQueue(VectorContainer container, SelectionVector4 
vector4) throws SchemaChangeException;
-  public void cleanup();
-
-  public static TemplateClassDefinition TEMPLATE_DEFINITION 
= new TemplateClassDefinition(PriorityQueue.class, 
PriorityQueueTemplate.class);
+  /**
+   * The elements in the given batch are added to the priority queue. Note 
that the priority queue
+   * only retains the top elements that fit within the size specified by 
the {@link #init(int, BufferAllocator, boolean)}
+   * method.
+   * @param batch The batch containing elements we want to add.
+   * @throws SchemaChangeException
+   */
+  void add(RecordBatchData batch) throws SchemaChangeException;
 
+  /**
+   * Initializes the priority queue. This method must be called before any 
other methods on the priority
+   * queue are called.
+   * @param limit The size of the priority queue.
+   * @param allocator The {@link BufferAllocator} to use when creating the 
priority queue.
+   * @param hasSv2 True when incoming batches have v2 selection vectors. 
False otherwise.
--- End diff --

Actutally, the name "sv2" refers to a 2 -byte Selection vector. Welcome to 
Drill's cryptic naming...


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144946834
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -335,20 +336,42 @@ private void purge() throws SchemaChangeException {
 logger.debug("Took {} us to purge", 
watch.elapsed(TimeUnit.MICROSECONDS));
   }
 
-  public PriorityQueue createNewPriorityQueue(FragmentContext context, 
List orderings,
- VectorAccessible 
batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
-  throws ClassTransformationException, IOException, 
SchemaChangeException{
-CodeGenerator cg = 
CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, 
context.getFunctionRegistry(), context.getOptions());
+  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int 
limit)
+throws SchemaChangeException, ClassTransformationException, 
IOException {
+return createNewPriorityQueue(
+  mainMapping, leftMapping, rightMapping, context.getOptionSet(), 
context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
+  config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, 
oContext.getAllocator(), schema.getSelectionVectorMode());
+  }
+
+  public static MappingSet createMainMappingSet() {
+return new MappingSet((String) null, null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createLeftMappingSet() {
+return new MappingSet("leftIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createRightMappingSet() {
+return new MappingSet("rightIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static PriorityQueue createNewPriorityQueue(
+MappingSet mainMapping, MappingSet leftMapping, MappingSet 
rightMapping,
--- End diff --

Do we ever use more than one mapping set? If not, can we just refer to the 
constants inside this function?

I see that a simpler form exists. Just curious why we also need the complex 
form. For better testing?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145517411
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.junit.Assert;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+public class BatchUtils {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
+
+  public static Map 
containerToObjects(VectorContainer vectorContainer) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int recordCount = vectorContainer.getRecordCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  ValueVector.Accessor valueVectorAccessor = vectorContainer
+.getValueVector(columnIndex)
+.getValueVector()
+.getAccessor();
+
+  for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
+data.add(valueVectorAccessor.getObject(recordIndex));
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static Map 
hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, 
SelectionVector4 selectionVector4) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int numIndices = selectionVector4.getCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  VectorWrapper vectorWrapper = 
vectorContainer.getValueVector(columnIndex);
+
+  for (int indexIndex = 0; indexIndex < numIndices; indexIndex++) {
+int sv4Index = selectionVector4.get(indexIndex);
+int batchIndex = SelectionVector4.getBatchIndex(sv4Index);
+int recordIndex = SelectionVector4.getRecordIndex(sv4Index);
+
+ValueVector valueVector = 
vectorWrapper.getValueVectors()[batchIndex];
+Object columnValue = 
valueVector.getAccessor().getObject(recordIndex);
+data.add(columnValue);
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static String toString(Map table) {
+if (table.isEmpty()) {
+  return "[ empty table ]";
+}
+
+List columnNames = Lists.newArrayList(table.keySet());
+Collections.sort(columnNames);
+int numRecords = table.get(columnNames.get(0)).size();
+
+StringBuilder sb = new StringBuilder();
+
+{
+  sb.append("[ ");
+  String separator = "";
+
+  for (String columnName : columnNames) {
+sb.append(separator);
+separator = ", ";
+sb.append(columnName);
+  }
+
+  sb.append(" ]\n");
+}
+
+for (int recordIndex = 0; recordIndex < numRecords; recordIndex++) {
+  sb.append("{");
+  String separator = "";
+
+  for (String columnName : columnNames) {
+sb.append(separator);
+separator = ", ";

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145292056
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -335,20 +336,42 @@ private void purge() throws SchemaChangeException {
 logger.debug("Took {} us to purge", 
watch.elapsed(TimeUnit.MICROSECONDS));
   }
 
-  public PriorityQueue createNewPriorityQueue(FragmentContext context, 
List orderings,
- VectorAccessible 
batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
-  throws ClassTransformationException, IOException, 
SchemaChangeException{
-CodeGenerator cg = 
CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, 
context.getFunctionRegistry(), context.getOptions());
+  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int 
limit)
+throws SchemaChangeException, ClassTransformationException, 
IOException {
+return createNewPriorityQueue(
+  mainMapping, leftMapping, rightMapping, context.getOptionSet(), 
context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
+  config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, 
oContext.getAllocator(), schema.getSelectionVectorMode());
+  }
+
+  public static MappingSet createMainMappingSet() {
+return new MappingSet((String) null, null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createLeftMappingSet() {
+return new MappingSet("leftIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createRightMappingSet() {
+return new MappingSet("rightIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static PriorityQueue createNewPriorityQueue(
+MappingSet mainMapping, MappingSet leftMapping, MappingSet 
rightMapping,
--- End diff --

One thought worth sharing here. Today, our code cache uses source code as 
the key to look for an existing compiled version of a class. But, this is 
inefficient: we have to generate the source code, transform it, and store two 
copies to act as keys.

Better would be to have a "class definition" object that holds all 
parameters needed to generate the code. Store these in the cache. Likely, these 
will be much smaller than the generated code. Using these avoids the need to 
generate the code to see if we already have a copy of that code.

The class definition idea works as long as the definition is a closed set: 
everything needed to generate the code is in that one definition object.

Your function here is close: it passes in a closed set of items. The open 
bits are the expression and system/session options. For options, we can copy 
the few values we need into our definition. The expression might require a bit 
more thought to capture.

So, if we can evolve your idea further, we can get a potential large 
per-schema performance improvement by avoiding code generation when doing a 
repeat query.

Too much to fix here; but something to think about moving forward.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145294064
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
@@ -19,24 +19,33 @@
 
 import org.apache.drill.categories.OperatorTest;
 import org.apache.drill.categories.SqlTest;
+import org.apache.drill.test.BaseTestQuery;
+import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 @Category({SqlTest.class, OperatorTest.class})
-public class TestAltSortQueries extends BaseTestQuery{
+public class TestAltSortQueries extends BaseTestQuery {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestAltSortQueries.class);
 
+  @BeforeClass
+  public static void setupTestFiles() {
+dirTestWatcher.copyFileToRoot("sample-data/region.parquet");
+dirTestWatcher.copyFileToRoot("sample-data/regionsSF");
+dirTestWatcher.copyFileToRoot("sample-data/nation.parquet");
+  }
+
   @Test
   public void testOrderBy() throws Exception{
 test("select R_REGIONKEY " +
- "from dfs_test.`[WORKING_PATH]/../../sample-data/region.parquet` 
" +
+ "from dfs_test.`/sample-data/region.parquet` " +
--- End diff --

I like this improvement. Here we created files just for this one test. You 
are using a workspace to point to those files. Do so avoids the need to doctor 
up the strings.

Is `dfs_test` setup to point to the per-query root directory? Seems to be.

Since I have to ask these questions, would be good to explain in the 
`...drill.test` `package-info.java` file how to set up temporary files and the 
mapping of workspaces to these directories. That will allow others to readily 
use the mechanism. Otherwise, people will guess, and will go back to using what 
is familiar. See `ExampleTest.java` for an example of a "starter" file to help 
people get started.

In fact, maybe we would add a new example test to that file that shows how 
to create a test-specific data file and query that file. (That would sure help 
me...)


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144944018
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java
 ---
@@ -20,22 +20,58 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 public interface PriorityQueue {
-  public void add(FragmentContext context, RecordBatchData batch) throws 
SchemaChangeException;
-  public void init(int limit, FragmentContext context, BufferAllocator 
allocator, boolean hasSv2) throws SchemaChangeException;
-  public void generate() throws SchemaChangeException;
-  public VectorContainer getHyperBatch();
-  public SelectionVector4 getHeapSv4();
-  public SelectionVector4 getFinalSv4();
-  public boolean validate();
-  public void resetQueue(VectorContainer container, SelectionVector4 
vector4) throws SchemaChangeException;
-  public void cleanup();
-
-  public static TemplateClassDefinition TEMPLATE_DEFINITION 
= new TemplateClassDefinition(PriorityQueue.class, 
PriorityQueueTemplate.class);
+  /**
+   * The elements in the given batch are added to the priority queue. Note 
that the priority queue
+   * only retains the top elements that fit within the size specified by 
the {@link #init(int, BufferAllocator, boolean)}
+   * method.
+   * @param batch The batch containing elements we want to add.
+   * @throws SchemaChangeException
+   */
+  void add(RecordBatchData batch) throws SchemaChangeException;
 
+  /**
+   * Initializes the priority queue. This method must be called before any 
other methods on the priority
+   * queue are called.
+   * @param limit The size of the priority queue.
+   * @param allocator The {@link BufferAllocator} to use when creating the 
priority queue.
+   * @param hasSv2 True when incoming batches have v2 selection vectors. 
False otherwise.
+   * @throws SchemaChangeException
+   */
+  void init(int limit, BufferAllocator allocator, boolean hasSv2) throws 
SchemaChangeException;
+
+  /**
+   * This method must be called before fetching the final heap hyper batch 
and final Sv4 vector.
--- End diff --

Ah, I see. Thanks.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145293197
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAggNullable.java ---
@@ -61,10 +57,8 @@ public void testHashAggNullableColumns() throws 
Exception {
 
   @Test  // StreamingAgg on nullable columns
   public void testStreamAggNullableColumns() throws Exception {
-String query1 = String.format("select t2.b2 from 
dfs_test.`%s/jsoninput/nullable2.json` t2 " +
-" group by t2.b2", TEST_RES_PATH);
-String query2 = String.format("select t2.a2, t2.b2 from 
dfs_test.`%s/jsoninput/nullable2.json` t2 " +
-" group by t2.a2, t2.b2", TEST_RES_PATH);
+String query1 = "select t2.b2 from cp.`/jsoninput/nullable2.json` t2 
group by t2.b2";
+String query2 = "select t2.a2, t2.b2 from 
cp.`/jsoninput/nullable2.json` t2 group by t2.a2, t2.b2";
--- End diff --

Very nice improvement; this has been bugging me for months. Why doctor up 
the query string when we have perfectly fine workspace mechanism.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145292505
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector4.java
 ---
@@ -116,6 +116,14 @@ public void clear() {
 }
   }
 
+  public static int getBatchIndex(int sv4Index) {
+return (sv4Index >> 16) & 0x;
+  }
+
+  public static int getRecordIndex(int sv4Index) {
+return (sv4Index) & 0x;
+  }
--- End diff --

I see you gave into the temptation to wrap these magic expressions in 
methods rather than repeating them over and over in generated code. The worry 
probably was that the overhead of a method call per value would be expensive. 
Did you do a bit of performance testing to demonstrate that Java inlines these 
methods so we get the same performance with this code as the original?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145297229
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java ---
@@ -559,7 +563,7 @@ public void testRankWithGroupBy() throws Exception {
 
   @Test // DRILL-3404
   public void testWindowSumAggIsNotNull() throws Exception {
-String query = String.format("select count(*) cnt from (select sum ( 
c1 ) over ( partition by c2 order by c1 asc nulls first ) w_sum from 
dfs.`%s/window/table_with_nulls.parquet` ) sub_query where w_sum is not null", 
TEST_RES_PATH);
+String query = "select count(*) cnt from (select sum ( c1 ) over ( 
partition by c2 order by c1 asc nulls first ) w_sum from 
dfs.`/window/table_with_nulls.parquet` ) sub_query where w_sum is not null";
--- End diff --

Right about now I have to give you a huge THANKS! This is a huge amount of 
clean-up you've done. These are really good improvements. Really appreciate the 
tedious work to get this all done.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144945381
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java
 ---
@@ -20,22 +20,58 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 public interface PriorityQueue {
-  public void add(FragmentContext context, RecordBatchData batch) throws 
SchemaChangeException;
-  public void init(int limit, FragmentContext context, BufferAllocator 
allocator, boolean hasSv2) throws SchemaChangeException;
-  public void generate() throws SchemaChangeException;
-  public VectorContainer getHyperBatch();
-  public SelectionVector4 getHeapSv4();
-  public SelectionVector4 getFinalSv4();
-  public boolean validate();
-  public void resetQueue(VectorContainer container, SelectionVector4 
vector4) throws SchemaChangeException;
-  public void cleanup();
-
-  public static TemplateClassDefinition TEMPLATE_DEFINITION 
= new TemplateClassDefinition(PriorityQueue.class, 
PriorityQueueTemplate.class);
+  /**
+   * The elements in the given batch are added to the priority queue. Note 
that the priority queue
+   * only retains the top elements that fit within the size specified by 
the {@link #init(int, BufferAllocator, boolean)}
+   * method.
+   * @param batch The batch containing elements we want to add.
+   * @throws SchemaChangeException
+   */
+  void add(RecordBatchData batch) throws SchemaChangeException;
 
+  /**
+   * Initializes the priority queue. This method must be called before any 
other methods on the priority
+   * queue are called.
+   * @param limit The size of the priority queue.
+   * @param allocator The {@link BufferAllocator} to use when creating the 
priority queue.
+   * @param hasSv2 True when incoming batches have v2 selection vectors. 
False otherwise.
+   * @throws SchemaChangeException
+   */
+  void init(int limit, BufferAllocator allocator, boolean hasSv2) throws 
SchemaChangeException;
+
+  /**
+   * This method must be called before fetching the final heap hyper batch 
and final Sv4 vector.
+   * @throws SchemaChangeException
+   */
+  void generate() throws SchemaChangeException;
+
+  /**
+   * Retrieves the final priority queue HyperBatch containing the results. 
Note: this should be called
+   * after {@link #generate()}.
+   * @return The final priority queue HyperBatch containing the results.
+   */
+  VectorContainer getHyperBatch();
+
+  SelectionVector4 getSv4();
+
+  /**
+   * Retrieves the selection vector used to select the elements in the 
priority queue from the hyper batch
+   * provided by the {@link #getHyperBatch()} method. Note: this 
should be called after {@link #generate()}.
+   * @return The selection vector used to select the elements in the 
priority queue.
+   */
+  SelectionVector4 getFinalSv4();
--- End diff --

Taking a step back... Suppose I do a top 100. Also, suppose my batches are 
big, say 100 MB each. Also, suppose I have fiendishly organized by data so that 
the top 100 values are the first values of 100 batches.

If the priority queue uses an SV4, then it means that the queue holds onto 
the entire batch that holds the set of top values. In my case, since the top 
100 values occur across 100 batches, I'm holding onto 100 of batches of 100 MB 
each for a total of 10 GB of memory.

Is this how it works?

Do we need to think about compaction at some point? Once we have buffered, 
say, five batches, we copy the values to a new, much smaller, batch and discard 
the inputs. We repeat this over and over to keep the number of buffered batches 
under control.

Once we control batch size (in bytes; we already control records), we'll 
have the means to set a memory budget for TopN, then use consolidation to stay 
within that budget.

Or, is this how the code already works?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145294835
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestCTASPartitionFilter.java ---
@@ -59,48 +58,48 @@ public void withDistribution() throws Exception {
 test("alter session set `planner.slice_target` = 1");
 test("alter session set `store.partition.hash_distribute` = true");
 test("use dfs_test.tmp");
-test(String.format("create table orders_distribution partition by 
(o_orderpriority) as select * from dfs_test.`%s/multilevel/parquet`", 
TEST_RES_PATH));
+test("create table orders_distribution partition by (o_orderpriority) 
as select * from dfs_test.`/multilevel/parquet`");
 String query = "select * from orders_distribution where 
o_orderpriority = '1-URGENT'";
-testExcludeFilter(query, 1, "Filter", 24);
+testExcludeFilter(query, 1, "Filter\\(", 24);
--- End diff --

Why the extra characters?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144936994
  
--- Diff: common/src/test/java/org/apache/drill/test/DirTestWatcher.java ---
@@ -32,23 +32,50 @@
 public class DirTestWatcher extends TestWatcher {
   private String dirPath;
   private File dir;
+  private boolean deleteDirAtEnd = true;
--- End diff --

Thanks for the Javadoc added previously. Perhaps add a section on how to 
use this class. If I need a test directory, do I create an instance of this 
class? Or, does JUnit do it for me? If done automagically, how do I get a copy 
of the directory?

If this is explained in JUnit, perhaps just reference that information. 
Even better would be a short example:

> To get a test directory:
```
// Do something here
File myDir = // do something
```


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144936263
  
--- Diff: common/src/test/java/org/apache/drill/test/SubDirTestWatcher.java 
---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import org.junit.rules.TestWatcher;
+import org.junit.runner.Description;
+
+import java.io.File;
+import java.util.List;
+
+public class SubDirTestWatcher extends TestWatcher {
--- End diff --

Maybe a Javadoc comment to explain what this does? From the name, it must 
have something to do with a temporary sub dir... But, how does this relate to 
the DirTestWatcher?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-18 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144935089
  
--- Diff: common/src/main/java/org/apache/drill/common/util/TestTools.java 
---
@@ -17,15 +17,28 @@
  */
 package org.apache.drill.common.util;
 
+import java.io.File;
+import java.io.IOException;
 import java.nio.file.Paths;
 
+import org.apache.commons.io.FileUtils;
 import org.junit.rules.TestName;
 import org.junit.rules.TestRule;
 import org.junit.rules.Timeout;
 
+import static org.apache.drill.common.util.TestTools.DataType.PROJECT;
+import static org.apache.drill.common.util.TestTools.DataType.RESOURCE;
+
 public class TestTools {
   // private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestTools.class);
 
+  public enum DataType {
--- End diff --

`ResourceType`? `FileType`? `FileSource`? `ResourceScope`?

This is a symbolic reference to a resource root, not really a kind of a 
data item...


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-12 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144205146
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -335,20 +333,32 @@ private void purge() throws SchemaChangeException {
 logger.debug("Took {} us to purge", 
watch.elapsed(TimeUnit.MICROSECONDS));
   }
 
-  public PriorityQueue createNewPriorityQueue(FragmentContext context, 
List orderings,
- VectorAccessible 
batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
-  throws ClassTransformationException, IOException, 
SchemaChangeException{
-CodeGenerator cg = 
CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, 
context.getFunctionRegistry(), context.getOptions());
+  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int 
limit)
+throws SchemaChangeException, ClassTransformationException, 
IOException {
+return createNewPriorityQueue(context.getOptionSet(), 
context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
+  config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, 
oContext.getAllocator(), schema.getSelectionVectorMode());
+  }
+
+  public static PriorityQueue createNewPriorityQueue(
+OptionSet optionSet, FunctionLookupContext functionLookupContext, 
CodeCompiler codeCompiler,
+List orderings, VectorAccessible batch, boolean 
unionTypeEnabled, boolean codegenDump,
+int limit, BufferAllocator allocator, SelectionVectorMode mode)
+  throws ClassTransformationException, IOException, 
SchemaChangeException {
+final MappingSet mainMapping = new MappingSet((String) null, null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+final MappingSet leftMapping = new MappingSet("leftIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+final MappingSet rightMapping = new MappingSet("rightIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
--- End diff --

Reverted it back to the way it was


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-11 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144195098
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -95,7 +91,9 @@ public TopNBatch(TopN popConfig, FragmentContext context, 
RecordBatch incoming)
 super(popConfig, context);
 this.incoming = incoming;
 this.config = popConfig;
-batchPurgeThreshold = 
context.getConfig().getInt(ExecConstants.BATCH_PURGE_THRESHOLD);
+DrillConfig drillConfig = context.getConfig();
+batchPurgeThreshold = 
drillConfig.getInt(ExecConstants.BATCH_PURGE_THRESHOLD);
+codegenDump = 
drillConfig.getBoolean(CodeCompiler.ENABLE_SAVE_CODE_FOR_DEBUG);
--- End diff --

I will rename the option to drill.debug.codegen.TopN


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-11 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144194940
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
@@ -110,6 +109,11 @@ public CodeGenCompiler(final DrillConfig config, final 
OptionSet optionManager)
   public static final String DISABLE_CACHE_CONFIG = COMPILE_BASE + 
".disable_cache";
 
   /**
+   * Enables saving generated code for debugging
+   */
+  public static final String ENABLE_SAVE_CODE_FOR_DEBUG = COMPILE_BASE + 
".codegen.dump";
--- End diff --

This is the name of the option which holds the true/false flag to enable 
saving code for debugging. The destination directory is governed by the 
ClassBuilder.CODE_DIR_OPTION property.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-11 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144194617
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java
 ---
@@ -20,22 +20,58 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 public interface PriorityQueue {
-  public void add(FragmentContext context, RecordBatchData batch) throws 
SchemaChangeException;
-  public void init(int limit, FragmentContext context, BufferAllocator 
allocator, boolean hasSv2) throws SchemaChangeException;
-  public void generate() throws SchemaChangeException;
-  public VectorContainer getHyperBatch();
-  public SelectionVector4 getHeapSv4();
-  public SelectionVector4 getFinalSv4();
-  public boolean validate();
-  public void resetQueue(VectorContainer container, SelectionVector4 
vector4) throws SchemaChangeException;
-  public void cleanup();
-
-  public static TemplateClassDefinition TEMPLATE_DEFINITION 
= new TemplateClassDefinition(PriorityQueue.class, 
PriorityQueueTemplate.class);
+  /**
+   * The elements in the given batch are added to the priority queue. Note 
that the priority queue
+   * only retains the top elements that fit within the size specified by 
the {@link #init(int, BufferAllocator, boolean)}
+   * method.
+   * @param batch The batch containing elements we want to add.
+   * @throws SchemaChangeException
+   */
+  void add(RecordBatchData batch) throws SchemaChangeException;
 
+  /**
+   * Initializes the priority queue. This method must be called before any 
other methods on the priority
+   * queue are called.
+   * @param limit The size of the priority queue.
+   * @param allocator The {@link BufferAllocator} to use when creating the 
priority queue.
+   * @param hasSv2 True when incoming batches have v2 selection vectors. 
False otherwise.
+   * @throws SchemaChangeException
+   */
+  void init(int limit, BufferAllocator allocator, boolean hasSv2) throws 
SchemaChangeException;
+
+  /**
+   * This method must be called before fetching the final heap hyper batch 
and final Sv4 vector.
--- End diff --

Old habit, I'm used to PriorityQueues being called Heaps. I agree it's 
confusing in this context I will fix it.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-11 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144194241
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/CodeGenerator.java ---
@@ -90,12 +89,11 @@
   private String generatedCode;
   private String generifiedCode;
 
-  CodeGenerator(TemplateClassDefinition definition, 
FunctionImplementationRegistry funcRegistry, OptionSet optionManager) {
-this(ClassGenerator.getDefaultMapping(), definition, funcRegistry, 
optionManager);
+  CodeGenerator(TemplateClassDefinition definition, OptionSet 
optionManager) {
+this(ClassGenerator.getDefaultMapping(), definition, optionManager);
--- End diff --

I traced through the code and asked IntelliJ where the variable is used. 
This is what I found:

1. CodeGenerator accepts it as an argument to its constructor
1. It is then passed to the constructor of the EvaluationVisitor
1. A private field is set with the value in the constructor of 
EvaluationVisitor
1. This private field of EvaluationVisitor is then unused.

If your not convinced let's walk through the code when I'm in the office 
next week.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-11 Thread ilooner

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144193392
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
@@ -110,6 +109,11 @@ public CodeGenCompiler(final DrillConfig config, final 
OptionSet optionManager)
   public static final String DISABLE_CACHE_CONFIG = COMPILE_BASE + 
".disable_cache";
 
   /**
+   * Enables saving generated code for debugging
+   */
+  public static final String ENABLE_SAVE_CODE_FOR_DEBUG = COMPILE_BASE + 
".codegen.dump";
--- End diff --

The directory that generated code is dumped into is determined by the 
ClassBuilder.CODE_DIR_OPTION property. The DirTestWatcher createw a directory 
of the following form:

```
/target//
```

So if the ClassBuilder.CODE_DIR_OPTION property is configured to use the 
directory generated by the DirTestWatcher, the code is dumped into the same 
well defined location every test run. Also the generated class file has names 
like:

```
Gen.java
```


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-11 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144147623
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -335,20 +333,32 @@ private void purge() throws SchemaChangeException {
 logger.debug("Took {} us to purge", 
watch.elapsed(TimeUnit.MICROSECONDS));
   }
 
-  public PriorityQueue createNewPriorityQueue(FragmentContext context, 
List orderings,
- VectorAccessible 
batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
-  throws ClassTransformationException, IOException, 
SchemaChangeException{
-CodeGenerator cg = 
CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, 
context.getFunctionRegistry(), context.getOptions());
+  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int 
limit)
+throws SchemaChangeException, ClassTransformationException, 
IOException {
+return createNewPriorityQueue(context.getOptionSet(), 
context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
+  config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, 
oContext.getAllocator(), schema.getSelectionVectorMode());
+  }
+
+  public static PriorityQueue createNewPriorityQueue(
+OptionSet optionSet, FunctionLookupContext functionLookupContext, 
CodeCompiler codeCompiler,
+List orderings, VectorAccessible batch, boolean 
unionTypeEnabled, boolean codegenDump,
+int limit, BufferAllocator allocator, SelectionVectorMode mode)
+  throws ClassTransformationException, IOException, 
SchemaChangeException {
+final MappingSet mainMapping = new MappingSet((String) null, null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+final MappingSet leftMapping = new MappingSet("leftIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+final MappingSet rightMapping = new MappingSet("rightIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
--- End diff --

Not sure these should be inside the method. Can we still grab hold of the 
variable for testing or debugging? Might we want to leave this in the class 
name space?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-11 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144132366
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/CodeGenerator.java ---
@@ -90,12 +89,11 @@
   private String generatedCode;
   private String generifiedCode;
 
-  CodeGenerator(TemplateClassDefinition definition, 
FunctionImplementationRegistry funcRegistry, OptionSet optionManager) {
-this(ClassGenerator.getDefaultMapping(), definition, funcRegistry, 
optionManager);
+  CodeGenerator(TemplateClassDefinition definition, OptionSet 
optionManager) {
+this(ClassGenerator.getDefaultMapping(), definition, optionManager);
--- End diff --

Does the code actually work without this item? Isn't the registry where we 
store UDF functions? And, doesn't the expression materializer make use of this 
info to copy UDFs inline into the generated code?

If we can remove this, that is a big convenience. But, I'm skeptical that 
it is, in fact, not needed.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-11 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144131729
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
@@ -110,6 +109,11 @@ public CodeGenCompiler(final DrillConfig config, final 
OptionSet optionManager)
   public static final String DISABLE_CACHE_CONFIG = COMPILE_BASE + 
".disable_cache";
 
   /**
+   * Enables saving generated code for debugging
+   */
+  public static final String ENABLE_SAVE_CODE_FOR_DEBUG = COMPILE_BASE + 
".codegen.dump";
--- End diff --

When saving generated code, I have to point to the directory from Eclipse. 
This means two things:

1. It works best if the same directory is used from one run to the next. 
(That is, no randomly generated temp directory.)
2. The directory has to be visible to the Eclipse UI. (That is, no hidden 
directories starting with dots.)

Can we make the directory in a known location, and can we use a nice simple 
name like "codegen"?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-11 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144142548
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java
 ---
@@ -20,22 +20,58 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 public interface PriorityQueue {
-  public void add(FragmentContext context, RecordBatchData batch) throws 
SchemaChangeException;
-  public void init(int limit, FragmentContext context, BufferAllocator 
allocator, boolean hasSv2) throws SchemaChangeException;
-  public void generate() throws SchemaChangeException;
-  public VectorContainer getHyperBatch();
-  public SelectionVector4 getHeapSv4();
-  public SelectionVector4 getFinalSv4();
-  public boolean validate();
-  public void resetQueue(VectorContainer container, SelectionVector4 
vector4) throws SchemaChangeException;
-  public void cleanup();
-
-  public static TemplateClassDefinition TEMPLATE_DEFINITION 
= new TemplateClassDefinition(PriorityQueue.class, 
PriorityQueueTemplate.class);
+  /**
+   * The elements in the given batch are added to the priority queue. Note 
that the priority queue
+   * only retains the top elements that fit within the size specified by 
the {@link #init(int, BufferAllocator, boolean)}
+   * method.
+   * @param batch The batch containing elements we want to add.
+   * @throws SchemaChangeException
+   */
+  void add(RecordBatchData batch) throws SchemaChangeException;
 
+  /**
+   * Initializes the priority queue. This method must be called before any 
other methods on the priority
+   * queue are called.
+   * @param limit The size of the priority queue.
+   * @param allocator The {@link BufferAllocator} to use when creating the 
priority queue.
+   * @param hasSv2 True when incoming batches have v2 selection vectors. 
False otherwise.
+   * @throws SchemaChangeException
+   */
+  void init(int limit, BufferAllocator allocator, boolean hasSv2) throws 
SchemaChangeException;
+
+  /**
+   * This method must be called before fetching the final heap hyper batch 
and final Sv4 vector.
--- End diff --

"heap hyper batch" seems an oxymoron. All batches, including hyper-batches, 
exist in direct memory. What is meant by "heap"?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-11 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144147991
  
--- Diff: 
common/src/test/java/org/apache/drill/testutils/SubDirTestWatcher.java ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.testutils;
--- End diff --

You were a brave soul and moved the `DrillTest` and `BaseTestQuery` classes 
into `org.apache.drill.test`. Do we want to move `testutils` under `test` as 
well?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-11 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144131909
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
@@ -110,6 +109,11 @@ public CodeGenCompiler(final DrillConfig config, final 
OptionSet optionManager)
   public static final String DISABLE_CACHE_CONFIG = COMPILE_BASE + 
".disable_cache";
 
   /**
+   * Enables saving generated code for debugging
+   */
+  public static final String ENABLE_SAVE_CODE_FOR_DEBUG = COMPILE_BASE + 
".codegen.dump";
--- End diff --

Also, how does a static string enable code generation? Isn't this just the 
destination folder once saving code is enabled?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-11 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144147334
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -95,7 +91,9 @@ public TopNBatch(TopN popConfig, FragmentContext context, 
RecordBatch incoming)
 super(popConfig, context);
 this.incoming = incoming;
 this.config = popConfig;
-batchPurgeThreshold = 
context.getConfig().getInt(ExecConstants.BATCH_PURGE_THRESHOLD);
+DrillConfig drillConfig = context.getConfig();
+batchPurgeThreshold = 
drillConfig.getInt(ExecConstants.BATCH_PURGE_THRESHOLD);
+codegenDump = 
drillConfig.getBoolean(CodeCompiler.ENABLE_SAVE_CODE_FOR_DEBUG);
--- End diff --

Not sure we want to enable this in this way. This would globally enable all 
code to be dumped. In practice, we want to enable module by module depending on 
what we are debugging.

I've been doing this the crude-but-effective way: just uncommenting the 
magic line of code. This usually works because I have to be in the debugger 
anyway to step through the code. But, if we wanted to be fancier, we could have 
config settings for each package. So, here, maybe you would enable 
`drill.debug.codegen.org.apache.drill.exec.physical.impl.TopN`. Or, we could be 
a bit lighter weight and just assign names: `drill.debug.codegen.TopN`.

Because a typical run will generate zillions of files, reducing the number 
of saved files to those of interest will help to keep things tidy.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-10-10 Thread ilooner

GitHub user ilooner opened a pull request:

https://github.com/apache/drill/pull/984

DRILL-5783 Made a unit test for generated Priority Queue. DRILL-5841 Fix 
tmp folder errors.

## DRILL-5783

- A unit test is created for the priority queue in the TopN operator
- The code generation classes passed around a completely unused function 
registry reference in some places so I removed it.
- The priority queue had unused parameters for some of its methods so I 
removed them.
- I added an add method to value vectors so that value vectors can easily 
be generated in unit tests.

## DRILL-5841

- There were many many ways in which temporary folders were created in unit 
tests. I have unified the way these folders are created with the 
DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
have been updated to use these. The test watchers create temp directories in 
./target//. So all the files 
generated and used in the context of a test can easily be found in the same 
consistent location.
- This change should fix the sporadic hashagg test failures, as well as 
failures caused by stray files in /tmp

Misc

- Added a TableBuilder so we don't have to becom human json parsers to 
generate test data.
- A Pstore test could fail sporadically
- General code cleanup. 
- There are many places where String.format is used unnecessarily. The test 
builder methods already use String.format for you when you pass them args. I 
cleaned some of these up.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner/drill DRILL-5783

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/984.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #984


commit c9c554340a4a2ca8bab257a478d8201cd5f4c92a
Author: Timothy Farkas 
Date:   2017-09-18T23:21:44Z

 - Added property for codegen dump.
 - Removed unnecessary FunctionRegistry argument to the code generator.

commit 68f77a2e9a904bcd5a808fefd824f2f108134aec
Author: Timothy Farkas 
Date:   2017-09-18T23:31:38Z

 - Removed unused imports

commit eff908b4b18fe5ccb67480f6e0c36a61e27d12e2
Author: Timothy Farkas 
Date:   2017-09-18T23:34:33Z

 - Removed unnecessary modifiers from the priority queue interface.

commit 319fae58faaf78f807d8d6f9c0928f801ec7e352
Author: Timothy Farkas 
Date:   2017-09-19T00:28:58Z

 - Made creating a priority queue more modular by refactoring the 
createNewPriorityQueue method

commit e82e316a628e3c2f3a1f9ef7005299065eb4379d
Author: Timothy Farkas 
Date:   2017-09-19T21:08:19Z

 - Have some codegen dump test code done

commit e2057d6c9abc301889ab46a12848cee7506f45d5
Author: Timothy Farkas 
Date:   2017-09-21T22:07:52Z

 - Made fix

commit 6289cff376f37c8d4e2984b309d7f6eb13703f2f
Author: Timothy Farkas 
Date:   2017-09-28T23:24:38Z

 - Removed unused imports
 - Removed commented out code
 - Formatting fixes

commit cb80f885c86a00cda82e114b3aec212890ce6d90
Author: Timothy Farkas 
Date:   2017-09-29T23:03:49Z

 - Added the option to the DirTestWatcher to not delete the tmp dir at the 
end of the test
 - Added a TableFileBuilder to be used in testing
 - Changed the External sort test to use the TableFileBuilder

commit bb58b21ad16d6adbeb11c66d5a38d11c20a0b05d
Author: Timothy Farkas 
Date:   2017-09-29T23:06:15Z

 - Delete the directory at the beginning of a test in DirTestWatcher

commit f0d89abf2dc3422081bcd49183b8bf0ae3742996
Author: Timothy Farkas 
Date:   2017-09-30T01:51:13Z

 - Used existing test framework for the TopNBatchTest

commit 7f8d978c12d0e2b2efd3b3c64f5450ca76f0ef0e
Author: Timothy Farkas 
Date:   2017-10-05T06:27:55Z

 - Removed passing around contexts unnecessarily in the TopNBatch operator.
 - Fixed some javadocs

commit c539ebd20c421d8672440ad3bb863c446f39e775
Author: Timothy Farkas 
Date:   2017-10-06T03:22:04Z

- Added a record batch builder, and added an add method to value vector 
mutators.

commit 89a5f8576bd2f50ce540b633a28ace55b31f08e6
Author: Timothy Farkas 
Date:   2017-10-06T05:46:48Z

 - Added Javadoc to the PriorityQueue.
 - Removed an unnecessary method
 - Got the unit test to work

commit 540715d250601590367d7bc71614ab52e891c6ec
Author: Timothy Farkas 
Date:   2017-10-06T10:27:46Z

 - Added utilities for comparing vector batch results
 - Removed redundant unit tests

commit

93 matches

Mail list logo