[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2020-01-02 Thread Bohdan Kazydub (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006926#comment-17006926
 ] 

Bohdan Kazydub commented on DRILL-7359:
---

Merged into Apache master with commit id 
[7c813ff6440a118de15f552145b40eb07bb8e7a2|https://github.com/apache/drill/commit/7c813ff6440a118de15f552145b40eb07bb8e7a2].

> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006917#comment-17006917
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

asfgit commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005258#comment-17005258
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-569647510
 
 
   @paul-rogers, I've updated the PR: added tests for `key` and `value` 
overflow for single and `DICT` array, introduced separate `VectorState`s for 
single `DICT` and `DICT` array, performed minor refacttoring. Please take a 
look.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004016#comment-17004016
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r361611436
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ColumnBuilder.java
 ##
 @@ -179,6 +189,19 @@ private ColumnState buildPrimitive(ContainerState parent, 
ColumnReadProjection c
 vectorState);
   }
 
+  /**
+   * Check if this is a special case when vector, writer and column state 
should be
+   * created for a primitive field though the field itself is not projected. 
This is
+   * needed because {@code DICT}'s {@code keys} field is not projected but is 
needed
+   * to be initialized to ensure the dict vector is constructed properly.
 
 Review comment:
   Dict works the same as arrays regarding projection: if the user refers to 
any `DICT` entry, the whole `DICT` is projected and finding the needed entry is 
to be handled by some other operator. This entry is not filtered in the Scan.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003980#comment-17003980
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r361606350
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderDictArray.java
 ##
 @@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import static org.apache.drill.test.rowSet.RowSetUtilities.map;
+import static org.apache.drill.test.rowSet.RowSetUtilities.objArray;
+import static org.apache.drill.test.rowSet.RowSetUtilities.strArray;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertSame;
+import static org.junit.Assert.assertTrue;
+
+import java.util.Iterator;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.complex.DictVector;
+import org.apache.drill.exec.vector.complex.RepeatedDictVector;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+/**
+ * Test dict array support in the result set loader.
+ */
+@Category(RowSetTests.class)
+public class TestResultSetLoaderDictArray extends SubOperatorTest {
 
 Review comment:
   Thanks for the excellent and complete tests. The one addition I might 
suggest is to add an overflow test for both the array and non-array cases. 
Basically, you want to make the vector exceed 16 GB. I do this by using big 
strings (~ 512 bytes in size.) Use an odd size: 523 or some such, to that 
overflow occurs in an other than a even row count.
   
   Then, use big keys to trigger overflow of the key vector. Check the 
resulting output batches. Then, repeat for the value vector.
   
   Overflow is the most complex part of the result set loader. If you look at 
examples for VarChar fields and arrays, you might see what was done before. 
(Avoid the "Torture" test: that one is very complex because it tries to set up 
near-random, yet predictable, overflow patterns.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003981#comment-17003981
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r361604989
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ColumnBuilder.java
 ##
 @@ -179,6 +189,19 @@ private ColumnState buildPrimitive(ContainerState parent, 
ColumnReadProjection c
 vectorState);
   }
 
+  /**
+   * Check if this is a special case when vector, writer and column state 
should be
+   * created for a primitive field though the field itself is not projected. 
This is
+   * needed because {@code DICT}'s {@code keys} field is not projected but is 
needed
+   * to be initialized to ensure the dict vector is constructed properly.
 
 Review comment:
   We have something similar with arrays:
   
   ```
   SELECT a[1], a[3] FROM ...
   ```
   
   In this case, only two (of possibly many) array positions are indexed.
   
   The rule adopted in the code is that if *any* index is projected, then the 
entire array is projected in the Scan operator. The Project operator will 
fiddle around with the array and pick out the two needed indexes.
   
   So, can we do the same here? If the user refers to any DICT entry, the Scan 
projects the entire DICT and some other operator worries about the details.
   
   This *seems* to make sense because I can't see of any reasonable way that 
within-array or within-DICT selection could be done in the Scan.
   
   (That said, I do hope, at some point, we can emulate Impala: push the filter 
into the scan. The ResultSetLoader is designed for this, but only at the row 
level. I suppose we could do it at the array element level with a bit of work.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003982#comment-17003982
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r361605652
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/TupleState.java
 ##
 @@ -567,4 +574,187 @@ public void dump(HierarchicalFormatter format) {
   .endArray()
   .endObject();
   }
+
+  public static class DictColumnState extends BaseContainerColumnState {
+protected final DictState dictState;
+protected boolean isVersioned;
+protected final ColumnMetadata outputSchema;
+// Indicates if value is accessed by key
+protected final boolean valueProjected;
+
+public DictColumnState(DictState dictState,
+  AbstractObjectWriter writer,
+  VectorState vectorState,
+  boolean isVersioned,
+  boolean valueProjected) {
+  super(dictState.loader(), writer, vectorState);
+  this.dictState = dictState;
+  dictState.bindColumnState(this);
+  this.isVersioned = isVersioned;
+  if (isVersioned) {
+outputSchema = schema().cloneEmpty();
+  } else {
+outputSchema = schema();
+  }
+  dictState.bindOutputSchema(outputSchema.tupleSchema());
+  this.valueProjected = valueProjected;
+}
+
+@Override
+public void buildOutput(TupleState tupleState) {
+  outputIndex = tupleState.addOutputColumn(vector(), outputSchema());
+}
+
+public DictState dictState() {
+  return dictState;
+}
+
+@Override
+public ContainerState container() {
+  return dictState;
+}
+
+@Override
+public boolean isProjected() {
+  return dictState.hasProjections();
+}
+
+public boolean isVersioned() {
+  return isVersioned;
+}
+
+@Override
+public ColumnMetadata outputSchema() { return outputSchema; }
+  }
+
+  public static abstract class DictState extends MapState {
+
+public DictState(LoaderInternals events,
+ResultVectorCache vectorCache,
+ProjectionSet projectionSet) {
+  super(events, vectorCache, projectionSet);
+}
+
+public void bindColumnState(ColumnState colState) {
+  super.bindColumnState(colState);
+  writer().bindListener(this);
+}
+
+@Override
+public boolean isDict() {
+  return true;
+}
+
+@Override
+protected boolean isVersioned() {
+  return ((DictColumnState) parentColumn).isVersioned();
+}
+
+@Override
+public void dump(HierarchicalFormatter format) {
+  format.startObject(this)
+  .attribute("column", parentColumn.schema().name())
+  .attribute("cardinality", innerCardinality())
+  .endObject();
+}
+  }
+
+  public static class SingleDictState extends DictState {
+
+public SingleDictState(LoaderInternals events,
+  ResultVectorCache vectorCache,
+  ProjectionSet projectionSet) {
+  super(events, vectorCache, projectionSet);
+}
+
+@Override
+public AbstractTupleWriter writer() {
+  return (AbstractTupleWriter) parentColumn.writer().dict().tuple();
+}
+  }
+
+  public static class DictArrayState extends DictState {
+
+public DictArrayState(LoaderInternals events,
+ ResultVectorCache vectorCache,
+ ProjectionSet projectionSet) {
+  super(events, vectorCache, projectionSet);
+}
+
+@Override
+public int addOutputColumn(ValueVector vector, ColumnMetadata colSchema) {
+  final RepeatedDictVector repeatedDictVector = parentColumn.vector();
+  DictVector dictVector = (DictVector) repeatedDictVector.getDataVector();
+  if (isVersioned()) {
+dictVector.putChild(colSchema.name(), vector);
+  }
+  final int index = outputSchema.addColumn(colSchema);
+  assert dictVector.size() == outputSchema.size();
+  assert dictVector.getField().getChildren().size() == outputSchema.size();
+  return index;
+}
+
+@Override
+public AbstractTupleWriter writer() {
+  return (AbstractTupleWriter) 
parentColumn.writer().array().dict().tuple();
+}
+  }
+
+  public static class DictVectorState implements VectorState {
+
+private final ValueVector vector;
 
 Review comment:
   I see the challenge. Might be better to use two distinct classes to model 
the different implementations. The two "state" classes can have the same API to 
hide the differences from other code.
   
   This class is responsible for translating Result Set Loader events into "do 
the right thing" actions on the underlying vectors.
   
   For 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003972#comment-17003972
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r361603427
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ColumnBuilder.java
 ##
 @@ -179,6 +189,19 @@ private ColumnState buildPrimitive(ContainerState parent, 
ColumnReadProjection c
 vectorState);
   }
 
+  /**
+   * Check if this is a special case when vector, writer and column state 
should be
+   * created for a primitive field though the field itself is not projected. 
This is
+   * needed because {@code DICT}'s {@code keys} field is not projected but is 
needed
+   * to be initialized to ensure the dict vector is constructed properly.
 
 Review comment:
   Oops, my bad - hasn't mentioned (just a kind of implicit hint in the 
`@return`) in the javadoc body (do not know what is the text before `@param`, 
`@return` etc. is called) that this affects creation of `DICT`'s `key` when 
`value` is accessed by key only, i.e. `col['someKey']` with `col` being a 
`DICT` column with `String` (`VARCHAR`) `key`; `value` in this case will be 
projected.
   If the `DICT` is not projected, this won't create any vector: neither 
`DictVector` nor vector for `key` (or `value`) - special "dummy" writer for 
`DICT` with "dummy" writers for its children, `key` and `value`, will be used, 
as is done for other vectors.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003953#comment-17003953
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r361603427
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ColumnBuilder.java
 ##
 @@ -179,6 +189,19 @@ private ColumnState buildPrimitive(ContainerState parent, 
ColumnReadProjection c
 vectorState);
   }
 
+  /**
+   * Check if this is a special case when vector, writer and column state 
should be
+   * created for a primitive field though the field itself is not projected. 
This is
+   * needed because {@code DICT}'s {@code keys} field is not projected but is 
needed
+   * to be initialized to ensure the dict vector is constructed properly.
 
 Review comment:
   Oops, my bad - hasn't mentioned (just a kind of implicit hint in the 
`@return`) in the javadoc body (do not know what is the text before `@param`, 
`@return` etc. is called) that this affects creation of `DICT`'s `key` when 
`value` is accessed by key only, i.e. `col['someKey']` with `col` being a 
`DICT` column with `String` (`VARCHAR`) `key`; `value` in this case will be 
projected.
   If the `DICT` is not projected, this won't create any vector: neither 
`DictVector` nor vector for `key` (or `value`) - special "dummy" vector for 
`DICT` will be used, as is done for other vectors.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003946#comment-17003946
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r361600935
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ContainerState.java
 ##
 @@ -172,4 +172,8 @@ public void close() {
   colState.close();
 }
   }
+
+  public boolean isDict() {
+return false;
+  }
 
 Review comment:
   It is better, this was removed, yes. I'll try to eliminate this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003945#comment-17003945
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r361600501
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/TupleState.java
 ##
 @@ -567,4 +574,187 @@ public void dump(HierarchicalFormatter format) {
   .endArray()
   .endObject();
   }
+
+  public static class DictColumnState extends BaseContainerColumnState {
+protected final DictState dictState;
+protected boolean isVersioned;
+protected final ColumnMetadata outputSchema;
+// Indicates if value is accessed by key
+protected final boolean valueProjected;
+
+public DictColumnState(DictState dictState,
+  AbstractObjectWriter writer,
+  VectorState vectorState,
+  boolean isVersioned,
+  boolean valueProjected) {
+  super(dictState.loader(), writer, vectorState);
+  this.dictState = dictState;
+  dictState.bindColumnState(this);
+  this.isVersioned = isVersioned;
+  if (isVersioned) {
+outputSchema = schema().cloneEmpty();
+  } else {
+outputSchema = schema();
+  }
+  dictState.bindOutputSchema(outputSchema.tupleSchema());
+  this.valueProjected = valueProjected;
+}
+
+@Override
+public void buildOutput(TupleState tupleState) {
+  outputIndex = tupleState.addOutputColumn(vector(), outputSchema());
+}
+
+public DictState dictState() {
+  return dictState;
+}
+
+@Override
+public ContainerState container() {
+  return dictState;
+}
+
+@Override
+public boolean isProjected() {
+  return dictState.hasProjections();
+}
+
+public boolean isVersioned() {
+  return isVersioned;
+}
+
+@Override
+public ColumnMetadata outputSchema() { return outputSchema; }
+  }
+
+  public static abstract class DictState extends MapState {
+
+public DictState(LoaderInternals events,
+ResultVectorCache vectorCache,
+ProjectionSet projectionSet) {
+  super(events, vectorCache, projectionSet);
+}
+
+public void bindColumnState(ColumnState colState) {
+  super.bindColumnState(colState);
+  writer().bindListener(this);
+}
+
+@Override
+public boolean isDict() {
+  return true;
+}
+
+@Override
+protected boolean isVersioned() {
+  return ((DictColumnState) parentColumn).isVersioned();
+}
+
+@Override
+public void dump(HierarchicalFormatter format) {
+  format.startObject(this)
+  .attribute("column", parentColumn.schema().name())
+  .attribute("cardinality", innerCardinality())
+  .endObject();
+}
+  }
+
+  public static class SingleDictState extends DictState {
+
+public SingleDictState(LoaderInternals events,
+  ResultVectorCache vectorCache,
+  ProjectionSet projectionSet) {
+  super(events, vectorCache, projectionSet);
+}
+
+@Override
+public AbstractTupleWriter writer() {
+  return (AbstractTupleWriter) parentColumn.writer().dict().tuple();
+}
+  }
+
+  public static class DictArrayState extends DictState {
+
+public DictArrayState(LoaderInternals events,
+ ResultVectorCache vectorCache,
+ ProjectionSet projectionSet) {
+  super(events, vectorCache, projectionSet);
+}
+
+@Override
+public int addOutputColumn(ValueVector vector, ColumnMetadata colSchema) {
+  final RepeatedDictVector repeatedDictVector = parentColumn.vector();
+  DictVector dictVector = (DictVector) repeatedDictVector.getDataVector();
+  if (isVersioned()) {
+dictVector.putChild(colSchema.name(), vector);
+  }
+  final int index = outputSchema.addColumn(colSchema);
+  assert dictVector.size() == outputSchema.size();
+  assert dictVector.getField().getChildren().size() == outputSchema.size();
+  return index;
+}
+
+@Override
+public AbstractTupleWriter writer() {
+  return (AbstractTupleWriter) 
parentColumn.writer().array().dict().tuple();
+}
+  }
+
+  public static class DictVectorState implements VectorState {
+
+private final ValueVector vector;
 
 Review comment:
   This class is used for both `DictVector` and `RepeatedDictVector`. Unlike 
for `MAP`, they do not have a common parent class (other than 
`RepeatedValueVector` interface). Should this be changed to 
`RepeatedValueVector` or should I create another `VectorState` for 
`RepeatedDictVector`?
 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003845#comment-17003845
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r361567680
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/AbstractKeyAccessor.java
 ##
 @@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor;
+
+import org.joda.time.Instant;
+import org.joda.time.LocalDate;
+import org.joda.time.LocalTime;
+import org.joda.time.Period;
+
+import java.math.BigDecimal;
+
+public class AbstractKeyAccessor implements KeyAccessor {
+
+  protected final DictReader dictReader;
+  protected final ScalarReader keyReader;
+
+  protected AbstractKeyAccessor(DictReader dictReader, ScalarReader keyReader) 
{
+this.dictReader = dictReader;
+this.keyReader = keyReader;
+  }
+
+  @Override
+  public boolean find(boolean key) {
+throw new UnsupportedOperationException(this.getClass().getName() + " does 
not support boolean key.");
 
 Review comment:
   Nit:
   
   ```
 throw unsupported("boolean");
   ...
   
   private UnsupportedOperationException unsupported(String type) {
 return new UnsupportedOperationException(String.format(
 "%s does not support a key of type %s.",
 getClass().getName(), type));
   }
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003846#comment-17003846
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r361551286
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ColumnBuilder.java
 ##
 @@ -179,6 +189,19 @@ private ColumnState buildPrimitive(ContainerState parent, 
ColumnReadProjection c
 vectorState);
   }
 
+  /**
+   * Check if this is a special case when vector, writer and column state 
should be
+   * created for a primitive field though the field itself is not projected. 
This is
+   * needed because {@code DICT}'s {@code keys} field is not projected but is 
needed
+   * to be initialized to ensure the dict vector is constructed properly.
 
 Review comment:
   Interesting. I'm a bit confused, however. Help me understand what's 
happening here.
   
   If a column X is unprojected, this means that, in the batch coming out of 
the Scan, say, we don't want column X to appear at all. This why, for all other 
projected columns, we do not create an actual value vector.
   
   From the perspective of the reader using the Result Set Loader, the column 
does exist: I can write to it. So, we create a "dummy" writer: one which simply 
ignores the values given it.
   
   Let's imagine how that would work for a DICT. My input file has a DICT 
field. But, that DICT is not projected. Either way, I may find it easier to 
read the DICT whether it is projected or not.
   
   Though a DICT represents a map conceptually (and is implemented as a 
different kind of map internally), it is really just a correlated array at the 
data level. So, I just want to write key/value pairs to the reader. A previous 
comment sketched out how I might want to do that. (I've not yet gotten to the 
new code where that part is implemented.)
   
   Unless I'm missing something obvious, I think the above reasoning means 
that, if a DICT is not projected, we want a dummy DICT writer (using dummy 
writers for the keys and values). But, I can't see why we would create the 
actual DICT vector.
   
   Can you help me see what I'm missing?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003844#comment-17003844
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r361565139
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ContainerState.java
 ##
 @@ -172,4 +172,8 @@ public void close() {
   colState.close();
 }
   }
+
+  public boolean isDict() {
+return false;
+  }
 
 Review comment:
   Should clients of this class care what kind of container this is? Shouldn't 
all DICT-specific code be in the DICT related classes?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003847#comment-17003847
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r361566453
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/TupleState.java
 ##
 @@ -567,4 +574,187 @@ public void dump(HierarchicalFormatter format) {
   .endArray()
   .endObject();
   }
+
+  public static class DictColumnState extends BaseContainerColumnState {
+protected final DictState dictState;
+protected boolean isVersioned;
+protected final ColumnMetadata outputSchema;
+// Indicates if value is accessed by key
+protected final boolean valueProjected;
+
+public DictColumnState(DictState dictState,
+  AbstractObjectWriter writer,
+  VectorState vectorState,
+  boolean isVersioned,
+  boolean valueProjected) {
+  super(dictState.loader(), writer, vectorState);
+  this.dictState = dictState;
+  dictState.bindColumnState(this);
+  this.isVersioned = isVersioned;
+  if (isVersioned) {
+outputSchema = schema().cloneEmpty();
+  } else {
+outputSchema = schema();
+  }
+  dictState.bindOutputSchema(outputSchema.tupleSchema());
+  this.valueProjected = valueProjected;
+}
+
+@Override
+public void buildOutput(TupleState tupleState) {
+  outputIndex = tupleState.addOutputColumn(vector(), outputSchema());
+}
+
+public DictState dictState() {
+  return dictState;
+}
+
+@Override
+public ContainerState container() {
+  return dictState;
+}
+
+@Override
+public boolean isProjected() {
+  return dictState.hasProjections();
+}
+
+public boolean isVersioned() {
+  return isVersioned;
+}
+
+@Override
+public ColumnMetadata outputSchema() { return outputSchema; }
+  }
+
+  public static abstract class DictState extends MapState {
+
+public DictState(LoaderInternals events,
+ResultVectorCache vectorCache,
+ProjectionSet projectionSet) {
+  super(events, vectorCache, projectionSet);
+}
+
+public void bindColumnState(ColumnState colState) {
+  super.bindColumnState(colState);
+  writer().bindListener(this);
+}
+
+@Override
+public boolean isDict() {
+  return true;
+}
+
+@Override
+protected boolean isVersioned() {
+  return ((DictColumnState) parentColumn).isVersioned();
+}
+
+@Override
+public void dump(HierarchicalFormatter format) {
+  format.startObject(this)
+  .attribute("column", parentColumn.schema().name())
+  .attribute("cardinality", innerCardinality())
+  .endObject();
+}
+  }
+
+  public static class SingleDictState extends DictState {
+
+public SingleDictState(LoaderInternals events,
+  ResultVectorCache vectorCache,
+  ProjectionSet projectionSet) {
+  super(events, vectorCache, projectionSet);
+}
+
+@Override
+public AbstractTupleWriter writer() {
+  return (AbstractTupleWriter) parentColumn.writer().dict().tuple();
+}
+  }
+
+  public static class DictArrayState extends DictState {
+
+public DictArrayState(LoaderInternals events,
+ ResultVectorCache vectorCache,
+ ProjectionSet projectionSet) {
+  super(events, vectorCache, projectionSet);
+}
+
+@Override
+public int addOutputColumn(ValueVector vector, ColumnMetadata colSchema) {
+  final RepeatedDictVector repeatedDictVector = parentColumn.vector();
+  DictVector dictVector = (DictVector) repeatedDictVector.getDataVector();
+  if (isVersioned()) {
+dictVector.putChild(colSchema.name(), vector);
+  }
+  final int index = outputSchema.addColumn(colSchema);
+  assert dictVector.size() == outputSchema.size();
+  assert dictVector.getField().getChildren().size() == outputSchema.size();
+  return index;
+}
+
+@Override
+public AbstractTupleWriter writer() {
+  return (AbstractTupleWriter) 
parentColumn.writer().array().dict().tuple();
+}
+  }
+
+  public static class DictVectorState implements VectorState {
+
+private final ValueVector vector;
 
 Review comment:
   Part of the goal of this class is to provide the actual vector class. Should 
this be `DictVector`? Same with `vector()`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003757#comment-17003757
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-569130340
 
 
   @paul-rogers, I've updated the PR, please have a look again.
   @arina-ielchiieva this is now fixed, thanks for pointing this out!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003320#comment-17003320
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

arina-ielchiieva commented on issue #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-568912112
 
 
   @KazydubB I see you have pushed new commit, is PR ready for review? Please 
tag Paul if yes.
   Also please check if `org.apache.drill.exec.record.metadata.DictBuilder` 
method `public DictBuilder resumeMap()` was fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998007#comment-16998007
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-566448821
 
 
   @paul-rogers, I'm sorry for taking so long. I'm planning to finish this in a 
few days.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997606#comment-16997606
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-566219911
 
 
   Would be great if we could push this over the finish line. 
   
   There will be a conflict with DRILL-7486.
   
   I made some suggestions about finding DICT values and handling missing 
values. Let me know if I can help with that work. If we're not sure what we 
want to do in this area, then, as noted earlier, might be good to commit what 
we have; then we can refine it over time as we try the code.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997428#comment-16997428
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

arina-ielchiieva commented on issue #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-566130764
 
 
   Please fix bug in `org.apache.drill.exec.record.metadata.DictBuilder`
   ```
 public DictBuilder resumeMap() {
   build();
   return (DictBuilder) parent;
 }
   ```
   Resume map should return map builder.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989080#comment-16989080
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r354501095
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   @KazydubB, first, thanks for understanding my hastily-written example; I've 
edited it to be accurate.
   
   We are defining an API. APIs exist to support use cases, so it is very 
helpful to think about those. (My bad for not doing that earlier; I was focused 
more on the plumbing aspects early on. This is, in fact, why I still like 
design docs, even in the era of Agile, so we can visualize these issues early 
on.)
   
   In my view, there are two key use cases: tests and production.
   
   In tests, we simply want to verify that, say, a reader correctly created (or 
filtered, or altered) a `DICT`. If a key exists, we want to verify its value. 
If a key does not exist, we simply need to verify that fact. So, having a 
generic `boolean findKey(Object key)` is perfectly fine. The overhead of doing 
a type-based switch statement on every call is fine. (This is why tests often 
use the `addRow()` methods which, internally, have a big, slow `if` statement 
based on the type of the value.)
   
   The second use case is production. Here, we can assume our goal is maximum 
performance via generated code.  I would argue that we do not want any extra 
code in the access path. Instead, the code should be specific to each type. 
This is why the readers are designed to go directly from, say, `getString()` 
into the code that grabs the bytes from the underlying value vector.
   
   For production code, we'd want a type-aware "find" operation. There are a 
couple of ways to do that. One is by adding a "key accessor":
   
   ```
   class DictReader ...
  KeyAccessor keyAccessor();
 ...
   }
   
   interface KeyAccessor {
 boolean findInt(int key);
 boolean findString(String key);
 ...
   }
   ```
   
   The accessor code gen could then generate type-specific search code by 
creating a key accessor class for each supported key type. This would be done 
in the build-time code gen, using Freemarker, as part of the existing 
`columnAccessors.java` template and code gen.
   
   Now things get interesting. We need to generate code for each SQL operation 
that uses a `DICT`. Let's make up an example:
   
   ```
   SELECT ... FROM myTableWithDict WHERE d['foo'] = 20
   ```
   
   Here, the story splits into two paths. Today, code gen works directly 
against value vectors: you will need to generate the find code inline for each 
operator. Longer term, it is my hope that we can shift to using the column 
readers. (Doing so will simplify code gen, and puts us in a position to use 
multiple value vector format, such as Arrow, if we wanted.)
   
   So, let's imagine we are using your `DICT` readers in runtime code 
generation. We want to generate code for the above example. 
   
   In either case (old-school or reader-based), we don't know the type until 
the first row appears. At that point, we generate run-time code that, in our 
example, uses a `VARCHAR` key to lookup an `INT` value and send it to the 
equality function. Very crudely:
   
   ```
   DictReader dr;
   KeyAccessor ka;
   ScalarReader vr;
   String constKey;
   String constArg;
   
   void doSetup() {
 dr = rowReader.dict("d");
 ka = dr.keyAccessor();
 vr = dr.value().scalar();
 constKey = "foo";
 constArg = "bar";
   }
   
   void doEval() {
 boolean isEquals;
 boolean found = ka.findString(constKey);
 if (found) {
   isEquals = strEquals(vr.getString(), constArg);
 } else {
   isEquals = false;
 }
 // do something with isEquals...
   }
   ```
   
   The above is written in "code gen style", but greatly simplified.
   
   So, we've looked at two use cases: test and production. In neither did we 
need to work with a missing value. Instead, we observed the rules that a 
non-existent key has no value (rather than a non-existent key 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989053#comment-16989053
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r353522274
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   @KazydubB, thanks for the explanation. It clarifies something I wondered 
about: the role of the null state readers. Let's talk a bit about the intended 
semantics for the readers.
   
   I should be able to write code that caches the readers:
   
   ```
   DictReader dr = rowReader.dict("myDict");
   ScalarReader kw = dr.key().scalar();
   MapReader vr = dr.value.tuple();
   ```
   
   That is, I should be able to use the same key reader (`kr`) and value reader 
(`vr`) for all members. Then, I should either be able to iterate over the 
values:
   
   ```
   while (dr.next()) {
 System.out.println(kr.getString());
 System.out.println(vr.getAsString());
   }
   ```
   
   If things work that way, then we don't need a special reader for the null 
state: the value reader (here, `vr`) would simply return `true` from `isNull()`:
   
   ```
   while (dr.next()) {
 System.out.println(kr.getString());
 if (vr.isNull()) {
   System.out.println("NULL");
 } else {
   System.out.println(vr.getAsString());
 }
   }
   ```
   
   (Ignoring, for the moment, that `getAsString()` already does a null check.)
   
   Note that, if we read past the end, both the key and value writers will 
point to nothing. I think that, for the array reader, accessing the value one 
past the end is an error. For `DICT`, we could simply declare that the value is 
NULL.
   
   Then, I was going to sketch out the same behavior if we search for a key and 
found...nothing. Seems that the `DictReader` has no way to search for a key. If 
it did, it would not be clear what type to use for the key. So, that is a bit 
of a mess.
   
   Let's assume we did have something to find a specific value, like we do for 
the arrays where we have `setPosn(int index)`. Maybe we have `findStringKey()` 
etc. For testing, we could just have `findKey(Object key)` and do a switch on 
type as we do for the other readers. So:
   
   ```
 dr.findStringKey("foo");
 if (vr.isNull()) {
   System.out.println("NULL");
 } else {
   System.out.println(vr.getAsString());
 }
   ```
   
   Or, even:
   
   ```
 if (dr.findStringKey("foo")) {
   System.out.println(vr.getAsString());
 }
   ```
   
   The only way to find the value is to do a linear search. If we do a linear 
search, and no value exists, the index will point one past the end of the dict. 
And, above, we suggested that the key and value readers could return `true` for 
`isNull()` in those cases.
   
   Given this, unless I'm missing something, I don't see that we need any kind 
of null value reader at all. In fact, I'd even say that we don't want such a 
reader if we'd have to do:
   
   ```
   ObjectReader kr = dr.findStringKey("foo");
   if (kr.isNull) {
 // kr is a special null reader
   } else {
 // kr is a Map reader
   }
   ```
   
   The above clutters the code and makes it impossible to cache the value 
reader, which will slow inner-most loops that want to use the reader 
abstractions.
   
   What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988896#comment-16988896
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r354275004
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   @paul-rogers, first of all, thank you for your thoughts - they are very 
useful.
   
   Regarding the first case you described (I suppose you meant `ScalarReader 
kw` and `MapReader vr` in the first code snippet):
   one is able to write code with cached readers currently, if needed for such 
iterations over key-value pairs in the form you wrote it
   ```
   while (dr.next()) {
 System.out.println(kr.getString());
 System.out.println(vr.getAsString());
   }
   ```
   with the behaviour you expect (I need to emphasize, that special 
`Null*Reader`s appear _only_ when finding value `ObjectReader` by certain `key` 
and the given `key` does not exist for the row). But there is no need for the 
case (here and below) to add a special `NULL`-`NULL` entry past the end, as 
iteration should be over _actual_ collection of entries, which were added to 
the `Dict` (through `DictWriter`).
   
   Regarding finding specific values:
   there is a `DictReader#getValueReader(Object key)` method (with return type 
`ObjectReader`), which allows to obtain `value`'s reader with its position set 
to the one corresponding to needed `key`. If there is no one for the `key` - 
the special `Null*Reader` (depends on the type of `value`) is returned. So, 
this approach enforces non-caching of the value reader (as you have pointed 
out). The idea was to make `null` `value`s to be handled the same, indepently 
whether the `null` is from the actual value (i.e., from some entry with some 
`key` and `null` `value` `"someKey" -> null`) or from absent `key`, e.g. your 
example:
   ```
   ObjectReader kr = dr.findStringKey("foo");
   if (kr.isNull) {
 // kr is a special null reader
   } else {
 // kr is a Map reader
   }
   ```
   is likely should be done, because of the nature of complex types in Drill - 
they are generally non-nullable (at least the ones in standard, pre-EVF, 
Drill's framework) leading to no distinction between, say, empty or `null` 
`MAP`. Also, worth noting, that `vr` (defined as `ObjectReader vr = 
dictReader.getValueReader("foo")`) will have correct behavior by itself as 
opposed to `NULL` emulation on `DictReader` level:
   ```
   if (dr.findStringKey("foo")) {
 System.out.println(vr.getAsString());
   }
   ```
   will likely need an `else` case also which should be emulated too (I mean, 
the `ValueReader` itself does not know if it points to `NULL` value within a 
dict as this convention is done on the dict-level; some value reader wrapper 
can be introduced of course, but that is another layer of complexity).
   
   But good point about poorer perfomance (with the current approach).
   
   I like your idea with `boolean findKey(...)` - something similar is done in 
standard framework as well. (In this case, your suggestion about treating 
entries by the end as if they were `null` does make sense). Perhaps, I will 
stick to it after it is clear this is definetely better.
   
   About designated method for each key type - I assume it is redundant, as 
`key` type validation is better to be performed once prior to obtaining the 
data, and general case of `Object key` should do, shouldn't it? 
   
   (Just want to note, that array-like behaviour is useful in testing 
primarily, the "real" use-case is to obtain elements by key).
   
   P.s. Sorry, missed the comment yesterday.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988878#comment-16988878
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r354303721
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   Yes, there is a need to have multiple methods for each key type to avoid 
invocation of generalized `keyReader.getObject()` which contains `switch` 
checking each type (really bad in production) - will fix it.
   
   *EDIT*: actually, there is another, rather more elegant, way to hadle this 
without the need to duplicate code/add many methods:
   an instance variable `private final Supplier keyObjectExtractor` can 
be defined in `DictReaderImpl` which will be initializated during creation as 
   ```
   private Supplier getKeyObjectExtractor() {
   switch (keyColumnType()) {
 case BOOLEAN:
   return keyReader::getBoolean;
 case BYTES:
   return keyReader::getBytes;
 ...
   }
 }
   ```
   This makes possible to use `Object` in `find(Object key)` because we can use 
`keyObjectExtractor.get()` whenever we need key `Object` value, thus making 
code cleaner.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988874#comment-16988874
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r354303721
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   Yes, there is a need to have multiple methods for each key type to avoid 
invocation of generalized `keyReader.getObject()` which contains `switch` 
checking each type (really bad in production) - will fix it.
   
   *EDIT*: actually, there is another, rather more elegant, way to hadle this 
without the need to duplicate code/add many methods:
   an instance variable `private final Supplier keyObjectExtractor` can 
be defined in `DictReaderImpl` which will be initiated during creation as 
   ```
   private Supplier getKeyObjectExtractor() {
   switch (keyColumnType()) {
 case BOOLEAN:
   return keyReader::getBoolean;
 case BYTES:
   return keyReader::getBytes;
 ...
   }
 }
   ```
   This makes possible to use `Object` in `find(Object key)` because we can use 
`keyObjectExtractor.get()` whenever we need key `Object` value, thus making 
code cleaner.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988871#comment-16988871
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r354303721
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   Yes, there is a need to have multiple methods for each key type to avoid 
invocation of generalized `keyReader.getObject()` which contains `switch` 
checking each type (really bad in production) - will fix it.
   
   *EDIT*: actually, there is another, rather more elegant, way to hadle this 
without the need to duplicate code/add many methods:
   an instance variable `private final Supplier keyObjectExtractor` can 
be defined in `DictReaderImpl` which will be initiated during creation as 
   ```
   private Supplier getKeyObjectExtractor() {
   switch (keyColumnType()) {
 case BOOLEAN:
   return keyReader::getBoolean;
 case BYTES:
   return keyReader::getBytes;
 ...
   }
 }
   ```
   This makes possible to use `Object` in `find(Object key)` thus making code 
cleaner.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988793#comment-16988793
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r354303721
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   Yes, there is a need to have multiple methods for each key type to avoid 
invocation of generalized `keyReader.getObject()` which contains `switch` 
checking each type (really bad in production) - will fix it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988754#comment-16988754
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r354275004
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   @paul-rogers, first of all, thank you for your thoughts - they are very 
useful.
   
   Regarding the first case you described (I suppose you meant `ScalarReader 
kw` and `MapReader vr` in the first code snippet):
   one is able to write code with cached readers currently, if needed for such 
iterations over key-value pairs in the form you wrote it
   ```
   while (dr.next()) {
 System.out.println(kr.getString());
 System.out.println(vr.getAsString());
   }
   ```
   with the behaviour you expect (without a special `Null*Reader`s). But there 
is no need for the case (here and below) to add a special `NULL`-`NULL` entry 
past the end, as iteration should be over _actual_ collection of entries, which 
were added to the `Dict` (through `DictWriter`).
   
   Regarding finding specific values:
   there is a `DictReader#getValueReader(Object key)` method (with return type 
`ObjectReader`), which allows to obtain `value`'s reader with its position set 
to the one corresponding to needed `key`. If there is no one for the `key` - 
the special `Null*Reader` (depends on the type of `value`) is returned. So, 
this approach enforces non-caching of the value reader (as you have pointed 
out). The idea was to make `null` `value`s to be handled the same, indepently 
whether the `null` is from the actual value (i.e., from some entry with some 
`key` and `null` `value` `"someKey" -> null`) or from absent `key`, e.g. your 
example:
   ```
   ObjectReader kr = dr.findStringKey("foo");
   if (kr.isNull) {
 // kr is a special null reader
   } else {
 // kr is a Map reader
   }
   ```
   is likely should be done, because of the nature of complex types in Drill - 
they are generally non-nullable (at least the ones in standard, pre-EVF, 
Drill's framework) leading to no distinction between, say, empty or `null` 
`MAP`. Also, worth noting, that `vr` (defined as `ObjectReader vr = 
dictReader.getValueReader("foo")`) will have correct behavior by itself as 
opposed to `NULL` emulation on `DictReader` level:
   ```
   if (dr.findStringKey("foo")) {
 System.out.println(vr.getAsString());
   }
   ```
   will likely need an `else` case also which should be emulated too (I mean, 
the `ValueReader` itself does not know if it points to `NULL` value within a 
dict as this convention is done on the dict-level; some value reader wrapper 
can be introduced of course, but that is another layer of complexity).
   
   But good point about poorer perfomance (with the current approach).
   
   I like your idea with `boolean findKey(...)` - something similar is done in 
standard framework as well. (In this case, your suggestion about treating 
entries by the end as if they were `null` does make sense). Perhaps, I will 
stick to it after it is clear this is definetely better.
   
   About designated method for each key type - I assume it is redundant, as 
`key` type validation is better to be performed once prior to obtaining the 
data, and general case of `Object key` should do, shouldn't it? 
   
   (Just want to note, that array-like behaviour is useful in testing 
primarily, the "real" use-case is to obtain elements by key).
   
   P.s. Sorry, missed the comment yesterday.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>   

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988729#comment-16988729
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r354275004
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   @paul-rogers, first of all, thank you for your thoughts - they are very 
useful.
   
   Regarding the first case you described (I suppose you meant `ScalarReader 
kw` and `MapReader vr` in the first code snippet):
   one is able to write code with cached readers currently, if needed for such 
iterations over key-value pairs in the form you wrote it
   ```
   while (dr.next()) {
 System.out.println(kr.getString());
 System.out.println(vr.getAsString());
   }
   ```
   with the behaviour you expect (without a special `Null*Reader`s). But there 
is no need for the case (here and below) to add a special `NULL`-`NULL` entry 
past the end, as iteration should be over _true_ collection of entries, which 
were added to the `Dict` (through `DictWriter`).
   
   Regarding finding specific values:
   there is a `DictReader#getValueReader(Object key)` method (with return type 
`ObjectReader`), which allows to obtain `value`'s reader with its position set 
to the one corresponding to needed `key`. If there is no one for the `key` - 
the special `Null*Reader` (depends on the type of `value`) is returned. So, 
this approach enforces non-caching of the value reader (as you have pointed 
out). The idea was to make `null` `value`s to be handled the same, indepently 
whether the `null` is from the actual value (i.e., from some entry with some 
`key` and `null` `value` `"someKey" -> null`) or from absent `key`, e.g. your 
example:
   ```
   ObjectReader kr = dr.findStringKey("foo");
   if (kr.isNull) {
 // kr is a special null reader
   } else {
 // kr is a Map reader
   }
   ```
   is likely should be done, because of the nature of complex types in Drill - 
they are generally non-nullable (at least the ones in standard, pre-EVF, 
Drill's framework) leading to no distinction between, say, empty or `null` 
`MAP`.
   
   But good point about poorer perfomance (with the current approach).
   
   I like your idea with `boolean findKey(...)` - something similar is done in 
standard framework as well. Perhaps, I will stick to it. (In this case, your 
suggestion about treating entries by the end as if they were `null` does make 
sense).
   
   About designated method for each key type - I assume it is redundant, as 
`key` type validation is better to be performed once prior to obtaining the 
data, and general case of `Object key` should do, shouldn't it? 
   
   (Just want to note, that array-like behaviour is useful in testing 
primarily, the "real" use-case is to obtain elements by key).
   
   P.s. Sorry, missed the comment yesterday.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987453#comment-16987453
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-561448583
 
 
   On `ProjectionType`, if the `SchemaPath` is, say, `foo`, then the 
`ProjectionType` will be `SCALAR`. I can see that this name is misleading. 
Perhaps `SIMPLE` is better. We don't know that the column **is** scalar, we 
just know that it is consistent with a scalar. It could be a map, array or dict 
as well.
   
   But, if you do add a `SchemaPath` representation for `foo['bar']`, then, 
yes, that can only be a `DICT`, so we can introduce a `DICT` projection type.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987446#comment-16987446
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r353522274
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   @KazydubB, thanks for the explanation. It clarifies something I wondered 
about: the role of the null state readers. Let's talk a bit about the intended 
semantics for the readers.
   
   I should be able to write code that caches the readers:
   
   ```
   DictReader dr = rowReader.dict("myDict");
   ScalarWriter kw = dr.key().scalar();
   MapWriter vr = dr.value.tuple();
   ```
   
   That is, I should be able to use the same key reader (`kr`) and value reader 
(`vr`) for all members. Then, I should either be able to iterate over the 
values:
   
   ```
   while (dr.next()) {
 System.out.println(kr.getString());
 System.out.println(vr.getAsString());
   }
   ```
   
   If things work that way, then we don't need a special reader for the null 
state: the value reader (here, `vr`) would simply return `true` from `isNull()`:
   
   ```
   while (dr.next()) {
 System.out.println(kr.getString());
 if (vr.isNull()) {
   System.out.println("NULL");
 } else {
   System.out.println(vr.getAsString());
 }
   }
   ```
   
   (Ignoring, for the moment, that `getAsString()` already does a null check.)
   
   Note that, if we read past the end, both the key and value writers will 
point to nothing. I think that, for the array reader, accessing the value one 
past the end is an error. For `DICT`, we could simply declare that the value is 
NULL.
   
   Then, I was going to sketch out the same behavior if we search for a key and 
found...nothing. Seems that the `DictReader` has no way to search for a key. If 
it did, it would not be clear what type to use for the key. So, that is a bit 
of a mess.
   
   Let's assume we did have something to find a specific value, like we do for 
the arrays where we have `setPosn(int index)`. Maybe we have `findStringKey()` 
etc. For testing, we could just have `findKey(Object key)` and do a switch on 
type as we do for the other readers. So:
   
   ```
 dr.findStringKey("foo");
 if (vr.isNull()) {
   System.out.println("NULL");
 } else {
   System.out.println(vr.getAsString());
 }
   ```
   
   Or, even:
   
   ```
 if (dr.findStringKey("foo")) {
   System.out.println(vr.getAsString());
 }
   ```
   
   The only way to find the value is to do a linear search. If we do a linear 
search, and no value exists, the index will point one past the end of the dict. 
And, above, we suggested that the key and value readers could return `true` for 
`isNull()` in those cases.
   
   Given this, unless I'm missing something, I don't see that we need any kind 
of null value reader at all. In fact, I'd even say that we don't want such a 
reader if we'd have to do:
   
   ```
   ObjectReader kr = dr.findStringKey("foo");
   if (kr.isNull) {
 // kr is a special null reader
   } else {
 // kr is a Map reader
   }
   ```
   
   The above clutters the code and makes it impossible to cache the value 
reader, which will slow inner-most loops that want to use the reader 
abstractions.
   
   What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986826#comment-16986826
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-561135023
 
 
   @paul-rogers, `ProjectionType` now has no notion of `MinorType#DICT` and 
treats it as `ProjectionType#SCALAR`. And yes, we do need to allow Python-like 
syntax for the type. (My bad, haven't made changes to this).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986824#comment-16986824
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r353132642
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   Regarding points of your suggestion:
   - `DictReaderImpl` has `NullStateReaders.REQUIRED_STATE_READER` 
`NullStateReader`, i.e. it can't be `NULL` (like `MapReader`).
   - `DICT` can't have `NULL` key as it is `REQUIRED`.
   - If the `DICT` has a key, but the value is `NULL`, then `isNull()` for the 
value returns `true` - yes.
   - in the (tricky) case of a missing key from the `DICT`, the latter approach 
was taken - there is no differentiation between missing key and `NULL` value. 
(The new `Null*Reader`s are introduced to be used in the case of a missing key 
- their `isNull()` always returns `true`; note, that in case when a key is 
present and the value is `NULL`, current (non-null) readers' `isNull()` will 
return `true` for that particular position, as corresponding value reader keeps 
track of nullability). The former approach was discarded to ensure uniform 
behavior without a chance of getting `NPE` or making `null`-checks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986227#comment-16986227
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r352737013
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   We may want to define our semantics here. If I understand your description, 
we have:
   
   * The entire `DICT` is `NULL` (is this allowed)?
   * The `DICT` has an entry for key `x`, but the value is null.
   * The `DICT` has an entry for the `NULL` key.
   * The `DICT` has no entry for the key `x`, meaning that the value is 
implicitly `NULL`.
   
   We have similar situations with the `UNION` type, perhaps we can learn from 
that:
   
   * The entire `UNION` can be `NULL`.
   * The `UNION` can have an entry for, say, `VARCHAR`, but the `VARCHAR` is 
`NULL`.
   * The `UNION` does not have an entry for a given type (say, `INT`), so the 
value is implicitly `NULL`. (Although we don't allow access to such a type.)
   
   My suggestion would be that we use the `isNull()` method for all the `NULL` 
cases:
   
   * If the `DICT` is `NULL` for a row, then `isNull()` returns true for the 
`DICT` reader.
   * If the `DICT` has a `NULL` key, then the key `isNull()` for the key 
returns true.
   * If the `DICT` has a key, but the value is `NULL`, then `isNull()` for the 
value returns true.
   
   Now the tricky case: a key is missing from the `DICT`:
   
   * If the `DICT` does not have a key, then we may want a `hasKey()` method, 
and the value reader should be a Java `null` because it does not exist.
   
   Or:
   
   * If the `DICT` does not have a key, then the value reader's `isNull()` 
returns true. (This means we cannot differentiate the has-key-with-null-value 
vs. does-not-have-a-key cases.)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986226#comment-16986226
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r352737013
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   We may want to define our semantics here. If I understand your description, 
we have:
   
   * The entire `DICT` is `NULL` (is this allowed)?
   * The `DICT` has an entry for key `x`, but the value is null.
   * The `DICT` has an entry for the `NULL` key.
   * The `DICT` has no entry for the key `x`, meaning that the value is 
implicitly `NULL`.
   
   We have similar situations with the `UNION` type, perhaps we can learn from 
that:
   
   * The entire `UNION` can be `NULL`.
   * The `UNION` can have an entry for, say, `VARCHAR`, but the `VARCHAR` is 
`NULL`.
   * The `UNION` does not have an entry for a given type (say, `INT`), so the 
value is implicitly `NULL`. (Although we don't allow access to such a type.)
   
   My suggestion would be that we use the `isNull()` method for all the `NULL` 
cases:
   
   * If the `DICT` is `NULL` for a row, then `isNull()` returns true for the 
`DICT` reader.
   * If the `DICT` has a `NULL` key, then the key `isNull()` for the key 
returns true.
   * If the `DICT` has a key, but the value is `NULL`, then `isNull()` for the 
value returns true.
   
   Not the tricky case:
   
   * If the `DICT` does not have a key, then we may want a `hasKey()` method, 
and the value reader should be a Java `null` because it does not exist.
   
   Or:
   
   * If the `DICT` does not have a key, then the value reader's `isNull()` 
returns true. (This means we cannot differentiate the has-key-with-null-value 
vs. does-not-have-a-key cases.)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986215#comment-16986215
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-560498165
 
 
   @arina-ielchiieva, @KazydubB, it turns out that `ProjectionType` does 
support `DICT`. This is a subtle topic. The projection type encodes information 
we can glean from the `SELECT` list, as encoded in the `SchemaPath`: `a`, 
`a[3]`, `a.b`. As a result, `DICT` is already covered: a reference will be 
either to the whole dict: `d`, or to a member: `d.m`.
   
   We would change the `ProjectionType` only if we enhance the `SchemaPath` to, 
say, allow Python-like syntax: `d[`foo`]. Did we make any such changes?  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986129#comment-16986129
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

arina-ielchiieva commented on issue #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-560440925
 
 
   @KazydubB looks like 
`org.apache.drill.exec.physical.resultSet.project.ProjectionType` does not 
support Dict type, could you please check?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986067#comment-16986067
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r352608433
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   Yes, this is for the case "where the `value` of a `key`/`value` pair is 
null". This `value` can be `null` for a given `key` if the `key` is not present 
in the dict for a given row (e.g. if you have a `Map` with 
entry `"a" : 1` and you try to `Map#get("b")`, with `Map` being a Java map in 
the example). That is, the value may be non-nullable, but when you fetch a 
`value` by `key` and there is no entry for the given key (in a row) then this 
`Null*Reader` is used. The `Null*Reader`s are created in 
https://github.com/apache/drill/pull/1870/files#diff-da401fee5976aec49192b5fb6e22de3eR83.
 This structures are handled outside (of particular `DictReader`) so the 
structure is retained for uniformity. 
   
   The "dummy" readers are not projected (merely a placeholder, aren't they?), 
so this is not the way to use them(?).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-12-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986066#comment-16986066
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r352608433
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   Yes, this is for the case "where the `value` of a `key`/`value` pair is 
null". This `value` can be `null` for a given `key` if the `key` is not present 
in the dict for a given row (e.g. if you have a `Map` with 
entry `"a" : 1` and you try to `Map#get("b")`, with `Map` being a Java map in 
the example). That is, the value may be non-nullable, but when you fetch a 
`value` by `key` and there is no entry for the given key (in a row) then this 
`Null*Reader` is used. The `Null*Reader`s are created in 
https://github.com/apache/drill/pull/1870/files#diff-da401fee5976aec49192b5fb6e22de3eR83.
 This structures are handled outside (of particular `DictReader`) so the 
structure is retained for uniformity. 
   The "dummy" readers are not projected (merely a placeholder, aren't they?), 
so this is not the way to use them(?).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985210#comment-16985210
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r352252495
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##
 @@ -268,4 +274,51 @@ public String getAsString() {
 }
 return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+for (int i = 0; i < variants.length; i++) {
+  nullVariants[i] = variants[i].createNullReader();
+}
+return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   This approach is a bit better than the big null reader we had before. I 
wonder, however, why we need such a reader? Null (AKA "dummy") readers exist, 
but generally for specialized cases (such as a List that has no type.) I 
wonder, is this for the case where the value of a key/value pair is null?
   
   If so, then two thoughts. First, the right way to handle that is to:
   
   * Use the `isNull()` method on the `ColumnReader` class:
   
   ```
   if (myDictReader.value().isNull()) {
 // Handle null value
   } else {
 int x = myDictReader.value().scalar().getInt();
   }
   ```
   
   The idea is that null values are null: they have no structure. The approach 
here seems to say that null values have structure (such as a null union), so we 
can ask what types the null has and so on. But, this is, perhaps, too much of a 
fantasy.
   
   The other suggestion would be to suggest that values cannot be null, 
especially if they are of structured types.
   
   Also, note that, for unions, we already have multiple levels of nulls: The 
union itself can be null. When used in a list, the list entry can be null. The 
union can have nullable types, so that the union might have an int, but the int 
is null. This is already a mess, we probably don't want to make it even more 
complex.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984558#comment-16984558
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-559567933
 
 
   @paul-rogers, I've updated the PR, please look again
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978532#comment-16978532
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r348557875
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java
 ##
 @@ -0,0 +1,369 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor.reader;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.VariantMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayReader;
+import org.apache.drill.exec.vector.accessor.DictReader;
+import org.apache.drill.exec.vector.accessor.ObjectReader;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ScalarReader;
+import org.apache.drill.exec.vector.accessor.TupleReader;
+import org.apache.drill.exec.vector.accessor.ValueType;
+import org.apache.drill.exec.vector.accessor.VariantReader;
+import org.joda.time.Instant;
+import org.joda.time.LocalDate;
+import org.joda.time.LocalTime;
+import org.joda.time.Period;
+
+import java.math.BigDecimal;
+
+/**
+ * Dummy reader which returns {@code null} for scalar types and itself for 
complex types.
+ */
+public class NullReader implements ScalarReader, ArrayReader, TupleReader, 
VariantReader, ObjectReader, DictReader {
 
 Review comment:
   I agree with your thoughts: will use existing 
`AbstractScalarReader.NullReader` in case of scalar values and will change the 
`NullReader` for complex case in a similar (to 
`AbstractScalarReader.NullReader`) maner.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978436#comment-16978436
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r348488284
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/ObjectDictWriter.java
 ##
 @@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor.writer;
+
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.vector.UInt4Vector;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.ValueType;
+import org.apache.drill.exec.vector.accessor.impl.HierarchicalFormatter;
+import org.apache.drill.exec.vector.complex.DictVector;
+import org.apache.drill.exec.vector.complex.RepeatedDictVector;
+
+import java.lang.reflect.Array;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * The implementation represents the writer as an array writer
+ * with special dict entry writer as its element writer.
+ */
+public class ObjectDictWriter extends ObjectArrayWriter implements DictWriter {
+
+  public static class DictObjectWriter extends 
AbstractArrayWriter.ArrayObjectWriter {
+
+public DictObjectWriter(ObjectDictWriter dictWriter) {
+  super(dictWriter);
+}
+
+@Override
+public DictWriter dict() {
+  return (DictWriter) arrayWriter;
+}
+
+@Override
+public void dump(HierarchicalFormatter format) {
+  format.startObject(this)
+  .attribute("dictWriter");
+  arrayWriter.dump(format);
+  format.endObject();
+}
+  }
+
+  public static ObjectDictWriter.DictObjectWriter buildDict(ColumnMetadata 
metadata, DictVector vector,
+
List keyValueWriters) {
+DictEntryWriter.DictEntryObjectWriter entryObjectWriter = 
DictEntryWriter.buildDictEntryWriter(metadata, keyValueWriters);
+ObjectDictWriter objectDictWriter = new ObjectDictWriter(metadata, 
vector.getOffsetVector(), entryObjectWriter);
+return new ObjectDictWriter.DictObjectWriter(objectDictWriter);
+  }
+
+  public static ArrayObjectWriter buildDictArray(RepeatedDictVector vector, 
ColumnMetadata metadata,
+ List 
keyValueWriters) {
+final DictVector dataVector = (DictVector) vector.getDataVector();
+ObjectDictWriter.DictObjectWriter dictWriter = buildDict(metadata, 
dataVector, keyValueWriters);
+AbstractArrayWriter arrayWriter = new ObjectArrayWriter(metadata, 
vector.getOffsetVector(), dictWriter);
+return new ArrayObjectWriter(arrayWriter);
+  }
+
+  private static final int FIELD_KEY_ORDINAL = 0;
+  private static final int FIELD_VALUE_ORDINAL = 1;
+
+  private final DictEntryWriter.DictEntryObjectWriter entryObjectWriter;
+
+  private ObjectDictWriter(ColumnMetadata schema, UInt4Vector offsetVector,
+   DictEntryWriter.DictEntryObjectWriter 
entryObjectWriter) {
+super(schema, offsetVector, entryObjectWriter);
+this.entryObjectWriter = entryObjectWriter;
+  }
+
+  @Override
+  public ValueType keyType() {
+return tuple().scalar(FIELD_KEY_ORDINAL).valueType();
+  }
+
+  @Override
+  public ObjectType valueType() {
+return tuple().type(FIELD_VALUE_ORDINAL);
+  }
+
+  @Override
+  public ScalarWriter keyWriter() {
+return tuple().scalar(FIELD_KEY_ORDINAL);
+  }
+
+  @Override
+  public ObjectWriter valueWriter() {
+return tuple().column(FIELD_VALUE_ORDINAL);
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public void setObject(Object object) {
+if (object instanceof Map) {
+  Map map = (Map) object;
+  for (Map.Entry entry : map.entrySet()) {
+entryObjectWriter.setObject(0, 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978434#comment-16978434
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r348486117
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java
 ##
 @@ -0,0 +1,377 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor.reader;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.VariantMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayReader;
+import org.apache.drill.exec.vector.accessor.DictReader;
+import org.apache.drill.exec.vector.accessor.ObjectReader;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ScalarReader;
+import org.apache.drill.exec.vector.accessor.TupleReader;
+import org.apache.drill.exec.vector.accessor.ValueType;
+import org.apache.drill.exec.vector.accessor.VariantReader;
+import org.joda.time.Instant;
+import org.joda.time.LocalDate;
+import org.joda.time.LocalTime;
+import org.joda.time.Period;
+
+import java.math.BigDecimal;
+
+/**
+ * Dummy reader which returns {@code null} on access to getters for scalar 
types
+ * ({@link #getString()}, {@link #getBytes()} etc.) and itself on access to 
reader types
+ * ({@link #scalar()}, {@link #tuple()} etc.). Used when accessing {@link 
DictReader} by non-existent key.
+ * {@link #isNull()} returns {@code true}.
+ *
+ * Does not preserve metadata structure. Other methods throw {@link 
UnsupportedOperationException}. todo:
+ *
+ * @see DictReader
+ */
+public class NullReader implements ScalarReader, ArrayReader, TupleReader, 
VariantReader, ObjectReader, DictReader {
+
+  private static final NullReader INSTANCE = new NullReader();
+
+  public static NullReader instance() {
+return INSTANCE;
+  }
+
+  @Override
+  public int size() {
+notSupported("size()");
+return 0;
+  }
+
+  @Override
+  public ObjectType entryType() {
+notSupported("entryType()");
+return null;
+  }
+
+  @Override
+  public ObjectReader entry() {
+notSupported("entry()");
+return null;
+  }
+
+  @Override
+  public ScalarReader scalar() {
+return this;
+  }
+
+  @Override
+  public TupleReader tuple() {
+return this;
+  }
+
+  @Override
+  public ArrayReader array() {
+return this;
+  }
+
+  @Override
+  public VariantReader variant() {
+return this;
+  }
+
+  @Override
+  public DictReader dict() {
+return this;
+  }
+
+  @Override
+  public void setPosn(int index) {
+notSupported("setPosn(int)");
+  }
+
+  @Override
+  public void rewind() {
+notSupported("rewind()");
+  }
+
+  @Override
+  public boolean next() {
+return false;
+  }
+
+  @Override
+  public ValueType valueType() {
+notSupported("valueType()");
+return null;
+  }
+
+  @Override
+  public ValueType extendedType() {
+notSupported("extendedType()");
+return null;
+  }
+
+  @Override
+  public int getInt() {
+notSupported("getInt()");
+return 0;
+  }
+
+  @Override
+  public boolean getBoolean() {
+notSupported("getBoolean()");
+return false;
+  }
+
+  @Override
+  public long getLong() {
+notSupported("getLong()");
+return 0;
+  }
+
+  @Override
+  public double getDouble() {
+notSupported("getDouble()");
+return 0;
+  }
+
+  @Override
+  public String getString() {
+return null;
+  }
+
+  @Override
+  public byte[] getBytes() {
+return null;
+  }
+
+  @Override
+  public BigDecimal getDecimal() {
+return null;
+  }
+
+  @Override
+  public Period getPeriod() {
+return null;
+  }
+
+  @Override
+  public LocalDate getDate() {
+return null;
+  }
+
+  @Override
+  public LocalTime getTime() {
+return null;
+  }
+
+  @Override
+  public Instant getTimestamp() {
+

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978432#comment-16978432
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r348485350
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java
 ##
 @@ -0,0 +1,377 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor.reader;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.VariantMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayReader;
+import org.apache.drill.exec.vector.accessor.DictReader;
+import org.apache.drill.exec.vector.accessor.ObjectReader;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ScalarReader;
+import org.apache.drill.exec.vector.accessor.TupleReader;
+import org.apache.drill.exec.vector.accessor.ValueType;
+import org.apache.drill.exec.vector.accessor.VariantReader;
+import org.joda.time.Instant;
+import org.joda.time.LocalDate;
+import org.joda.time.LocalTime;
+import org.joda.time.Period;
+
+import java.math.BigDecimal;
+
+/**
+ * Dummy reader which returns {@code null} on access to getters for scalar 
types
+ * ({@link #getString()}, {@link #getBytes()} etc.) and itself on access to 
reader types
+ * ({@link #scalar()}, {@link #tuple()} etc.). Used when accessing {@link 
DictReader} by non-existent key.
+ * {@link #isNull()} returns {@code true}.
+ *
+ * Does not preserve metadata structure. Other methods throw {@link 
UnsupportedOperationException}. todo:
+ *
+ * @see DictReader
+ */
+public class NullReader implements ScalarReader, ArrayReader, TupleReader, 
VariantReader, ObjectReader, DictReader {
+
+  private static final NullReader INSTANCE = new NullReader();
+
+  public static NullReader instance() {
+return INSTANCE;
+  }
+
+  @Override
+  public int size() {
+notSupported("size()");
+return 0;
+  }
+
+  @Override
+  public ObjectType entryType() {
+notSupported("entryType()");
+return null;
+  }
+
+  @Override
+  public ObjectReader entry() {
+notSupported("entry()");
+return null;
+  }
+
+  @Override
+  public ScalarReader scalar() {
+return this;
+  }
+
+  @Override
+  public TupleReader tuple() {
+return this;
+  }
+
+  @Override
+  public ArrayReader array() {
+return this;
+  }
+
+  @Override
+  public VariantReader variant() {
+return this;
+  }
+
+  @Override
+  public DictReader dict() {
+return this;
+  }
+
+  @Override
+  public void setPosn(int index) {
+notSupported("setPosn(int)");
+  }
+
+  @Override
+  public void rewind() {
+notSupported("rewind()");
+  }
+
+  @Override
+  public boolean next() {
+return false;
+  }
+
+  @Override
+  public ValueType valueType() {
+notSupported("valueType()");
+return null;
+  }
+
+  @Override
+  public ValueType extendedType() {
+notSupported("extendedType()");
+return null;
+  }
+
+  @Override
+  public int getInt() {
+notSupported("getInt()");
+return 0;
+  }
+
+  @Override
+  public boolean getBoolean() {
+notSupported("getBoolean()");
+return false;
+  }
+
+  @Override
+  public long getLong() {
+notSupported("getLong()");
+return 0;
+  }
+
+  @Override
+  public double getDouble() {
+notSupported("getDouble()");
+return 0;
+  }
+
+  @Override
+  public String getString() {
+return null;
+  }
+
+  @Override
+  public byte[] getBytes() {
+return null;
+  }
+
+  @Override
+  public BigDecimal getDecimal() {
+return null;
+  }
+
+  @Override
+  public Period getPeriod() {
+return null;
+  }
+
+  @Override
+  public LocalDate getDate() {
+return null;
+  }
+
+  @Override
+  public LocalTime getTime() {
+return null;
+  }
+
+  @Override
+  public Instant getTimestamp() {
+

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978425#comment-16978425
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r348479761
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestRowSet.java
 ##
 @@ -586,6 +595,490 @@ public void testRepeatedMapStructure() {
 RowSetUtilities.verify(expected, actual);
   }
 
+  @Test
+  public void testDictStructure() {
+final String dictName = "d";
+
+final TupleMetadata schema = new SchemaBuilder()
+.add("id", MinorType.INT)
+.addDict(dictName, MinorType.INT)
+.value(MinorType.VARCHAR) // required int
+.resumeSchema()
+.buildSchema();
+final ExtendableRowSet rowSet = fixture.rowSet(schema);
+final RowSetWriter writer = rowSet.writer();
+
+// Dict
+// Pick out components and lightly test. (Assumes structure
+// tested earlier is still valid, so no need to exhaustively
+// test again.)
+
+assertEquals(ObjectType.ARRAY, writer.column(dictName).type());
+assertTrue(writer.column(dictName).schema().isDict());
+
+final ScalarWriter idWriter = writer.column(0).scalar();
+final DictWriter dictWriter = writer.column(1).dict();
+
+assertEquals(ValueType.INTEGER, dictWriter.keyType());
+assertEquals(ObjectType.SCALAR, dictWriter.valueType());
+
+final ScalarWriter keyWriter = dictWriter.keyWriter();
+final ScalarWriter valueWriter = dictWriter.valueWriter().scalar();
+
+assertEquals(ValueType.INTEGER, keyWriter.valueType());
+assertEquals(ValueType.STRING, valueWriter.valueType());
+
+// Write data
+idWriter.setInt(1);
+
+keyWriter.setInt(11);
+valueWriter.setString("a");
+dictWriter.save(); // Advance to next entry position
+keyWriter.setInt(12);
+valueWriter.setString("b");
+dictWriter.save();
+writer.save();
+
+idWriter.setInt(2);
+
+keyWriter.setInt(21);
+valueWriter.setString("c");
+dictWriter.save();
+writer.save();
+
+idWriter.setInt(3);
+
+keyWriter.setInt(31);
+valueWriter.setString("d");
+dictWriter.save();
+keyWriter.setInt(32);
+valueWriter.setString("e");
+dictWriter.save();
+writer.save();
+
+// Finish the row set and get a reader.
+
+final SingleRowSet actual = writer.done();
+final RowSetReader reader = actual.reader();
+
+// Verify reader structure
+
+assertEquals(ObjectType.ARRAY, reader.column(dictName).type());
+
+final DictReader dictReader = reader.dict(1);
+assertEquals(ObjectType.ARRAY, dictReader.type());
+
+assertEquals(ValueType.INTEGER, dictReader.keyColumnType());
+assertEquals(ObjectType.SCALAR, dictReader.valueColumnType());
+
+// Row 1: get value reader with its position set to entry corresponding to 
a key
+
+assertTrue(reader.next());
+assertFalse(dictReader.isNull()); // dict itself is not null
+
+dictReader.getAsString();
+assertEquals("b", dictReader.getValueReader(12).scalar().getString());
+assertEquals("a", dictReader.getValueReader(11).scalar().getString());
+
+// compare entire dict
+Map map = map(11, "a", 12, "b");
+assertEquals(map, dictReader.getObject());
+
+// Row 2: get Object representation of value directly
+
+assertTrue(reader.next());
+assertEquals("c", dictReader.get(21));
+assertTrue(dictReader.getValueReader(22).isNull()); // the dict does not 
contain an entry with the key
+
+map = map(21, "c");
+assertEquals(map, dictReader.getObject());
+
+// Row 3
+
+assertTrue(reader.next());
+
+assertEquals("d", dictReader.get(31));
+assertEquals("e", dictReader.get(32));
+
+map = map(31, "d", 32, "e");
+assertEquals(map, dictReader.getObject());
+
+assertFalse(reader.next());
+
+// Verify that the dict accessor's value count was set.
+
+final DictVector dictVector = (DictVector) 
actual.container().getValueVector(1).getValueVector();
+assertEquals(3, dictVector.getAccessor().getValueCount());
+
+final SingleRowSet expected = fixture.rowSetBuilder(schema)
+.addRow(1, objArray(11, "a", 12, "b"))
+.addRow(2, objArray(21, "c"))
+.addRow(3, objArray(31, "d", 32, "e"))
+.build();
+RowSetUtilities.verify(expected, actual);
+  }
+
+  /**
+   * Utility method to bootstrap a map object easily.
+   *
+   * @param entry key-value sequence
+   * @return map containing key-value pairs from passed sequence
+   */
+  private Map map(Object... entry) {
 
 Review comment:
   I've tried, but there was an issue with entry ordering (used `HashMap`). 
Should work with `LinkedHashMap`, I'll try (oops, haven't thought of it back 
then).
 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978407#comment-16978407
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r348472429
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/model/single/BaseReaderBuilder.java
 ##
 @@ -77,22 +81,48 @@ protected AbstractObjectReader 
buildVectorReader(ValueVector vector, VectorDescr
 final MajorType type = va.type();
 
 switch(type.getMinorType()) {
-case MAP:
-  return buildMap((AbstractMapVector) vector, va, type.getMode(), descrip);
-case UNION:
-  return buildUnion((UnionVector) vector, va, descrip);
-case LIST:
-  return buildList(vector, va, descrip);
-case LATE:
-
-  // Occurs for a list with no type: a list of nulls.
-
-  return AbstractScalarReader.nullReader(descrip.metadata);
-default:
-  return buildScalarReader(va, descrip.metadata);
+  case DICT:
+return buildDict(vector, va, descrip);
+  case MAP:
+return buildMap((AbstractMapVector) vector, va, type.getMode(), 
descrip);
+  case UNION:
+return buildUnion((UnionVector) vector, va, descrip);
+  case LIST:
+return buildList(vector, va, descrip);
+  case LATE:
+
+// Occurs for a list with no type: a list of nulls.
+
+return AbstractScalarReader.nullReader(descrip.metadata);
+  default:
+return buildScalarReader(va, descrip.metadata);
 }
   }
 
+  private AbstractObjectReader buildDict(ValueVector vector, VectorAccessor 
va, VectorDescrip descrip) {
+
+boolean isArray = descrip.metadata.isArray();
+
+DictVector dictVector;
+VectorAccessor dictAccessor;
+if (isArray) {
 
 Review comment:
   To clarify, this builds reader for `REPEATED DICT` and `DICT`. Its fields 
(`key` and `value`) are handled in `buildMapMembers(AbstractMapVector, 
MetadataProvider)` which was written for handling fields for `MAP` and 
`REPEATED MAP` and now for both cases of `DICT`. (Yeah, the method probably 
should've been renamed taking into account its usage for `DICT`).
   
   Is the above which concerns you or have I unterstood you incorrectly?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975891#comment-16975891
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347115215
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestRowSet.java
 ##
 @@ -586,6 +595,490 @@ public void testRepeatedMapStructure() {
 RowSetUtilities.verify(expected, actual);
   }
 
+  @Test
+  public void testDictStructure() {
+final String dictName = "d";
+
+final TupleMetadata schema = new SchemaBuilder()
+.add("id", MinorType.INT)
+.addDict(dictName, MinorType.INT)
+.value(MinorType.VARCHAR) // required int
+.resumeSchema()
+.buildSchema();
+final ExtendableRowSet rowSet = fixture.rowSet(schema);
+final RowSetWriter writer = rowSet.writer();
+
+// Dict
+// Pick out components and lightly test. (Assumes structure
+// tested earlier is still valid, so no need to exhaustively
+// test again.)
+
+assertEquals(ObjectType.ARRAY, writer.column(dictName).type());
+assertTrue(writer.column(dictName).schema().isDict());
+
+final ScalarWriter idWriter = writer.column(0).scalar();
+final DictWriter dictWriter = writer.column(1).dict();
+
+assertEquals(ValueType.INTEGER, dictWriter.keyType());
+assertEquals(ObjectType.SCALAR, dictWriter.valueType());
+
+final ScalarWriter keyWriter = dictWriter.keyWriter();
+final ScalarWriter valueWriter = dictWriter.valueWriter().scalar();
+
+assertEquals(ValueType.INTEGER, keyWriter.valueType());
+assertEquals(ValueType.STRING, valueWriter.valueType());
+
+// Write data
+idWriter.setInt(1);
+
+keyWriter.setInt(11);
+valueWriter.setString("a");
+dictWriter.save(); // Advance to next entry position
+keyWriter.setInt(12);
+valueWriter.setString("b");
+dictWriter.save();
+writer.save();
+
+idWriter.setInt(2);
+
+keyWriter.setInt(21);
+valueWriter.setString("c");
+dictWriter.save();
+writer.save();
+
+idWriter.setInt(3);
+
+keyWriter.setInt(31);
+valueWriter.setString("d");
+dictWriter.save();
+keyWriter.setInt(32);
+valueWriter.setString("e");
+dictWriter.save();
+writer.save();
+
+// Finish the row set and get a reader.
+
+final SingleRowSet actual = writer.done();
+final RowSetReader reader = actual.reader();
+
+// Verify reader structure
+
+assertEquals(ObjectType.ARRAY, reader.column(dictName).type());
+
+final DictReader dictReader = reader.dict(1);
+assertEquals(ObjectType.ARRAY, dictReader.type());
+
+assertEquals(ValueType.INTEGER, dictReader.keyColumnType());
+assertEquals(ObjectType.SCALAR, dictReader.valueColumnType());
+
+// Row 1: get value reader with its position set to entry corresponding to 
a key
+
+assertTrue(reader.next());
+assertFalse(dictReader.isNull()); // dict itself is not null
+
+dictReader.getAsString();
+assertEquals("b", dictReader.getValueReader(12).scalar().getString());
+assertEquals("a", dictReader.getValueReader(11).scalar().getString());
+
+// compare entire dict
+Map map = map(11, "a", 12, "b");
+assertEquals(map, dictReader.getObject());
+
+// Row 2: get Object representation of value directly
+
+assertTrue(reader.next());
+assertEquals("c", dictReader.get(21));
+assertTrue(dictReader.getValueReader(22).isNull()); // the dict does not 
contain an entry with the key
+
+map = map(21, "c");
+assertEquals(map, dictReader.getObject());
+
+// Row 3
+
+assertTrue(reader.next());
+
+assertEquals("d", dictReader.get(31));
+assertEquals("e", dictReader.get(32));
+
+map = map(31, "d", 32, "e");
+assertEquals(map, dictReader.getObject());
+
+assertFalse(reader.next());
+
+// Verify that the dict accessor's value count was set.
+
+final DictVector dictVector = (DictVector) 
actual.container().getValueVector(1).getValueVector();
+assertEquals(3, dictVector.getAccessor().getValueCount());
+
+final SingleRowSet expected = fixture.rowSetBuilder(schema)
+.addRow(1, objArray(11, "a", 12, "b"))
+.addRow(2, objArray(21, "c"))
+.addRow(3, objArray(31, "d", 32, "e"))
+.build();
+RowSetUtilities.verify(expected, actual);
+  }
+
+  /**
+   * Utility method to bootstrap a map object easily.
+   *
+   * @param entry key-value sequence
+   * @return map containing key-value pairs from passed sequence
+   */
+  private Map map(Object... entry) {
+if (entry.length % 2 == 1) {
+  throw new IllegalArgumentException("Array length should be even.");
+}
+
+Map map = new HashMap<>();
+for (int i = 0; i < entry.length; i += 2) {
+ 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975883#comment-16975883
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347115016
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestRowSet.java
 ##
 @@ -586,6 +595,490 @@ public void testRepeatedMapStructure() {
 RowSetUtilities.verify(expected, actual);
   }
 
+  @Test
+  public void testDictStructure() {
+final String dictName = "d";
+
+final TupleMetadata schema = new SchemaBuilder()
+.add("id", MinorType.INT)
+.addDict(dictName, MinorType.INT)
+.value(MinorType.VARCHAR) // required int
+.resumeSchema()
 
 Review comment:
   Nit: perhaps indent the above two lines to show that they form a nested 
structure.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975896#comment-16975896
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347115462
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java
 ##
 @@ -0,0 +1,377 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor.reader;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.VariantMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayReader;
+import org.apache.drill.exec.vector.accessor.DictReader;
+import org.apache.drill.exec.vector.accessor.ObjectReader;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ScalarReader;
+import org.apache.drill.exec.vector.accessor.TupleReader;
+import org.apache.drill.exec.vector.accessor.ValueType;
+import org.apache.drill.exec.vector.accessor.VariantReader;
+import org.joda.time.Instant;
+import org.joda.time.LocalDate;
+import org.joda.time.LocalTime;
+import org.joda.time.Period;
+
+import java.math.BigDecimal;
+
+/**
+ * Dummy reader which returns {@code null} on access to getters for scalar 
types
+ * ({@link #getString()}, {@link #getBytes()} etc.) and itself on access to 
reader types
+ * ({@link #scalar()}, {@link #tuple()} etc.). Used when accessing {@link 
DictReader} by non-existent key.
+ * {@link #isNull()} returns {@code true}.
+ *
+ * Does not preserve metadata structure. Other methods throw {@link 
UnsupportedOperationException}. todo:
+ *
+ * @see DictReader
+ */
+public class NullReader implements ScalarReader, ArrayReader, TupleReader, 
VariantReader, ObjectReader, DictReader {
+
+  private static final NullReader INSTANCE = new NullReader();
+
+  public static NullReader instance() {
+return INSTANCE;
+  }
+
+  @Override
+  public int size() {
+notSupported("size()");
+return 0;
+  }
+
+  @Override
+  public ObjectType entryType() {
+notSupported("entryType()");
+return null;
+  }
+
+  @Override
+  public ObjectReader entry() {
+notSupported("entry()");
+return null;
+  }
+
+  @Override
+  public ScalarReader scalar() {
+return this;
+  }
+
+  @Override
+  public TupleReader tuple() {
+return this;
+  }
+
+  @Override
+  public ArrayReader array() {
+return this;
+  }
+
+  @Override
+  public VariantReader variant() {
+return this;
+  }
+
+  @Override
+  public DictReader dict() {
+return this;
+  }
+
+  @Override
+  public void setPosn(int index) {
+notSupported("setPosn(int)");
+  }
+
+  @Override
+  public void rewind() {
+notSupported("rewind()");
+  }
+
+  @Override
+  public boolean next() {
+return false;
+  }
+
+  @Override
+  public ValueType valueType() {
+notSupported("valueType()");
+return null;
+  }
+
+  @Override
+  public ValueType extendedType() {
+notSupported("extendedType()");
+return null;
+  }
+
+  @Override
+  public int getInt() {
+notSupported("getInt()");
+return 0;
+  }
+
+  @Override
+  public boolean getBoolean() {
+notSupported("getBoolean()");
+return false;
+  }
+
+  @Override
+  public long getLong() {
+notSupported("getLong()");
+return 0;
+  }
+
+  @Override
+  public double getDouble() {
+notSupported("getDouble()");
+return 0;
+  }
+
+  @Override
+  public String getString() {
+return null;
+  }
+
+  @Override
+  public byte[] getBytes() {
+return null;
+  }
+
+  @Override
+  public BigDecimal getDecimal() {
+return null;
+  }
+
+  @Override
+  public Period getPeriod() {
+return null;
+  }
+
+  @Override
+  public LocalDate getDate() {
+return null;
+  }
+
+  @Override
+  public LocalTime getTime() {
+return null;
+  }
+
+  @Override
+  public Instant getTimestamp() {
+ 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975882#comment-16975882
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347114952
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/model/single/BuildVectorsFromMetadata.java
 ##
 @@ -69,7 +71,11 @@ public VectorContainer build(TupleMetadata schema) {
   private ValueVector buildVector(ColumnMetadata metadata) {
 switch (metadata.structureType()) {
 case TUPLE:
-  return buildMap(metadata);
+  if (!metadata.isDict()) {
 
 Review comment:
   This can be simplified a bit. A Dict should not be a tuple; we should define 
a new structure type to indicate that this is a DICT, not a Tuple that happens 
to act like a DICT.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975888#comment-16975888
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347115104
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestRowSet.java
 ##
 @@ -586,6 +595,490 @@ public void testRepeatedMapStructure() {
 RowSetUtilities.verify(expected, actual);
   }
 
+  @Test
+  public void testDictStructure() {
+final String dictName = "d";
+
+final TupleMetadata schema = new SchemaBuilder()
+.add("id", MinorType.INT)
+.addDict(dictName, MinorType.INT)
+.value(MinorType.VARCHAR) // required int
+.resumeSchema()
+.buildSchema();
+final ExtendableRowSet rowSet = fixture.rowSet(schema);
+final RowSetWriter writer = rowSet.writer();
+
+// Dict
+// Pick out components and lightly test. (Assumes structure
+// tested earlier is still valid, so no need to exhaustively
+// test again.)
+
+assertEquals(ObjectType.ARRAY, writer.column(dictName).type());
+assertTrue(writer.column(dictName).schema().isDict());
+
+final ScalarWriter idWriter = writer.column(0).scalar();
+final DictWriter dictWriter = writer.column(1).dict();
+
+assertEquals(ValueType.INTEGER, dictWriter.keyType());
+assertEquals(ObjectType.SCALAR, dictWriter.valueType());
+
+final ScalarWriter keyWriter = dictWriter.keyWriter();
+final ScalarWriter valueWriter = dictWriter.valueWriter().scalar();
+
+assertEquals(ValueType.INTEGER, keyWriter.valueType());
+assertEquals(ValueType.STRING, valueWriter.valueType());
+
+// Write data
+idWriter.setInt(1);
+
+keyWriter.setInt(11);
+valueWriter.setString("a");
+dictWriter.save(); // Advance to next entry position
+keyWriter.setInt(12);
+valueWriter.setString("b");
+dictWriter.save();
+writer.save();
+
+idWriter.setInt(2);
+
+keyWriter.setInt(21);
+valueWriter.setString("c");
+dictWriter.save();
+writer.save();
+
+idWriter.setInt(3);
+
+keyWriter.setInt(31);
+valueWriter.setString("d");
+dictWriter.save();
+keyWriter.setInt(32);
+valueWriter.setString("e");
+dictWriter.save();
+writer.save();
+
+// Finish the row set and get a reader.
+
+final SingleRowSet actual = writer.done();
+final RowSetReader reader = actual.reader();
+
+// Verify reader structure
+
+assertEquals(ObjectType.ARRAY, reader.column(dictName).type());
+
+final DictReader dictReader = reader.dict(1);
+assertEquals(ObjectType.ARRAY, dictReader.type());
+
+assertEquals(ValueType.INTEGER, dictReader.keyColumnType());
+assertEquals(ObjectType.SCALAR, dictReader.valueColumnType());
+
+// Row 1: get value reader with its position set to entry corresponding to 
a key
+
+assertTrue(reader.next());
+assertFalse(dictReader.isNull()); // dict itself is not null
+
+dictReader.getAsString();
+assertEquals("b", dictReader.getValueReader(12).scalar().getString());
+assertEquals("a", dictReader.getValueReader(11).scalar().getString());
+
+// compare entire dict
+Map map = map(11, "a", 12, "b");
+assertEquals(map, dictReader.getObject());
+
+// Row 2: get Object representation of value directly
+
+assertTrue(reader.next());
+assertEquals("c", dictReader.get(21));
+assertTrue(dictReader.getValueReader(22).isNull()); // the dict does not 
contain an entry with the key
+
+map = map(21, "c");
+assertEquals(map, dictReader.getObject());
+
+// Row 3
+
+assertTrue(reader.next());
+
+assertEquals("d", dictReader.get(31));
+assertEquals("e", dictReader.get(32));
+
+map = map(31, "d", 32, "e");
+assertEquals(map, dictReader.getObject());
+
+assertFalse(reader.next());
+
+// Verify that the dict accessor's value count was set.
+
+final DictVector dictVector = (DictVector) 
actual.container().getValueVector(1).getValueVector();
+assertEquals(3, dictVector.getAccessor().getValueCount());
+
+final SingleRowSet expected = fixture.rowSetBuilder(schema)
+.addRow(1, objArray(11, "a", 12, "b"))
+.addRow(2, objArray(21, "c"))
+.addRow(3, objArray(31, "d", 32, "e"))
 
 Review comment:
   Using `objArray` to create DICT values is not entirely intuitive. But, I 
could not think of any other one-liner way to do it, so I think you've found 
the simplest solution.
   
   Have not yet reviewed the `setObject()` code, but would be handy to also 
take a `Map` and split that into its key/value pairs in the Dict. Also, 
`getObject()` should return a `Map` if it doesn't already. 
 

This is an 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975893#comment-16975893
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347115621
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/DictEntryWriter.java
 ##
 @@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor.writer;
+
+import java.util.List;
+import java.util.Map;
+
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.vector.accessor.ColumnWriterIndex;
+import org.apache.drill.exec.vector.accessor.impl.HierarchicalFormatter;
+
+/**
+ * Writer for a Dict entry. The entry is a special kind of tuple.
+ */
+public class DictEntryWriter extends AbstractTupleWriter {
+
+  public static class DictEntryObjectWriter extends TupleObjectWriter {
+
+public DictEntryObjectWriter(DictEntryWriter entryWriter) {
+  super(entryWriter);
+}
+
+public void setObject(int col, Object value) {
+  tupleWriter.set(col, value);
+}
+
+@Override
+public void dump(HierarchicalFormatter format) {
+  format.startObject(this)
+  .attribute("dictEntryWriter");
+  tupleWriter.dump(format);
+  format.endObject();
+}
+  }
+
+  public static DictEntryObjectWriter buildDictEntryWriter(ColumnMetadata 
schema,
+   
List keyValueWriters) {
+assert keyValueWriters.size() == 2;
+DictEntryWriter dictEntryWriter = new DictEntryWriter(schema, 
keyValueWriters);
+return new DictEntryObjectWriter(dictEntryWriter);
+  }
+
+  private final ColumnMetadata dictColumnSchema;
+
+  public DictEntryWriter(ColumnMetadata schema, List 
writers) {
+super(schema.tupleSchema(), writers);
+dictColumnSchema = schema;
+  }
+
+  @Override
+  public void bindIndex(ColumnWriterIndex index) {
+
+// Similarly to a repeated map, the provided index is an array element
+// index. Convert this to an index that will not increment the element
+// index on each write so that a dict with key and value members won't
+// increment the index for each member. Rather, the index must be
+// incremented at the array level.
+
+bindIndex(index, new MemberWriterIndex(index));
+  }
+
+  @Override
+  public boolean isProjected() {
+return true;
+  }
 
 Review comment:
   Can't remember how this is supposed to work; it could be it ended up not be 
being used.
   
   But, this does suggest a missing test case: in the result set loader tests, 
test the case in which a reader can provide a DICT, but the client does not 
project that. That is, if a reader has a DICT `d`, along with other columns 
`a`, `b` and `c`, we want to make sure this works:
   
   ```
   SELECT a, c FROM myTable
   ```
   
   Then, we write to the Dict. In other cases, we create dummy writers which 
accept the unwanted values, but then just throw them away. That way, the (data) 
reader code need not special case whether the column is projected or not.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975895#comment-16975895
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347115742
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/ObjectDictWriter.java
 ##
 @@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor.writer;
+
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.vector.UInt4Vector;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.ValueType;
+import org.apache.drill.exec.vector.accessor.impl.HierarchicalFormatter;
+import org.apache.drill.exec.vector.complex.DictVector;
+import org.apache.drill.exec.vector.complex.RepeatedDictVector;
+
+import java.lang.reflect.Array;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * The implementation represents the writer as an array writer
+ * with special dict entry writer as its element writer.
+ */
+public class ObjectDictWriter extends ObjectArrayWriter implements DictWriter {
+
+  public static class DictObjectWriter extends 
AbstractArrayWriter.ArrayObjectWriter {
+
+public DictObjectWriter(ObjectDictWriter dictWriter) {
+  super(dictWriter);
+}
+
+@Override
+public DictWriter dict() {
+  return (DictWriter) arrayWriter;
+}
+
+@Override
+public void dump(HierarchicalFormatter format) {
+  format.startObject(this)
+  .attribute("dictWriter");
+  arrayWriter.dump(format);
+  format.endObject();
+}
+  }
+
+  public static ObjectDictWriter.DictObjectWriter buildDict(ColumnMetadata 
metadata, DictVector vector,
+
List keyValueWriters) {
+DictEntryWriter.DictEntryObjectWriter entryObjectWriter = 
DictEntryWriter.buildDictEntryWriter(metadata, keyValueWriters);
+ObjectDictWriter objectDictWriter = new ObjectDictWriter(metadata, 
vector.getOffsetVector(), entryObjectWriter);
+return new ObjectDictWriter.DictObjectWriter(objectDictWriter);
+  }
+
+  public static ArrayObjectWriter buildDictArray(RepeatedDictVector vector, 
ColumnMetadata metadata,
+ List 
keyValueWriters) {
+final DictVector dataVector = (DictVector) vector.getDataVector();
+ObjectDictWriter.DictObjectWriter dictWriter = buildDict(metadata, 
dataVector, keyValueWriters);
+AbstractArrayWriter arrayWriter = new ObjectArrayWriter(metadata, 
vector.getOffsetVector(), dictWriter);
+return new ArrayObjectWriter(arrayWriter);
+  }
+
+  private static final int FIELD_KEY_ORDINAL = 0;
+  private static final int FIELD_VALUE_ORDINAL = 1;
+
+  private final DictEntryWriter.DictEntryObjectWriter entryObjectWriter;
+
+  private ObjectDictWriter(ColumnMetadata schema, UInt4Vector offsetVector,
+   DictEntryWriter.DictEntryObjectWriter 
entryObjectWriter) {
+super(schema, offsetVector, entryObjectWriter);
+this.entryObjectWriter = entryObjectWriter;
+  }
+
+  @Override
+  public ValueType keyType() {
+return tuple().scalar(FIELD_KEY_ORDINAL).valueType();
+  }
+
+  @Override
+  public ObjectType valueType() {
+return tuple().type(FIELD_VALUE_ORDINAL);
+  }
+
+  @Override
+  public ScalarWriter keyWriter() {
+return tuple().scalar(FIELD_KEY_ORDINAL);
+  }
+
+  @Override
+  public ObjectWriter valueWriter() {
+return tuple().column(FIELD_VALUE_ORDINAL);
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public void setObject(Object object) {
+if (object instanceof Map) {
+  Map map = (Map) object;
+  for (Map.Entry entry : map.entrySet()) {
+entryObjectWriter.setObject(0, 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975886#comment-16975886
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347115170
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestRowSet.java
 ##
 @@ -586,6 +595,490 @@ public void testRepeatedMapStructure() {
 RowSetUtilities.verify(expected, actual);
   }
 
+  @Test
+  public void testDictStructure() {
+final String dictName = "d";
+
+final TupleMetadata schema = new SchemaBuilder()
+.add("id", MinorType.INT)
+.addDict(dictName, MinorType.INT)
+.value(MinorType.VARCHAR) // required int
+.resumeSchema()
+.buildSchema();
+final ExtendableRowSet rowSet = fixture.rowSet(schema);
+final RowSetWriter writer = rowSet.writer();
+
+// Dict
+// Pick out components and lightly test. (Assumes structure
+// tested earlier is still valid, so no need to exhaustively
+// test again.)
+
+assertEquals(ObjectType.ARRAY, writer.column(dictName).type());
+assertTrue(writer.column(dictName).schema().isDict());
+
+final ScalarWriter idWriter = writer.column(0).scalar();
+final DictWriter dictWriter = writer.column(1).dict();
+
+assertEquals(ValueType.INTEGER, dictWriter.keyType());
+assertEquals(ObjectType.SCALAR, dictWriter.valueType());
+
+final ScalarWriter keyWriter = dictWriter.keyWriter();
+final ScalarWriter valueWriter = dictWriter.valueWriter().scalar();
+
+assertEquals(ValueType.INTEGER, keyWriter.valueType());
+assertEquals(ValueType.STRING, valueWriter.valueType());
+
+// Write data
+idWriter.setInt(1);
+
+keyWriter.setInt(11);
+valueWriter.setString("a");
+dictWriter.save(); // Advance to next entry position
+keyWriter.setInt(12);
+valueWriter.setString("b");
+dictWriter.save();
+writer.save();
+
+idWriter.setInt(2);
+
+keyWriter.setInt(21);
+valueWriter.setString("c");
+dictWriter.save();
+writer.save();
+
+idWriter.setInt(3);
+
+keyWriter.setInt(31);
+valueWriter.setString("d");
+dictWriter.save();
+keyWriter.setInt(32);
+valueWriter.setString("e");
+dictWriter.save();
+writer.save();
+
+// Finish the row set and get a reader.
+
+final SingleRowSet actual = writer.done();
+final RowSetReader reader = actual.reader();
+
+// Verify reader structure
+
+assertEquals(ObjectType.ARRAY, reader.column(dictName).type());
+
+final DictReader dictReader = reader.dict(1);
+assertEquals(ObjectType.ARRAY, dictReader.type());
+
+assertEquals(ValueType.INTEGER, dictReader.keyColumnType());
+assertEquals(ObjectType.SCALAR, dictReader.valueColumnType());
+
+// Row 1: get value reader with its position set to entry corresponding to 
a key
+
+assertTrue(reader.next());
+assertFalse(dictReader.isNull()); // dict itself is not null
+
+dictReader.getAsString();
+assertEquals("b", dictReader.getValueReader(12).scalar().getString());
+assertEquals("a", dictReader.getValueReader(11).scalar().getString());
+
+// compare entire dict
+Map map = map(11, "a", 12, "b");
+assertEquals(map, dictReader.getObject());
+
+// Row 2: get Object representation of value directly
+
+assertTrue(reader.next());
+assertEquals("c", dictReader.get(21));
+assertTrue(dictReader.getValueReader(22).isNull()); // the dict does not 
contain an entry with the key
+
+map = map(21, "c");
+assertEquals(map, dictReader.getObject());
+
+// Row 3
+
+assertTrue(reader.next());
+
+assertEquals("d", dictReader.get(31));
+assertEquals("e", dictReader.get(32));
+
+map = map(31, "d", 32, "e");
+assertEquals(map, dictReader.getObject());
+
+assertFalse(reader.next());
+
+// Verify that the dict accessor's value count was set.
+
+final DictVector dictVector = (DictVector) 
actual.container().getValueVector(1).getValueVector();
+assertEquals(3, dictVector.getAccessor().getValueCount());
+
+final SingleRowSet expected = fixture.rowSetBuilder(schema)
+.addRow(1, objArray(11, "a", 12, "b"))
+.addRow(2, objArray(21, "c"))
+.addRow(3, objArray(31, "d", 32, "e"))
+.build();
+RowSetUtilities.verify(expected, actual);
+  }
+
+  /**
+   * Utility method to bootstrap a map object easily.
+   *
+   * @param entry key-value sequence
+   * @return map containing key-value pairs from passed sequence
+   */
+  private Map map(Object... entry) {
+if (entry.length % 2 == 1) {
+  throw new IllegalArgumentException("Array length should be even.");
+}
+
+Map map = new HashMap<>();
+for (int i = 0; i < entry.length; i += 2) {
+ 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975881#comment-16975881
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347114856
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/model/single/BaseReaderBuilder.java
 ##
 @@ -77,22 +81,48 @@ protected AbstractObjectReader 
buildVectorReader(ValueVector vector, VectorDescr
 final MajorType type = va.type();
 
 switch(type.getMinorType()) {
-case MAP:
-  return buildMap((AbstractMapVector) vector, va, type.getMode(), descrip);
-case UNION:
-  return buildUnion((UnionVector) vector, va, descrip);
-case LIST:
-  return buildList(vector, va, descrip);
-case LATE:
-
-  // Occurs for a list with no type: a list of nulls.
-
-  return AbstractScalarReader.nullReader(descrip.metadata);
-default:
-  return buildScalarReader(va, descrip.metadata);
+  case DICT:
+return buildDict(vector, va, descrip);
+  case MAP:
+return buildMap((AbstractMapVector) vector, va, type.getMode(), 
descrip);
+  case UNION:
+return buildUnion((UnionVector) vector, va, descrip);
+  case LIST:
+return buildList(vector, va, descrip);
+  case LATE:
+
+// Occurs for a list with no type: a list of nulls.
+
+return AbstractScalarReader.nullReader(descrip.metadata);
+  default:
+return buildScalarReader(va, descrip.metadata);
 }
   }
 
+  private AbstractObjectReader buildDict(ValueVector vector, VectorAccessor 
va, VectorDescrip descrip) {
+
+boolean isArray = descrip.metadata.isArray();
+
+DictVector dictVector;
+VectorAccessor dictAccessor;
+if (isArray) {
 
 Review comment:
   It *should* be possible to let the column builder recursively build the 
readers for each of the two component fields, handling the details of arrays, 
lists. repeated lists and so on. That this code works is probably due to 
insufficient tests: do we have tests where the value is, say, a union, or a 
list (of unions), etc.?
   
   This is a very tricky bit of the code, took me forever to work out how to 
properly build nested structures.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975880#comment-16975880
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347114932
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/model/single/BaseWriterBuilder.java
 ##
 @@ -75,19 +78,32 @@ protected BaseWriterBuilder(ColumnConversionFactory 
conversionFactory) {
   private AbstractObjectWriter buildVectorWriter(ValueVector vector, 
VectorDescrip descrip) {
 final MajorType type = vector.getField().getType();
 switch (type.getMinorType()) {
-case MAP:
-  return MapWriter.buildMapWriter(descrip.metadata,
-  (AbstractMapVector) vector,
-  buildMap((AbstractMapVector) vector, descrip));
+  case DICT:
+return buildDict(vector, descrip);
+  case MAP:
+return MapWriter.buildMapWriter(descrip.metadata,
+(AbstractMapVector) vector,
+buildMap((AbstractMapVector) vector, descrip));
 
-case UNION:
-  return buildUnion((UnionVector) vector, descrip);
+  case UNION:
+return buildUnion((UnionVector) vector, descrip);
 
-case LIST:
-  return buildList(vector, descrip);
+  case LIST:
+return buildList(vector, descrip);
 
-default:
-  return ColumnWriterFactory.buildColumnWriter(descrip.metadata, 
conversionFactory, vector);
+  default:
+return ColumnWriterFactory.buildColumnWriter(descrip.metadata, 
conversionFactory, vector);
+}
+  }
+
+  private AbstractObjectWriter buildDict(ValueVector vector, VectorDescrip 
descrip) {
+if (vector.getField().getType().getMode() == DataMode.REPEATED) {
 
 Review comment:
   Again, it *should* be possible to simply call `buildVectorWriter` and handle 
the key and value vectors recursively so you don't have to repeat the type 
checks in this method.
   
   If the recursive construction does not work, it is likely because of some 
details that we can figure out. (As noted above, this code is tricky because of 
multiple constraints imposed by the `MaterializedField`, `VaueVector` and 
writer classes.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975894#comment-16975894
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347115550
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java
 ##
 @@ -0,0 +1,377 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor.reader;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.VariantMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayReader;
+import org.apache.drill.exec.vector.accessor.DictReader;
+import org.apache.drill.exec.vector.accessor.ObjectReader;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ScalarReader;
+import org.apache.drill.exec.vector.accessor.TupleReader;
+import org.apache.drill.exec.vector.accessor.ValueType;
+import org.apache.drill.exec.vector.accessor.VariantReader;
+import org.joda.time.Instant;
+import org.joda.time.LocalDate;
+import org.joda.time.LocalTime;
+import org.joda.time.Period;
+
+import java.math.BigDecimal;
+
+/**
+ * Dummy reader which returns {@code null} on access to getters for scalar 
types
+ * ({@link #getString()}, {@link #getBytes()} etc.) and itself on access to 
reader types
+ * ({@link #scalar()}, {@link #tuple()} etc.). Used when accessing {@link 
DictReader} by non-existent key.
+ * {@link #isNull()} returns {@code true}.
+ *
+ * Does not preserve metadata structure. Other methods throw {@link 
UnsupportedOperationException}. todo:
+ *
+ * @see DictReader
+ */
+public class NullReader implements ScalarReader, ArrayReader, TupleReader, 
VariantReader, ObjectReader, DictReader {
+
+  private static final NullReader INSTANCE = new NullReader();
+
+  public static NullReader instance() {
+return INSTANCE;
+  }
+
+  @Override
+  public int size() {
+notSupported("size()");
+return 0;
+  }
+
+  @Override
+  public ObjectType entryType() {
+notSupported("entryType()");
+return null;
+  }
+
+  @Override
+  public ObjectReader entry() {
+notSupported("entry()");
+return null;
+  }
+
+  @Override
+  public ScalarReader scalar() {
+return this;
+  }
+
+  @Override
+  public TupleReader tuple() {
+return this;
+  }
+
+  @Override
+  public ArrayReader array() {
+return this;
+  }
+
+  @Override
+  public VariantReader variant() {
+return this;
+  }
+
+  @Override
+  public DictReader dict() {
+return this;
+  }
+
+  @Override
+  public void setPosn(int index) {
+notSupported("setPosn(int)");
+  }
+
+  @Override
+  public void rewind() {
+notSupported("rewind()");
+  }
 
 Review comment:
   This, and the above, are likely to be a problem. The readers recursively 
call down the reader tree on operations such as set position or rewind. If this 
reader appears in the tree, the whole operation will fail, which is not quite 
what we want.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975892#comment-16975892
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347115224
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestSchemaBuilder.java
 ##
 @@ -280,6 +280,40 @@ public void testRepeatedListInRow() {
 assertEquals(DataMode.REPEATED, child.mode());
   }
 
+  /**
+   * Tests creating a dict within a row.
+   * Also the basic dict add key and value columns methods.
+   */
+  @Test
+  public void testDictInRow() {
+TupleMetadata schema = new SchemaBuilder()
+.addDict("d", MinorType.VARCHAR)
+.nullableValue(MinorType.FLOAT8)
+.resumeSchema()
 
 Review comment:
   Nit: Indent value structure
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975889#comment-16975889
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347115511
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java
 ##
 @@ -0,0 +1,369 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor.reader;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.VariantMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayReader;
+import org.apache.drill.exec.vector.accessor.DictReader;
+import org.apache.drill.exec.vector.accessor.ObjectReader;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ScalarReader;
+import org.apache.drill.exec.vector.accessor.TupleReader;
+import org.apache.drill.exec.vector.accessor.ValueType;
+import org.apache.drill.exec.vector.accessor.VariantReader;
+import org.joda.time.Instant;
+import org.joda.time.LocalDate;
+import org.joda.time.LocalTime;
+import org.joda.time.Period;
+
+import java.math.BigDecimal;
+
+/**
+ * Dummy reader which returns {@code null} for scalar types and itself for 
complex types.
+ */
+public class NullReader implements ScalarReader, ArrayReader, TupleReader, 
VariantReader, ObjectReader, DictReader {
 
 Review comment:
   See comments below. I think this is just a bit confusing; it is not clear 
when we'd use it and how clients would use it.
   
   As explained below, I *think* that if you need a reader for a missing 
element, you can use the existing scalar `NullReader`: just define a missing 
reader as a reader of scalar nulls.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975887#comment-16975887
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347115126
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestRowSet.java
 ##
 @@ -586,6 +595,490 @@ public void testRepeatedMapStructure() {
 RowSetUtilities.verify(expected, actual);
   }
 
+  @Test
+  public void testDictStructure() {
+final String dictName = "d";
+
+final TupleMetadata schema = new SchemaBuilder()
+.add("id", MinorType.INT)
+.addDict(dictName, MinorType.INT)
+.value(MinorType.VARCHAR) // required int
+.resumeSchema()
+.buildSchema();
+final ExtendableRowSet rowSet = fixture.rowSet(schema);
+final RowSetWriter writer = rowSet.writer();
+
+// Dict
+// Pick out components and lightly test. (Assumes structure
+// tested earlier is still valid, so no need to exhaustively
+// test again.)
+
+assertEquals(ObjectType.ARRAY, writer.column(dictName).type());
+assertTrue(writer.column(dictName).schema().isDict());
+
+final ScalarWriter idWriter = writer.column(0).scalar();
+final DictWriter dictWriter = writer.column(1).dict();
+
+assertEquals(ValueType.INTEGER, dictWriter.keyType());
+assertEquals(ObjectType.SCALAR, dictWriter.valueType());
+
+final ScalarWriter keyWriter = dictWriter.keyWriter();
+final ScalarWriter valueWriter = dictWriter.valueWriter().scalar();
+
+assertEquals(ValueType.INTEGER, keyWriter.valueType());
+assertEquals(ValueType.STRING, valueWriter.valueType());
+
+// Write data
+idWriter.setInt(1);
+
+keyWriter.setInt(11);
+valueWriter.setString("a");
+dictWriter.save(); // Advance to next entry position
+keyWriter.setInt(12);
+valueWriter.setString("b");
+dictWriter.save();
+writer.save();
+
+idWriter.setInt(2);
+
+keyWriter.setInt(21);
+valueWriter.setString("c");
+dictWriter.save();
+writer.save();
+
+idWriter.setInt(3);
+
+keyWriter.setInt(31);
+valueWriter.setString("d");
+dictWriter.save();
+keyWriter.setInt(32);
+valueWriter.setString("e");
+dictWriter.save();
+writer.save();
+
+// Finish the row set and get a reader.
+
+final SingleRowSet actual = writer.done();
+final RowSetReader reader = actual.reader();
+
+// Verify reader structure
+
+assertEquals(ObjectType.ARRAY, reader.column(dictName).type());
+
+final DictReader dictReader = reader.dict(1);
+assertEquals(ObjectType.ARRAY, dictReader.type());
+
+assertEquals(ValueType.INTEGER, dictReader.keyColumnType());
+assertEquals(ObjectType.SCALAR, dictReader.valueColumnType());
+
+// Row 1: get value reader with its position set to entry corresponding to 
a key
+
+assertTrue(reader.next());
+assertFalse(dictReader.isNull()); // dict itself is not null
+
+dictReader.getAsString();
+assertEquals("b", dictReader.getValueReader(12).scalar().getString());
+assertEquals("a", dictReader.getValueReader(11).scalar().getString());
+
+// compare entire dict
+Map map = map(11, "a", 12, "b");
+assertEquals(map, dictReader.getObject());
+
+// Row 2: get Object representation of value directly
+
+assertTrue(reader.next());
+assertEquals("c", dictReader.get(21));
+assertTrue(dictReader.getValueReader(22).isNull()); // the dict does not 
contain an entry with the key
+
+map = map(21, "c");
+assertEquals(map, dictReader.getObject());
+
+// Row 3
+
+assertTrue(reader.next());
+
+assertEquals("d", dictReader.get(31));
+assertEquals("e", dictReader.get(32));
+
+map = map(31, "d", 32, "e");
+assertEquals(map, dictReader.getObject());
+
+assertFalse(reader.next());
+
+// Verify that the dict accessor's value count was set.
+
+final DictVector dictVector = (DictVector) 
actual.container().getValueVector(1).getValueVector();
+assertEquals(3, dictVector.getAccessor().getValueCount());
+
+final SingleRowSet expected = fixture.rowSetBuilder(schema)
+.addRow(1, objArray(11, "a", 12, "b"))
+.addRow(2, objArray(21, "c"))
+.addRow(3, objArray(31, "d", 32, "e"))
+.build();
+RowSetUtilities.verify(expected, actual);
+  }
+
+  /**
+   * Utility method to bootstrap a map object easily.
+   *
+   * @param entry key-value sequence
+   * @return map containing key-value pairs from passed sequence
+   */
+  private Map map(Object... entry) {
 
 Review comment:
   Maybe move this into `RowSetUtilities` so we can use it elsewhere.
   
   I wonder, why wasn't this used in the test above where we used `objArray`?
 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975890#comment-16975890
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347115442
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java
 ##
 @@ -0,0 +1,377 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor.reader;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.VariantMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayReader;
+import org.apache.drill.exec.vector.accessor.DictReader;
+import org.apache.drill.exec.vector.accessor.ObjectReader;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ScalarReader;
+import org.apache.drill.exec.vector.accessor.TupleReader;
+import org.apache.drill.exec.vector.accessor.ValueType;
+import org.apache.drill.exec.vector.accessor.VariantReader;
+import org.joda.time.Instant;
+import org.joda.time.LocalDate;
+import org.joda.time.LocalTime;
+import org.joda.time.Period;
+
+import java.math.BigDecimal;
+
+/**
+ * Dummy reader which returns {@code null} on access to getters for scalar 
types
+ * ({@link #getString()}, {@link #getBytes()} etc.) and itself on access to 
reader types
+ * ({@link #scalar()}, {@link #tuple()} etc.). Used when accessing {@link 
DictReader} by non-existent key.
+ * {@link #isNull()} returns {@code true}.
+ *
+ * Does not preserve metadata structure. Other methods throw {@link 
UnsupportedOperationException}. todo:
+ *
+ * @see DictReader
+ */
+public class NullReader implements ScalarReader, ArrayReader, TupleReader, 
VariantReader, ObjectReader, DictReader {
+
+  private static final NullReader INSTANCE = new NullReader();
+
+  public static NullReader instance() {
+return INSTANCE;
+  }
+
+  @Override
+  public int size() {
+notSupported("size()");
+return 0;
+  }
+
+  @Override
+  public ObjectType entryType() {
+notSupported("entryType()");
+return null;
+  }
+
+  @Override
+  public ObjectReader entry() {
+notSupported("entry()");
+return null;
+  }
+
+  @Override
+  public ScalarReader scalar() {
+return this;
+  }
+
+  @Override
+  public TupleReader tuple() {
+return this;
+  }
+
+  @Override
+  public ArrayReader array() {
+return this;
+  }
+
+  @Override
+  public VariantReader variant() {
+return this;
+  }
+
+  @Override
+  public DictReader dict() {
+return this;
+  }
+
+  @Override
+  public void setPosn(int index) {
+notSupported("setPosn(int)");
+  }
+
+  @Override
+  public void rewind() {
+notSupported("rewind()");
+  }
+
+  @Override
+  public boolean next() {
+return false;
+  }
+
+  @Override
+  public ValueType valueType() {
+notSupported("valueType()");
+return null;
+  }
+
+  @Override
+  public ValueType extendedType() {
+notSupported("extendedType()");
+return null;
+  }
+
+  @Override
+  public int getInt() {
+notSupported("getInt()");
+return 0;
+  }
+
+  @Override
+  public boolean getBoolean() {
+notSupported("getBoolean()");
+return false;
+  }
+
+  @Override
+  public long getLong() {
+notSupported("getLong()");
+return 0;
+  }
+
+  @Override
+  public double getDouble() {
+notSupported("getDouble()");
+return 0;
+  }
+
+  @Override
+  public String getString() {
+return null;
+  }
+
+  @Override
+  public byte[] getBytes() {
+return null;
+  }
+
+  @Override
+  public BigDecimal getDecimal() {
+return null;
+  }
+
+  @Override
+  public Period getPeriod() {
+return null;
+  }
+
+  @Override
+  public LocalDate getDate() {
+return null;
+  }
+
+  @Override
+  public LocalTime getTime() {
+return null;
+  }
+
+  @Override
+  public Instant getTimestamp() {
+ 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975885#comment-16975885
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347115387
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java
 ##
 @@ -0,0 +1,377 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor.reader;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.VariantMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayReader;
+import org.apache.drill.exec.vector.accessor.DictReader;
+import org.apache.drill.exec.vector.accessor.ObjectReader;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ScalarReader;
+import org.apache.drill.exec.vector.accessor.TupleReader;
+import org.apache.drill.exec.vector.accessor.ValueType;
+import org.apache.drill.exec.vector.accessor.VariantReader;
+import org.joda.time.Instant;
+import org.joda.time.LocalDate;
+import org.joda.time.LocalTime;
+import org.joda.time.Period;
+
+import java.math.BigDecimal;
+
+/**
+ * Dummy reader which returns {@code null} on access to getters for scalar 
types
+ * ({@link #getString()}, {@link #getBytes()} etc.) and itself on access to 
reader types
+ * ({@link #scalar()}, {@link #tuple()} etc.). Used when accessing {@link 
DictReader} by non-existent key.
+ * {@link #isNull()} returns {@code true}.
+ *
+ * Does not preserve metadata structure. Other methods throw {@link 
UnsupportedOperationException}. todo:
+ *
+ * @see DictReader
+ */
+public class NullReader implements ScalarReader, ArrayReader, TupleReader, 
VariantReader, ObjectReader, DictReader {
+
+  private static final NullReader INSTANCE = new NullReader();
+
+  public static NullReader instance() {
+return INSTANCE;
+  }
+
+  @Override
+  public int size() {
+notSupported("size()");
+return 0;
+  }
+
+  @Override
+  public ObjectType entryType() {
+notSupported("entryType()");
+return null;
+  }
+
+  @Override
+  public ObjectReader entry() {
+notSupported("entry()");
+return null;
+  }
+
+  @Override
+  public ScalarReader scalar() {
+return this;
+  }
+
+  @Override
+  public TupleReader tuple() {
+return this;
+  }
+
+  @Override
+  public ArrayReader array() {
+return this;
+  }
+
+  @Override
+  public VariantReader variant() {
+return this;
+  }
+
+  @Override
+  public DictReader dict() {
+return this;
+  }
+
+  @Override
+  public void setPosn(int index) {
+notSupported("setPosn(int)");
+  }
+
+  @Override
+  public void rewind() {
+notSupported("rewind()");
+  }
+
+  @Override
+  public boolean next() {
+return false;
+  }
+
+  @Override
+  public ValueType valueType() {
+notSupported("valueType()");
+return null;
+  }
+
+  @Override
+  public ValueType extendedType() {
+notSupported("extendedType()");
+return null;
+  }
+
+  @Override
+  public int getInt() {
+notSupported("getInt()");
+return 0;
+  }
+
+  @Override
+  public boolean getBoolean() {
+notSupported("getBoolean()");
+return false;
+  }
+
+  @Override
+  public long getLong() {
+notSupported("getLong()");
+return 0;
+  }
+
+  @Override
+  public double getDouble() {
+notSupported("getDouble()");
+return 0;
+  }
+
+  @Override
+  public String getString() {
+return null;
+  }
+
+  @Override
+  public byte[] getBytes() {
+return null;
+  }
+
+  @Override
+  public BigDecimal getDecimal() {
+return null;
+  }
+
+  @Override
+  public Period getPeriod() {
+return null;
+  }
+
+  @Override
+  public LocalDate getDate() {
+return null;
+  }
+
+  @Override
+  public LocalTime getTime() {
+return null;
+  }
+
+  @Override
+  public Instant getTimestamp() {
+ 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975884#comment-16975884
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r347115045
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestRowSet.java
 ##
 @@ -586,6 +595,490 @@ public void testRepeatedMapStructure() {
 RowSetUtilities.verify(expected, actual);
   }
 
+  @Test
+  public void testDictStructure() {
+final String dictName = "d";
+
+final TupleMetadata schema = new SchemaBuilder()
+.add("id", MinorType.INT)
+.addDict(dictName, MinorType.INT)
+.value(MinorType.VARCHAR) // required int
+.resumeSchema()
+.buildSchema();
+final ExtendableRowSet rowSet = fixture.rowSet(schema);
+final RowSetWriter writer = rowSet.writer();
+
+// Dict
+// Pick out components and lightly test. (Assumes structure
+// tested earlier is still valid, so no need to exhaustively
+// test again.)
+
+assertEquals(ObjectType.ARRAY, writer.column(dictName).type());
+assertTrue(writer.column(dictName).schema().isDict());
+
+final ScalarWriter idWriter = writer.column(0).scalar();
+final DictWriter dictWriter = writer.column(1).dict();
+
+assertEquals(ValueType.INTEGER, dictWriter.keyType());
+assertEquals(ObjectType.SCALAR, dictWriter.valueType());
+
+final ScalarWriter keyWriter = dictWriter.keyWriter();
+final ScalarWriter valueWriter = dictWriter.valueWriter().scalar();
+
+assertEquals(ValueType.INTEGER, keyWriter.valueType());
+assertEquals(ValueType.STRING, valueWriter.valueType());
+
+// Write data
+idWriter.setInt(1);
+
+keyWriter.setInt(11);
+valueWriter.setString("a");
+dictWriter.save(); // Advance to next entry position
+keyWriter.setInt(12);
+valueWriter.setString("b");
+dictWriter.save();
 
 Review comment:
   I like how writing to the DICT works: it is about as simple as you can make 
it, yet it handles the multiple value types. Nicely done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975288#comment-16975288
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-554465413
 
 
   @paul-rogers I've updated the PR, please take a look again.
   Thanks for your insight!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974397#comment-16974397
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r346409969
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
 
 Review comment:
   Yes, `Dict` is defined as `DICT` to represent mapping (like in 
Java's Map) from `String` to `int`. 
   `Dict` does not try to be a `Union` for the value type. If one wants the 
`Dict` to have `Union` value, one must define it as `DICT>` 
(the `key` type is going to be `STRING` and the `value` is going to be a union 
of some types; like union with all problems it has in Drill :) )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974395#comment-16974395
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r346407038
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
+
+  private final SchemaContainer parent;
+  private final TupleBuilder tupleBuilder = new TupleBuilder();
+  private final String memberName;
+  private final TypeProtos.DataMode mode;
+
+  public DictBuilder(String memberName, TypeProtos.DataMode mode) {
+this(null, memberName, mode);
+  }
+
+  public DictBuilder(SchemaContainer parent, String memberName, 
TypeProtos.DataMode mode) {
+this.parent = parent;
+this.memberName = memberName;
+this.mode = mode;
+  }
+
+  @Override
+  public void addColumn(ColumnMetadata column) {
 
 Review comment:
   I suppose, this will do for now (at least I see it OK). Changed the 
validation to check if the added field is `value` as `key` can't be complex and 
the method does not make sense for the case.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974393#comment-16974393
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r346405704
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/DictWriter.java
 ##
 @@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor;
+
+/**
+ * Dict writer is represented as an array of Maps (Array with Tuple entry).
 
 Review comment:
   I wanted to say the dict writer is an array of dict `entries` (you were 
talking about below), just used the `map` somewhat in another context, like 
some type of tuple.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974390#comment-16974390
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r346403272
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/DictWriter.java
 ##
 @@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor;
+
+/**
+ * Dict writer is represented as an array of Maps (Array with Tuple entry).
+ */
+public interface DictWriter extends ArrayWriter {
+  ValueType keyType();
+  ObjectType valueType();
+  ScalarWriter keyWriter();
+  ObjectWriter valueWriter();
 
 Review comment:
   My initial intention was to make it something like you proposed in your 
second option, though was done so implicitly. Made it the way you described 
with the exception that `DictWriter` actually encapsulates '`DictEntry`' (no 
need to perform `DictWriter dw = aw.dict()` rather use `DictWriter` itself to 
write the pairs; `aw` in this case is also `DictWriter`).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974387#comment-16974387
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r346400903
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java
 ##
 @@ -0,0 +1,369 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor.reader;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.VariantMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayReader;
+import org.apache.drill.exec.vector.accessor.DictReader;
+import org.apache.drill.exec.vector.accessor.ObjectReader;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ScalarReader;
+import org.apache.drill.exec.vector.accessor.TupleReader;
+import org.apache.drill.exec.vector.accessor.ValueType;
+import org.apache.drill.exec.vector.accessor.VariantReader;
+import org.joda.time.Instant;
+import org.joda.time.LocalDate;
+import org.joda.time.LocalTime;
+import org.joda.time.Period;
+
+import java.math.BigDecimal;
+
+/**
+ * Dummy reader which returns {@code null} for scalar types and itself for 
complex types.
+ */
+public class NullReader implements ScalarReader, ArrayReader, TupleReader, 
VariantReader, ObjectReader, DictReader {
 
 Review comment:
   It does not follow the actual metadata structure, I suppose general case 
will do.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974385#comment-16974385
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r346400409
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/MetadataUtils.java
 ##
 @@ -205,4 +205,10 @@ private static ColumnMetadata newDecimal(String name, 
MinorType type, DataMode m
 return new PrimitiveColumnMetadata(field);
   }
 
+  private static boolean isScalar(MinorType type) {
+return type != MinorType.MAP
+&& type != MinorType.UNION
+&& type != MinorType.LIST
+&& type != MinorType.DICT;
+  }
 
 Review comment:
   Yeah, much negation... Changed the body of the method - didn't want to 
introduce the new `isComplex()`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974382#comment-16974382
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r346399315
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/MapBuilder.java
 ##
 @@ -144,6 +144,14 @@ public RepeatedListBuilder addRepeatedList(String name) {
 return tupleBuilder.addRepeatedList(this, name);
   }
 
+  public DictBuilder addDict(String name) {
+return tupleBuilder.addDict(this, name);
+  }
 
 Review comment:
   Good point. Yes, `value` can be nullable (added explicit comment to 
`DictVector`).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972137#comment-16972137
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-552769421
 
 
   The schema builder issue has been nagging at me. I observed above that the 
current way of handling nested types is getting too complex, with all the 
special cases and "resumeFoo" methods.
   
   The current design was originally designed to build a `BatchSchema` with 
`MaterializedField`s,. Those classes require a top-down construction. (That is 
also why the final `build()` method returns a `BatchSchema` and why we must 
call `buildSchema()` to get a `TupleMetadata`.)
   
   It turns out that the `SchemaMetadata` classes *do not* have the same 
constraints. They can actually be built bottom-up. It never occurred to me to 
clean up the old-style of building when we repurposed `SchemaBuilder` to build 
metadata objects.
   
   So, since we can change the structure, we could borrow a technique from, I 
think, certain JSON builders. Rather than having the schema builder create the 
nested type builder, ask the user to do so explicitly:
   
   ```
   TupleMetadata schema = new SchemaBuilder()
 .addMap(new MapBuilder("myMap")
   .add("name", MinorType.VARCHAR)
   .addMap(new MapBuilder("nested")
 .add("id", MinorType.INT)
 .build())
   .build())
 .add("last", MinorType.INT)
 .build();
   ```
   
   Adopting this model has additional benefits. It is currently a bit awkward 
to create a bare `ColumnMetadata`. We could create helper methods to create 
each kind of column schema. Something like:
   
   ```
   foo = SchemaBuilder.column("foo", MinorType.INT, DataMode.REQUIRED);
   bar = SchemaBuilder.dict("bar", MinorType.VARCHAR)
 .value(SchemaBuilder.map(...))
 .build();
   ```
   
   Lots of things we can play with.
   
   Given all of this, my advice for the `DICT` type is to figure out something 
that works with the structure we already have. Then, as a separate PR, we can 
clean up the `SchemaBuilder` to make it easier to use for the many complex 
types we now support. That way, you don't have to solve the schema builder 
problem in this PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972136#comment-16972136
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-552769421
 
 
   The schema builder issue has been nagging at me. I observed above that the 
current way of handling nested types is getting too complex, with all the 
special cases and "resumeFoo" methods.
   
   The current design was originally designed to build a `BatchSchema` with 
`MaterializedField`s,. Those classes require a top-down construction. (That is 
also why the final `build()` method returns a `BatchSchema` and why we must 
call `buildSchema()` to get a `TupleMetadata`.)
   
   It turns out that the `SchemaMetadata` classes *do not* have the same 
constraints. They can actually be built bottom-up. It never occurred to me to 
clean up the old-style of building when we repurposed `SchemaBuilder` to build 
metadata objects.
   
   So, since we can change the structure, we could borrow a technique from, I 
think, certain JSON builders. Rather than having the schema builder create the 
nested type builder, ask the user to do so explicitly:
   
   ```
   TupleMetadata schema = new SchemaBuilder()
 .addMap(new MapBuilder("myMap")
   .add("name", MinorType.VARCHAR)
   .addMap(new MapBuilder("nested")
 .add("id", MinorType.INT)
 .build())
   .build())
 .add("last", MinorType.INT)
 .build();
   ```
   
   Adopting this model as additional benefits. It is currently a bit awkward to 
create a bare `ColumnMetadata`. We could create helper methods to create each 
kind of column schema. Something like:
   
   ```
   foo = SchemaBuilder.column("foo", MinorType.INT, DataMode.REQUIRED);
   bar = SchemaBuilder.dict("bar", MinorType.VARCHAR)
 .value(SchemaBuilder.map(...))
 .build();
   ```
   
   Lots of things we can play with.
   
   Given all of this, my advice for the `DICT` type is to figure out something 
that works with the structure we already have. Then, as a separate PR, we can 
clean up the `SchemaBuilder` to make it easier to use for the many complex 
types we now support. That way, you don't have to solve the schema builder 
problem in this PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966333#comment-16966333
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-549198995
 
 
   @KazydubB, thanks much for the changes. This is a complex area. It looks 
like you are getting a good understanding.
   
   My focus on this review is to help you define the simplest implementation 
for the schema builder. Maybe we can get that part finished up.
   
   Then, I made a suggestion about how to structure the Dict writer. The new 
design here is much improved. Still, I think we can improve it a bit more.
   
   Everything said for the writer applies to the reader, so I didn't repeat the 
comments there.
   
   Once we get the API correct, then I'll dive into the implementation details. 
That should be a quick review if the we agree on the API, and if you have 
complete unit tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966328#comment-16966328
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r341884369
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/MetadataUtils.java
 ##
 @@ -205,4 +205,10 @@ private static ColumnMetadata newDecimal(String name, 
MinorType type, DataMode m
 return new PrimitiveColumnMetadata(field);
   }
 
+  private static boolean isScalar(MinorType type) {
+return type != MinorType.MAP
+&& type != MinorType.UNION
+&& type != MinorType.LIST
+&& type != MinorType.DICT;
+  }
 
 Review comment:
   Maybe change this to `isComplex`, and define `isScalar()` as `! isComplex()`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966330#comment-16966330
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r341884303
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
+
+  private final SchemaContainer parent;
+  private final TupleBuilder tupleBuilder = new TupleBuilder();
+  private final String memberName;
+  private final TypeProtos.DataMode mode;
+
+  public DictBuilder(String memberName, TypeProtos.DataMode mode) {
+this(null, memberName, mode);
+  }
+
+  public DictBuilder(SchemaContainer parent, String memberName, 
TypeProtos.DataMode mode) {
+this.parent = parent;
+this.memberName = memberName;
+this.mode = mode;
+  }
+
+  @Override
+  public void addColumn(ColumnMetadata column) {
 
 Review comment:
   As a side comment, I will take the blame for the mess we are creating in the 
schema builder. When I first wrote it, I was mostly concerned with the 
top-level row and with maps. Then came repeated lists, unions and now Dict.
   
   Is there a cleaner way to handle the nesting? Something clever with 
parameterized types to avoid all the 'resumeFoo' methods? Feel free to be 
creative here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966329#comment-16966329
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r341884201
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
 
 Review comment:
   Follow-up question. For the value, is a Dict defined as, say, `DICT` so that all values are of the same type? If so, that is the easy case.
   
   Or, can I define a Dict as `DICT>`, 
meaning that key values can be any one of the three types? Note that, in Drill, 
we use the (barely-working) Union vector to handle this situation.
   
   Or, is the Dict trying, itself, to be a Union for the value type?
   
   Just read the (very nice) comments in `DictVector`, so it looks like, if I 
want my Dict to hold three types, I make it a `DICT`. Correct?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966324#comment-16966324
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r341882388
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/DictWriter.java
 ##
 @@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor;
+
+/**
+ * Dict writer is represented as an array of Maps (Array with Tuple entry).
+ */
+public interface DictWriter extends ArrayWriter {
+  ValueType keyType();
+  ObjectType valueType();
+  ScalarWriter keyWriter();
+  ObjectWriter valueWriter();
 
 Review comment:
   The above is getting close, but it does introduce an ambiguity. The array 
writer already has the following members:
   
   ```
ObjectType entryType();
ObjectWriter entry();
   ```
   
   Along with some other helpers. In the `DictWriter` case, it is not clear 
that I could use this object a `ArrayWriter`. Which of above does the entry 
correspond to?
   
   Might it be better to have `DictWriter` inherit directly from `ColumnWriter` 
so you can define clean semantics?
   
   If it is worth having common functionality (only `save()` would be common), 
we could define a `SequenceWriter` which is the base of `ArrayWriter` and 
`DictWriter`.
   
   Or, you could change `ArrayWriter` to add a new element writer; a 
`DictEntry` which would provide the methods above. That is:
   
   ```
 ArrayWriter aw = reader.get('myDict').array();
 DictWriter dw = aw.dict();
 for (int i = 0; i < kvCount; i++) {
   dw.keyWriter().setString("key + " i);
   dw.valueWriter().union().setInt(i);
aw.save();
 }
   ```
   
   That is, a Dict is an array. The entries in the array are key/value pairs. 
The key is a scalar, the value is of some type (above I suggested a Union -- 
with all the problems that Unions bring.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966326#comment-16966326
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r341884645
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/MapBuilder.java
 ##
 @@ -144,6 +144,14 @@ public RepeatedListBuilder addRepeatedList(String name) {
 return tupleBuilder.addRepeatedList(this, name);
   }
 
+  public DictBuilder addDict(String name) {
+return tupleBuilder.addDict(this, name);
+  }
 
 Review comment:
   Maybe `addDict(String name, MinorType keyType)`. Works because the key must 
be a scalar and is required. Then the `DictBuilder` simply builds the value 
type:
   
   ```
   scalarValue(MinorType type);
   MapBuilder mapValue();
   MapBuilder mapArrayValue();
   UnionBuilder unionValue();
   Foo resumeFoo();
   ```
   Can the value be nullable? (The comment in `DictVector` doesn't say yes or 
no.) If nullable is allowed, then only the scalar can be nullable, so maybe add:
   
   ```
   nullableValue(MinorType type)
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966325#comment-16966325
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r341881937
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/DictWriter.java
 ##
 @@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor;
+
+/**
+ * Dict writer is represented as an array of Maps (Array with Tuple entry).
 
 Review comment:
   Not sure I follow. A Dict is an array of key/value pairs, correct? It is a 
Map turned sideways: the key/value pairs are rows in an array, not columns in a 
map (as in the Map) type. So, a Dict is not "an array of Maps" is it?
   
   If we want a Dict to allow heterogeneous value members, then we'd have an 
array of key/Union pairs, not key/Map pairs. Or, am I missing something?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-11-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966327#comment-16966327
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r341883957
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java
 ##
 @@ -0,0 +1,369 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor.reader;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.VariantMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayReader;
+import org.apache.drill.exec.vector.accessor.DictReader;
+import org.apache.drill.exec.vector.accessor.ObjectReader;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ScalarReader;
+import org.apache.drill.exec.vector.accessor.TupleReader;
+import org.apache.drill.exec.vector.accessor.ValueType;
+import org.apache.drill.exec.vector.accessor.VariantReader;
+import org.joda.time.Instant;
+import org.joda.time.LocalDate;
+import org.joda.time.LocalTime;
+import org.joda.time.Period;
+
+import java.math.BigDecimal;
+
+/**
+ * Dummy reader which returns {@code null} for scalar types and itself for 
complex types.
+ */
+public class NullReader implements ScalarReader, ArrayReader, TupleReader, 
VariantReader, ObjectReader, DictReader {
 
 Review comment:
   Good idea. But, is this too general? Does it follow the actual metadata 
structure?
   
   The writer has a `NullScalarWriter` and a few other components that try to 
allow only structurally-legal operations. I suppose, however, that is less 
important when reading: being lenient won't break anything.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964061#comment-16964061
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-548392780
 
 
   @paul-rogers thank you for review. I've made some changes, please take a 
look.
   
   There is a (interface) `DictWriter` similar to what you proposed with the 
exception it `extends` `ArrayWriter`, i.e. when writing a `dict` is assumed to 
be an array of tuples (with `key` and `value` fields) other than a pair of 
arrays. (`DictWriter`'s implementation, `ObjectDictWriter`, actually `extends 
ObjectArrayWriter`).
   
   For reading, there is a (interface) `DictReader` (`extends ColumnReader`) 
introduces 5 new methods:
   ```
   ObjectReader getValueReader(Object key);
   Object get(Object key);
   int size();
   ValueType keyColumnType();
   ObjectType valueColumnType();
   ```
   
   For key/value lookup `getValueReader(Object key)` is used. When a `key` is 
found, `value`'s reader is returned with its position set correspondingly to 
the `key`. In a case there is no such `key` - a special type of `ObjectReader` 
- `org.apache.drill.exec.vector.accessor.reader.NullReader` will be returned.
   
   I am not sure that array access is needed - can't see any use case yet.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964014#comment-16964014
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r341136660
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
+
+  private final SchemaContainer parent;
+  private final TupleBuilder tupleBuilder = new TupleBuilder();
+  private final String memberName;
+  private final TypeProtos.DataMode mode;
+
+  public DictBuilder(String memberName, TypeProtos.DataMode mode) {
+this(null, memberName, mode);
+  }
+
+  public DictBuilder(SchemaContainer parent, String memberName, 
TypeProtos.DataMode mode) {
+this.parent = parent;
+this.memberName = memberName;
+this.mode = mode;
+  }
+
+  @Override
+  public void addColumn(ColumnMetadata column) {
+assert DictVector.fieldNames.contains(column.name());
+tupleBuilder.addColumn(column);
+  }
+
+  public DictBuilder addKey(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type));
+  }
+
+  public DictBuilder addValue(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type));
+  }
+
+  public DictBuilder add(MaterializedField col) {
+assert DictVector.fieldNames.contains(col.getName());
+tupleBuilder.add(col);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, TypeProtos.DataMode 
mode) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, mode);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, TypeProtos.DataMode 
mode) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, mode);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, int width) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, width);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, int width) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, width);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, int precision, int 
scale) {
+return addDecimal(DictVector.FIELD_KEY_NAME, type, 
TypeProtos.DataMode.REQUIRED, precision, scale);
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, int precision, int 
scale) {
+return addDecimal(DictVector.FIELD_VALUE_NAME, type, 
TypeProtos.DataMode.REQUIRED, precision, scale);
+  }
+
+  public DictBuilder addNullable(String name, TypeProtos.MinorType type) {
+tupleBuilder.addNullable(name,  type);
+return this;
+  }
+
+  public DictBuilder addNullable(String name, TypeProtos.MinorType type, int 
width) {
+tupleBuilder.addNullable(name, type, width);
+return this;
+  }
+
+  public DictBuilder addNullable(String name, TypeProtos.MinorType type, int 
precision, int scale) {
+return addDecimal(name, type, TypeProtos.DataMode.OPTIONAL, precision, 
scale);
+  }
+
+  public DictBuilder addArrayValue(TypeProtos.MinorType type) {
+tupleBuilder.addArray(DictVector.FIELD_VALUE_NAME, type);
+return this;
+  }
+
+  public DictBuilder addArrayValue(TypeProtos.MinorType type, int dims) {
+tupleBuilder.addArray(DictVector.FIELD_VALUE_NAME, type, dims);
+return this;
+  }
+
+  public DictBuilder addArrayValue(TypeProtos.MinorType type, int precision, 
int 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964004#comment-16964004
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r341133570
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
+
+  private final SchemaContainer parent;
+  private final TupleBuilder tupleBuilder = new TupleBuilder();
+  private final String memberName;
+  private final TypeProtos.DataMode mode;
+
+  public DictBuilder(String memberName, TypeProtos.DataMode mode) {
+this(null, memberName, mode);
+  }
+
+  public DictBuilder(SchemaContainer parent, String memberName, 
TypeProtos.DataMode mode) {
+this.parent = parent;
+this.memberName = memberName;
+this.mode = mode;
+  }
+
+  @Override
+  public void addColumn(ColumnMetadata column) {
+assert DictVector.fieldNames.contains(column.name());
+tupleBuilder.addColumn(column);
+  }
+
+  public DictBuilder addKey(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type));
+  }
+
+  public DictBuilder addValue(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type));
+  }
+
+  public DictBuilder add(MaterializedField col) {
+assert DictVector.fieldNames.contains(col.getName());
+tupleBuilder.add(col);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, TypeProtos.DataMode 
mode) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, mode);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, TypeProtos.DataMode 
mode) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, mode);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, int width) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, width);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, int width) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, width);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, int precision, int 
scale) {
 
 Review comment:
   Yes, `key` can be a `DECIMAL`. Added `key(MajorType)` method to allow 
creation for the case (the methods validates allowed `key` types).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963999#comment-16963999
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r341132825
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
+
+  private final SchemaContainer parent;
+  private final TupleBuilder tupleBuilder = new TupleBuilder();
+  private final String memberName;
+  private final TypeProtos.DataMode mode;
+
+  public DictBuilder(String memberName, TypeProtos.DataMode mode) {
+this(null, memberName, mode);
+  }
+
+  public DictBuilder(SchemaContainer parent, String memberName, 
TypeProtos.DataMode mode) {
+this.parent = parent;
+this.memberName = memberName;
+this.mode = mode;
+  }
+
+  @Override
+  public void addColumn(ColumnMetadata column) {
+assert DictVector.fieldNames.contains(column.name());
+tupleBuilder.addColumn(column);
+  }
+
+  public DictBuilder addKey(TypeProtos.MajorType type) {
 
 Review comment:
   Changed the name to the `key()` and added validations, thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963996#comment-16963996
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r341132301
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
 
 Review comment:
   The `key` can be of any `REQUIRED` primitive type and `value` can be of any 
primitive or complex types. Dict's key is not limited to be a string.
   Yes, your assumption is correct.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963997#comment-16963997
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r341132415
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
+
+  private final SchemaContainer parent;
+  private final TupleBuilder tupleBuilder = new TupleBuilder();
+  private final String memberName;
+  private final TypeProtos.DataMode mode;
+
+  public DictBuilder(String memberName, TypeProtos.DataMode mode) {
+this(null, memberName, mode);
+  }
+
+  public DictBuilder(SchemaContainer parent, String memberName, 
TypeProtos.DataMode mode) {
+this.parent = parent;
+this.memberName = memberName;
+this.mode = mode;
+  }
+
+  @Override
+  public void addColumn(ColumnMetadata column) {
 
 Review comment:
   Yes, naming of the method can be confusing in this case (the method comes 
from `SchemaContainer`). Added validation to make sure only `key` and `value` 
fields may be added if they were not previously added.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950411#comment-16950411
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-541442192
 
 
   On, read, we can access the `DICT` as a dictionary and as an array. This 
means we need to be a bit clever about designing the `DictReader()` interface.
   
   We can get our `DictReader` similar to the write case:
   
   ```
  DictReader dictReader = rowReader.dict("myDict");
   ```
   
   Now, we have two ways to proceed. Either we want key/value lookups, or array 
access.
   
   For array access, we want an iterator, much like the `ArrayReader`, but that 
gives us access to both the key and value. Something like:
   
   ```
   interface DictReader ...
  ScalarReader key();
  ObjectReader value();
  boolean next();
   ```
   
   For key/value, we have a challenge. We need a way to specify the key that is 
type-specific. Then, we need to access the key in a type-specific way. This is 
the first time we've had such an issue.
   
   Maybe we can build this on the array iterator. In addition to the `next()` 
method, we can provide a `findFoo()` method, with one for each supported key 
type.
   
   ```
  int findString(String key);
  int findInt(int key);
  ...
   ```
   
   The semantics might be that the array index is set to the (first) index of 
the requested value, and that index is returned.
   
   If not found, the return value is -1 and the array index is one past the end 
of the array.
   
   ```
  DictReader dictReader = ...
  if (dictReader.findString("barney")) {
value = dictReader.value().scalar().getDouble();
  }
   ```
   
   There are probably other variations. Think how you expect to use the reader 
(for testing, yes, but what other operations)?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950407#comment-16950407
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-541440586
 
 
   Let's continue to assume that `DICT` is, essentially, `DICT` and that we can think of the `DICT`, when writing, as a pair of 
arrays: one for keys, one for values. (Because vectors are write-once, we have 
to add (key, value) pairs one by one. If so, then we need a new form of writer, 
a `DictWriter` that has semantics such as:
   
   ```
 ObjectWriter key();
 ObjectWriter value();
 void save();
   ```
   
   The `key()` gives us an object writer that lets us access the key writer. If 
we restrict keys to be scalars, then we can just do:
   
   ```
 ScalarWriter key();
   ```
   
   Values can, I imagine, be of any type. So, the `ObjectWriter` lets us work 
with them generically. Suppose we had a `DICT`, we could do:
   
   ```
  DictWriter dictWriter = rowWriter.dict("myDict");
  dictWriter.key().setString("fred");
  dictWriter.value().scalar().setDouble(123.45)
  dictWriter.save();
  dictWriter.key().setString("barney");
  dictWriter.value().scalar().setDouble(98.76)
  dictWriter.save();
   ```
   
   This means that, in the `ObjectWriter()`, we need to add a new method, 
`dict()`, which will return a `DictWriter`, and we need to define the 
`DictWriter` interface.
   
   There is quite a bit of commentary in the column accessor package that 
(tries) to explain the structure behind these writers. Perhaps that might help 
explain the ideas here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950397#comment-16950397
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r334287580
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
+
+  private final SchemaContainer parent;
+  private final TupleBuilder tupleBuilder = new TupleBuilder();
+  private final String memberName;
+  private final TypeProtos.DataMode mode;
+
+  public DictBuilder(String memberName, TypeProtos.DataMode mode) {
+this(null, memberName, mode);
+  }
+
+  public DictBuilder(SchemaContainer parent, String memberName, 
TypeProtos.DataMode mode) {
+this.parent = parent;
+this.memberName = memberName;
+this.mode = mode;
+  }
+
+  @Override
+  public void addColumn(ColumnMetadata column) {
+assert DictVector.fieldNames.contains(column.name());
+tupleBuilder.addColumn(column);
+  }
+
+  public DictBuilder addKey(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type));
+  }
+
+  public DictBuilder addValue(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type));
+  }
+
+  public DictBuilder add(MaterializedField col) {
 
 Review comment:
   Probably no longer need this form. It is in the code to ease conversion from 
old code, but does not actually seem to have been used.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950405#comment-16950405
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r334287763
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
+
+  private final SchemaContainer parent;
+  private final TupleBuilder tupleBuilder = new TupleBuilder();
+  private final String memberName;
+  private final TypeProtos.DataMode mode;
+
+  public DictBuilder(String memberName, TypeProtos.DataMode mode) {
+this(null, memberName, mode);
+  }
+
+  public DictBuilder(SchemaContainer parent, String memberName, 
TypeProtos.DataMode mode) {
+this.parent = parent;
+this.memberName = memberName;
+this.mode = mode;
+  }
+
+  @Override
+  public void addColumn(ColumnMetadata column) {
+assert DictVector.fieldNames.contains(column.name());
+tupleBuilder.addColumn(column);
+  }
+
+  public DictBuilder addKey(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type));
+  }
+
+  public DictBuilder addValue(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type));
+  }
+
+  public DictBuilder add(MaterializedField col) {
+assert DictVector.fieldNames.contains(col.getName());
+tupleBuilder.add(col);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, TypeProtos.DataMode 
mode) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, mode);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, TypeProtos.DataMode 
mode) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, mode);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, int width) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, width);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, int width) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, width);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, int precision, int 
scale) {
+return addDecimal(DictVector.FIELD_KEY_NAME, type, 
TypeProtos.DataMode.REQUIRED, precision, scale);
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, int precision, int 
scale) {
+return addDecimal(DictVector.FIELD_VALUE_NAME, type, 
TypeProtos.DataMode.REQUIRED, precision, scale);
+  }
+
+  public DictBuilder addNullable(String name, TypeProtos.MinorType type) {
 
 Review comment:
   Yeah, all these forms make sense when only working with the top-level row, 
but become tedious when repeated or all these specialized builders...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950400#comment-16950400
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r334287912
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
+
+  private final SchemaContainer parent;
+  private final TupleBuilder tupleBuilder = new TupleBuilder();
+  private final String memberName;
+  private final TypeProtos.DataMode mode;
+
+  public DictBuilder(String memberName, TypeProtos.DataMode mode) {
+this(null, memberName, mode);
+  }
+
+  public DictBuilder(SchemaContainer parent, String memberName, 
TypeProtos.DataMode mode) {
+this.parent = parent;
+this.memberName = memberName;
+this.mode = mode;
+  }
+
+  @Override
+  public void addColumn(ColumnMetadata column) {
+assert DictVector.fieldNames.contains(column.name());
+tupleBuilder.addColumn(column);
+  }
+
+  public DictBuilder addKey(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type));
+  }
+
+  public DictBuilder addValue(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type));
+  }
+
+  public DictBuilder add(MaterializedField col) {
+assert DictVector.fieldNames.contains(col.getName());
+tupleBuilder.add(col);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, TypeProtos.DataMode 
mode) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, mode);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, TypeProtos.DataMode 
mode) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, mode);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, int width) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, width);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, int width) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, width);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, int precision, int 
scale) {
+return addDecimal(DictVector.FIELD_KEY_NAME, type, 
TypeProtos.DataMode.REQUIRED, precision, scale);
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, int precision, int 
scale) {
+return addDecimal(DictVector.FIELD_VALUE_NAME, type, 
TypeProtos.DataMode.REQUIRED, precision, scale);
+  }
+
+  public DictBuilder addNullable(String name, TypeProtos.MinorType type) {
+tupleBuilder.addNullable(name,  type);
+return this;
+  }
+
+  public DictBuilder addNullable(String name, TypeProtos.MinorType type, int 
width) {
+tupleBuilder.addNullable(name, type, width);
+return this;
+  }
+
+  public DictBuilder addNullable(String name, TypeProtos.MinorType type, int 
precision, int scale) {
+return addDecimal(name, type, TypeProtos.DataMode.OPTIONAL, precision, 
scale);
+  }
+
+  public DictBuilder addArrayValue(TypeProtos.MinorType type) {
+tupleBuilder.addArray(DictVector.FIELD_VALUE_NAME, type);
+return this;
+  }
+
+  public DictBuilder addArrayValue(TypeProtos.MinorType type, int dims) {
+tupleBuilder.addArray(DictVector.FIELD_VALUE_NAME, type, dims);
+return this;
+  }
+
+  public DictBuilder addArrayValue(TypeProtos.MinorType type, int precision, 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950404#comment-16950404
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r334287471
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
+
+  private final SchemaContainer parent;
+  private final TupleBuilder tupleBuilder = new TupleBuilder();
+  private final String memberName;
+  private final TypeProtos.DataMode mode;
+
+  public DictBuilder(String memberName, TypeProtos.DataMode mode) {
+this(null, memberName, mode);
+  }
+
+  public DictBuilder(SchemaContainer parent, String memberName, 
TypeProtos.DataMode mode) {
+this.parent = parent;
+this.memberName = memberName;
+this.mode = mode;
+  }
+
+  @Override
+  public void addColumn(ColumnMetadata column) {
 
 Review comment:
   If we agree DICT is DICT then we might want the 
building of a DICT to be something like:
   
   ```
   TupleMetadata = new SchemaBuilder()
 .addDict("myDict")
   .key(MinorType.VARCHAR)
   .valueArray(MinorType.DOUBLE)
   .resumeSchema()
 ...
   ```
   
   That is, we need to add one key. Can keys only be scalars? Only 
non-nullable? If so, then the single `key()` method is all that is needed. This 
returns the same builder.
   
   Can the value be of any type? Then we need the full set of type builders: 
scalar, array, map and so on.
   
   The current form does not make this clear: what does it mean to add a 
column? This isn't a map...
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950403#comment-16950403
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r334287737
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
+
+  private final SchemaContainer parent;
+  private final TupleBuilder tupleBuilder = new TupleBuilder();
+  private final String memberName;
+  private final TypeProtos.DataMode mode;
+
+  public DictBuilder(String memberName, TypeProtos.DataMode mode) {
+this(null, memberName, mode);
+  }
+
+  public DictBuilder(SchemaContainer parent, String memberName, 
TypeProtos.DataMode mode) {
+this.parent = parent;
+this.memberName = memberName;
+this.mode = mode;
+  }
+
+  @Override
+  public void addColumn(ColumnMetadata column) {
+assert DictVector.fieldNames.contains(column.name());
+tupleBuilder.addColumn(column);
+  }
+
+  public DictBuilder addKey(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type));
+  }
+
+  public DictBuilder addValue(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type));
+  }
+
+  public DictBuilder add(MaterializedField col) {
+assert DictVector.fieldNames.contains(col.getName());
+tupleBuilder.add(col);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, TypeProtos.DataMode 
mode) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, mode);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, TypeProtos.DataMode 
mode) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, mode);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, int width) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, width);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, int width) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, width);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, int precision, int 
scale) {
 
 Review comment:
   This form is used for `DECIMAL`. Does it make sense to allow a `DECIMAL` for 
a key? Maybe here, and in the width case above, just allow a `MajorType`.
   
   Maybe, in general, the `SchemaBuilder` should have relied on the `Types` 
methods to build up such types rather than having methods for each...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950396#comment-16950396
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-541439848
 
 
   Going to review this in multiple stages. First, let's pin down the desired 
semantics of the DICT type. See code comments.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950402#comment-16950402
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r334287531
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
+
+  private final SchemaContainer parent;
+  private final TupleBuilder tupleBuilder = new TupleBuilder();
+  private final String memberName;
+  private final TypeProtos.DataMode mode;
+
+  public DictBuilder(String memberName, TypeProtos.DataMode mode) {
+this(null, memberName, mode);
+  }
+
+  public DictBuilder(SchemaContainer parent, String memberName, 
TypeProtos.DataMode mode) {
+this.parent = parent;
+this.memberName = memberName;
+this.mode = mode;
+  }
+
+  @Override
+  public void addColumn(ColumnMetadata column) {
+assert DictVector.fieldNames.contains(column.name());
+tupleBuilder.addColumn(column);
+  }
+
+  public DictBuilder addKey(TypeProtos.MajorType type) {
 
 Review comment:
   Good, but we allow only one key, do we not? If so, then rather than 
`addKey()` (which suggests I can add multiple), a simple `key()` would be 
simpler. Also, we should check if the key type was already set and not allow 
setting it a second time.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950399#comment-16950399
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r334287620
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
+
+  private final SchemaContainer parent;
+  private final TupleBuilder tupleBuilder = new TupleBuilder();
+  private final String memberName;
+  private final TypeProtos.DataMode mode;
+
+  public DictBuilder(String memberName, TypeProtos.DataMode mode) {
+this(null, memberName, mode);
+  }
+
+  public DictBuilder(SchemaContainer parent, String memberName, 
TypeProtos.DataMode mode) {
+this.parent = parent;
+this.memberName = memberName;
+this.mode = mode;
+  }
+
+  @Override
+  public void addColumn(ColumnMetadata column) {
+assert DictVector.fieldNames.contains(column.name());
+tupleBuilder.addColumn(column);
+  }
+
+  public DictBuilder addKey(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type));
+  }
+
+  public DictBuilder addValue(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type));
+  }
+
+  public DictBuilder add(MaterializedField col) {
+assert DictVector.fieldNames.contains(col.getName());
+tupleBuilder.add(col);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, TypeProtos.DataMode 
mode) {
 
 Review comment:
   Does it make sense for a key to be repeated or nullable? Isn't non-nullable 
the only valid key mode?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950401#comment-16950401
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r334287799
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
+
+  private final SchemaContainer parent;
+  private final TupleBuilder tupleBuilder = new TupleBuilder();
+  private final String memberName;
+  private final TypeProtos.DataMode mode;
+
+  public DictBuilder(String memberName, TypeProtos.DataMode mode) {
+this(null, memberName, mode);
+  }
+
+  public DictBuilder(SchemaContainer parent, String memberName, 
TypeProtos.DataMode mode) {
+this.parent = parent;
+this.memberName = memberName;
+this.mode = mode;
+  }
+
+  @Override
+  public void addColumn(ColumnMetadata column) {
+assert DictVector.fieldNames.contains(column.name());
+tupleBuilder.addColumn(column);
+  }
+
+  public DictBuilder addKey(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type));
+  }
+
+  public DictBuilder addValue(TypeProtos.MajorType type) {
+return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type));
+  }
+
+  public DictBuilder add(MaterializedField col) {
+assert DictVector.fieldNames.contains(col.getName());
+tupleBuilder.add(col);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, TypeProtos.DataMode 
mode) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, mode);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, TypeProtos.DataMode 
mode) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, mode);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, int width) {
+tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, width);
+return this;
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, int width) {
+tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, width);
+return this;
+  }
+
+  public DictBuilder addKey(TypeProtos.MinorType type, int precision, int 
scale) {
+return addDecimal(DictVector.FIELD_KEY_NAME, type, 
TypeProtos.DataMode.REQUIRED, precision, scale);
+  }
+
+  public DictBuilder addValue(TypeProtos.MinorType type, int precision, int 
scale) {
+return addDecimal(DictVector.FIELD_VALUE_NAME, type, 
TypeProtos.DataMode.REQUIRED, precision, scale);
+  }
+
+  public DictBuilder addNullable(String name, TypeProtos.MinorType type) {
+tupleBuilder.addNullable(name,  type);
+return this;
+  }
+
+  public DictBuilder addNullable(String name, TypeProtos.MinorType type, int 
width) {
+tupleBuilder.addNullable(name, type, width);
+return this;
+  }
+
+  public DictBuilder addNullable(String name, TypeProtos.MinorType type, int 
precision, int scale) {
+return addDecimal(name, type, TypeProtos.DataMode.OPTIONAL, precision, 
scale);
+  }
+
+  public DictBuilder addArrayValue(TypeProtos.MinorType type) {
+tupleBuilder.addArray(DictVector.FIELD_VALUE_NAME, type);
+return this;
+  }
+
+  public DictBuilder addArrayValue(TypeProtos.MinorType type, int dims) {
+tupleBuilder.addArray(DictVector.FIELD_VALUE_NAME, type, dims);
+return this;
+  }
+
+  public DictBuilder addArrayValue(TypeProtos.MinorType type, int precision, 

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950398#comment-16950398
 ] 

ASF GitHub Bot commented on DRILL-7359:
---

paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT 
type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r334287167
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java
 ##
 @@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.complex.DictVector;
+
+public class DictBuilder implements SchemaContainer {
 
 Review comment:
   Would be great to add a few comments to explain the DICT structure. Is it of 
the form DICT, where the key is of one type, the value of another? 
Or, can keys and values be of different types?
   
   Since a MAP is a collection of (key, value) pairs in which the keys are 
declared, the keys are strings, and the values can each be of a distinct type, 
it is not likely that DICT does the same.
   
   I will assume the DICT format in my comments here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >