[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006926#comment-17006926 ] Bohdan Kazydub commented on DRILL-7359: --- Merged into Apache master with commit id [7c813ff6440a118de15f552145b40eb07bb8e7a2|https://github.com/apache/drill/commit/7c813ff6440a118de15f552145b40eb07bb8e7a2]. > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Labels: ready-to-commit > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006917#comment-17006917 ] ASF GitHub Bot commented on DRILL-7359: --- asfgit commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Labels: ready-to-commit > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005258#comment-17005258 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-569647510 @paul-rogers, I've updated the PR: added tests for `key` and `value` overflow for single and `DICT` array, introduced separate `VectorState`s for single `DICT` and `DICT` array, performed minor refacttoring. Please take a look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004016#comment-17004016 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r361611436 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ColumnBuilder.java ## @@ -179,6 +189,19 @@ private ColumnState buildPrimitive(ContainerState parent, ColumnReadProjection c vectorState); } + /** + * Check if this is a special case when vector, writer and column state should be + * created for a primitive field though the field itself is not projected. This is + * needed because {@code DICT}'s {@code keys} field is not projected but is needed + * to be initialized to ensure the dict vector is constructed properly. Review comment: Dict works the same as arrays regarding projection: if the user refers to any `DICT` entry, the whole `DICT` is projected and finding the needed entry is to be handled by some other operator. This entry is not filtered in the Scan. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003980#comment-17003980 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r361606350 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderDictArray.java ## @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.resultSet.impl; + +import static org.apache.drill.test.rowSet.RowSetUtilities.map; +import static org.apache.drill.test.rowSet.RowSetUtilities.objArray; +import static org.apache.drill.test.rowSet.RowSetUtilities.strArray; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertSame; +import static org.junit.Assert.assertTrue; + +import java.util.Iterator; + +import org.apache.drill.categories.RowSetTests; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.physical.resultSet.ResultSetLoader; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.vector.accessor.DictWriter; +import org.apache.drill.exec.vector.complex.DictVector; +import org.apache.drill.exec.vector.complex.RepeatedDictVector; +import org.apache.drill.test.SubOperatorTest; +import org.apache.drill.exec.physical.rowSet.RowSet; +import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet; +import org.apache.drill.test.rowSet.RowSetUtilities; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +/** + * Test dict array support in the result set loader. + */ +@Category(RowSetTests.class) +public class TestResultSetLoaderDictArray extends SubOperatorTest { Review comment: Thanks for the excellent and complete tests. The one addition I might suggest is to add an overflow test for both the array and non-array cases. Basically, you want to make the vector exceed 16 GB. I do this by using big strings (~ 512 bytes in size.) Use an odd size: 523 or some such, to that overflow occurs in an other than a even row count. Then, use big keys to trigger overflow of the key vector. Check the resulting output batches. Then, repeat for the value vector. Overflow is the most complex part of the result set loader. If you look at examples for VarChar fields and arrays, you might see what was done before. (Avoid the "Torture" test: that one is very complex because it tries to set up near-random, yet predictable, overflow patterns.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003981#comment-17003981 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r361604989 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ColumnBuilder.java ## @@ -179,6 +189,19 @@ private ColumnState buildPrimitive(ContainerState parent, ColumnReadProjection c vectorState); } + /** + * Check if this is a special case when vector, writer and column state should be + * created for a primitive field though the field itself is not projected. This is + * needed because {@code DICT}'s {@code keys} field is not projected but is needed + * to be initialized to ensure the dict vector is constructed properly. Review comment: We have something similar with arrays: ``` SELECT a[1], a[3] FROM ... ``` In this case, only two (of possibly many) array positions are indexed. The rule adopted in the code is that if *any* index is projected, then the entire array is projected in the Scan operator. The Project operator will fiddle around with the array and pick out the two needed indexes. So, can we do the same here? If the user refers to any DICT entry, the Scan projects the entire DICT and some other operator worries about the details. This *seems* to make sense because I can't see of any reasonable way that within-array or within-DICT selection could be done in the Scan. (That said, I do hope, at some point, we can emulate Impala: push the filter into the scan. The ResultSetLoader is designed for this, but only at the row level. I suppose we could do it at the array element level with a bit of work.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003982#comment-17003982 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r361605652 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/TupleState.java ## @@ -567,4 +574,187 @@ public void dump(HierarchicalFormatter format) { .endArray() .endObject(); } + + public static class DictColumnState extends BaseContainerColumnState { +protected final DictState dictState; +protected boolean isVersioned; +protected final ColumnMetadata outputSchema; +// Indicates if value is accessed by key +protected final boolean valueProjected; + +public DictColumnState(DictState dictState, + AbstractObjectWriter writer, + VectorState vectorState, + boolean isVersioned, + boolean valueProjected) { + super(dictState.loader(), writer, vectorState); + this.dictState = dictState; + dictState.bindColumnState(this); + this.isVersioned = isVersioned; + if (isVersioned) { +outputSchema = schema().cloneEmpty(); + } else { +outputSchema = schema(); + } + dictState.bindOutputSchema(outputSchema.tupleSchema()); + this.valueProjected = valueProjected; +} + +@Override +public void buildOutput(TupleState tupleState) { + outputIndex = tupleState.addOutputColumn(vector(), outputSchema()); +} + +public DictState dictState() { + return dictState; +} + +@Override +public ContainerState container() { + return dictState; +} + +@Override +public boolean isProjected() { + return dictState.hasProjections(); +} + +public boolean isVersioned() { + return isVersioned; +} + +@Override +public ColumnMetadata outputSchema() { return outputSchema; } + } + + public static abstract class DictState extends MapState { + +public DictState(LoaderInternals events, +ResultVectorCache vectorCache, +ProjectionSet projectionSet) { + super(events, vectorCache, projectionSet); +} + +public void bindColumnState(ColumnState colState) { + super.bindColumnState(colState); + writer().bindListener(this); +} + +@Override +public boolean isDict() { + return true; +} + +@Override +protected boolean isVersioned() { + return ((DictColumnState) parentColumn).isVersioned(); +} + +@Override +public void dump(HierarchicalFormatter format) { + format.startObject(this) + .attribute("column", parentColumn.schema().name()) + .attribute("cardinality", innerCardinality()) + .endObject(); +} + } + + public static class SingleDictState extends DictState { + +public SingleDictState(LoaderInternals events, + ResultVectorCache vectorCache, + ProjectionSet projectionSet) { + super(events, vectorCache, projectionSet); +} + +@Override +public AbstractTupleWriter writer() { + return (AbstractTupleWriter) parentColumn.writer().dict().tuple(); +} + } + + public static class DictArrayState extends DictState { + +public DictArrayState(LoaderInternals events, + ResultVectorCache vectorCache, + ProjectionSet projectionSet) { + super(events, vectorCache, projectionSet); +} + +@Override +public int addOutputColumn(ValueVector vector, ColumnMetadata colSchema) { + final RepeatedDictVector repeatedDictVector = parentColumn.vector(); + DictVector dictVector = (DictVector) repeatedDictVector.getDataVector(); + if (isVersioned()) { +dictVector.putChild(colSchema.name(), vector); + } + final int index = outputSchema.addColumn(colSchema); + assert dictVector.size() == outputSchema.size(); + assert dictVector.getField().getChildren().size() == outputSchema.size(); + return index; +} + +@Override +public AbstractTupleWriter writer() { + return (AbstractTupleWriter) parentColumn.writer().array().dict().tuple(); +} + } + + public static class DictVectorState implements VectorState { + +private final ValueVector vector; Review comment: I see the challenge. Might be better to use two distinct classes to model the different implementations. The two "state" classes can have the same API to hide the differences from other code. This class is responsible for translating Result Set Loader events into "do the right thing" actions on the underlying vectors. For
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003972#comment-17003972 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r361603427 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ColumnBuilder.java ## @@ -179,6 +189,19 @@ private ColumnState buildPrimitive(ContainerState parent, ColumnReadProjection c vectorState); } + /** + * Check if this is a special case when vector, writer and column state should be + * created for a primitive field though the field itself is not projected. This is + * needed because {@code DICT}'s {@code keys} field is not projected but is needed + * to be initialized to ensure the dict vector is constructed properly. Review comment: Oops, my bad - hasn't mentioned (just a kind of implicit hint in the `@return`) in the javadoc body (do not know what is the text before `@param`, `@return` etc. is called) that this affects creation of `DICT`'s `key` when `value` is accessed by key only, i.e. `col['someKey']` with `col` being a `DICT` column with `String` (`VARCHAR`) `key`; `value` in this case will be projected. If the `DICT` is not projected, this won't create any vector: neither `DictVector` nor vector for `key` (or `value`) - special "dummy" writer for `DICT` with "dummy" writers for its children, `key` and `value`, will be used, as is done for other vectors. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003953#comment-17003953 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r361603427 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ColumnBuilder.java ## @@ -179,6 +189,19 @@ private ColumnState buildPrimitive(ContainerState parent, ColumnReadProjection c vectorState); } + /** + * Check if this is a special case when vector, writer and column state should be + * created for a primitive field though the field itself is not projected. This is + * needed because {@code DICT}'s {@code keys} field is not projected but is needed + * to be initialized to ensure the dict vector is constructed properly. Review comment: Oops, my bad - hasn't mentioned (just a kind of implicit hint in the `@return`) in the javadoc body (do not know what is the text before `@param`, `@return` etc. is called) that this affects creation of `DICT`'s `key` when `value` is accessed by key only, i.e. `col['someKey']` with `col` being a `DICT` column with `String` (`VARCHAR`) `key`; `value` in this case will be projected. If the `DICT` is not projected, this won't create any vector: neither `DictVector` nor vector for `key` (or `value`) - special "dummy" vector for `DICT` will be used, as is done for other vectors. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003946#comment-17003946 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r361600935 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ContainerState.java ## @@ -172,4 +172,8 @@ public void close() { colState.close(); } } + + public boolean isDict() { +return false; + } Review comment: It is better, this was removed, yes. I'll try to eliminate this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003945#comment-17003945 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r361600501 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/TupleState.java ## @@ -567,4 +574,187 @@ public void dump(HierarchicalFormatter format) { .endArray() .endObject(); } + + public static class DictColumnState extends BaseContainerColumnState { +protected final DictState dictState; +protected boolean isVersioned; +protected final ColumnMetadata outputSchema; +// Indicates if value is accessed by key +protected final boolean valueProjected; + +public DictColumnState(DictState dictState, + AbstractObjectWriter writer, + VectorState vectorState, + boolean isVersioned, + boolean valueProjected) { + super(dictState.loader(), writer, vectorState); + this.dictState = dictState; + dictState.bindColumnState(this); + this.isVersioned = isVersioned; + if (isVersioned) { +outputSchema = schema().cloneEmpty(); + } else { +outputSchema = schema(); + } + dictState.bindOutputSchema(outputSchema.tupleSchema()); + this.valueProjected = valueProjected; +} + +@Override +public void buildOutput(TupleState tupleState) { + outputIndex = tupleState.addOutputColumn(vector(), outputSchema()); +} + +public DictState dictState() { + return dictState; +} + +@Override +public ContainerState container() { + return dictState; +} + +@Override +public boolean isProjected() { + return dictState.hasProjections(); +} + +public boolean isVersioned() { + return isVersioned; +} + +@Override +public ColumnMetadata outputSchema() { return outputSchema; } + } + + public static abstract class DictState extends MapState { + +public DictState(LoaderInternals events, +ResultVectorCache vectorCache, +ProjectionSet projectionSet) { + super(events, vectorCache, projectionSet); +} + +public void bindColumnState(ColumnState colState) { + super.bindColumnState(colState); + writer().bindListener(this); +} + +@Override +public boolean isDict() { + return true; +} + +@Override +protected boolean isVersioned() { + return ((DictColumnState) parentColumn).isVersioned(); +} + +@Override +public void dump(HierarchicalFormatter format) { + format.startObject(this) + .attribute("column", parentColumn.schema().name()) + .attribute("cardinality", innerCardinality()) + .endObject(); +} + } + + public static class SingleDictState extends DictState { + +public SingleDictState(LoaderInternals events, + ResultVectorCache vectorCache, + ProjectionSet projectionSet) { + super(events, vectorCache, projectionSet); +} + +@Override +public AbstractTupleWriter writer() { + return (AbstractTupleWriter) parentColumn.writer().dict().tuple(); +} + } + + public static class DictArrayState extends DictState { + +public DictArrayState(LoaderInternals events, + ResultVectorCache vectorCache, + ProjectionSet projectionSet) { + super(events, vectorCache, projectionSet); +} + +@Override +public int addOutputColumn(ValueVector vector, ColumnMetadata colSchema) { + final RepeatedDictVector repeatedDictVector = parentColumn.vector(); + DictVector dictVector = (DictVector) repeatedDictVector.getDataVector(); + if (isVersioned()) { +dictVector.putChild(colSchema.name(), vector); + } + final int index = outputSchema.addColumn(colSchema); + assert dictVector.size() == outputSchema.size(); + assert dictVector.getField().getChildren().size() == outputSchema.size(); + return index; +} + +@Override +public AbstractTupleWriter writer() { + return (AbstractTupleWriter) parentColumn.writer().array().dict().tuple(); +} + } + + public static class DictVectorState implements VectorState { + +private final ValueVector vector; Review comment: This class is used for both `DictVector` and `RepeatedDictVector`. Unlike for `MAP`, they do not have a common parent class (other than `RepeatedValueVector` interface). Should this be changed to `RepeatedValueVector` or should I create another `VectorState` for `RepeatedDictVector`?
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003845#comment-17003845 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r361567680 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/AbstractKeyAccessor.java ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor; + +import org.joda.time.Instant; +import org.joda.time.LocalDate; +import org.joda.time.LocalTime; +import org.joda.time.Period; + +import java.math.BigDecimal; + +public class AbstractKeyAccessor implements KeyAccessor { + + protected final DictReader dictReader; + protected final ScalarReader keyReader; + + protected AbstractKeyAccessor(DictReader dictReader, ScalarReader keyReader) { +this.dictReader = dictReader; +this.keyReader = keyReader; + } + + @Override + public boolean find(boolean key) { +throw new UnsupportedOperationException(this.getClass().getName() + " does not support boolean key."); Review comment: Nit: ``` throw unsupported("boolean"); ... private UnsupportedOperationException unsupported(String type) { return new UnsupportedOperationException(String.format( "%s does not support a key of type %s.", getClass().getName(), type)); } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003846#comment-17003846 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r361551286 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ColumnBuilder.java ## @@ -179,6 +189,19 @@ private ColumnState buildPrimitive(ContainerState parent, ColumnReadProjection c vectorState); } + /** + * Check if this is a special case when vector, writer and column state should be + * created for a primitive field though the field itself is not projected. This is + * needed because {@code DICT}'s {@code keys} field is not projected but is needed + * to be initialized to ensure the dict vector is constructed properly. Review comment: Interesting. I'm a bit confused, however. Help me understand what's happening here. If a column X is unprojected, this means that, in the batch coming out of the Scan, say, we don't want column X to appear at all. This why, for all other projected columns, we do not create an actual value vector. From the perspective of the reader using the Result Set Loader, the column does exist: I can write to it. So, we create a "dummy" writer: one which simply ignores the values given it. Let's imagine how that would work for a DICT. My input file has a DICT field. But, that DICT is not projected. Either way, I may find it easier to read the DICT whether it is projected or not. Though a DICT represents a map conceptually (and is implemented as a different kind of map internally), it is really just a correlated array at the data level. So, I just want to write key/value pairs to the reader. A previous comment sketched out how I might want to do that. (I've not yet gotten to the new code where that part is implemented.) Unless I'm missing something obvious, I think the above reasoning means that, if a DICT is not projected, we want a dummy DICT writer (using dummy writers for the keys and values). But, I can't see why we would create the actual DICT vector. Can you help me see what I'm missing? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003844#comment-17003844 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r361565139 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ContainerState.java ## @@ -172,4 +172,8 @@ public void close() { colState.close(); } } + + public boolean isDict() { +return false; + } Review comment: Should clients of this class care what kind of container this is? Shouldn't all DICT-specific code be in the DICT related classes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003847#comment-17003847 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r361566453 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/TupleState.java ## @@ -567,4 +574,187 @@ public void dump(HierarchicalFormatter format) { .endArray() .endObject(); } + + public static class DictColumnState extends BaseContainerColumnState { +protected final DictState dictState; +protected boolean isVersioned; +protected final ColumnMetadata outputSchema; +// Indicates if value is accessed by key +protected final boolean valueProjected; + +public DictColumnState(DictState dictState, + AbstractObjectWriter writer, + VectorState vectorState, + boolean isVersioned, + boolean valueProjected) { + super(dictState.loader(), writer, vectorState); + this.dictState = dictState; + dictState.bindColumnState(this); + this.isVersioned = isVersioned; + if (isVersioned) { +outputSchema = schema().cloneEmpty(); + } else { +outputSchema = schema(); + } + dictState.bindOutputSchema(outputSchema.tupleSchema()); + this.valueProjected = valueProjected; +} + +@Override +public void buildOutput(TupleState tupleState) { + outputIndex = tupleState.addOutputColumn(vector(), outputSchema()); +} + +public DictState dictState() { + return dictState; +} + +@Override +public ContainerState container() { + return dictState; +} + +@Override +public boolean isProjected() { + return dictState.hasProjections(); +} + +public boolean isVersioned() { + return isVersioned; +} + +@Override +public ColumnMetadata outputSchema() { return outputSchema; } + } + + public static abstract class DictState extends MapState { + +public DictState(LoaderInternals events, +ResultVectorCache vectorCache, +ProjectionSet projectionSet) { + super(events, vectorCache, projectionSet); +} + +public void bindColumnState(ColumnState colState) { + super.bindColumnState(colState); + writer().bindListener(this); +} + +@Override +public boolean isDict() { + return true; +} + +@Override +protected boolean isVersioned() { + return ((DictColumnState) parentColumn).isVersioned(); +} + +@Override +public void dump(HierarchicalFormatter format) { + format.startObject(this) + .attribute("column", parentColumn.schema().name()) + .attribute("cardinality", innerCardinality()) + .endObject(); +} + } + + public static class SingleDictState extends DictState { + +public SingleDictState(LoaderInternals events, + ResultVectorCache vectorCache, + ProjectionSet projectionSet) { + super(events, vectorCache, projectionSet); +} + +@Override +public AbstractTupleWriter writer() { + return (AbstractTupleWriter) parentColumn.writer().dict().tuple(); +} + } + + public static class DictArrayState extends DictState { + +public DictArrayState(LoaderInternals events, + ResultVectorCache vectorCache, + ProjectionSet projectionSet) { + super(events, vectorCache, projectionSet); +} + +@Override +public int addOutputColumn(ValueVector vector, ColumnMetadata colSchema) { + final RepeatedDictVector repeatedDictVector = parentColumn.vector(); + DictVector dictVector = (DictVector) repeatedDictVector.getDataVector(); + if (isVersioned()) { +dictVector.putChild(colSchema.name(), vector); + } + final int index = outputSchema.addColumn(colSchema); + assert dictVector.size() == outputSchema.size(); + assert dictVector.getField().getChildren().size() == outputSchema.size(); + return index; +} + +@Override +public AbstractTupleWriter writer() { + return (AbstractTupleWriter) parentColumn.writer().array().dict().tuple(); +} + } + + public static class DictVectorState implements VectorState { + +private final ValueVector vector; Review comment: Part of the goal of this class is to provide the actual vector class. Should this be `DictVector`? Same with `vector()`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003757#comment-17003757 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-569130340 @paul-rogers, I've updated the PR, please have a look again. @arina-ielchiieva this is now fixed, thanks for pointing this out! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003320#comment-17003320 ] ASF GitHub Bot commented on DRILL-7359: --- arina-ielchiieva commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-568912112 @KazydubB I see you have pushed new commit, is PR ready for review? Please tag Paul if yes. Also please check if `org.apache.drill.exec.record.metadata.DictBuilder` method `public DictBuilder resumeMap()` was fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998007#comment-16998007 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-566448821 @paul-rogers, I'm sorry for taking so long. I'm planning to finish this in a few days. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997606#comment-16997606 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-566219911 Would be great if we could push this over the finish line. There will be a conflict with DRILL-7486. I made some suggestions about finding DICT values and handling missing values. Let me know if I can help with that work. If we're not sure what we want to do in this area, then, as noted earlier, might be good to commit what we have; then we can refine it over time as we try the code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997428#comment-16997428 ] ASF GitHub Bot commented on DRILL-7359: --- arina-ielchiieva commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-566130764 Please fix bug in `org.apache.drill.exec.record.metadata.DictBuilder` ``` public DictBuilder resumeMap() { build(); return (DictBuilder) parent; } ``` Resume map should return map builder. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989080#comment-16989080 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r354501095 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: @KazydubB, first, thanks for understanding my hastily-written example; I've edited it to be accurate. We are defining an API. APIs exist to support use cases, so it is very helpful to think about those. (My bad for not doing that earlier; I was focused more on the plumbing aspects early on. This is, in fact, why I still like design docs, even in the era of Agile, so we can visualize these issues early on.) In my view, there are two key use cases: tests and production. In tests, we simply want to verify that, say, a reader correctly created (or filtered, or altered) a `DICT`. If a key exists, we want to verify its value. If a key does not exist, we simply need to verify that fact. So, having a generic `boolean findKey(Object key)` is perfectly fine. The overhead of doing a type-based switch statement on every call is fine. (This is why tests often use the `addRow()` methods which, internally, have a big, slow `if` statement based on the type of the value.) The second use case is production. Here, we can assume our goal is maximum performance via generated code. I would argue that we do not want any extra code in the access path. Instead, the code should be specific to each type. This is why the readers are designed to go directly from, say, `getString()` into the code that grabs the bytes from the underlying value vector. For production code, we'd want a type-aware "find" operation. There are a couple of ways to do that. One is by adding a "key accessor": ``` class DictReader ... KeyAccessor keyAccessor(); ... } interface KeyAccessor { boolean findInt(int key); boolean findString(String key); ... } ``` The accessor code gen could then generate type-specific search code by creating a key accessor class for each supported key type. This would be done in the build-time code gen, using Freemarker, as part of the existing `columnAccessors.java` template and code gen. Now things get interesting. We need to generate code for each SQL operation that uses a `DICT`. Let's make up an example: ``` SELECT ... FROM myTableWithDict WHERE d['foo'] = 20 ``` Here, the story splits into two paths. Today, code gen works directly against value vectors: you will need to generate the find code inline for each operator. Longer term, it is my hope that we can shift to using the column readers. (Doing so will simplify code gen, and puts us in a position to use multiple value vector format, such as Arrow, if we wanted.) So, let's imagine we are using your `DICT` readers in runtime code generation. We want to generate code for the above example. In either case (old-school or reader-based), we don't know the type until the first row appears. At that point, we generate run-time code that, in our example, uses a `VARCHAR` key to lookup an `INT` value and send it to the equality function. Very crudely: ``` DictReader dr; KeyAccessor ka; ScalarReader vr; String constKey; String constArg; void doSetup() { dr = rowReader.dict("d"); ka = dr.keyAccessor(); vr = dr.value().scalar(); constKey = "foo"; constArg = "bar"; } void doEval() { boolean isEquals; boolean found = ka.findString(constKey); if (found) { isEquals = strEquals(vr.getString(), constArg); } else { isEquals = false; } // do something with isEquals... } ``` The above is written in "code gen style", but greatly simplified. So, we've looked at two use cases: test and production. In neither did we need to work with a missing value. Instead, we observed the rules that a non-existent key has no value (rather than a non-existent key
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989053#comment-16989053 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r353522274 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: @KazydubB, thanks for the explanation. It clarifies something I wondered about: the role of the null state readers. Let's talk a bit about the intended semantics for the readers. I should be able to write code that caches the readers: ``` DictReader dr = rowReader.dict("myDict"); ScalarReader kw = dr.key().scalar(); MapReader vr = dr.value.tuple(); ``` That is, I should be able to use the same key reader (`kr`) and value reader (`vr`) for all members. Then, I should either be able to iterate over the values: ``` while (dr.next()) { System.out.println(kr.getString()); System.out.println(vr.getAsString()); } ``` If things work that way, then we don't need a special reader for the null state: the value reader (here, `vr`) would simply return `true` from `isNull()`: ``` while (dr.next()) { System.out.println(kr.getString()); if (vr.isNull()) { System.out.println("NULL"); } else { System.out.println(vr.getAsString()); } } ``` (Ignoring, for the moment, that `getAsString()` already does a null check.) Note that, if we read past the end, both the key and value writers will point to nothing. I think that, for the array reader, accessing the value one past the end is an error. For `DICT`, we could simply declare that the value is NULL. Then, I was going to sketch out the same behavior if we search for a key and found...nothing. Seems that the `DictReader` has no way to search for a key. If it did, it would not be clear what type to use for the key. So, that is a bit of a mess. Let's assume we did have something to find a specific value, like we do for the arrays where we have `setPosn(int index)`. Maybe we have `findStringKey()` etc. For testing, we could just have `findKey(Object key)` and do a switch on type as we do for the other readers. So: ``` dr.findStringKey("foo"); if (vr.isNull()) { System.out.println("NULL"); } else { System.out.println(vr.getAsString()); } ``` Or, even: ``` if (dr.findStringKey("foo")) { System.out.println(vr.getAsString()); } ``` The only way to find the value is to do a linear search. If we do a linear search, and no value exists, the index will point one past the end of the dict. And, above, we suggested that the key and value readers could return `true` for `isNull()` in those cases. Given this, unless I'm missing something, I don't see that we need any kind of null value reader at all. In fact, I'd even say that we don't want such a reader if we'd have to do: ``` ObjectReader kr = dr.findStringKey("foo"); if (kr.isNull) { // kr is a special null reader } else { // kr is a Map reader } ``` The above clutters the code and makes it impossible to cache the value reader, which will slow inner-most loops that want to use the reader abstractions. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988896#comment-16988896 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r354275004 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: @paul-rogers, first of all, thank you for your thoughts - they are very useful. Regarding the first case you described (I suppose you meant `ScalarReader kw` and `MapReader vr` in the first code snippet): one is able to write code with cached readers currently, if needed for such iterations over key-value pairs in the form you wrote it ``` while (dr.next()) { System.out.println(kr.getString()); System.out.println(vr.getAsString()); } ``` with the behaviour you expect (I need to emphasize, that special `Null*Reader`s appear _only_ when finding value `ObjectReader` by certain `key` and the given `key` does not exist for the row). But there is no need for the case (here and below) to add a special `NULL`-`NULL` entry past the end, as iteration should be over _actual_ collection of entries, which were added to the `Dict` (through `DictWriter`). Regarding finding specific values: there is a `DictReader#getValueReader(Object key)` method (with return type `ObjectReader`), which allows to obtain `value`'s reader with its position set to the one corresponding to needed `key`. If there is no one for the `key` - the special `Null*Reader` (depends on the type of `value`) is returned. So, this approach enforces non-caching of the value reader (as you have pointed out). The idea was to make `null` `value`s to be handled the same, indepently whether the `null` is from the actual value (i.e., from some entry with some `key` and `null` `value` `"someKey" -> null`) or from absent `key`, e.g. your example: ``` ObjectReader kr = dr.findStringKey("foo"); if (kr.isNull) { // kr is a special null reader } else { // kr is a Map reader } ``` is likely should be done, because of the nature of complex types in Drill - they are generally non-nullable (at least the ones in standard, pre-EVF, Drill's framework) leading to no distinction between, say, empty or `null` `MAP`. Also, worth noting, that `vr` (defined as `ObjectReader vr = dictReader.getValueReader("foo")`) will have correct behavior by itself as opposed to `NULL` emulation on `DictReader` level: ``` if (dr.findStringKey("foo")) { System.out.println(vr.getAsString()); } ``` will likely need an `else` case also which should be emulated too (I mean, the `ValueReader` itself does not know if it points to `NULL` value within a dict as this convention is done on the dict-level; some value reader wrapper can be introduced of course, but that is another layer of complexity). But good point about poorer perfomance (with the current approach). I like your idea with `boolean findKey(...)` - something similar is done in standard framework as well. (In this case, your suggestion about treating entries by the end as if they were `null` does make sense). Perhaps, I will stick to it after it is clear this is definetely better. About designated method for each key type - I assume it is redundant, as `key` type validation is better to be performed once prior to obtaining the data, and general case of `Object key` should do, shouldn't it? (Just want to note, that array-like behaviour is useful in testing primarily, the "real" use-case is to obtain elements by key). P.s. Sorry, missed the comment yesterday. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL:
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988878#comment-16988878 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r354303721 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: Yes, there is a need to have multiple methods for each key type to avoid invocation of generalized `keyReader.getObject()` which contains `switch` checking each type (really bad in production) - will fix it. *EDIT*: actually, there is another, rather more elegant, way to hadle this without the need to duplicate code/add many methods: an instance variable `private final Supplier keyObjectExtractor` can be defined in `DictReaderImpl` which will be initializated during creation as ``` private Supplier getKeyObjectExtractor() { switch (keyColumnType()) { case BOOLEAN: return keyReader::getBoolean; case BYTES: return keyReader::getBytes; ... } } ``` This makes possible to use `Object` in `find(Object key)` because we can use `keyObjectExtractor.get()` whenever we need key `Object` value, thus making code cleaner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988874#comment-16988874 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r354303721 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: Yes, there is a need to have multiple methods for each key type to avoid invocation of generalized `keyReader.getObject()` which contains `switch` checking each type (really bad in production) - will fix it. *EDIT*: actually, there is another, rather more elegant, way to hadle this without the need to duplicate code/add many methods: an instance variable `private final Supplier keyObjectExtractor` can be defined in `DictReaderImpl` which will be initiated during creation as ``` private Supplier getKeyObjectExtractor() { switch (keyColumnType()) { case BOOLEAN: return keyReader::getBoolean; case BYTES: return keyReader::getBytes; ... } } ``` This makes possible to use `Object` in `find(Object key)` because we can use `keyObjectExtractor.get()` whenever we need key `Object` value, thus making code cleaner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988871#comment-16988871 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r354303721 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: Yes, there is a need to have multiple methods for each key type to avoid invocation of generalized `keyReader.getObject()` which contains `switch` checking each type (really bad in production) - will fix it. *EDIT*: actually, there is another, rather more elegant, way to hadle this without the need to duplicate code/add many methods: an instance variable `private final Supplier keyObjectExtractor` can be defined in `DictReaderImpl` which will be initiated during creation as ``` private Supplier getKeyObjectExtractor() { switch (keyColumnType()) { case BOOLEAN: return keyReader::getBoolean; case BYTES: return keyReader::getBytes; ... } } ``` This makes possible to use `Object` in `find(Object key)` thus making code cleaner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988793#comment-16988793 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r354303721 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: Yes, there is a need to have multiple methods for each key type to avoid invocation of generalized `keyReader.getObject()` which contains `switch` checking each type (really bad in production) - will fix it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988754#comment-16988754 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r354275004 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: @paul-rogers, first of all, thank you for your thoughts - they are very useful. Regarding the first case you described (I suppose you meant `ScalarReader kw` and `MapReader vr` in the first code snippet): one is able to write code with cached readers currently, if needed for such iterations over key-value pairs in the form you wrote it ``` while (dr.next()) { System.out.println(kr.getString()); System.out.println(vr.getAsString()); } ``` with the behaviour you expect (without a special `Null*Reader`s). But there is no need for the case (here and below) to add a special `NULL`-`NULL` entry past the end, as iteration should be over _actual_ collection of entries, which were added to the `Dict` (through `DictWriter`). Regarding finding specific values: there is a `DictReader#getValueReader(Object key)` method (with return type `ObjectReader`), which allows to obtain `value`'s reader with its position set to the one corresponding to needed `key`. If there is no one for the `key` - the special `Null*Reader` (depends on the type of `value`) is returned. So, this approach enforces non-caching of the value reader (as you have pointed out). The idea was to make `null` `value`s to be handled the same, indepently whether the `null` is from the actual value (i.e., from some entry with some `key` and `null` `value` `"someKey" -> null`) or from absent `key`, e.g. your example: ``` ObjectReader kr = dr.findStringKey("foo"); if (kr.isNull) { // kr is a special null reader } else { // kr is a Map reader } ``` is likely should be done, because of the nature of complex types in Drill - they are generally non-nullable (at least the ones in standard, pre-EVF, Drill's framework) leading to no distinction between, say, empty or `null` `MAP`. Also, worth noting, that `vr` (defined as `ObjectReader vr = dictReader.getValueReader("foo")`) will have correct behavior by itself as opposed to `NULL` emulation on `DictReader` level: ``` if (dr.findStringKey("foo")) { System.out.println(vr.getAsString()); } ``` will likely need an `else` case also which should be emulated too (I mean, the `ValueReader` itself does not know if it points to `NULL` value within a dict as this convention is done on the dict-level; some value reader wrapper can be introduced of course, but that is another layer of complexity). But good point about poorer perfomance (with the current approach). I like your idea with `boolean findKey(...)` - something similar is done in standard framework as well. (In this case, your suggestion about treating entries by the end as if they were `null` does make sense). Perhaps, I will stick to it after it is clear this is definetely better. About designated method for each key type - I assume it is redundant, as `key` type validation is better to be performed once prior to obtaining the data, and general case of `Object key` should do, shouldn't it? (Just want to note, that array-like behaviour is useful in testing primarily, the "real" use-case is to obtain elements by key). P.s. Sorry, missed the comment yesterday. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988729#comment-16988729 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r354275004 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: @paul-rogers, first of all, thank you for your thoughts - they are very useful. Regarding the first case you described (I suppose you meant `ScalarReader kw` and `MapReader vr` in the first code snippet): one is able to write code with cached readers currently, if needed for such iterations over key-value pairs in the form you wrote it ``` while (dr.next()) { System.out.println(kr.getString()); System.out.println(vr.getAsString()); } ``` with the behaviour you expect (without a special `Null*Reader`s). But there is no need for the case (here and below) to add a special `NULL`-`NULL` entry past the end, as iteration should be over _true_ collection of entries, which were added to the `Dict` (through `DictWriter`). Regarding finding specific values: there is a `DictReader#getValueReader(Object key)` method (with return type `ObjectReader`), which allows to obtain `value`'s reader with its position set to the one corresponding to needed `key`. If there is no one for the `key` - the special `Null*Reader` (depends on the type of `value`) is returned. So, this approach enforces non-caching of the value reader (as you have pointed out). The idea was to make `null` `value`s to be handled the same, indepently whether the `null` is from the actual value (i.e., from some entry with some `key` and `null` `value` `"someKey" -> null`) or from absent `key`, e.g. your example: ``` ObjectReader kr = dr.findStringKey("foo"); if (kr.isNull) { // kr is a special null reader } else { // kr is a Map reader } ``` is likely should be done, because of the nature of complex types in Drill - they are generally non-nullable (at least the ones in standard, pre-EVF, Drill's framework) leading to no distinction between, say, empty or `null` `MAP`. But good point about poorer perfomance (with the current approach). I like your idea with `boolean findKey(...)` - something similar is done in standard framework as well. Perhaps, I will stick to it. (In this case, your suggestion about treating entries by the end as if they were `null` does make sense). About designated method for each key type - I assume it is redundant, as `key` type validation is better to be performed once prior to obtaining the data, and general case of `Object key` should do, shouldn't it? (Just want to note, that array-like behaviour is useful in testing primarily, the "real" use-case is to obtain elements by key). P.s. Sorry, missed the comment yesterday. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987453#comment-16987453 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-561448583 On `ProjectionType`, if the `SchemaPath` is, say, `foo`, then the `ProjectionType` will be `SCALAR`. I can see that this name is misleading. Perhaps `SIMPLE` is better. We don't know that the column **is** scalar, we just know that it is consistent with a scalar. It could be a map, array or dict as well. But, if you do add a `SchemaPath` representation for `foo['bar']`, then, yes, that can only be a `DICT`, so we can introduce a `DICT` projection type. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987446#comment-16987446 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r353522274 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: @KazydubB, thanks for the explanation. It clarifies something I wondered about: the role of the null state readers. Let's talk a bit about the intended semantics for the readers. I should be able to write code that caches the readers: ``` DictReader dr = rowReader.dict("myDict"); ScalarWriter kw = dr.key().scalar(); MapWriter vr = dr.value.tuple(); ``` That is, I should be able to use the same key reader (`kr`) and value reader (`vr`) for all members. Then, I should either be able to iterate over the values: ``` while (dr.next()) { System.out.println(kr.getString()); System.out.println(vr.getAsString()); } ``` If things work that way, then we don't need a special reader for the null state: the value reader (here, `vr`) would simply return `true` from `isNull()`: ``` while (dr.next()) { System.out.println(kr.getString()); if (vr.isNull()) { System.out.println("NULL"); } else { System.out.println(vr.getAsString()); } } ``` (Ignoring, for the moment, that `getAsString()` already does a null check.) Note that, if we read past the end, both the key and value writers will point to nothing. I think that, for the array reader, accessing the value one past the end is an error. For `DICT`, we could simply declare that the value is NULL. Then, I was going to sketch out the same behavior if we search for a key and found...nothing. Seems that the `DictReader` has no way to search for a key. If it did, it would not be clear what type to use for the key. So, that is a bit of a mess. Let's assume we did have something to find a specific value, like we do for the arrays where we have `setPosn(int index)`. Maybe we have `findStringKey()` etc. For testing, we could just have `findKey(Object key)` and do a switch on type as we do for the other readers. So: ``` dr.findStringKey("foo"); if (vr.isNull()) { System.out.println("NULL"); } else { System.out.println(vr.getAsString()); } ``` Or, even: ``` if (dr.findStringKey("foo")) { System.out.println(vr.getAsString()); } ``` The only way to find the value is to do a linear search. If we do a linear search, and no value exists, the index will point one past the end of the dict. And, above, we suggested that the key and value readers could return `true` for `isNull()` in those cases. Given this, unless I'm missing something, I don't see that we need any kind of null value reader at all. In fact, I'd even say that we don't want such a reader if we'd have to do: ``` ObjectReader kr = dr.findStringKey("foo"); if (kr.isNull) { // kr is a special null reader } else { // kr is a Map reader } ``` The above clutters the code and makes it impossible to cache the value reader, which will slow inner-most loops that want to use the reader abstractions. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986826#comment-16986826 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-561135023 @paul-rogers, `ProjectionType` now has no notion of `MinorType#DICT` and treats it as `ProjectionType#SCALAR`. And yes, we do need to allow Python-like syntax for the type. (My bad, haven't made changes to this). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986824#comment-16986824 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r353132642 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: Regarding points of your suggestion: - `DictReaderImpl` has `NullStateReaders.REQUIRED_STATE_READER` `NullStateReader`, i.e. it can't be `NULL` (like `MapReader`). - `DICT` can't have `NULL` key as it is `REQUIRED`. - If the `DICT` has a key, but the value is `NULL`, then `isNull()` for the value returns `true` - yes. - in the (tricky) case of a missing key from the `DICT`, the latter approach was taken - there is no differentiation between missing key and `NULL` value. (The new `Null*Reader`s are introduced to be used in the case of a missing key - their `isNull()` always returns `true`; note, that in case when a key is present and the value is `NULL`, current (non-null) readers' `isNull()` will return `true` for that particular position, as corresponding value reader keeps track of nullability). The former approach was discarded to ensure uniform behavior without a chance of getting `NPE` or making `null`-checks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986227#comment-16986227 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r352737013 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: We may want to define our semantics here. If I understand your description, we have: * The entire `DICT` is `NULL` (is this allowed)? * The `DICT` has an entry for key `x`, but the value is null. * The `DICT` has an entry for the `NULL` key. * The `DICT` has no entry for the key `x`, meaning that the value is implicitly `NULL`. We have similar situations with the `UNION` type, perhaps we can learn from that: * The entire `UNION` can be `NULL`. * The `UNION` can have an entry for, say, `VARCHAR`, but the `VARCHAR` is `NULL`. * The `UNION` does not have an entry for a given type (say, `INT`), so the value is implicitly `NULL`. (Although we don't allow access to such a type.) My suggestion would be that we use the `isNull()` method for all the `NULL` cases: * If the `DICT` is `NULL` for a row, then `isNull()` returns true for the `DICT` reader. * If the `DICT` has a `NULL` key, then the key `isNull()` for the key returns true. * If the `DICT` has a key, but the value is `NULL`, then `isNull()` for the value returns true. Now the tricky case: a key is missing from the `DICT`: * If the `DICT` does not have a key, then we may want a `hasKey()` method, and the value reader should be a Java `null` because it does not exist. Or: * If the `DICT` does not have a key, then the value reader's `isNull()` returns true. (This means we cannot differentiate the has-key-with-null-value vs. does-not-have-a-key cases.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986226#comment-16986226 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r352737013 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: We may want to define our semantics here. If I understand your description, we have: * The entire `DICT` is `NULL` (is this allowed)? * The `DICT` has an entry for key `x`, but the value is null. * The `DICT` has an entry for the `NULL` key. * The `DICT` has no entry for the key `x`, meaning that the value is implicitly `NULL`. We have similar situations with the `UNION` type, perhaps we can learn from that: * The entire `UNION` can be `NULL`. * The `UNION` can have an entry for, say, `VARCHAR`, but the `VARCHAR` is `NULL`. * The `UNION` does not have an entry for a given type (say, `INT`), so the value is implicitly `NULL`. (Although we don't allow access to such a type.) My suggestion would be that we use the `isNull()` method for all the `NULL` cases: * If the `DICT` is `NULL` for a row, then `isNull()` returns true for the `DICT` reader. * If the `DICT` has a `NULL` key, then the key `isNull()` for the key returns true. * If the `DICT` has a key, but the value is `NULL`, then `isNull()` for the value returns true. Not the tricky case: * If the `DICT` does not have a key, then we may want a `hasKey()` method, and the value reader should be a Java `null` because it does not exist. Or: * If the `DICT` does not have a key, then the value reader's `isNull()` returns true. (This means we cannot differentiate the has-key-with-null-value vs. does-not-have-a-key cases.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986215#comment-16986215 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-560498165 @arina-ielchiieva, @KazydubB, it turns out that `ProjectionType` does support `DICT`. This is a subtle topic. The projection type encodes information we can glean from the `SELECT` list, as encoded in the `SchemaPath`: `a`, `a[3]`, `a.b`. As a result, `DICT` is already covered: a reference will be either to the whole dict: `d`, or to a member: `d.m`. We would change the `ProjectionType` only if we enhance the `SchemaPath` to, say, allow Python-like syntax: `d[`foo`]. Did we make any such changes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986129#comment-16986129 ] ASF GitHub Bot commented on DRILL-7359: --- arina-ielchiieva commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-560440925 @KazydubB looks like `org.apache.drill.exec.physical.resultSet.project.ProjectionType` does not support Dict type, could you please check? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986067#comment-16986067 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r352608433 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: Yes, this is for the case "where the `value` of a `key`/`value` pair is null". This `value` can be `null` for a given `key` if the `key` is not present in the dict for a given row (e.g. if you have a `Map` with entry `"a" : 1` and you try to `Map#get("b")`, with `Map` being a Java map in the example). That is, the value may be non-nullable, but when you fetch a `value` by `key` and there is no entry for the given key (in a row) then this `Null*Reader` is used. The `Null*Reader`s are created in https://github.com/apache/drill/pull/1870/files#diff-da401fee5976aec49192b5fb6e22de3eR83. This structures are handled outside (of particular `DictReader`) so the structure is retained for uniformity. The "dummy" readers are not projected (merely a placeholder, aren't they?), so this is not the way to use them(?). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986066#comment-16986066 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r352608433 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: Yes, this is for the case "where the `value` of a `key`/`value` pair is null". This `value` can be `null` for a given `key` if the `key` is not present in the dict for a given row (e.g. if you have a `Map` with entry `"a" : 1` and you try to `Map#get("b")`, with `Map` being a Java map in the example). That is, the value may be non-nullable, but when you fetch a `value` by `key` and there is no entry for the given key (in a row) then this `Null*Reader` is used. The `Null*Reader`s are created in https://github.com/apache/drill/pull/1870/files#diff-da401fee5976aec49192b5fb6e22de3eR83. This structures are handled outside (of particular `DictReader`) so the structure is retained for uniformity. The "dummy" readers are not projected (merely a placeholder, aren't they?), so this is not the way to use them(?). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985210#comment-16985210 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r352252495 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { +AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; +for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); +} +return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + +private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: This approach is a bit better than the big null reader we had before. I wonder, however, why we need such a reader? Null (AKA "dummy") readers exist, but generally for specialized cases (such as a List that has no type.) I wonder, is this for the case where the value of a key/value pair is null? If so, then two thoughts. First, the right way to handle that is to: * Use the `isNull()` method on the `ColumnReader` class: ``` if (myDictReader.value().isNull()) { // Handle null value } else { int x = myDictReader.value().scalar().getInt(); } ``` The idea is that null values are null: they have no structure. The approach here seems to say that null values have structure (such as a null union), so we can ask what types the null has and so on. But, this is, perhaps, too much of a fantasy. The other suggestion would be to suggest that values cannot be null, especially if they are of structured types. Also, note that, for unions, we already have multiple levels of nulls: The union itself can be null. When used in a list, the list entry can be null. The union can have nullable types, so that the union might have an int, but the int is null. This is already a mess, we probably don't want to make it even more complex. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984558#comment-16984558 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-559567933 @paul-rogers, I've updated the PR, please look again This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978532#comment-16978532 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r348557875 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java ## @@ -0,0 +1,369 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor.reader; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.record.metadata.VariantMetadata; +import org.apache.drill.exec.vector.accessor.ArrayReader; +import org.apache.drill.exec.vector.accessor.DictReader; +import org.apache.drill.exec.vector.accessor.ObjectReader; +import org.apache.drill.exec.vector.accessor.ObjectType; +import org.apache.drill.exec.vector.accessor.ScalarReader; +import org.apache.drill.exec.vector.accessor.TupleReader; +import org.apache.drill.exec.vector.accessor.ValueType; +import org.apache.drill.exec.vector.accessor.VariantReader; +import org.joda.time.Instant; +import org.joda.time.LocalDate; +import org.joda.time.LocalTime; +import org.joda.time.Period; + +import java.math.BigDecimal; + +/** + * Dummy reader which returns {@code null} for scalar types and itself for complex types. + */ +public class NullReader implements ScalarReader, ArrayReader, TupleReader, VariantReader, ObjectReader, DictReader { Review comment: I agree with your thoughts: will use existing `AbstractScalarReader.NullReader` in case of scalar values and will change the `NullReader` for complex case in a similar (to `AbstractScalarReader.NullReader`) maner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978436#comment-16978436 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r348488284 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/ObjectDictWriter.java ## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor.writer; + +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.vector.UInt4Vector; +import org.apache.drill.exec.vector.accessor.DictWriter; +import org.apache.drill.exec.vector.accessor.ObjectType; +import org.apache.drill.exec.vector.accessor.ObjectWriter; +import org.apache.drill.exec.vector.accessor.ScalarWriter; +import org.apache.drill.exec.vector.accessor.ValueType; +import org.apache.drill.exec.vector.accessor.impl.HierarchicalFormatter; +import org.apache.drill.exec.vector.complex.DictVector; +import org.apache.drill.exec.vector.complex.RepeatedDictVector; + +import java.lang.reflect.Array; +import java.util.List; +import java.util.Map; + +/** + * The implementation represents the writer as an array writer + * with special dict entry writer as its element writer. + */ +public class ObjectDictWriter extends ObjectArrayWriter implements DictWriter { + + public static class DictObjectWriter extends AbstractArrayWriter.ArrayObjectWriter { + +public DictObjectWriter(ObjectDictWriter dictWriter) { + super(dictWriter); +} + +@Override +public DictWriter dict() { + return (DictWriter) arrayWriter; +} + +@Override +public void dump(HierarchicalFormatter format) { + format.startObject(this) + .attribute("dictWriter"); + arrayWriter.dump(format); + format.endObject(); +} + } + + public static ObjectDictWriter.DictObjectWriter buildDict(ColumnMetadata metadata, DictVector vector, + List keyValueWriters) { +DictEntryWriter.DictEntryObjectWriter entryObjectWriter = DictEntryWriter.buildDictEntryWriter(metadata, keyValueWriters); +ObjectDictWriter objectDictWriter = new ObjectDictWriter(metadata, vector.getOffsetVector(), entryObjectWriter); +return new ObjectDictWriter.DictObjectWriter(objectDictWriter); + } + + public static ArrayObjectWriter buildDictArray(RepeatedDictVector vector, ColumnMetadata metadata, + List keyValueWriters) { +final DictVector dataVector = (DictVector) vector.getDataVector(); +ObjectDictWriter.DictObjectWriter dictWriter = buildDict(metadata, dataVector, keyValueWriters); +AbstractArrayWriter arrayWriter = new ObjectArrayWriter(metadata, vector.getOffsetVector(), dictWriter); +return new ArrayObjectWriter(arrayWriter); + } + + private static final int FIELD_KEY_ORDINAL = 0; + private static final int FIELD_VALUE_ORDINAL = 1; + + private final DictEntryWriter.DictEntryObjectWriter entryObjectWriter; + + private ObjectDictWriter(ColumnMetadata schema, UInt4Vector offsetVector, + DictEntryWriter.DictEntryObjectWriter entryObjectWriter) { +super(schema, offsetVector, entryObjectWriter); +this.entryObjectWriter = entryObjectWriter; + } + + @Override + public ValueType keyType() { +return tuple().scalar(FIELD_KEY_ORDINAL).valueType(); + } + + @Override + public ObjectType valueType() { +return tuple().type(FIELD_VALUE_ORDINAL); + } + + @Override + public ScalarWriter keyWriter() { +return tuple().scalar(FIELD_KEY_ORDINAL); + } + + @Override + public ObjectWriter valueWriter() { +return tuple().column(FIELD_VALUE_ORDINAL); + } + + @Override + @SuppressWarnings("unchecked") + public void setObject(Object object) { +if (object instanceof Map) { + Map map = (Map) object; + for (Map.Entry entry : map.entrySet()) { +entryObjectWriter.setObject(0,
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978434#comment-16978434 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r348486117 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java ## @@ -0,0 +1,377 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor.reader; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.record.metadata.VariantMetadata; +import org.apache.drill.exec.vector.accessor.ArrayReader; +import org.apache.drill.exec.vector.accessor.DictReader; +import org.apache.drill.exec.vector.accessor.ObjectReader; +import org.apache.drill.exec.vector.accessor.ObjectType; +import org.apache.drill.exec.vector.accessor.ScalarReader; +import org.apache.drill.exec.vector.accessor.TupleReader; +import org.apache.drill.exec.vector.accessor.ValueType; +import org.apache.drill.exec.vector.accessor.VariantReader; +import org.joda.time.Instant; +import org.joda.time.LocalDate; +import org.joda.time.LocalTime; +import org.joda.time.Period; + +import java.math.BigDecimal; + +/** + * Dummy reader which returns {@code null} on access to getters for scalar types + * ({@link #getString()}, {@link #getBytes()} etc.) and itself on access to reader types + * ({@link #scalar()}, {@link #tuple()} etc.). Used when accessing {@link DictReader} by non-existent key. + * {@link #isNull()} returns {@code true}. + * + * Does not preserve metadata structure. Other methods throw {@link UnsupportedOperationException}. todo: + * + * @see DictReader + */ +public class NullReader implements ScalarReader, ArrayReader, TupleReader, VariantReader, ObjectReader, DictReader { + + private static final NullReader INSTANCE = new NullReader(); + + public static NullReader instance() { +return INSTANCE; + } + + @Override + public int size() { +notSupported("size()"); +return 0; + } + + @Override + public ObjectType entryType() { +notSupported("entryType()"); +return null; + } + + @Override + public ObjectReader entry() { +notSupported("entry()"); +return null; + } + + @Override + public ScalarReader scalar() { +return this; + } + + @Override + public TupleReader tuple() { +return this; + } + + @Override + public ArrayReader array() { +return this; + } + + @Override + public VariantReader variant() { +return this; + } + + @Override + public DictReader dict() { +return this; + } + + @Override + public void setPosn(int index) { +notSupported("setPosn(int)"); + } + + @Override + public void rewind() { +notSupported("rewind()"); + } + + @Override + public boolean next() { +return false; + } + + @Override + public ValueType valueType() { +notSupported("valueType()"); +return null; + } + + @Override + public ValueType extendedType() { +notSupported("extendedType()"); +return null; + } + + @Override + public int getInt() { +notSupported("getInt()"); +return 0; + } + + @Override + public boolean getBoolean() { +notSupported("getBoolean()"); +return false; + } + + @Override + public long getLong() { +notSupported("getLong()"); +return 0; + } + + @Override + public double getDouble() { +notSupported("getDouble()"); +return 0; + } + + @Override + public String getString() { +return null; + } + + @Override + public byte[] getBytes() { +return null; + } + + @Override + public BigDecimal getDecimal() { +return null; + } + + @Override + public Period getPeriod() { +return null; + } + + @Override + public LocalDate getDate() { +return null; + } + + @Override + public LocalTime getTime() { +return null; + } + + @Override + public Instant getTimestamp() { +
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978432#comment-16978432 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r348485350 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java ## @@ -0,0 +1,377 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor.reader; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.record.metadata.VariantMetadata; +import org.apache.drill.exec.vector.accessor.ArrayReader; +import org.apache.drill.exec.vector.accessor.DictReader; +import org.apache.drill.exec.vector.accessor.ObjectReader; +import org.apache.drill.exec.vector.accessor.ObjectType; +import org.apache.drill.exec.vector.accessor.ScalarReader; +import org.apache.drill.exec.vector.accessor.TupleReader; +import org.apache.drill.exec.vector.accessor.ValueType; +import org.apache.drill.exec.vector.accessor.VariantReader; +import org.joda.time.Instant; +import org.joda.time.LocalDate; +import org.joda.time.LocalTime; +import org.joda.time.Period; + +import java.math.BigDecimal; + +/** + * Dummy reader which returns {@code null} on access to getters for scalar types + * ({@link #getString()}, {@link #getBytes()} etc.) and itself on access to reader types + * ({@link #scalar()}, {@link #tuple()} etc.). Used when accessing {@link DictReader} by non-existent key. + * {@link #isNull()} returns {@code true}. + * + * Does not preserve metadata structure. Other methods throw {@link UnsupportedOperationException}. todo: + * + * @see DictReader + */ +public class NullReader implements ScalarReader, ArrayReader, TupleReader, VariantReader, ObjectReader, DictReader { + + private static final NullReader INSTANCE = new NullReader(); + + public static NullReader instance() { +return INSTANCE; + } + + @Override + public int size() { +notSupported("size()"); +return 0; + } + + @Override + public ObjectType entryType() { +notSupported("entryType()"); +return null; + } + + @Override + public ObjectReader entry() { +notSupported("entry()"); +return null; + } + + @Override + public ScalarReader scalar() { +return this; + } + + @Override + public TupleReader tuple() { +return this; + } + + @Override + public ArrayReader array() { +return this; + } + + @Override + public VariantReader variant() { +return this; + } + + @Override + public DictReader dict() { +return this; + } + + @Override + public void setPosn(int index) { +notSupported("setPosn(int)"); + } + + @Override + public void rewind() { +notSupported("rewind()"); + } + + @Override + public boolean next() { +return false; + } + + @Override + public ValueType valueType() { +notSupported("valueType()"); +return null; + } + + @Override + public ValueType extendedType() { +notSupported("extendedType()"); +return null; + } + + @Override + public int getInt() { +notSupported("getInt()"); +return 0; + } + + @Override + public boolean getBoolean() { +notSupported("getBoolean()"); +return false; + } + + @Override + public long getLong() { +notSupported("getLong()"); +return 0; + } + + @Override + public double getDouble() { +notSupported("getDouble()"); +return 0; + } + + @Override + public String getString() { +return null; + } + + @Override + public byte[] getBytes() { +return null; + } + + @Override + public BigDecimal getDecimal() { +return null; + } + + @Override + public Period getPeriod() { +return null; + } + + @Override + public LocalDate getDate() { +return null; + } + + @Override + public LocalTime getTime() { +return null; + } + + @Override + public Instant getTimestamp() { +
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978425#comment-16978425 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r348479761 ## File path: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestRowSet.java ## @@ -586,6 +595,490 @@ public void testRepeatedMapStructure() { RowSetUtilities.verify(expected, actual); } + @Test + public void testDictStructure() { +final String dictName = "d"; + +final TupleMetadata schema = new SchemaBuilder() +.add("id", MinorType.INT) +.addDict(dictName, MinorType.INT) +.value(MinorType.VARCHAR) // required int +.resumeSchema() +.buildSchema(); +final ExtendableRowSet rowSet = fixture.rowSet(schema); +final RowSetWriter writer = rowSet.writer(); + +// Dict +// Pick out components and lightly test. (Assumes structure +// tested earlier is still valid, so no need to exhaustively +// test again.) + +assertEquals(ObjectType.ARRAY, writer.column(dictName).type()); +assertTrue(writer.column(dictName).schema().isDict()); + +final ScalarWriter idWriter = writer.column(0).scalar(); +final DictWriter dictWriter = writer.column(1).dict(); + +assertEquals(ValueType.INTEGER, dictWriter.keyType()); +assertEquals(ObjectType.SCALAR, dictWriter.valueType()); + +final ScalarWriter keyWriter = dictWriter.keyWriter(); +final ScalarWriter valueWriter = dictWriter.valueWriter().scalar(); + +assertEquals(ValueType.INTEGER, keyWriter.valueType()); +assertEquals(ValueType.STRING, valueWriter.valueType()); + +// Write data +idWriter.setInt(1); + +keyWriter.setInt(11); +valueWriter.setString("a"); +dictWriter.save(); // Advance to next entry position +keyWriter.setInt(12); +valueWriter.setString("b"); +dictWriter.save(); +writer.save(); + +idWriter.setInt(2); + +keyWriter.setInt(21); +valueWriter.setString("c"); +dictWriter.save(); +writer.save(); + +idWriter.setInt(3); + +keyWriter.setInt(31); +valueWriter.setString("d"); +dictWriter.save(); +keyWriter.setInt(32); +valueWriter.setString("e"); +dictWriter.save(); +writer.save(); + +// Finish the row set and get a reader. + +final SingleRowSet actual = writer.done(); +final RowSetReader reader = actual.reader(); + +// Verify reader structure + +assertEquals(ObjectType.ARRAY, reader.column(dictName).type()); + +final DictReader dictReader = reader.dict(1); +assertEquals(ObjectType.ARRAY, dictReader.type()); + +assertEquals(ValueType.INTEGER, dictReader.keyColumnType()); +assertEquals(ObjectType.SCALAR, dictReader.valueColumnType()); + +// Row 1: get value reader with its position set to entry corresponding to a key + +assertTrue(reader.next()); +assertFalse(dictReader.isNull()); // dict itself is not null + +dictReader.getAsString(); +assertEquals("b", dictReader.getValueReader(12).scalar().getString()); +assertEquals("a", dictReader.getValueReader(11).scalar().getString()); + +// compare entire dict +Map map = map(11, "a", 12, "b"); +assertEquals(map, dictReader.getObject()); + +// Row 2: get Object representation of value directly + +assertTrue(reader.next()); +assertEquals("c", dictReader.get(21)); +assertTrue(dictReader.getValueReader(22).isNull()); // the dict does not contain an entry with the key + +map = map(21, "c"); +assertEquals(map, dictReader.getObject()); + +// Row 3 + +assertTrue(reader.next()); + +assertEquals("d", dictReader.get(31)); +assertEquals("e", dictReader.get(32)); + +map = map(31, "d", 32, "e"); +assertEquals(map, dictReader.getObject()); + +assertFalse(reader.next()); + +// Verify that the dict accessor's value count was set. + +final DictVector dictVector = (DictVector) actual.container().getValueVector(1).getValueVector(); +assertEquals(3, dictVector.getAccessor().getValueCount()); + +final SingleRowSet expected = fixture.rowSetBuilder(schema) +.addRow(1, objArray(11, "a", 12, "b")) +.addRow(2, objArray(21, "c")) +.addRow(3, objArray(31, "d", 32, "e")) +.build(); +RowSetUtilities.verify(expected, actual); + } + + /** + * Utility method to bootstrap a map object easily. + * + * @param entry key-value sequence + * @return map containing key-value pairs from passed sequence + */ + private Map map(Object... entry) { Review comment: I've tried, but there was an issue with entry ordering (used `HashMap`). Should work with `LinkedHashMap`, I'll try (oops, haven't thought of it back then).
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978407#comment-16978407 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r348472429 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/model/single/BaseReaderBuilder.java ## @@ -77,22 +81,48 @@ protected AbstractObjectReader buildVectorReader(ValueVector vector, VectorDescr final MajorType type = va.type(); switch(type.getMinorType()) { -case MAP: - return buildMap((AbstractMapVector) vector, va, type.getMode(), descrip); -case UNION: - return buildUnion((UnionVector) vector, va, descrip); -case LIST: - return buildList(vector, va, descrip); -case LATE: - - // Occurs for a list with no type: a list of nulls. - - return AbstractScalarReader.nullReader(descrip.metadata); -default: - return buildScalarReader(va, descrip.metadata); + case DICT: +return buildDict(vector, va, descrip); + case MAP: +return buildMap((AbstractMapVector) vector, va, type.getMode(), descrip); + case UNION: +return buildUnion((UnionVector) vector, va, descrip); + case LIST: +return buildList(vector, va, descrip); + case LATE: + +// Occurs for a list with no type: a list of nulls. + +return AbstractScalarReader.nullReader(descrip.metadata); + default: +return buildScalarReader(va, descrip.metadata); } } + private AbstractObjectReader buildDict(ValueVector vector, VectorAccessor va, VectorDescrip descrip) { + +boolean isArray = descrip.metadata.isArray(); + +DictVector dictVector; +VectorAccessor dictAccessor; +if (isArray) { Review comment: To clarify, this builds reader for `REPEATED DICT` and `DICT`. Its fields (`key` and `value`) are handled in `buildMapMembers(AbstractMapVector, MetadataProvider)` which was written for handling fields for `MAP` and `REPEATED MAP` and now for both cases of `DICT`. (Yeah, the method probably should've been renamed taking into account its usage for `DICT`). Is the above which concerns you or have I unterstood you incorrectly? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975891#comment-16975891 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347115215 ## File path: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestRowSet.java ## @@ -586,6 +595,490 @@ public void testRepeatedMapStructure() { RowSetUtilities.verify(expected, actual); } + @Test + public void testDictStructure() { +final String dictName = "d"; + +final TupleMetadata schema = new SchemaBuilder() +.add("id", MinorType.INT) +.addDict(dictName, MinorType.INT) +.value(MinorType.VARCHAR) // required int +.resumeSchema() +.buildSchema(); +final ExtendableRowSet rowSet = fixture.rowSet(schema); +final RowSetWriter writer = rowSet.writer(); + +// Dict +// Pick out components and lightly test. (Assumes structure +// tested earlier is still valid, so no need to exhaustively +// test again.) + +assertEquals(ObjectType.ARRAY, writer.column(dictName).type()); +assertTrue(writer.column(dictName).schema().isDict()); + +final ScalarWriter idWriter = writer.column(0).scalar(); +final DictWriter dictWriter = writer.column(1).dict(); + +assertEquals(ValueType.INTEGER, dictWriter.keyType()); +assertEquals(ObjectType.SCALAR, dictWriter.valueType()); + +final ScalarWriter keyWriter = dictWriter.keyWriter(); +final ScalarWriter valueWriter = dictWriter.valueWriter().scalar(); + +assertEquals(ValueType.INTEGER, keyWriter.valueType()); +assertEquals(ValueType.STRING, valueWriter.valueType()); + +// Write data +idWriter.setInt(1); + +keyWriter.setInt(11); +valueWriter.setString("a"); +dictWriter.save(); // Advance to next entry position +keyWriter.setInt(12); +valueWriter.setString("b"); +dictWriter.save(); +writer.save(); + +idWriter.setInt(2); + +keyWriter.setInt(21); +valueWriter.setString("c"); +dictWriter.save(); +writer.save(); + +idWriter.setInt(3); + +keyWriter.setInt(31); +valueWriter.setString("d"); +dictWriter.save(); +keyWriter.setInt(32); +valueWriter.setString("e"); +dictWriter.save(); +writer.save(); + +// Finish the row set and get a reader. + +final SingleRowSet actual = writer.done(); +final RowSetReader reader = actual.reader(); + +// Verify reader structure + +assertEquals(ObjectType.ARRAY, reader.column(dictName).type()); + +final DictReader dictReader = reader.dict(1); +assertEquals(ObjectType.ARRAY, dictReader.type()); + +assertEquals(ValueType.INTEGER, dictReader.keyColumnType()); +assertEquals(ObjectType.SCALAR, dictReader.valueColumnType()); + +// Row 1: get value reader with its position set to entry corresponding to a key + +assertTrue(reader.next()); +assertFalse(dictReader.isNull()); // dict itself is not null + +dictReader.getAsString(); +assertEquals("b", dictReader.getValueReader(12).scalar().getString()); +assertEquals("a", dictReader.getValueReader(11).scalar().getString()); + +// compare entire dict +Map map = map(11, "a", 12, "b"); +assertEquals(map, dictReader.getObject()); + +// Row 2: get Object representation of value directly + +assertTrue(reader.next()); +assertEquals("c", dictReader.get(21)); +assertTrue(dictReader.getValueReader(22).isNull()); // the dict does not contain an entry with the key + +map = map(21, "c"); +assertEquals(map, dictReader.getObject()); + +// Row 3 + +assertTrue(reader.next()); + +assertEquals("d", dictReader.get(31)); +assertEquals("e", dictReader.get(32)); + +map = map(31, "d", 32, "e"); +assertEquals(map, dictReader.getObject()); + +assertFalse(reader.next()); + +// Verify that the dict accessor's value count was set. + +final DictVector dictVector = (DictVector) actual.container().getValueVector(1).getValueVector(); +assertEquals(3, dictVector.getAccessor().getValueCount()); + +final SingleRowSet expected = fixture.rowSetBuilder(schema) +.addRow(1, objArray(11, "a", 12, "b")) +.addRow(2, objArray(21, "c")) +.addRow(3, objArray(31, "d", 32, "e")) +.build(); +RowSetUtilities.verify(expected, actual); + } + + /** + * Utility method to bootstrap a map object easily. + * + * @param entry key-value sequence + * @return map containing key-value pairs from passed sequence + */ + private Map map(Object... entry) { +if (entry.length % 2 == 1) { + throw new IllegalArgumentException("Array length should be even."); +} + +Map map = new HashMap<>(); +for (int i = 0; i < entry.length; i += 2) { +
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975883#comment-16975883 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347115016 ## File path: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestRowSet.java ## @@ -586,6 +595,490 @@ public void testRepeatedMapStructure() { RowSetUtilities.verify(expected, actual); } + @Test + public void testDictStructure() { +final String dictName = "d"; + +final TupleMetadata schema = new SchemaBuilder() +.add("id", MinorType.INT) +.addDict(dictName, MinorType.INT) +.value(MinorType.VARCHAR) // required int +.resumeSchema() Review comment: Nit: perhaps indent the above two lines to show that they form a nested structure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975896#comment-16975896 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347115462 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java ## @@ -0,0 +1,377 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor.reader; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.record.metadata.VariantMetadata; +import org.apache.drill.exec.vector.accessor.ArrayReader; +import org.apache.drill.exec.vector.accessor.DictReader; +import org.apache.drill.exec.vector.accessor.ObjectReader; +import org.apache.drill.exec.vector.accessor.ObjectType; +import org.apache.drill.exec.vector.accessor.ScalarReader; +import org.apache.drill.exec.vector.accessor.TupleReader; +import org.apache.drill.exec.vector.accessor.ValueType; +import org.apache.drill.exec.vector.accessor.VariantReader; +import org.joda.time.Instant; +import org.joda.time.LocalDate; +import org.joda.time.LocalTime; +import org.joda.time.Period; + +import java.math.BigDecimal; + +/** + * Dummy reader which returns {@code null} on access to getters for scalar types + * ({@link #getString()}, {@link #getBytes()} etc.) and itself on access to reader types + * ({@link #scalar()}, {@link #tuple()} etc.). Used when accessing {@link DictReader} by non-existent key. + * {@link #isNull()} returns {@code true}. + * + * Does not preserve metadata structure. Other methods throw {@link UnsupportedOperationException}. todo: + * + * @see DictReader + */ +public class NullReader implements ScalarReader, ArrayReader, TupleReader, VariantReader, ObjectReader, DictReader { + + private static final NullReader INSTANCE = new NullReader(); + + public static NullReader instance() { +return INSTANCE; + } + + @Override + public int size() { +notSupported("size()"); +return 0; + } + + @Override + public ObjectType entryType() { +notSupported("entryType()"); +return null; + } + + @Override + public ObjectReader entry() { +notSupported("entry()"); +return null; + } + + @Override + public ScalarReader scalar() { +return this; + } + + @Override + public TupleReader tuple() { +return this; + } + + @Override + public ArrayReader array() { +return this; + } + + @Override + public VariantReader variant() { +return this; + } + + @Override + public DictReader dict() { +return this; + } + + @Override + public void setPosn(int index) { +notSupported("setPosn(int)"); + } + + @Override + public void rewind() { +notSupported("rewind()"); + } + + @Override + public boolean next() { +return false; + } + + @Override + public ValueType valueType() { +notSupported("valueType()"); +return null; + } + + @Override + public ValueType extendedType() { +notSupported("extendedType()"); +return null; + } + + @Override + public int getInt() { +notSupported("getInt()"); +return 0; + } + + @Override + public boolean getBoolean() { +notSupported("getBoolean()"); +return false; + } + + @Override + public long getLong() { +notSupported("getLong()"); +return 0; + } + + @Override + public double getDouble() { +notSupported("getDouble()"); +return 0; + } + + @Override + public String getString() { +return null; + } + + @Override + public byte[] getBytes() { +return null; + } + + @Override + public BigDecimal getDecimal() { +return null; + } + + @Override + public Period getPeriod() { +return null; + } + + @Override + public LocalDate getDate() { +return null; + } + + @Override + public LocalTime getTime() { +return null; + } + + @Override + public Instant getTimestamp() { +
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975882#comment-16975882 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347114952 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/model/single/BuildVectorsFromMetadata.java ## @@ -69,7 +71,11 @@ public VectorContainer build(TupleMetadata schema) { private ValueVector buildVector(ColumnMetadata metadata) { switch (metadata.structureType()) { case TUPLE: - return buildMap(metadata); + if (!metadata.isDict()) { Review comment: This can be simplified a bit. A Dict should not be a tuple; we should define a new structure type to indicate that this is a DICT, not a Tuple that happens to act like a DICT. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975888#comment-16975888 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347115104 ## File path: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestRowSet.java ## @@ -586,6 +595,490 @@ public void testRepeatedMapStructure() { RowSetUtilities.verify(expected, actual); } + @Test + public void testDictStructure() { +final String dictName = "d"; + +final TupleMetadata schema = new SchemaBuilder() +.add("id", MinorType.INT) +.addDict(dictName, MinorType.INT) +.value(MinorType.VARCHAR) // required int +.resumeSchema() +.buildSchema(); +final ExtendableRowSet rowSet = fixture.rowSet(schema); +final RowSetWriter writer = rowSet.writer(); + +// Dict +// Pick out components and lightly test. (Assumes structure +// tested earlier is still valid, so no need to exhaustively +// test again.) + +assertEquals(ObjectType.ARRAY, writer.column(dictName).type()); +assertTrue(writer.column(dictName).schema().isDict()); + +final ScalarWriter idWriter = writer.column(0).scalar(); +final DictWriter dictWriter = writer.column(1).dict(); + +assertEquals(ValueType.INTEGER, dictWriter.keyType()); +assertEquals(ObjectType.SCALAR, dictWriter.valueType()); + +final ScalarWriter keyWriter = dictWriter.keyWriter(); +final ScalarWriter valueWriter = dictWriter.valueWriter().scalar(); + +assertEquals(ValueType.INTEGER, keyWriter.valueType()); +assertEquals(ValueType.STRING, valueWriter.valueType()); + +// Write data +idWriter.setInt(1); + +keyWriter.setInt(11); +valueWriter.setString("a"); +dictWriter.save(); // Advance to next entry position +keyWriter.setInt(12); +valueWriter.setString("b"); +dictWriter.save(); +writer.save(); + +idWriter.setInt(2); + +keyWriter.setInt(21); +valueWriter.setString("c"); +dictWriter.save(); +writer.save(); + +idWriter.setInt(3); + +keyWriter.setInt(31); +valueWriter.setString("d"); +dictWriter.save(); +keyWriter.setInt(32); +valueWriter.setString("e"); +dictWriter.save(); +writer.save(); + +// Finish the row set and get a reader. + +final SingleRowSet actual = writer.done(); +final RowSetReader reader = actual.reader(); + +// Verify reader structure + +assertEquals(ObjectType.ARRAY, reader.column(dictName).type()); + +final DictReader dictReader = reader.dict(1); +assertEquals(ObjectType.ARRAY, dictReader.type()); + +assertEquals(ValueType.INTEGER, dictReader.keyColumnType()); +assertEquals(ObjectType.SCALAR, dictReader.valueColumnType()); + +// Row 1: get value reader with its position set to entry corresponding to a key + +assertTrue(reader.next()); +assertFalse(dictReader.isNull()); // dict itself is not null + +dictReader.getAsString(); +assertEquals("b", dictReader.getValueReader(12).scalar().getString()); +assertEquals("a", dictReader.getValueReader(11).scalar().getString()); + +// compare entire dict +Map map = map(11, "a", 12, "b"); +assertEquals(map, dictReader.getObject()); + +// Row 2: get Object representation of value directly + +assertTrue(reader.next()); +assertEquals("c", dictReader.get(21)); +assertTrue(dictReader.getValueReader(22).isNull()); // the dict does not contain an entry with the key + +map = map(21, "c"); +assertEquals(map, dictReader.getObject()); + +// Row 3 + +assertTrue(reader.next()); + +assertEquals("d", dictReader.get(31)); +assertEquals("e", dictReader.get(32)); + +map = map(31, "d", 32, "e"); +assertEquals(map, dictReader.getObject()); + +assertFalse(reader.next()); + +// Verify that the dict accessor's value count was set. + +final DictVector dictVector = (DictVector) actual.container().getValueVector(1).getValueVector(); +assertEquals(3, dictVector.getAccessor().getValueCount()); + +final SingleRowSet expected = fixture.rowSetBuilder(schema) +.addRow(1, objArray(11, "a", 12, "b")) +.addRow(2, objArray(21, "c")) +.addRow(3, objArray(31, "d", 32, "e")) Review comment: Using `objArray` to create DICT values is not entirely intuitive. But, I could not think of any other one-liner way to do it, so I think you've found the simplest solution. Have not yet reviewed the `setObject()` code, but would be handy to also take a `Map` and split that into its key/value pairs in the Dict. Also, `getObject()` should return a `Map` if it doesn't already. This is an
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975893#comment-16975893 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347115621 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/DictEntryWriter.java ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor.writer; + +import java.util.List; +import java.util.Map; + +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.vector.accessor.ColumnWriterIndex; +import org.apache.drill.exec.vector.accessor.impl.HierarchicalFormatter; + +/** + * Writer for a Dict entry. The entry is a special kind of tuple. + */ +public class DictEntryWriter extends AbstractTupleWriter { + + public static class DictEntryObjectWriter extends TupleObjectWriter { + +public DictEntryObjectWriter(DictEntryWriter entryWriter) { + super(entryWriter); +} + +public void setObject(int col, Object value) { + tupleWriter.set(col, value); +} + +@Override +public void dump(HierarchicalFormatter format) { + format.startObject(this) + .attribute("dictEntryWriter"); + tupleWriter.dump(format); + format.endObject(); +} + } + + public static DictEntryObjectWriter buildDictEntryWriter(ColumnMetadata schema, + List keyValueWriters) { +assert keyValueWriters.size() == 2; +DictEntryWriter dictEntryWriter = new DictEntryWriter(schema, keyValueWriters); +return new DictEntryObjectWriter(dictEntryWriter); + } + + private final ColumnMetadata dictColumnSchema; + + public DictEntryWriter(ColumnMetadata schema, List writers) { +super(schema.tupleSchema(), writers); +dictColumnSchema = schema; + } + + @Override + public void bindIndex(ColumnWriterIndex index) { + +// Similarly to a repeated map, the provided index is an array element +// index. Convert this to an index that will not increment the element +// index on each write so that a dict with key and value members won't +// increment the index for each member. Rather, the index must be +// incremented at the array level. + +bindIndex(index, new MemberWriterIndex(index)); + } + + @Override + public boolean isProjected() { +return true; + } Review comment: Can't remember how this is supposed to work; it could be it ended up not be being used. But, this does suggest a missing test case: in the result set loader tests, test the case in which a reader can provide a DICT, but the client does not project that. That is, if a reader has a DICT `d`, along with other columns `a`, `b` and `c`, we want to make sure this works: ``` SELECT a, c FROM myTable ``` Then, we write to the Dict. In other cases, we create dummy writers which accept the unwanted values, but then just throw them away. That way, the (data) reader code need not special case whether the column is projected or not. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975895#comment-16975895 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347115742 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/ObjectDictWriter.java ## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor.writer; + +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.vector.UInt4Vector; +import org.apache.drill.exec.vector.accessor.DictWriter; +import org.apache.drill.exec.vector.accessor.ObjectType; +import org.apache.drill.exec.vector.accessor.ObjectWriter; +import org.apache.drill.exec.vector.accessor.ScalarWriter; +import org.apache.drill.exec.vector.accessor.ValueType; +import org.apache.drill.exec.vector.accessor.impl.HierarchicalFormatter; +import org.apache.drill.exec.vector.complex.DictVector; +import org.apache.drill.exec.vector.complex.RepeatedDictVector; + +import java.lang.reflect.Array; +import java.util.List; +import java.util.Map; + +/** + * The implementation represents the writer as an array writer + * with special dict entry writer as its element writer. + */ +public class ObjectDictWriter extends ObjectArrayWriter implements DictWriter { + + public static class DictObjectWriter extends AbstractArrayWriter.ArrayObjectWriter { + +public DictObjectWriter(ObjectDictWriter dictWriter) { + super(dictWriter); +} + +@Override +public DictWriter dict() { + return (DictWriter) arrayWriter; +} + +@Override +public void dump(HierarchicalFormatter format) { + format.startObject(this) + .attribute("dictWriter"); + arrayWriter.dump(format); + format.endObject(); +} + } + + public static ObjectDictWriter.DictObjectWriter buildDict(ColumnMetadata metadata, DictVector vector, + List keyValueWriters) { +DictEntryWriter.DictEntryObjectWriter entryObjectWriter = DictEntryWriter.buildDictEntryWriter(metadata, keyValueWriters); +ObjectDictWriter objectDictWriter = new ObjectDictWriter(metadata, vector.getOffsetVector(), entryObjectWriter); +return new ObjectDictWriter.DictObjectWriter(objectDictWriter); + } + + public static ArrayObjectWriter buildDictArray(RepeatedDictVector vector, ColumnMetadata metadata, + List keyValueWriters) { +final DictVector dataVector = (DictVector) vector.getDataVector(); +ObjectDictWriter.DictObjectWriter dictWriter = buildDict(metadata, dataVector, keyValueWriters); +AbstractArrayWriter arrayWriter = new ObjectArrayWriter(metadata, vector.getOffsetVector(), dictWriter); +return new ArrayObjectWriter(arrayWriter); + } + + private static final int FIELD_KEY_ORDINAL = 0; + private static final int FIELD_VALUE_ORDINAL = 1; + + private final DictEntryWriter.DictEntryObjectWriter entryObjectWriter; + + private ObjectDictWriter(ColumnMetadata schema, UInt4Vector offsetVector, + DictEntryWriter.DictEntryObjectWriter entryObjectWriter) { +super(schema, offsetVector, entryObjectWriter); +this.entryObjectWriter = entryObjectWriter; + } + + @Override + public ValueType keyType() { +return tuple().scalar(FIELD_KEY_ORDINAL).valueType(); + } + + @Override + public ObjectType valueType() { +return tuple().type(FIELD_VALUE_ORDINAL); + } + + @Override + public ScalarWriter keyWriter() { +return tuple().scalar(FIELD_KEY_ORDINAL); + } + + @Override + public ObjectWriter valueWriter() { +return tuple().column(FIELD_VALUE_ORDINAL); + } + + @Override + @SuppressWarnings("unchecked") + public void setObject(Object object) { +if (object instanceof Map) { + Map map = (Map) object; + for (Map.Entry entry : map.entrySet()) { +entryObjectWriter.setObject(0,
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975886#comment-16975886 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347115170 ## File path: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestRowSet.java ## @@ -586,6 +595,490 @@ public void testRepeatedMapStructure() { RowSetUtilities.verify(expected, actual); } + @Test + public void testDictStructure() { +final String dictName = "d"; + +final TupleMetadata schema = new SchemaBuilder() +.add("id", MinorType.INT) +.addDict(dictName, MinorType.INT) +.value(MinorType.VARCHAR) // required int +.resumeSchema() +.buildSchema(); +final ExtendableRowSet rowSet = fixture.rowSet(schema); +final RowSetWriter writer = rowSet.writer(); + +// Dict +// Pick out components and lightly test. (Assumes structure +// tested earlier is still valid, so no need to exhaustively +// test again.) + +assertEquals(ObjectType.ARRAY, writer.column(dictName).type()); +assertTrue(writer.column(dictName).schema().isDict()); + +final ScalarWriter idWriter = writer.column(0).scalar(); +final DictWriter dictWriter = writer.column(1).dict(); + +assertEquals(ValueType.INTEGER, dictWriter.keyType()); +assertEquals(ObjectType.SCALAR, dictWriter.valueType()); + +final ScalarWriter keyWriter = dictWriter.keyWriter(); +final ScalarWriter valueWriter = dictWriter.valueWriter().scalar(); + +assertEquals(ValueType.INTEGER, keyWriter.valueType()); +assertEquals(ValueType.STRING, valueWriter.valueType()); + +// Write data +idWriter.setInt(1); + +keyWriter.setInt(11); +valueWriter.setString("a"); +dictWriter.save(); // Advance to next entry position +keyWriter.setInt(12); +valueWriter.setString("b"); +dictWriter.save(); +writer.save(); + +idWriter.setInt(2); + +keyWriter.setInt(21); +valueWriter.setString("c"); +dictWriter.save(); +writer.save(); + +idWriter.setInt(3); + +keyWriter.setInt(31); +valueWriter.setString("d"); +dictWriter.save(); +keyWriter.setInt(32); +valueWriter.setString("e"); +dictWriter.save(); +writer.save(); + +// Finish the row set and get a reader. + +final SingleRowSet actual = writer.done(); +final RowSetReader reader = actual.reader(); + +// Verify reader structure + +assertEquals(ObjectType.ARRAY, reader.column(dictName).type()); + +final DictReader dictReader = reader.dict(1); +assertEquals(ObjectType.ARRAY, dictReader.type()); + +assertEquals(ValueType.INTEGER, dictReader.keyColumnType()); +assertEquals(ObjectType.SCALAR, dictReader.valueColumnType()); + +// Row 1: get value reader with its position set to entry corresponding to a key + +assertTrue(reader.next()); +assertFalse(dictReader.isNull()); // dict itself is not null + +dictReader.getAsString(); +assertEquals("b", dictReader.getValueReader(12).scalar().getString()); +assertEquals("a", dictReader.getValueReader(11).scalar().getString()); + +// compare entire dict +Map map = map(11, "a", 12, "b"); +assertEquals(map, dictReader.getObject()); + +// Row 2: get Object representation of value directly + +assertTrue(reader.next()); +assertEquals("c", dictReader.get(21)); +assertTrue(dictReader.getValueReader(22).isNull()); // the dict does not contain an entry with the key + +map = map(21, "c"); +assertEquals(map, dictReader.getObject()); + +// Row 3 + +assertTrue(reader.next()); + +assertEquals("d", dictReader.get(31)); +assertEquals("e", dictReader.get(32)); + +map = map(31, "d", 32, "e"); +assertEquals(map, dictReader.getObject()); + +assertFalse(reader.next()); + +// Verify that the dict accessor's value count was set. + +final DictVector dictVector = (DictVector) actual.container().getValueVector(1).getValueVector(); +assertEquals(3, dictVector.getAccessor().getValueCount()); + +final SingleRowSet expected = fixture.rowSetBuilder(schema) +.addRow(1, objArray(11, "a", 12, "b")) +.addRow(2, objArray(21, "c")) +.addRow(3, objArray(31, "d", 32, "e")) +.build(); +RowSetUtilities.verify(expected, actual); + } + + /** + * Utility method to bootstrap a map object easily. + * + * @param entry key-value sequence + * @return map containing key-value pairs from passed sequence + */ + private Map map(Object... entry) { +if (entry.length % 2 == 1) { + throw new IllegalArgumentException("Array length should be even."); +} + +Map map = new HashMap<>(); +for (int i = 0; i < entry.length; i += 2) { +
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975881#comment-16975881 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347114856 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/model/single/BaseReaderBuilder.java ## @@ -77,22 +81,48 @@ protected AbstractObjectReader buildVectorReader(ValueVector vector, VectorDescr final MajorType type = va.type(); switch(type.getMinorType()) { -case MAP: - return buildMap((AbstractMapVector) vector, va, type.getMode(), descrip); -case UNION: - return buildUnion((UnionVector) vector, va, descrip); -case LIST: - return buildList(vector, va, descrip); -case LATE: - - // Occurs for a list with no type: a list of nulls. - - return AbstractScalarReader.nullReader(descrip.metadata); -default: - return buildScalarReader(va, descrip.metadata); + case DICT: +return buildDict(vector, va, descrip); + case MAP: +return buildMap((AbstractMapVector) vector, va, type.getMode(), descrip); + case UNION: +return buildUnion((UnionVector) vector, va, descrip); + case LIST: +return buildList(vector, va, descrip); + case LATE: + +// Occurs for a list with no type: a list of nulls. + +return AbstractScalarReader.nullReader(descrip.metadata); + default: +return buildScalarReader(va, descrip.metadata); } } + private AbstractObjectReader buildDict(ValueVector vector, VectorAccessor va, VectorDescrip descrip) { + +boolean isArray = descrip.metadata.isArray(); + +DictVector dictVector; +VectorAccessor dictAccessor; +if (isArray) { Review comment: It *should* be possible to let the column builder recursively build the readers for each of the two component fields, handling the details of arrays, lists. repeated lists and so on. That this code works is probably due to insufficient tests: do we have tests where the value is, say, a union, or a list (of unions), etc.? This is a very tricky bit of the code, took me forever to work out how to properly build nested structures. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975880#comment-16975880 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347114932 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/model/single/BaseWriterBuilder.java ## @@ -75,19 +78,32 @@ protected BaseWriterBuilder(ColumnConversionFactory conversionFactory) { private AbstractObjectWriter buildVectorWriter(ValueVector vector, VectorDescrip descrip) { final MajorType type = vector.getField().getType(); switch (type.getMinorType()) { -case MAP: - return MapWriter.buildMapWriter(descrip.metadata, - (AbstractMapVector) vector, - buildMap((AbstractMapVector) vector, descrip)); + case DICT: +return buildDict(vector, descrip); + case MAP: +return MapWriter.buildMapWriter(descrip.metadata, +(AbstractMapVector) vector, +buildMap((AbstractMapVector) vector, descrip)); -case UNION: - return buildUnion((UnionVector) vector, descrip); + case UNION: +return buildUnion((UnionVector) vector, descrip); -case LIST: - return buildList(vector, descrip); + case LIST: +return buildList(vector, descrip); -default: - return ColumnWriterFactory.buildColumnWriter(descrip.metadata, conversionFactory, vector); + default: +return ColumnWriterFactory.buildColumnWriter(descrip.metadata, conversionFactory, vector); +} + } + + private AbstractObjectWriter buildDict(ValueVector vector, VectorDescrip descrip) { +if (vector.getField().getType().getMode() == DataMode.REPEATED) { Review comment: Again, it *should* be possible to simply call `buildVectorWriter` and handle the key and value vectors recursively so you don't have to repeat the type checks in this method. If the recursive construction does not work, it is likely because of some details that we can figure out. (As noted above, this code is tricky because of multiple constraints imposed by the `MaterializedField`, `VaueVector` and writer classes.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975894#comment-16975894 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347115550 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java ## @@ -0,0 +1,377 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor.reader; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.record.metadata.VariantMetadata; +import org.apache.drill.exec.vector.accessor.ArrayReader; +import org.apache.drill.exec.vector.accessor.DictReader; +import org.apache.drill.exec.vector.accessor.ObjectReader; +import org.apache.drill.exec.vector.accessor.ObjectType; +import org.apache.drill.exec.vector.accessor.ScalarReader; +import org.apache.drill.exec.vector.accessor.TupleReader; +import org.apache.drill.exec.vector.accessor.ValueType; +import org.apache.drill.exec.vector.accessor.VariantReader; +import org.joda.time.Instant; +import org.joda.time.LocalDate; +import org.joda.time.LocalTime; +import org.joda.time.Period; + +import java.math.BigDecimal; + +/** + * Dummy reader which returns {@code null} on access to getters for scalar types + * ({@link #getString()}, {@link #getBytes()} etc.) and itself on access to reader types + * ({@link #scalar()}, {@link #tuple()} etc.). Used when accessing {@link DictReader} by non-existent key. + * {@link #isNull()} returns {@code true}. + * + * Does not preserve metadata structure. Other methods throw {@link UnsupportedOperationException}. todo: + * + * @see DictReader + */ +public class NullReader implements ScalarReader, ArrayReader, TupleReader, VariantReader, ObjectReader, DictReader { + + private static final NullReader INSTANCE = new NullReader(); + + public static NullReader instance() { +return INSTANCE; + } + + @Override + public int size() { +notSupported("size()"); +return 0; + } + + @Override + public ObjectType entryType() { +notSupported("entryType()"); +return null; + } + + @Override + public ObjectReader entry() { +notSupported("entry()"); +return null; + } + + @Override + public ScalarReader scalar() { +return this; + } + + @Override + public TupleReader tuple() { +return this; + } + + @Override + public ArrayReader array() { +return this; + } + + @Override + public VariantReader variant() { +return this; + } + + @Override + public DictReader dict() { +return this; + } + + @Override + public void setPosn(int index) { +notSupported("setPosn(int)"); + } + + @Override + public void rewind() { +notSupported("rewind()"); + } Review comment: This, and the above, are likely to be a problem. The readers recursively call down the reader tree on operations such as set position or rewind. If this reader appears in the tree, the whole operation will fail, which is not quite what we want. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975892#comment-16975892 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347115224 ## File path: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestSchemaBuilder.java ## @@ -280,6 +280,40 @@ public void testRepeatedListInRow() { assertEquals(DataMode.REPEATED, child.mode()); } + /** + * Tests creating a dict within a row. + * Also the basic dict add key and value columns methods. + */ + @Test + public void testDictInRow() { +TupleMetadata schema = new SchemaBuilder() +.addDict("d", MinorType.VARCHAR) +.nullableValue(MinorType.FLOAT8) +.resumeSchema() Review comment: Nit: Indent value structure This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975889#comment-16975889 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347115511 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java ## @@ -0,0 +1,369 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor.reader; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.record.metadata.VariantMetadata; +import org.apache.drill.exec.vector.accessor.ArrayReader; +import org.apache.drill.exec.vector.accessor.DictReader; +import org.apache.drill.exec.vector.accessor.ObjectReader; +import org.apache.drill.exec.vector.accessor.ObjectType; +import org.apache.drill.exec.vector.accessor.ScalarReader; +import org.apache.drill.exec.vector.accessor.TupleReader; +import org.apache.drill.exec.vector.accessor.ValueType; +import org.apache.drill.exec.vector.accessor.VariantReader; +import org.joda.time.Instant; +import org.joda.time.LocalDate; +import org.joda.time.LocalTime; +import org.joda.time.Period; + +import java.math.BigDecimal; + +/** + * Dummy reader which returns {@code null} for scalar types and itself for complex types. + */ +public class NullReader implements ScalarReader, ArrayReader, TupleReader, VariantReader, ObjectReader, DictReader { Review comment: See comments below. I think this is just a bit confusing; it is not clear when we'd use it and how clients would use it. As explained below, I *think* that if you need a reader for a missing element, you can use the existing scalar `NullReader`: just define a missing reader as a reader of scalar nulls. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975887#comment-16975887 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347115126 ## File path: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestRowSet.java ## @@ -586,6 +595,490 @@ public void testRepeatedMapStructure() { RowSetUtilities.verify(expected, actual); } + @Test + public void testDictStructure() { +final String dictName = "d"; + +final TupleMetadata schema = new SchemaBuilder() +.add("id", MinorType.INT) +.addDict(dictName, MinorType.INT) +.value(MinorType.VARCHAR) // required int +.resumeSchema() +.buildSchema(); +final ExtendableRowSet rowSet = fixture.rowSet(schema); +final RowSetWriter writer = rowSet.writer(); + +// Dict +// Pick out components and lightly test. (Assumes structure +// tested earlier is still valid, so no need to exhaustively +// test again.) + +assertEquals(ObjectType.ARRAY, writer.column(dictName).type()); +assertTrue(writer.column(dictName).schema().isDict()); + +final ScalarWriter idWriter = writer.column(0).scalar(); +final DictWriter dictWriter = writer.column(1).dict(); + +assertEquals(ValueType.INTEGER, dictWriter.keyType()); +assertEquals(ObjectType.SCALAR, dictWriter.valueType()); + +final ScalarWriter keyWriter = dictWriter.keyWriter(); +final ScalarWriter valueWriter = dictWriter.valueWriter().scalar(); + +assertEquals(ValueType.INTEGER, keyWriter.valueType()); +assertEquals(ValueType.STRING, valueWriter.valueType()); + +// Write data +idWriter.setInt(1); + +keyWriter.setInt(11); +valueWriter.setString("a"); +dictWriter.save(); // Advance to next entry position +keyWriter.setInt(12); +valueWriter.setString("b"); +dictWriter.save(); +writer.save(); + +idWriter.setInt(2); + +keyWriter.setInt(21); +valueWriter.setString("c"); +dictWriter.save(); +writer.save(); + +idWriter.setInt(3); + +keyWriter.setInt(31); +valueWriter.setString("d"); +dictWriter.save(); +keyWriter.setInt(32); +valueWriter.setString("e"); +dictWriter.save(); +writer.save(); + +// Finish the row set and get a reader. + +final SingleRowSet actual = writer.done(); +final RowSetReader reader = actual.reader(); + +// Verify reader structure + +assertEquals(ObjectType.ARRAY, reader.column(dictName).type()); + +final DictReader dictReader = reader.dict(1); +assertEquals(ObjectType.ARRAY, dictReader.type()); + +assertEquals(ValueType.INTEGER, dictReader.keyColumnType()); +assertEquals(ObjectType.SCALAR, dictReader.valueColumnType()); + +// Row 1: get value reader with its position set to entry corresponding to a key + +assertTrue(reader.next()); +assertFalse(dictReader.isNull()); // dict itself is not null + +dictReader.getAsString(); +assertEquals("b", dictReader.getValueReader(12).scalar().getString()); +assertEquals("a", dictReader.getValueReader(11).scalar().getString()); + +// compare entire dict +Map map = map(11, "a", 12, "b"); +assertEquals(map, dictReader.getObject()); + +// Row 2: get Object representation of value directly + +assertTrue(reader.next()); +assertEquals("c", dictReader.get(21)); +assertTrue(dictReader.getValueReader(22).isNull()); // the dict does not contain an entry with the key + +map = map(21, "c"); +assertEquals(map, dictReader.getObject()); + +// Row 3 + +assertTrue(reader.next()); + +assertEquals("d", dictReader.get(31)); +assertEquals("e", dictReader.get(32)); + +map = map(31, "d", 32, "e"); +assertEquals(map, dictReader.getObject()); + +assertFalse(reader.next()); + +// Verify that the dict accessor's value count was set. + +final DictVector dictVector = (DictVector) actual.container().getValueVector(1).getValueVector(); +assertEquals(3, dictVector.getAccessor().getValueCount()); + +final SingleRowSet expected = fixture.rowSetBuilder(schema) +.addRow(1, objArray(11, "a", 12, "b")) +.addRow(2, objArray(21, "c")) +.addRow(3, objArray(31, "d", 32, "e")) +.build(); +RowSetUtilities.verify(expected, actual); + } + + /** + * Utility method to bootstrap a map object easily. + * + * @param entry key-value sequence + * @return map containing key-value pairs from passed sequence + */ + private Map map(Object... entry) { Review comment: Maybe move this into `RowSetUtilities` so we can use it elsewhere. I wonder, why wasn't this used in the test above where we used `objArray`?
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975890#comment-16975890 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347115442 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java ## @@ -0,0 +1,377 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor.reader; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.record.metadata.VariantMetadata; +import org.apache.drill.exec.vector.accessor.ArrayReader; +import org.apache.drill.exec.vector.accessor.DictReader; +import org.apache.drill.exec.vector.accessor.ObjectReader; +import org.apache.drill.exec.vector.accessor.ObjectType; +import org.apache.drill.exec.vector.accessor.ScalarReader; +import org.apache.drill.exec.vector.accessor.TupleReader; +import org.apache.drill.exec.vector.accessor.ValueType; +import org.apache.drill.exec.vector.accessor.VariantReader; +import org.joda.time.Instant; +import org.joda.time.LocalDate; +import org.joda.time.LocalTime; +import org.joda.time.Period; + +import java.math.BigDecimal; + +/** + * Dummy reader which returns {@code null} on access to getters for scalar types + * ({@link #getString()}, {@link #getBytes()} etc.) and itself on access to reader types + * ({@link #scalar()}, {@link #tuple()} etc.). Used when accessing {@link DictReader} by non-existent key. + * {@link #isNull()} returns {@code true}. + * + * Does not preserve metadata structure. Other methods throw {@link UnsupportedOperationException}. todo: + * + * @see DictReader + */ +public class NullReader implements ScalarReader, ArrayReader, TupleReader, VariantReader, ObjectReader, DictReader { + + private static final NullReader INSTANCE = new NullReader(); + + public static NullReader instance() { +return INSTANCE; + } + + @Override + public int size() { +notSupported("size()"); +return 0; + } + + @Override + public ObjectType entryType() { +notSupported("entryType()"); +return null; + } + + @Override + public ObjectReader entry() { +notSupported("entry()"); +return null; + } + + @Override + public ScalarReader scalar() { +return this; + } + + @Override + public TupleReader tuple() { +return this; + } + + @Override + public ArrayReader array() { +return this; + } + + @Override + public VariantReader variant() { +return this; + } + + @Override + public DictReader dict() { +return this; + } + + @Override + public void setPosn(int index) { +notSupported("setPosn(int)"); + } + + @Override + public void rewind() { +notSupported("rewind()"); + } + + @Override + public boolean next() { +return false; + } + + @Override + public ValueType valueType() { +notSupported("valueType()"); +return null; + } + + @Override + public ValueType extendedType() { +notSupported("extendedType()"); +return null; + } + + @Override + public int getInt() { +notSupported("getInt()"); +return 0; + } + + @Override + public boolean getBoolean() { +notSupported("getBoolean()"); +return false; + } + + @Override + public long getLong() { +notSupported("getLong()"); +return 0; + } + + @Override + public double getDouble() { +notSupported("getDouble()"); +return 0; + } + + @Override + public String getString() { +return null; + } + + @Override + public byte[] getBytes() { +return null; + } + + @Override + public BigDecimal getDecimal() { +return null; + } + + @Override + public Period getPeriod() { +return null; + } + + @Override + public LocalDate getDate() { +return null; + } + + @Override + public LocalTime getTime() { +return null; + } + + @Override + public Instant getTimestamp() { +
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975885#comment-16975885 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347115387 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java ## @@ -0,0 +1,377 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor.reader; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.record.metadata.VariantMetadata; +import org.apache.drill.exec.vector.accessor.ArrayReader; +import org.apache.drill.exec.vector.accessor.DictReader; +import org.apache.drill.exec.vector.accessor.ObjectReader; +import org.apache.drill.exec.vector.accessor.ObjectType; +import org.apache.drill.exec.vector.accessor.ScalarReader; +import org.apache.drill.exec.vector.accessor.TupleReader; +import org.apache.drill.exec.vector.accessor.ValueType; +import org.apache.drill.exec.vector.accessor.VariantReader; +import org.joda.time.Instant; +import org.joda.time.LocalDate; +import org.joda.time.LocalTime; +import org.joda.time.Period; + +import java.math.BigDecimal; + +/** + * Dummy reader which returns {@code null} on access to getters for scalar types + * ({@link #getString()}, {@link #getBytes()} etc.) and itself on access to reader types + * ({@link #scalar()}, {@link #tuple()} etc.). Used when accessing {@link DictReader} by non-existent key. + * {@link #isNull()} returns {@code true}. + * + * Does not preserve metadata structure. Other methods throw {@link UnsupportedOperationException}. todo: + * + * @see DictReader + */ +public class NullReader implements ScalarReader, ArrayReader, TupleReader, VariantReader, ObjectReader, DictReader { + + private static final NullReader INSTANCE = new NullReader(); + + public static NullReader instance() { +return INSTANCE; + } + + @Override + public int size() { +notSupported("size()"); +return 0; + } + + @Override + public ObjectType entryType() { +notSupported("entryType()"); +return null; + } + + @Override + public ObjectReader entry() { +notSupported("entry()"); +return null; + } + + @Override + public ScalarReader scalar() { +return this; + } + + @Override + public TupleReader tuple() { +return this; + } + + @Override + public ArrayReader array() { +return this; + } + + @Override + public VariantReader variant() { +return this; + } + + @Override + public DictReader dict() { +return this; + } + + @Override + public void setPosn(int index) { +notSupported("setPosn(int)"); + } + + @Override + public void rewind() { +notSupported("rewind()"); + } + + @Override + public boolean next() { +return false; + } + + @Override + public ValueType valueType() { +notSupported("valueType()"); +return null; + } + + @Override + public ValueType extendedType() { +notSupported("extendedType()"); +return null; + } + + @Override + public int getInt() { +notSupported("getInt()"); +return 0; + } + + @Override + public boolean getBoolean() { +notSupported("getBoolean()"); +return false; + } + + @Override + public long getLong() { +notSupported("getLong()"); +return 0; + } + + @Override + public double getDouble() { +notSupported("getDouble()"); +return 0; + } + + @Override + public String getString() { +return null; + } + + @Override + public byte[] getBytes() { +return null; + } + + @Override + public BigDecimal getDecimal() { +return null; + } + + @Override + public Period getPeriod() { +return null; + } + + @Override + public LocalDate getDate() { +return null; + } + + @Override + public LocalTime getTime() { +return null; + } + + @Override + public Instant getTimestamp() { +
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975884#comment-16975884 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r347115045 ## File path: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestRowSet.java ## @@ -586,6 +595,490 @@ public void testRepeatedMapStructure() { RowSetUtilities.verify(expected, actual); } + @Test + public void testDictStructure() { +final String dictName = "d"; + +final TupleMetadata schema = new SchemaBuilder() +.add("id", MinorType.INT) +.addDict(dictName, MinorType.INT) +.value(MinorType.VARCHAR) // required int +.resumeSchema() +.buildSchema(); +final ExtendableRowSet rowSet = fixture.rowSet(schema); +final RowSetWriter writer = rowSet.writer(); + +// Dict +// Pick out components and lightly test. (Assumes structure +// tested earlier is still valid, so no need to exhaustively +// test again.) + +assertEquals(ObjectType.ARRAY, writer.column(dictName).type()); +assertTrue(writer.column(dictName).schema().isDict()); + +final ScalarWriter idWriter = writer.column(0).scalar(); +final DictWriter dictWriter = writer.column(1).dict(); + +assertEquals(ValueType.INTEGER, dictWriter.keyType()); +assertEquals(ObjectType.SCALAR, dictWriter.valueType()); + +final ScalarWriter keyWriter = dictWriter.keyWriter(); +final ScalarWriter valueWriter = dictWriter.valueWriter().scalar(); + +assertEquals(ValueType.INTEGER, keyWriter.valueType()); +assertEquals(ValueType.STRING, valueWriter.valueType()); + +// Write data +idWriter.setInt(1); + +keyWriter.setInt(11); +valueWriter.setString("a"); +dictWriter.save(); // Advance to next entry position +keyWriter.setInt(12); +valueWriter.setString("b"); +dictWriter.save(); Review comment: I like how writing to the DICT works: it is about as simple as you can make it, yet it handles the multiple value types. Nicely done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975288#comment-16975288 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-554465413 @paul-rogers I've updated the PR, please take a look again. Thanks for your insight! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974397#comment-16974397 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r346409969 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { Review comment: Yes, `Dict` is defined as `DICT` to represent mapping (like in Java's Map) from `String` to `int`. `Dict` does not try to be a `Union` for the value type. If one wants the `Dict` to have `Union` value, one must define it as `DICT>` (the `key` type is going to be `STRING` and the `value` is going to be a union of some types; like union with all problems it has in Drill :) ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974395#comment-16974395 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r346407038 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { + + private final SchemaContainer parent; + private final TupleBuilder tupleBuilder = new TupleBuilder(); + private final String memberName; + private final TypeProtos.DataMode mode; + + public DictBuilder(String memberName, TypeProtos.DataMode mode) { +this(null, memberName, mode); + } + + public DictBuilder(SchemaContainer parent, String memberName, TypeProtos.DataMode mode) { +this.parent = parent; +this.memberName = memberName; +this.mode = mode; + } + + @Override + public void addColumn(ColumnMetadata column) { Review comment: I suppose, this will do for now (at least I see it OK). Changed the validation to check if the added field is `value` as `key` can't be complex and the method does not make sense for the case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974393#comment-16974393 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r346405704 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/DictWriter.java ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor; + +/** + * Dict writer is represented as an array of Maps (Array with Tuple entry). Review comment: I wanted to say the dict writer is an array of dict `entries` (you were talking about below), just used the `map` somewhat in another context, like some type of tuple. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974390#comment-16974390 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r346403272 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/DictWriter.java ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor; + +/** + * Dict writer is represented as an array of Maps (Array with Tuple entry). + */ +public interface DictWriter extends ArrayWriter { + ValueType keyType(); + ObjectType valueType(); + ScalarWriter keyWriter(); + ObjectWriter valueWriter(); Review comment: My initial intention was to make it something like you proposed in your second option, though was done so implicitly. Made it the way you described with the exception that `DictWriter` actually encapsulates '`DictEntry`' (no need to perform `DictWriter dw = aw.dict()` rather use `DictWriter` itself to write the pairs; `aw` in this case is also `DictWriter`). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974387#comment-16974387 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r346400903 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java ## @@ -0,0 +1,369 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor.reader; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.record.metadata.VariantMetadata; +import org.apache.drill.exec.vector.accessor.ArrayReader; +import org.apache.drill.exec.vector.accessor.DictReader; +import org.apache.drill.exec.vector.accessor.ObjectReader; +import org.apache.drill.exec.vector.accessor.ObjectType; +import org.apache.drill.exec.vector.accessor.ScalarReader; +import org.apache.drill.exec.vector.accessor.TupleReader; +import org.apache.drill.exec.vector.accessor.ValueType; +import org.apache.drill.exec.vector.accessor.VariantReader; +import org.joda.time.Instant; +import org.joda.time.LocalDate; +import org.joda.time.LocalTime; +import org.joda.time.Period; + +import java.math.BigDecimal; + +/** + * Dummy reader which returns {@code null} for scalar types and itself for complex types. + */ +public class NullReader implements ScalarReader, ArrayReader, TupleReader, VariantReader, ObjectReader, DictReader { Review comment: It does not follow the actual metadata structure, I suppose general case will do. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974385#comment-16974385 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r346400409 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/MetadataUtils.java ## @@ -205,4 +205,10 @@ private static ColumnMetadata newDecimal(String name, MinorType type, DataMode m return new PrimitiveColumnMetadata(field); } + private static boolean isScalar(MinorType type) { +return type != MinorType.MAP +&& type != MinorType.UNION +&& type != MinorType.LIST +&& type != MinorType.DICT; + } Review comment: Yeah, much negation... Changed the body of the method - didn't want to introduce the new `isComplex()`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974382#comment-16974382 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r346399315 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/MapBuilder.java ## @@ -144,6 +144,14 @@ public RepeatedListBuilder addRepeatedList(String name) { return tupleBuilder.addRepeatedList(this, name); } + public DictBuilder addDict(String name) { +return tupleBuilder.addDict(this, name); + } Review comment: Good point. Yes, `value` can be nullable (added explicit comment to `DictVector`). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972137#comment-16972137 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-552769421 The schema builder issue has been nagging at me. I observed above that the current way of handling nested types is getting too complex, with all the special cases and "resumeFoo" methods. The current design was originally designed to build a `BatchSchema` with `MaterializedField`s,. Those classes require a top-down construction. (That is also why the final `build()` method returns a `BatchSchema` and why we must call `buildSchema()` to get a `TupleMetadata`.) It turns out that the `SchemaMetadata` classes *do not* have the same constraints. They can actually be built bottom-up. It never occurred to me to clean up the old-style of building when we repurposed `SchemaBuilder` to build metadata objects. So, since we can change the structure, we could borrow a technique from, I think, certain JSON builders. Rather than having the schema builder create the nested type builder, ask the user to do so explicitly: ``` TupleMetadata schema = new SchemaBuilder() .addMap(new MapBuilder("myMap") .add("name", MinorType.VARCHAR) .addMap(new MapBuilder("nested") .add("id", MinorType.INT) .build()) .build()) .add("last", MinorType.INT) .build(); ``` Adopting this model has additional benefits. It is currently a bit awkward to create a bare `ColumnMetadata`. We could create helper methods to create each kind of column schema. Something like: ``` foo = SchemaBuilder.column("foo", MinorType.INT, DataMode.REQUIRED); bar = SchemaBuilder.dict("bar", MinorType.VARCHAR) .value(SchemaBuilder.map(...)) .build(); ``` Lots of things we can play with. Given all of this, my advice for the `DICT` type is to figure out something that works with the structure we already have. Then, as a separate PR, we can clean up the `SchemaBuilder` to make it easier to use for the many complex types we now support. That way, you don't have to solve the schema builder problem in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972136#comment-16972136 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-552769421 The schema builder issue has been nagging at me. I observed above that the current way of handling nested types is getting too complex, with all the special cases and "resumeFoo" methods. The current design was originally designed to build a `BatchSchema` with `MaterializedField`s,. Those classes require a top-down construction. (That is also why the final `build()` method returns a `BatchSchema` and why we must call `buildSchema()` to get a `TupleMetadata`.) It turns out that the `SchemaMetadata` classes *do not* have the same constraints. They can actually be built bottom-up. It never occurred to me to clean up the old-style of building when we repurposed `SchemaBuilder` to build metadata objects. So, since we can change the structure, we could borrow a technique from, I think, certain JSON builders. Rather than having the schema builder create the nested type builder, ask the user to do so explicitly: ``` TupleMetadata schema = new SchemaBuilder() .addMap(new MapBuilder("myMap") .add("name", MinorType.VARCHAR) .addMap(new MapBuilder("nested") .add("id", MinorType.INT) .build()) .build()) .add("last", MinorType.INT) .build(); ``` Adopting this model as additional benefits. It is currently a bit awkward to create a bare `ColumnMetadata`. We could create helper methods to create each kind of column schema. Something like: ``` foo = SchemaBuilder.column("foo", MinorType.INT, DataMode.REQUIRED); bar = SchemaBuilder.dict("bar", MinorType.VARCHAR) .value(SchemaBuilder.map(...)) .build(); ``` Lots of things we can play with. Given all of this, my advice for the `DICT` type is to figure out something that works with the structure we already have. Then, as a separate PR, we can clean up the `SchemaBuilder` to make it easier to use for the many complex types we now support. That way, you don't have to solve the schema builder problem in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966333#comment-16966333 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-549198995 @KazydubB, thanks much for the changes. This is a complex area. It looks like you are getting a good understanding. My focus on this review is to help you define the simplest implementation for the schema builder. Maybe we can get that part finished up. Then, I made a suggestion about how to structure the Dict writer. The new design here is much improved. Still, I think we can improve it a bit more. Everything said for the writer applies to the reader, so I didn't repeat the comments there. Once we get the API correct, then I'll dive into the implementation details. That should be a quick review if the we agree on the API, and if you have complete unit tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966328#comment-16966328 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r341884369 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/MetadataUtils.java ## @@ -205,4 +205,10 @@ private static ColumnMetadata newDecimal(String name, MinorType type, DataMode m return new PrimitiveColumnMetadata(field); } + private static boolean isScalar(MinorType type) { +return type != MinorType.MAP +&& type != MinorType.UNION +&& type != MinorType.LIST +&& type != MinorType.DICT; + } Review comment: Maybe change this to `isComplex`, and define `isScalar()` as `! isComplex()`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966330#comment-16966330 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r341884303 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { + + private final SchemaContainer parent; + private final TupleBuilder tupleBuilder = new TupleBuilder(); + private final String memberName; + private final TypeProtos.DataMode mode; + + public DictBuilder(String memberName, TypeProtos.DataMode mode) { +this(null, memberName, mode); + } + + public DictBuilder(SchemaContainer parent, String memberName, TypeProtos.DataMode mode) { +this.parent = parent; +this.memberName = memberName; +this.mode = mode; + } + + @Override + public void addColumn(ColumnMetadata column) { Review comment: As a side comment, I will take the blame for the mess we are creating in the schema builder. When I first wrote it, I was mostly concerned with the top-level row and with maps. Then came repeated lists, unions and now Dict. Is there a cleaner way to handle the nesting? Something clever with parameterized types to avoid all the 'resumeFoo' methods? Feel free to be creative here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966329#comment-16966329 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r341884201 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { Review comment: Follow-up question. For the value, is a Dict defined as, say, `DICT` so that all values are of the same type? If so, that is the easy case. Or, can I define a Dict as `DICT>`, meaning that key values can be any one of the three types? Note that, in Drill, we use the (barely-working) Union vector to handle this situation. Or, is the Dict trying, itself, to be a Union for the value type? Just read the (very nice) comments in `DictVector`, so it looks like, if I want my Dict to hold three types, I make it a `DICT`. Correct? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966324#comment-16966324 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r341882388 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/DictWriter.java ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor; + +/** + * Dict writer is represented as an array of Maps (Array with Tuple entry). + */ +public interface DictWriter extends ArrayWriter { + ValueType keyType(); + ObjectType valueType(); + ScalarWriter keyWriter(); + ObjectWriter valueWriter(); Review comment: The above is getting close, but it does introduce an ambiguity. The array writer already has the following members: ``` ObjectType entryType(); ObjectWriter entry(); ``` Along with some other helpers. In the `DictWriter` case, it is not clear that I could use this object a `ArrayWriter`. Which of above does the entry correspond to? Might it be better to have `DictWriter` inherit directly from `ColumnWriter` so you can define clean semantics? If it is worth having common functionality (only `save()` would be common), we could define a `SequenceWriter` which is the base of `ArrayWriter` and `DictWriter`. Or, you could change `ArrayWriter` to add a new element writer; a `DictEntry` which would provide the methods above. That is: ``` ArrayWriter aw = reader.get('myDict').array(); DictWriter dw = aw.dict(); for (int i = 0; i < kvCount; i++) { dw.keyWriter().setString("key + " i); dw.valueWriter().union().setInt(i); aw.save(); } ``` That is, a Dict is an array. The entries in the array are key/value pairs. The key is a scalar, the value is of some type (above I suggested a Union -- with all the problems that Unions bring.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966326#comment-16966326 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r341884645 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/MapBuilder.java ## @@ -144,6 +144,14 @@ public RepeatedListBuilder addRepeatedList(String name) { return tupleBuilder.addRepeatedList(this, name); } + public DictBuilder addDict(String name) { +return tupleBuilder.addDict(this, name); + } Review comment: Maybe `addDict(String name, MinorType keyType)`. Works because the key must be a scalar and is required. Then the `DictBuilder` simply builds the value type: ``` scalarValue(MinorType type); MapBuilder mapValue(); MapBuilder mapArrayValue(); UnionBuilder unionValue(); Foo resumeFoo(); ``` Can the value be nullable? (The comment in `DictVector` doesn't say yes or no.) If nullable is allowed, then only the scalar can be nullable, so maybe add: ``` nullableValue(MinorType type) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966325#comment-16966325 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r341881937 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/DictWriter.java ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor; + +/** + * Dict writer is represented as an array of Maps (Array with Tuple entry). Review comment: Not sure I follow. A Dict is an array of key/value pairs, correct? It is a Map turned sideways: the key/value pairs are rows in an array, not columns in a map (as in the Map) type. So, a Dict is not "an array of Maps" is it? If we want a Dict to allow heterogeneous value members, then we'd have an array of key/Union pairs, not key/Map pairs. Or, am I missing something? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966327#comment-16966327 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r341883957 ## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/NullReader.java ## @@ -0,0 +1,369 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector.accessor.reader; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.record.metadata.VariantMetadata; +import org.apache.drill.exec.vector.accessor.ArrayReader; +import org.apache.drill.exec.vector.accessor.DictReader; +import org.apache.drill.exec.vector.accessor.ObjectReader; +import org.apache.drill.exec.vector.accessor.ObjectType; +import org.apache.drill.exec.vector.accessor.ScalarReader; +import org.apache.drill.exec.vector.accessor.TupleReader; +import org.apache.drill.exec.vector.accessor.ValueType; +import org.apache.drill.exec.vector.accessor.VariantReader; +import org.joda.time.Instant; +import org.joda.time.LocalDate; +import org.joda.time.LocalTime; +import org.joda.time.Period; + +import java.math.BigDecimal; + +/** + * Dummy reader which returns {@code null} for scalar types and itself for complex types. + */ +public class NullReader implements ScalarReader, ArrayReader, TupleReader, VariantReader, ObjectReader, DictReader { Review comment: Good idea. But, is this too general? Does it follow the actual metadata structure? The writer has a `NullScalarWriter` and a few other components that try to allow only structurally-legal operations. I suppose, however, that is less important when reading: being lenient won't break anything. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964061#comment-16964061 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-548392780 @paul-rogers thank you for review. I've made some changes, please take a look. There is a (interface) `DictWriter` similar to what you proposed with the exception it `extends` `ArrayWriter`, i.e. when writing a `dict` is assumed to be an array of tuples (with `key` and `value` fields) other than a pair of arrays. (`DictWriter`'s implementation, `ObjectDictWriter`, actually `extends ObjectArrayWriter`). For reading, there is a (interface) `DictReader` (`extends ColumnReader`) introduces 5 new methods: ``` ObjectReader getValueReader(Object key); Object get(Object key); int size(); ValueType keyColumnType(); ObjectType valueColumnType(); ``` For key/value lookup `getValueReader(Object key)` is used. When a `key` is found, `value`'s reader is returned with its position set correspondingly to the `key`. In a case there is no such `key` - a special type of `ObjectReader` - `org.apache.drill.exec.vector.accessor.reader.NullReader` will be returned. I am not sure that array access is needed - can't see any use case yet. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964014#comment-16964014 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r341136660 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { + + private final SchemaContainer parent; + private final TupleBuilder tupleBuilder = new TupleBuilder(); + private final String memberName; + private final TypeProtos.DataMode mode; + + public DictBuilder(String memberName, TypeProtos.DataMode mode) { +this(null, memberName, mode); + } + + public DictBuilder(SchemaContainer parent, String memberName, TypeProtos.DataMode mode) { +this.parent = parent; +this.memberName = memberName; +this.mode = mode; + } + + @Override + public void addColumn(ColumnMetadata column) { +assert DictVector.fieldNames.contains(column.name()); +tupleBuilder.addColumn(column); + } + + public DictBuilder addKey(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type)); + } + + public DictBuilder addValue(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type)); + } + + public DictBuilder add(MaterializedField col) { +assert DictVector.fieldNames.contains(col.getName()); +tupleBuilder.add(col); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, TypeProtos.DataMode mode) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, mode); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type, TypeProtos.DataMode mode) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, mode); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, int width) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, width); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type, int width) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, width); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, int precision, int scale) { +return addDecimal(DictVector.FIELD_KEY_NAME, type, TypeProtos.DataMode.REQUIRED, precision, scale); + } + + public DictBuilder addValue(TypeProtos.MinorType type, int precision, int scale) { +return addDecimal(DictVector.FIELD_VALUE_NAME, type, TypeProtos.DataMode.REQUIRED, precision, scale); + } + + public DictBuilder addNullable(String name, TypeProtos.MinorType type) { +tupleBuilder.addNullable(name, type); +return this; + } + + public DictBuilder addNullable(String name, TypeProtos.MinorType type, int width) { +tupleBuilder.addNullable(name, type, width); +return this; + } + + public DictBuilder addNullable(String name, TypeProtos.MinorType type, int precision, int scale) { +return addDecimal(name, type, TypeProtos.DataMode.OPTIONAL, precision, scale); + } + + public DictBuilder addArrayValue(TypeProtos.MinorType type) { +tupleBuilder.addArray(DictVector.FIELD_VALUE_NAME, type); +return this; + } + + public DictBuilder addArrayValue(TypeProtos.MinorType type, int dims) { +tupleBuilder.addArray(DictVector.FIELD_VALUE_NAME, type, dims); +return this; + } + + public DictBuilder addArrayValue(TypeProtos.MinorType type, int precision, int
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964004#comment-16964004 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r341133570 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { + + private final SchemaContainer parent; + private final TupleBuilder tupleBuilder = new TupleBuilder(); + private final String memberName; + private final TypeProtos.DataMode mode; + + public DictBuilder(String memberName, TypeProtos.DataMode mode) { +this(null, memberName, mode); + } + + public DictBuilder(SchemaContainer parent, String memberName, TypeProtos.DataMode mode) { +this.parent = parent; +this.memberName = memberName; +this.mode = mode; + } + + @Override + public void addColumn(ColumnMetadata column) { +assert DictVector.fieldNames.contains(column.name()); +tupleBuilder.addColumn(column); + } + + public DictBuilder addKey(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type)); + } + + public DictBuilder addValue(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type)); + } + + public DictBuilder add(MaterializedField col) { +assert DictVector.fieldNames.contains(col.getName()); +tupleBuilder.add(col); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, TypeProtos.DataMode mode) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, mode); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type, TypeProtos.DataMode mode) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, mode); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, int width) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, width); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type, int width) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, width); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, int precision, int scale) { Review comment: Yes, `key` can be a `DECIMAL`. Added `key(MajorType)` method to allow creation for the case (the methods validates allowed `key` types). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963999#comment-16963999 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r341132825 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { + + private final SchemaContainer parent; + private final TupleBuilder tupleBuilder = new TupleBuilder(); + private final String memberName; + private final TypeProtos.DataMode mode; + + public DictBuilder(String memberName, TypeProtos.DataMode mode) { +this(null, memberName, mode); + } + + public DictBuilder(SchemaContainer parent, String memberName, TypeProtos.DataMode mode) { +this.parent = parent; +this.memberName = memberName; +this.mode = mode; + } + + @Override + public void addColumn(ColumnMetadata column) { +assert DictVector.fieldNames.contains(column.name()); +tupleBuilder.addColumn(column); + } + + public DictBuilder addKey(TypeProtos.MajorType type) { Review comment: Changed the name to the `key()` and added validations, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963996#comment-16963996 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r341132301 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { Review comment: The `key` can be of any `REQUIRED` primitive type and `value` can be of any primitive or complex types. Dict's key is not limited to be a string. Yes, your assumption is correct. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963997#comment-16963997 ] ASF GitHub Bot commented on DRILL-7359: --- KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r341132415 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { + + private final SchemaContainer parent; + private final TupleBuilder tupleBuilder = new TupleBuilder(); + private final String memberName; + private final TypeProtos.DataMode mode; + + public DictBuilder(String memberName, TypeProtos.DataMode mode) { +this(null, memberName, mode); + } + + public DictBuilder(SchemaContainer parent, String memberName, TypeProtos.DataMode mode) { +this.parent = parent; +this.memberName = memberName; +this.mode = mode; + } + + @Override + public void addColumn(ColumnMetadata column) { Review comment: Yes, naming of the method can be confusing in this case (the method comes from `SchemaContainer`). Added validation to make sure only `key` and `value` fields may be added if they were not previously added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950411#comment-16950411 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-541442192 On, read, we can access the `DICT` as a dictionary and as an array. This means we need to be a bit clever about designing the `DictReader()` interface. We can get our `DictReader` similar to the write case: ``` DictReader dictReader = rowReader.dict("myDict"); ``` Now, we have two ways to proceed. Either we want key/value lookups, or array access. For array access, we want an iterator, much like the `ArrayReader`, but that gives us access to both the key and value. Something like: ``` interface DictReader ... ScalarReader key(); ObjectReader value(); boolean next(); ``` For key/value, we have a challenge. We need a way to specify the key that is type-specific. Then, we need to access the key in a type-specific way. This is the first time we've had such an issue. Maybe we can build this on the array iterator. In addition to the `next()` method, we can provide a `findFoo()` method, with one for each supported key type. ``` int findString(String key); int findInt(int key); ... ``` The semantics might be that the array index is set to the (first) index of the requested value, and that index is returned. If not found, the return value is -1 and the array index is one past the end of the array. ``` DictReader dictReader = ... if (dictReader.findString("barney")) { value = dictReader.value().scalar().getDouble(); } ``` There are probably other variations. Think how you expect to use the reader (for testing, yes, but what other operations)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950407#comment-16950407 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-541440586 Let's continue to assume that `DICT` is, essentially, `DICT` and that we can think of the `DICT`, when writing, as a pair of arrays: one for keys, one for values. (Because vectors are write-once, we have to add (key, value) pairs one by one. If so, then we need a new form of writer, a `DictWriter` that has semantics such as: ``` ObjectWriter key(); ObjectWriter value(); void save(); ``` The `key()` gives us an object writer that lets us access the key writer. If we restrict keys to be scalars, then we can just do: ``` ScalarWriter key(); ``` Values can, I imagine, be of any type. So, the `ObjectWriter` lets us work with them generically. Suppose we had a `DICT`, we could do: ``` DictWriter dictWriter = rowWriter.dict("myDict"); dictWriter.key().setString("fred"); dictWriter.value().scalar().setDouble(123.45) dictWriter.save(); dictWriter.key().setString("barney"); dictWriter.value().scalar().setDouble(98.76) dictWriter.save(); ``` This means that, in the `ObjectWriter()`, we need to add a new method, `dict()`, which will return a `DictWriter`, and we need to define the `DictWriter` interface. There is quite a bit of commentary in the column accessor package that (tries) to explain the structure behind these writers. Perhaps that might help explain the ideas here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950397#comment-16950397 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r334287580 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { + + private final SchemaContainer parent; + private final TupleBuilder tupleBuilder = new TupleBuilder(); + private final String memberName; + private final TypeProtos.DataMode mode; + + public DictBuilder(String memberName, TypeProtos.DataMode mode) { +this(null, memberName, mode); + } + + public DictBuilder(SchemaContainer parent, String memberName, TypeProtos.DataMode mode) { +this.parent = parent; +this.memberName = memberName; +this.mode = mode; + } + + @Override + public void addColumn(ColumnMetadata column) { +assert DictVector.fieldNames.contains(column.name()); +tupleBuilder.addColumn(column); + } + + public DictBuilder addKey(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type)); + } + + public DictBuilder addValue(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type)); + } + + public DictBuilder add(MaterializedField col) { Review comment: Probably no longer need this form. It is in the code to ease conversion from old code, but does not actually seem to have been used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950405#comment-16950405 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r334287763 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { + + private final SchemaContainer parent; + private final TupleBuilder tupleBuilder = new TupleBuilder(); + private final String memberName; + private final TypeProtos.DataMode mode; + + public DictBuilder(String memberName, TypeProtos.DataMode mode) { +this(null, memberName, mode); + } + + public DictBuilder(SchemaContainer parent, String memberName, TypeProtos.DataMode mode) { +this.parent = parent; +this.memberName = memberName; +this.mode = mode; + } + + @Override + public void addColumn(ColumnMetadata column) { +assert DictVector.fieldNames.contains(column.name()); +tupleBuilder.addColumn(column); + } + + public DictBuilder addKey(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type)); + } + + public DictBuilder addValue(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type)); + } + + public DictBuilder add(MaterializedField col) { +assert DictVector.fieldNames.contains(col.getName()); +tupleBuilder.add(col); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, TypeProtos.DataMode mode) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, mode); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type, TypeProtos.DataMode mode) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, mode); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, int width) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, width); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type, int width) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, width); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, int precision, int scale) { +return addDecimal(DictVector.FIELD_KEY_NAME, type, TypeProtos.DataMode.REQUIRED, precision, scale); + } + + public DictBuilder addValue(TypeProtos.MinorType type, int precision, int scale) { +return addDecimal(DictVector.FIELD_VALUE_NAME, type, TypeProtos.DataMode.REQUIRED, precision, scale); + } + + public DictBuilder addNullable(String name, TypeProtos.MinorType type) { Review comment: Yeah, all these forms make sense when only working with the top-level row, but become tedious when repeated or all these specialized builders... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee:
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950400#comment-16950400 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r334287912 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { + + private final SchemaContainer parent; + private final TupleBuilder tupleBuilder = new TupleBuilder(); + private final String memberName; + private final TypeProtos.DataMode mode; + + public DictBuilder(String memberName, TypeProtos.DataMode mode) { +this(null, memberName, mode); + } + + public DictBuilder(SchemaContainer parent, String memberName, TypeProtos.DataMode mode) { +this.parent = parent; +this.memberName = memberName; +this.mode = mode; + } + + @Override + public void addColumn(ColumnMetadata column) { +assert DictVector.fieldNames.contains(column.name()); +tupleBuilder.addColumn(column); + } + + public DictBuilder addKey(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type)); + } + + public DictBuilder addValue(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type)); + } + + public DictBuilder add(MaterializedField col) { +assert DictVector.fieldNames.contains(col.getName()); +tupleBuilder.add(col); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, TypeProtos.DataMode mode) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, mode); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type, TypeProtos.DataMode mode) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, mode); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, int width) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, width); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type, int width) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, width); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, int precision, int scale) { +return addDecimal(DictVector.FIELD_KEY_NAME, type, TypeProtos.DataMode.REQUIRED, precision, scale); + } + + public DictBuilder addValue(TypeProtos.MinorType type, int precision, int scale) { +return addDecimal(DictVector.FIELD_VALUE_NAME, type, TypeProtos.DataMode.REQUIRED, precision, scale); + } + + public DictBuilder addNullable(String name, TypeProtos.MinorType type) { +tupleBuilder.addNullable(name, type); +return this; + } + + public DictBuilder addNullable(String name, TypeProtos.MinorType type, int width) { +tupleBuilder.addNullable(name, type, width); +return this; + } + + public DictBuilder addNullable(String name, TypeProtos.MinorType type, int precision, int scale) { +return addDecimal(name, type, TypeProtos.DataMode.OPTIONAL, precision, scale); + } + + public DictBuilder addArrayValue(TypeProtos.MinorType type) { +tupleBuilder.addArray(DictVector.FIELD_VALUE_NAME, type); +return this; + } + + public DictBuilder addArrayValue(TypeProtos.MinorType type, int dims) { +tupleBuilder.addArray(DictVector.FIELD_VALUE_NAME, type, dims); +return this; + } + + public DictBuilder addArrayValue(TypeProtos.MinorType type, int precision,
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950404#comment-16950404 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r334287471 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { + + private final SchemaContainer parent; + private final TupleBuilder tupleBuilder = new TupleBuilder(); + private final String memberName; + private final TypeProtos.DataMode mode; + + public DictBuilder(String memberName, TypeProtos.DataMode mode) { +this(null, memberName, mode); + } + + public DictBuilder(SchemaContainer parent, String memberName, TypeProtos.DataMode mode) { +this.parent = parent; +this.memberName = memberName; +this.mode = mode; + } + + @Override + public void addColumn(ColumnMetadata column) { Review comment: If we agree DICT is DICT then we might want the building of a DICT to be something like: ``` TupleMetadata = new SchemaBuilder() .addDict("myDict") .key(MinorType.VARCHAR) .valueArray(MinorType.DOUBLE) .resumeSchema() ... ``` That is, we need to add one key. Can keys only be scalars? Only non-nullable? If so, then the single `key()` method is all that is needed. This returns the same builder. Can the value be of any type? Then we need the full set of type builders: scalar, array, map and so on. The current form does not make this clear: what does it mean to add a column? This isn't a map... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950403#comment-16950403 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r334287737 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { + + private final SchemaContainer parent; + private final TupleBuilder tupleBuilder = new TupleBuilder(); + private final String memberName; + private final TypeProtos.DataMode mode; + + public DictBuilder(String memberName, TypeProtos.DataMode mode) { +this(null, memberName, mode); + } + + public DictBuilder(SchemaContainer parent, String memberName, TypeProtos.DataMode mode) { +this.parent = parent; +this.memberName = memberName; +this.mode = mode; + } + + @Override + public void addColumn(ColumnMetadata column) { +assert DictVector.fieldNames.contains(column.name()); +tupleBuilder.addColumn(column); + } + + public DictBuilder addKey(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type)); + } + + public DictBuilder addValue(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type)); + } + + public DictBuilder add(MaterializedField col) { +assert DictVector.fieldNames.contains(col.getName()); +tupleBuilder.add(col); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, TypeProtos.DataMode mode) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, mode); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type, TypeProtos.DataMode mode) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, mode); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, int width) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, width); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type, int width) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, width); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, int precision, int scale) { Review comment: This form is used for `DECIMAL`. Does it make sense to allow a `DECIMAL` for a key? Maybe here, and in the width case above, just allow a `MajorType`. Maybe, in general, the `SchemaBuilder` should have relied on the `Types` methods to build up such types rather than having methods for each... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950396#comment-16950396 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#issuecomment-541439848 Going to review this in multiple stages. First, let's pin down the desired semantics of the DICT type. See code comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950402#comment-16950402 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r334287531 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { + + private final SchemaContainer parent; + private final TupleBuilder tupleBuilder = new TupleBuilder(); + private final String memberName; + private final TypeProtos.DataMode mode; + + public DictBuilder(String memberName, TypeProtos.DataMode mode) { +this(null, memberName, mode); + } + + public DictBuilder(SchemaContainer parent, String memberName, TypeProtos.DataMode mode) { +this.parent = parent; +this.memberName = memberName; +this.mode = mode; + } + + @Override + public void addColumn(ColumnMetadata column) { +assert DictVector.fieldNames.contains(column.name()); +tupleBuilder.addColumn(column); + } + + public DictBuilder addKey(TypeProtos.MajorType type) { Review comment: Good, but we allow only one key, do we not? If so, then rather than `addKey()` (which suggests I can add multiple), a simple `key()` would be simpler. Also, we should check if the key type was already set and not allow setting it a second time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950399#comment-16950399 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r334287620 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { + + private final SchemaContainer parent; + private final TupleBuilder tupleBuilder = new TupleBuilder(); + private final String memberName; + private final TypeProtos.DataMode mode; + + public DictBuilder(String memberName, TypeProtos.DataMode mode) { +this(null, memberName, mode); + } + + public DictBuilder(SchemaContainer parent, String memberName, TypeProtos.DataMode mode) { +this.parent = parent; +this.memberName = memberName; +this.mode = mode; + } + + @Override + public void addColumn(ColumnMetadata column) { +assert DictVector.fieldNames.contains(column.name()); +tupleBuilder.addColumn(column); + } + + public DictBuilder addKey(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type)); + } + + public DictBuilder addValue(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type)); + } + + public DictBuilder add(MaterializedField col) { +assert DictVector.fieldNames.contains(col.getName()); +tupleBuilder.add(col); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, TypeProtos.DataMode mode) { Review comment: Does it make sense for a key to be repeated or nullable? Isn't non-nullable the only valid key mode? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950401#comment-16950401 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r334287799 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { + + private final SchemaContainer parent; + private final TupleBuilder tupleBuilder = new TupleBuilder(); + private final String memberName; + private final TypeProtos.DataMode mode; + + public DictBuilder(String memberName, TypeProtos.DataMode mode) { +this(null, memberName, mode); + } + + public DictBuilder(SchemaContainer parent, String memberName, TypeProtos.DataMode mode) { +this.parent = parent; +this.memberName = memberName; +this.mode = mode; + } + + @Override + public void addColumn(ColumnMetadata column) { +assert DictVector.fieldNames.contains(column.name()); +tupleBuilder.addColumn(column); + } + + public DictBuilder addKey(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_KEY_NAME, type)); + } + + public DictBuilder addValue(TypeProtos.MajorType type) { +return add(MaterializedField.create(DictVector.FIELD_VALUE_NAME, type)); + } + + public DictBuilder add(MaterializedField col) { +assert DictVector.fieldNames.contains(col.getName()); +tupleBuilder.add(col); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, TypeProtos.DataMode mode) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, mode); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type, TypeProtos.DataMode mode) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, mode); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, int width) { +tupleBuilder.add(DictVector.FIELD_KEY_NAME, type, width); +return this; + } + + public DictBuilder addValue(TypeProtos.MinorType type, int width) { +tupleBuilder.add(DictVector.FIELD_VALUE_NAME, type, width); +return this; + } + + public DictBuilder addKey(TypeProtos.MinorType type, int precision, int scale) { +return addDecimal(DictVector.FIELD_KEY_NAME, type, TypeProtos.DataMode.REQUIRED, precision, scale); + } + + public DictBuilder addValue(TypeProtos.MinorType type, int precision, int scale) { +return addDecimal(DictVector.FIELD_VALUE_NAME, type, TypeProtos.DataMode.REQUIRED, precision, scale); + } + + public DictBuilder addNullable(String name, TypeProtos.MinorType type) { +tupleBuilder.addNullable(name, type); +return this; + } + + public DictBuilder addNullable(String name, TypeProtos.MinorType type, int width) { +tupleBuilder.addNullable(name, type, width); +return this; + } + + public DictBuilder addNullable(String name, TypeProtos.MinorType type, int precision, int scale) { +return addDecimal(name, type, TypeProtos.DataMode.OPTIONAL, precision, scale); + } + + public DictBuilder addArrayValue(TypeProtos.MinorType type) { +tupleBuilder.addArray(DictVector.FIELD_VALUE_NAME, type); +return this; + } + + public DictBuilder addArrayValue(TypeProtos.MinorType type, int dims) { +tupleBuilder.addArray(DictVector.FIELD_VALUE_NAME, type, dims); +return this; + } + + public DictBuilder addArrayValue(TypeProtos.MinorType type, int precision,
[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework
[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950398#comment-16950398 ] ASF GitHub Bot commented on DRILL-7359: --- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r334287167 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/DictBuilder.java ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record.metadata; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.complex.DictVector; + +public class DictBuilder implements SchemaContainer { Review comment: Would be great to add a few comments to explain the DICT structure. Is it of the form DICT, where the key is of one type, the value of another? Or, can keys and values be of different types? Since a MAP is a collection of (key, value) pairs in which the keys are declared, the keys are strings, and the values can each be of a distinct type, it is not likely that DICT does the same. I will assume the DICT format in my comments here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > - > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Fix For: 1.17.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)