[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421452#comment-16421452 ] ASF GitHub Bot commented on DRILL-6234: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1164 > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398089#comment-16398089 ] ASF GitHub Bot commented on DRILL-6234: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1164 @paul-rogers Applied review comments. > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Fix For: 1.14.0 > > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398085#comment-16398085 ] ASF GitHub Bot commented on DRILL-6234: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174350127 --- Diff: exec/vector/src/test/java/org/apache/drill/exec/vector/VariableLengthVectorTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.memory.RootAllocator; +import org.apache.drill.exec.record.MaterializedField; +import org.junit.Assert; +import org.junit.Test; + +/** + * This test uses {@link VarCharVector} to test the template code in VariableLengthVector. + */ +public class VariableLengthVectorTest +{ + /** + * If the vector contains 1000 records, setting a value count of 1000 should work. + */ + @Test + public void testSettingSameValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +checkIndexStrings("", 0, size, accessor); + } finally { +vector.clear(); + } +} + } + + /** + * Test trunicating data. If you have 1 records, reduce the vector to 1000 records. + */ + @Test + public void testTrunicateVectorSetValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final int fluffSize = 1; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); +setSafeIndexStrings("first cut ", size, fluffSize, mutator); + +mutator.setValueCount(fluffSize); +Assert.assertEquals(fluffSize, accessor.getValueCount()); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +setSafeIndexStrings("redone cut ", size, fluffSize, mutator); --- End diff -- Yikes! I didn't know this. Thanks for catching. > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Fix For: 1.14.0 > > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398082#comment-16398082 ] ASF GitHub Bot commented on DRILL-6234: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174349986 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract --- End diff -- This is a good way to layout the information. I switched the javadoc to follow this outline. > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Fix For: 1.14.0 > > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398080#comment-16398080 ] ASF GitHub Bot commented on DRILL-6234: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174349888 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. + * It is possible to trim the vector by setting the value count to be less than the number of values currently container in the vector with {@link #setValueCount(int)}, then + * the process of setting values starts with the index after the last index. + * + * + * It is also possible to back track and set the value at an index earlier than the current index, however, the caller must assume that all data container after the last + * set index is corrupted. + * + * Notes + * + * There is no gaurantee the data buffer for the {@link VariableWidthVector} will have enough space to contain the data you set unless you use setSafe. If you + * use set you may get array index out of bounds exceptions. --- End diff -- Liked this refactored phrasing > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Fix For: 1.14.0 > > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398077#comment-16398077 ] ASF GitHub Bot commented on DRILL-6234: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174349801 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. + * It is possible to trim the vector by setting the value count to be less than the number of values currently container in the vector with {@link #setValueCount(int)}, then + * the process of setting values starts with the index after the last index. + * + * + * It is also possible to back track and set the value at an index earlier than the current index, however, the caller must assume that all data container after the last --- End diff -- Updated > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Fix For: 1.14.0 > > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398079#comment-16398079 ] ASF GitHub Bot commented on DRILL-6234: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174349826 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. + * It is possible to trim the vector by setting the value count to be less than the number of values currently container in the vector with {@link #setValueCount(int)}, then + * the process of setting values starts with the index after the last index. + * + * + * It is also possible to back track and set the value at an index earlier than the current index, however, the caller must assume that all data container after the last + * set index is corrupted. --- End diff -- Added > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Fix For: 1.14.0 > > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398075#comment-16398075 ] ASF GitHub Bot commented on DRILL-6234: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174349707 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -506,6 +506,8 @@ public boolean isNull(int index){ } /** + * Overview --- End diff -- Fixed > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Fix For: 1.14.0 > > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398069#comment-16398069 ] ASF GitHub Bot commented on DRILL-6234: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174348633 --- Diff: exec/vector/src/test/java/org/apache/drill/exec/vector/VariableLengthVectorTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.memory.RootAllocator; +import org.apache.drill.exec.record.MaterializedField; +import org.junit.Assert; +import org.junit.Test; + +/** + * This test uses {@link VarCharVector} to test the template code in VariableLengthVector. + */ +public class VariableLengthVectorTest +{ + /** + * If the vector contains 1000 records, setting a value count of 1000 should work. + */ + @Test + public void testSettingSameValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +checkIndexStrings("", 0, size, accessor); + } finally { +vector.clear(); + } +} + } + + /** + * Test trunicating data. If you have 1 records, reduce the vector to 1000 records. + */ + @Test + public void testTrunicateVectorSetValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final int fluffSize = 1; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); +setSafeIndexStrings("first cut ", size, fluffSize, mutator); + +mutator.setValueCount(fluffSize); +Assert.assertEquals(fluffSize, accessor.getValueCount()); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +setSafeIndexStrings("redone cut ", size, fluffSize, mutator); +mutator.setValueCount(fluffSize); +Assert.assertEquals(fluffSize, accessor.getValueCount()); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); + +checkIndexStrings("", 0, size, accessor); + + } finally { +vector.clear(); + } +} + } + + /** + * Set 1 values. Then go back and set new values starting at the 1001 the record. --- End diff -- I agree the vector writers should be used. The reason why I was looking into this is that I saw strange behavior in the legacy HashTable where setValueCount was being called with a larger valueCount than there was data in a VarCharVector. I did an ugly (and now I think incorrect work around) for the issue and set about to make setValueCount
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398031#comment-16398031 ] ASF GitHub Bot commented on DRILL-6234: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174343151 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. --- End diff -- Thanks for bringing this up. I'm sharing a design doc on the dev list tomorrow or the day after about how I plan to refactor HashAgg. It will cover how to facilitate unit tests and how to change the memory handling to use a deterministic calculator like the SortMemoryManager and soon to be introduced HashJoinMemoryCalculator (instead of catch OOMs). Perhaps you could comment on the doc about how to set ourselves up to fix this case. > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397932#comment-16397932 ] ASF GitHub Bot commented on DRILL-6234: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174328025 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract --- End diff -- Might be worth summarizing how to use this: 1) Write to values sequentially. Fixed-width vectors allow random access, but special care is needed. 2) Keep track in client code of the total time count. Call `setValueCount()` once the vector is full to set the final value. (The vector does not know its count while a write is in progress.) 3) Either take responsibility for allocating enough memory, or call the `setSafe()` methods to automatically extend the vector. 4) Once vectors are written, they are immutable; no additional writes of any kind are allowed to that vector. > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397935#comment-16397935 ] ASF GitHub Bot commented on DRILL-6234: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174327182 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. + * It is possible to trim the vector by setting the value count to be less than the number of values currently container in the vector with {@link #setValueCount(int)}, then + * the process of setting values starts with the index after the last index. + * + * + * It is also possible to back track and set the value at an index earlier than the current index, however, the caller must assume that all data container after the last + * set index is corrupted. --- End diff -- Maybe add "changing the index does not release memory after the index." > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397931#comment-16397931 ] ASF GitHub Bot commented on DRILL-6234: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174328361 --- Diff: exec/vector/src/test/java/org/apache/drill/exec/vector/VariableLengthVectorTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.memory.RootAllocator; +import org.apache.drill.exec.record.MaterializedField; +import org.junit.Assert; +import org.junit.Test; + +/** + * This test uses {@link VarCharVector} to test the template code in VariableLengthVector. + */ +public class VariableLengthVectorTest +{ + /** + * If the vector contains 1000 records, setting a value count of 1000 should work. + */ + @Test + public void testSettingSameValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +checkIndexStrings("", 0, size, accessor); + } finally { +vector.clear(); + } +} + } + + /** + * Test trunicating data. If you have 1 records, reduce the vector to 1000 records. + */ + @Test + public void testTrunicateVectorSetValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final int fluffSize = 1; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); +setSafeIndexStrings("first cut ", size, fluffSize, mutator); + +mutator.setValueCount(fluffSize); +Assert.assertEquals(fluffSize, accessor.getValueCount()); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +setSafeIndexStrings("redone cut ", size, fluffSize, mutator); --- End diff -- While this works, we are actually violating the vector contract which is "once the value count is set, the vector becomes immutable." If the client is not done writing to the vector, the client should maintain the value count until it is finally done. > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); >
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397933#comment-16397933 ] ASF GitHub Bot commented on DRILL-6234: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174327511 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. --- End diff -- This is very important to know. This is why spill-to-disk for hash agg will eventually cause a serious customer failure. Aggregate UDFs write to vectors to store intermediate group values. A "max" string can't. Instead, it writes to a Java object. That object will be lost on spill and reread. Will result in loosing prior max values and the aggregate starting over. So, this little note is not just a nuisance, it is the fatal flaw in how we handle the (albeit obscure) string aggregate values. > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397936#comment-16397936 ] ASF GitHub Bot commented on DRILL-6234: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174328662 --- Diff: exec/vector/src/test/java/org/apache/drill/exec/vector/VariableLengthVectorTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.memory.RootAllocator; +import org.apache.drill.exec.record.MaterializedField; +import org.junit.Assert; +import org.junit.Test; + +/** + * This test uses {@link VarCharVector} to test the template code in VariableLengthVector. + */ +public class VariableLengthVectorTest +{ + /** + * If the vector contains 1000 records, setting a value count of 1000 should work. + */ + @Test + public void testSettingSameValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +checkIndexStrings("", 0, size, accessor); + } finally { +vector.clear(); + } +} + } + + /** + * Test trunicating data. If you have 1 records, reduce the vector to 1000 records. + */ + @Test + public void testTrunicateVectorSetValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final int fluffSize = 1; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); +setSafeIndexStrings("first cut ", size, fluffSize, mutator); + +mutator.setValueCount(fluffSize); +Assert.assertEquals(fluffSize, accessor.getValueCount()); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +setSafeIndexStrings("redone cut ", size, fluffSize, mutator); +mutator.setValueCount(fluffSize); +Assert.assertEquals(fluffSize, accessor.getValueCount()); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); + +checkIndexStrings("", 0, size, accessor); + + } finally { +vector.clear(); + } +} + } + + /** + * Set 1 values. Then go back and set new values starting at the 1001 the record. --- End diff -- Just FYI: the vector writers handle all this stuff for you. They allow overwriting the most recent value. They keep track of vector counts and data offsets. And so on. This is why I can offer such detailed comments: I learned how all this works when creating those classes. Would be very wonderful to start reusing that work rather than
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397929#comment-16397929 ] ASF GitHub Bot commented on DRILL-6234: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174327132 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. + * It is possible to trim the vector by setting the value count to be less than the number of values currently container in the vector with {@link #setValueCount(int)}, then + * the process of setting values starts with the index after the last index. + * + * + * It is also possible to back track and set the value at an index earlier than the current index, however, the caller must assume that all data container after the last --- End diff -- "all data container after" --> "all data after the updated index" ? > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397934#comment-16397934 ] ASF GitHub Bot commented on DRILL-6234: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174327601 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. + * It is possible to trim the vector by setting the value count to be less than the number of values currently container in the vector with {@link #setValueCount(int)}, then + * the process of setting values starts with the index after the last index. + * + * + * It is also possible to back track and set the value at an index earlier than the current index, however, the caller must assume that all data container after the last + * set index is corrupted. + * + * Notes + * + * There is no gaurantee the data buffer for the {@link VariableWidthVector} will have enough space to contain the data you set unless you use setSafe. If you + * use set you may get array index out of bounds exceptions. --- End diff -- Said another way, either 1) be careful to manage your own memory, or 2) call `setSafe()`. That is, in fact, why `setSafe()` exists. > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397930#comment-16397930 ] ASF GitHub Bot commented on DRILL-6234: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174327027 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -506,6 +506,8 @@ public boolean isNull(int index){ } /** + * Overview --- End diff -- Nit, but I think that h4 is the usual heading level used in Java doc comments. The higher levels are used in the surrounding generated HTML. > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397921#comment-16397921 ] Paul Rogers commented on DRILL-6234: [~timothyfarkas], not pointless at all. You are going though the Drill University of Hard Knocks. If we could do this over, vector value count should be maintained in the vector for ever write. No reason to keep it in application code (most operators) or the vector writers (new code). The only reason we don't do it the "right way" is that historical decision to base vectors on network buffers. If we everĀ get a chance to replace {{DrillBuf}} with fixed-width buffers chained together (or some other implementation), we should certainly revisit this issue. > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397900#comment-16397900 ] ASF GitHub Bot commented on DRILL-6234: --- GitHub user ilooner opened a pull request: https://github.com/apache/drill/pull/1164 DRILL-6234: Improved documentation for VariableWidthVector mutators I had some confusion about how setValueCount should behave for variable width vectors. I added some documentation and unit tests which explain its behavior so that others don't waste time in the future. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilooner/drill DRILL-6234 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1164.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1164 commit 4f13fd2873a9b510a9d0105ad87f72792aa46314 Author: Timothy FarkasDate: 2018-03-14T00:24:28Z DRILL-6234: Improved documentation for VariableWidthVector mutators, and added simple unit tests demonstrating mutator behavior. > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior
[ https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397901#comment-16397901 ] ASF GitHub Bot commented on DRILL-6234: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1164 @paul-rogers > Improve Documentation of VariableWidthVector Behavior > - > > Key: DRILL-6234 > URL: https://issues.apache.org/jira/browse/DRILL-6234 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Doing the following will throw an Index out of bounds exception. > {code} > final VarCharVector vector = new VarCharVector(field, allocator); > vector.allocateNew(); > vector.getMutator().setValueCount(100); > {code} > The expected behavior is to resize the array appropriately. If an index is > uninitialized you should not call get for that index. > {code} > at > org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) > at > org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) > at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) > at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) > at > org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) > at > org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729) > at > org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)