[jira] [Commented] (DRILL-6234) VarCharVector setValueCount can throw IndexOutOfBoundsException

2018-03-13 Thread Timothy Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397856#comment-16397856
 ] 

Timothy Farkas commented on DRILL-6234:
---

Hi [~paul-rogers], I agree with your analysis. What I tried doing was having 
the vector remember it's value count without being explicitly told it. In the 
case of variable length vectors the value count is effectively last index 
written to + 1. If someone set a larger value count we would simply increase 
the size of the offset vector but internally use the last written index + 1 as 
the effective value count. If someone sets a smaller value count than the 
current, then we trim the vector as the method does now. I got things to work 
as I wanted in isolation, but things broke down in the integration tests when 
transferTo, load, and getMetadata were called. Then I realized how pointless 
this whole thing was and you were right all along :) and gave up. I'm going to 
salvage some of the Javadoc and unit tests I wrote and supplement it with some 
of your explanation in an effort to prevent other people from being confused by 
this. I'll open a small PR with the documentation improvements and tag you on 
it.

> VarCharVector setValueCount can throw IndexOutOfBoundsException
> ---
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) VarCharVector setValueCount can throw IndexOutOfBoundsException

2018-03-12 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396335#comment-16396335
 ] 

Paul Rogers commented on DRILL-6234:


[~timothyfarkas], the tricky bit here is that your proposal is not simply a bug 
fix; it is a change of design. Now, the way that the value count works is 
decidedly odd: one would naively think that each vector would know how much 
data it holds. The value count name itself is misleading; all that call really 
does is set the end-of-buffer location for the underlying {{ByteBuf}}. (This 
is, in turn, because a value vector is not like a database buffer, it is 
instead modeled on a network serialization/deserialization buffer.)

The call in {{VarCharVector}} fails because we need to set two positions: the 
end of the offsets and the end of the values. To get the end of the values, we 
index into the offsets to find the position of the (non-existent) n+1st item, 
and use that as the length of the values.

If we've not written any values, than the offsets are all zero (if the vector 
has been resized, or if this is the first allocation from direct memory) or 
filled with garbage (if this is the initial vector allocation and the buffer 
has been reused from the free list in which case we don't zero the vector.)

So, it turns out, *there is no way* to set the end position of the values 
vector until we write data. This is obviously not true of a fixed vector (such 
as the offsets) since we know the width of each value. But, it is very true of 
variable-width vectors (VarChar) or arrays.

Bottom line: find some other way to accomplish your goal. The Drill protocol 
expects the client to keep track of the item count until the vector is 
finished. For example, that is one of the tasks that the result set loader 
handles: keeping track of the top-level values, and the counts of values in 
nested arrays of maps of arrays of...

> VarCharVector setValueCount can throw IndexOutOfBoundsException
> ---
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) VarCharVector setValueCount can throw IndexOutOfBoundsException

2018-03-12 Thread Timothy Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396243#comment-16396243
 ] 

Timothy Farkas commented on DRILL-6234:
---

[~paul-rogers] The motivation for this was to make handling cases where you 
don't know the amount of data that will go into the vector upfront a little 
simpler. Such a case is a BatchHolder in a hash table where it is desirable to 
grow the vectors as you get more data instead preallocating a large chunk of 
memory that could go unused. You are right, an index out of bound exception 
does occur when setting more elements than there are in the vector, however, I 
assumed things would behave as they do with the FixedWidth vectors. For example 
Uint4Vector resizes the array to fit valueCount elements when setValueCount is 
called even if setValueCount. So if we can make the behavior consistent without 
any negative side effects, then why not do it?

> VarCharVector setValueCount can throw IndexOutOfBoundsException
> ---
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)