[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458166#comment-16458166 ] ASF GitHub Bot commented on DRILL-6202: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1144 > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16455505#comment-16455505 ] ASF GitHub Bot commented on DRILL-6202: --- Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1144 Looks good. +1 > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454483#comment-16454483 ] ASF GitHub Bot commented on DRILL-6202: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r184452335 --- Diff: src/main/resources/checkstyle-config.xml --- @@ -30,10 +30,15 @@ + --- End diff -- My point is that there should be no special handling for IOBE compared to NPE or CCE. If it is in process call to a Java library all run-time exceptions including NPE, IOBE and CCE should be propagated to the caller. If it is a remote call, runtime and checked exceptions need to be handled by propagating them to a remote caller as well. What an end user tool displays to an end user depends on a tool. IMO, the best option is to have a configuration parameter that determines verbosity level. With default verbosity level, the tool may show a generic error message (possibly explaining how to increase verbosity level to get more info, see `maven -X`), and there should be an option to change the verbosity level and print a stack trace that initially triggered the error. Such stack trace is required when an end user logs a bug or asks for help on the dev list. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453280#comment-16453280 ] ASF GitHub Bot commented on DRILL-6202: --- Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r184242257 --- Diff: src/main/resources/checkstyle-config.xml --- @@ -30,10 +30,15 @@ + --- End diff -- So what do you tell the user when you get a runtime exception (any exception) that is the result of a bug? It is silly to show them a stack trace that does not help them. It is better to let the user know that there was an internal error and they should log a bug with support or look for help on the dev list. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449061#comment-16449061 ] ASF GitHub Bot commented on DRILL-6202: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r183571786 --- Diff: src/main/resources/checkstyle-config.xml --- @@ -30,10 +30,15 @@ + --- End diff -- IMO the same applies to all run-time exceptions (NPE, CCE, IOBE) and likely conversion for NPE is the same as for IOBE. Note that most likely the root cause of an IOBE or NPE is a bug and not an end user error, misconfiguration, invalid input or a lack of resources, so end user won't be able to "fix" NPE or IOBE. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448836#comment-16448836 ] ASF GitHub Bot commented on DRILL-6202: --- Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r183530901 --- Diff: src/main/resources/checkstyle-config.xml --- @@ -30,10 +30,15 @@ + --- End diff -- I think IOBE should be caught and converted to a UserException so that the end user does not get an incomprehensible error message. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433186#comment-16433186 ] ASF GitHub Bot commented on DRILL-6202: --- Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r180598658 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -534,15 +534,11 @@ public void setSafe(int index, byte[] bytes) { assert index >= 0; final int currentOffset = offsetVector.getAccessor().get(index); - offsetVector.getMutator().setSafe(index + 1, currentOffset + bytes.length); - try { -data.setBytes(currentOffset, bytes, 0, bytes.length); - } catch (IndexOutOfBoundsException e) { -while (data.capacity() < currentOffset + bytes.length) { - reAlloc(); -} -data.setBytes(currentOffset, bytes, 0, bytes.length); + while (data.capacity() < currentOffset + bytes.length) { --- End diff -- I'm on the fence on this one. The change from 18 months ago resulted in a performance improvement of a few percent for variable length columns. This change essentially undoes that and takes us back to where we were. It seems that with the recent changes to batch sizing, we are on the path to eliminating the realloc behaviour that is part of the performance bottleneck for the setSafe methods. So, perhaps, it might be ok to let this one stay. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425944#comment-16425944 ] ASF GitHub Bot commented on DRILL-6202: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r179231151 --- Diff: src/main/resources/checkstyle-config.xml --- @@ -30,10 +30,15 @@ + --- End diff -- I commented out the default rule value with the addition of IndexOutOfBoundsException. IndexOutOfBoundsException can be used, but it is not common to handle it. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425933#comment-16425933 ] ASF GitHub Bot commented on DRILL-6202: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r179229891 --- Diff: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/BaseScalarWriter.java --- @@ -211,7 +211,7 @@ protected boolean canExpand(int delta) { protected void overflowed() { if (listener == null) { - throw new IndexOutOfBoundsException("Overflow not supported"); + throw new UnsupportedOperationException("Overflow not supported"); --- End diff -- I also considered `OversizedAllocationException` but IMO `UnsupportedOperationException` is a better fit. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425912#comment-16425912 ] ASF GitHub Bot commented on DRILL-6202: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r179226227 --- Diff: exec/memory/base/src/main/java/io/netty/buffer/DrillBuf.java --- @@ -777,23 +778,20 @@ public int getActualMemoryConsumed() { * @return A hex dump in a String. */ public String toHexString(final int start, final int length) { -final int roundedStart = (start / LOG_BYTES_PER_ROW) * LOG_BYTES_PER_ROW; - -final StringBuilder sb = new StringBuilder("buffer byte dump\n"); -int index = roundedStart; -for (int nLogged = 0; nLogged < length; nLogged += LOG_BYTES_PER_ROW) { - sb.append(String.format(" [%05d-%05d]", index, index + LOG_BYTES_PER_ROW - 1)); - for (int i = 0; i < LOG_BYTES_PER_ROW; ++i) { -try { - final byte b = getByte(index++); - sb.append(String.format(" 0x%02x", b)); -} catch (IndexOutOfBoundsException ioob) { - sb.append(" "); -} +Preconditions.checkArgument(start >= 0); +final StringBuilder sb = new StringBuilder("buffer byte dump"); +final int end = Math.min(length, this.length - start); +for (int i = 0; i < end; i++) { + if (i % LOG_BYTES_PER_ROW == 0) { +sb.append(String.format("%n [%05d-%05d]", i + start, Math.min(i + LOG_BYTES_PER_ROW - 1, end - 1) + start)); } - sb.append('\n'); + byte b = _getByte(i + start); + sb.append(" 0x").append(HEX_CHAR[b >> 4]).append(HEX_CHAR[b & 0x0F]); --- End diff -- The goal is to avoid new String allocation for every output byte inside DrillBuf. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424987#comment-16424987 ] ASF GitHub Bot commented on DRILL-6202: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r179023675 --- Diff: src/main/resources/checkstyle-suppressions.xml --- @@ -16,4 +16,13 @@ + Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424988#comment-16424988 ] ASF GitHub Bot commented on DRILL-6202: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r179023466 --- Diff: src/main/resources/checkstyle-config.xml --- @@ -30,10 +30,15 @@ + --- End diff -- Do we want to do this? Why are all these forbidden? `Throwable` was needed by @Ben-Zvi in his very first Drill bug fix. `RuntimeException` is the root of the unchecked exceptions beloved by Drill code and is used as a general "something went wrong" placeholder. These are all commented out, but still... Finally, `IndexOutOfBoundsException` is a perfectly fine exception. Adding this line is probably handy to catch all reference as part of this fix, but probably not helpful as a general policy. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424984#comment-16424984 ] ASF GitHub Bot commented on DRILL-6202: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r179022533 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -534,15 +534,11 @@ public void setSafe(int index, byte[] bytes) { assert index >= 0; final int currentOffset = offsetVector.getAccessor().get(index); - offsetVector.getMutator().setSafe(index + 1, currentOffset + bytes.length); - try { -data.setBytes(currentOffset, bytes, 0, bytes.length); - } catch (IndexOutOfBoundsException e) { -while (data.capacity() < currentOffset + bytes.length) { - reAlloc(); -} -data.setBytes(currentOffset, bytes, 0, bytes.length); + while (data.capacity() < currentOffset + bytes.length) { --- End diff -- One trick used in the row set reader code is to avoid redundant re-allocs by computing the new size and rounding to a power of two. This is handy at the start with the vector doubles from 256 to 512 to 1K and the data size is, say, 600 bytes. However, it is not clear that such an optimization is as necessary as it once was: the team's been doing a good job at adding up-front vector allocation where it was missing, reducing the gratuitous reallocs. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424983#comment-16424983 ] ASF GitHub Bot commented on DRILL-6202: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r179022338 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -534,15 +534,11 @@ public void setSafe(int index, byte[] bytes) { assert index >= 0; final int currentOffset = offsetVector.getAccessor().get(index); - offsetVector.getMutator().setSafe(index + 1, currentOffset + bytes.length); - try { -data.setBytes(currentOffset, bytes, 0, bytes.length); - } catch (IndexOutOfBoundsException e) { -while (data.capacity() < currentOffset + bytes.length) { - reAlloc(); -} -data.setBytes(currentOffset, bytes, 0, bytes.length); + while (data.capacity() < currentOffset + bytes.length) { --- End diff -- If we compare this implementation to the one from, say, a year or 18 months ago, I think we're back where we started. This would actually be a good use case for a "checkedSetBytes" method that does the bounds checks. Still, whether done here or in the underlying method, the `if` statement is needed, so no savings in using a "checked" method, > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424985#comment-16424985 ] ASF GitHub Bot commented on DRILL-6202: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r179022090 --- Diff: exec/memory/base/src/main/java/io/netty/buffer/DrillBuf.java --- @@ -777,23 +778,20 @@ public int getActualMemoryConsumed() { * @return A hex dump in a String. */ public String toHexString(final int start, final int length) { -final int roundedStart = (start / LOG_BYTES_PER_ROW) * LOG_BYTES_PER_ROW; - -final StringBuilder sb = new StringBuilder("buffer byte dump\n"); -int index = roundedStart; -for (int nLogged = 0; nLogged < length; nLogged += LOG_BYTES_PER_ROW) { - sb.append(String.format(" [%05d-%05d]", index, index + LOG_BYTES_PER_ROW - 1)); - for (int i = 0; i < LOG_BYTES_PER_ROW; ++i) { -try { - final byte b = getByte(index++); - sb.append(String.format(" 0x%02x", b)); -} catch (IndexOutOfBoundsException ioob) { - sb.append(" "); -} +Preconditions.checkArgument(start >= 0); +final StringBuilder sb = new StringBuilder("buffer byte dump"); +final int end = Math.min(length, this.length - start); +for (int i = 0; i < end; i++) { + if (i % LOG_BYTES_PER_ROW == 0) { +sb.append(String.format("%n [%05d-%05d]", i + start, Math.min(i + LOG_BYTES_PER_ROW - 1, end - 1) + start)); } - sb.append('\n'); + byte b = _getByte(i + start); + sb.append(" 0x").append(HEX_CHAR[b >> 4]).append(HEX_CHAR[b & 0x0F]); --- End diff -- Is this complexity needed relative to the original `format(" 0x02x", b)` implementation? Is this code performance sensitive enough to justify the extra complexity? > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424986#comment-16424986 ] ASF GitHub Bot commented on DRILL-6202: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r179023079 --- Diff: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/BaseScalarWriter.java --- @@ -211,7 +211,7 @@ protected boolean canExpand(int delta) { protected void overflowed() { if (listener == null) { - throw new IndexOutOfBoundsException("Overflow not supported"); + throw new UnsupportedOperationException("Overflow not supported"); --- End diff -- Maybe change to `OversizedAllocationException`. See `FixedValueVectors.setInitialCapacity()`. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424957#comment-16424957 ] ASF GitHub Bot commented on DRILL-6202: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1144 My two cents... DrillBuf is the only memory-level abstraction that (low level) Drill code should reference. The UDLE and other bits should be fully encapsulated. This guideline lets us evolve the representation if we ever need to do so. The original design appeared to be that value vectors would be the primary interface to memory. But, a great many issues made that difficult, not least of which is that vector access methods are heavily typed, resulting in far too much casting. Also, the mutator methods try to do the full operation, leading to inefficiency (especially around VarChars). A more general rule is that application code should work with vectors until they can migrate to working with the result set loader or reader. (We should probably call these the row set emitter and collector to be more Hadoop-like...) The higher-level abstractions handle the grunt work currently spread throughout operators. (And, to answer a prior question: we want to use the row set abstractions so we have a uniform way to write to vectors, to control batch size, to handle schema issues and so on on write. And, to have a standard way to handle indirection vectors and vector navigation on read.) Ideally only, the vector mutator or row set loader implementation works with DrillBuf to do actual data reads and writes. In an early version, the row set loader code used `PlatformDependent` to avoid bounds checks. But, with @vrozov's improvements, doing so became unnecessary -- a nice improvement. Still, bounds checks should be done during tests: it is handy to work with a safety net. Since bounds checks are optional (turned off in production), then the changes here make good sense: no code should count on bounds checks from the "unchecked" methods for the simple reason that the checks are normally off. That said, if there is a reason to have "checked" access, we could provide such methods. Those methods would throw the `IndexOutOfBoundsException`. That is, the checked methods would recreate the original "get/set" methods prior to @vrozov's improvements. I can't think of a reason to do that off the top of my head, but someone might present a valid use case. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424595#comment-16424595 ] Vlad Rozov commented on DRILL-6202: --- [~sachouche] ??Your Jira doesn't quite explain the "why" you intend to deprecate the IndexOutOfBoundException (since it is an unchecked exception)?? Please see JIRA description and DRILL-6004 for details why {{IndexOutOfBoundException}} should not be used to re-alloc vectors. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424580#comment-16424580 ] ASF GitHub Bot commented on DRILL-6202: --- Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1144 IMO, this PR or DRILL-6202 is not the best place to discuss boundary checking as the PR and JIRA deals with `IndexOutOfBoundsException` but does not change how DrillBuf, Vector or Operators ensure that reads and writes are done within proper boundaries. I'll bring that to dev@drill. Please repeat your arguments on the mailing list. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424563#comment-16424563 ] ASF GitHub Bot commented on DRILL-6202: --- Github user sachouche commented on the issue: https://github.com/apache/drill/pull/1144 I have added a comment within this PR associated JIRA: [DRILL-6202](https://issues.apache.org/jira/browse/DRILL-6202) > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424555#comment-16424555 ] salim achouche commented on DRILL-6202: --- This is my take on the Drill boundary checks: _*Short Term -*_ * Ideally, the Drill boundary checks should be always on as long as ** The impact of a Drillbit process crash (or data corruption) is big since there is no built-in fault-tolerance ** The code is extensible and extensions are allowed to access direct memory * Having said that, my priority would have been to minimize the cost associated with these checks instead of completely turning them off ** This is no different from Java's behavior with regard to array boundary checks * How do we do that? Actually there are multiple strategies (which could be combined) ** Fine-grained checks *** Add boundary checks within +all+ DrillBuf data accessors *** Invoke the accessor API within a loop and ensure the JVM is able to optimize the checks This will help you answer the question(s) around whether we access DM directly or through Netty ** Caller overwrite *** Allow caller to disable checks that are deemed too expensive or not easily optimizable by the HotSport (e.g., Reference Checks) *** This pattern works well for a centralized layer (e.g., Paul's accessor framework) but not for extensions as they cannot be always trusted to do the right thing *** To mitigate this, we could always have an auxiliary flag that will force execution of such checks if set; that is overwrite untrusted callers This should be done if a crash or corruption is observed ** Bulk Processing *** Bulk accessor APIs will allow +all the checks+ to be performed but with a minimal cost (amortized) _*Long Term -*_ * With the new Accessor Framework in place all DM checks should be primarily within this layer ** The promise of this layer is that other memory formats can be transparently substituted (e.g., Apache Arrow) * The question on whether the runtime checks are enabled by default becomes less important ** The chance of crash / corruption is highly minimized ** It should be rather easy for this layer to optimize the runtime checks; then the question becomes "why not?" _*Question -*_ * Your Jira doesn't quite explain the ** "why" you intend to deprecate the IndexOutOfBoundException (since it is an unchecked exception) ** And replace it with what other mechanism? *NOTE -* * To minimize bookkeeping complexity, Drill operators will upfront allocate memory for the variable length value vectors to minimize the cost of re-allocs * The setSafe() APIs are called (at least for Parquet) when the associated column ** Has enough VV space to insert the new value(s) ** Can extend the current VV to the next-power-of-two; the setSafe() api is responsible for extending the vector(s) > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423828#comment-16423828 ] ASF GitHub Bot commented on DRILL-6202: --- Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1144 I think we need to include a few other folks into this. @paul-rogers, @sachouche, have also looked into the issue of excessive bounds checking and ways to write to direct memory with minimum overhead. Both Salim and Paul have done work which has tried to eliminate the excessive checking and use `PlatformDependent` directly, so it might be the right time to agree on the right approach here. At a high level, I believe there is agreement that we need to 1) reduce bounds checking to (preferably) once per write, and 2) to minimise the number of function calls before memory is actually written to. We have three layers where we could potentially check bounds - i) the operators, ii) the vectors, iii) DrillBuf. Right now, we do so at each level, at multiple times at that. Paul's work on batch sizing provides us with a layer that gives us the bounds check guarantees at the operator level. This means we could potentially use value-vectors' set methods (as opposed to the setSafe methods) and DrillBuf can use PlatformDependent directly. Some caveats - UDLE's check for and enforce little-endianness. Checking for endianness is important for value vectors because they assume little endian, but the enforcement is sometimes not so desirable. Drill's Java client uses the same DrillBuf backed by a UDLE and that means that client applications can no longer run on big endian machines (and yes, I have heard this request from end users). However, the fact is that UDLE's are an intrinsic part of the drill-memory design [1] [2]. Eliminating UDLE's can lead to re-doing large parts of very well tested code. The caveat to using the vector set methods is that the setSafe methods provide resizing capability that operators have come to rely upon. Switching from setSafe to set breaks practically every operator. [1] https://github.com/jacques-n/drill/blob/DRILL-4144/exec/memory/base/src/main/java/org/apache/drill/exec/memory/README.md [2] https://docs.google.com/nonceSigner?nonce=nj279efks0ro0&continue=https://doc-0o-as-docs.googleusercontent.com/docs/securesc/gipu3hlcf22l6svruqr71h7qe2k3djum/5v7eb2cm4bghq76nj658ai5hkk9h52ur/152274960/11021365158327727934/11021365158327727934/0B6CmYjIAywyCalVwcURkaFlkc1U?e%3Ddownload&hash=41l8jspccbj1pp63750c5von8ol4ijtl > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423360#comment-16423360 ] ASF GitHub Bot commented on DRILL-6202: --- Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1144 It is not clear why get/set Byte/Char/Short/Int/Long/Float/Double do not delegate to UDLE, while get/set Bytes delegates to UDLE and relies on netty 'AbstractByteBuf` for bounds checking. IMO, it will be good to have the behavior consistent for all methods. In many cases including `VariableLengthVectors`, there is no need to rely on UDLE boundary checking as a caller already provides or can provide a guarantee that an index is within a buffer boundaries. In those cases, boundary check becomes an extra cost. IMO, it will be good to have a consistent behavior with ability to enable bounds checking for debugging. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419420#comment-16419420 ] ASF GitHub Bot commented on DRILL-6202: --- Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1144 To be honest, I don't know. There are too many paths available to write to direct memory, some which were developed over time and so not all vector code may be using them. Ideally, every bit of code should go thru DrillBuf which wraps a UDLE (UnsafeDirectLittleEndian). UDLE has one level of bounds check (which I feel is necessary) and uses PlatformDependent directly. Vector set safe methods provides vector level bounds checking and the vector set methods bypass bounds checking. ResultSetLoader methods provide guarantees that may make the set safe methods redundant. Our goal should be to not depend on the set safe methods except for debugging. What exactly are you thinking of when proposing using PlatformDependent directly. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417682#comment-16417682 ] ASF GitHub Bot commented on DRILL-6202: --- Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1144 Is there a reason to delegate `get/setBytes` to `AbstractByteBuf`? If not, this PR will be a preparation step to use `PlatformDependent` directly bypassing Netty bounds checking. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389093#comment-16389093 ] ASF GitHub Bot commented on DRILL-6202: --- Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1144 AFAIK, IndexOutOfBoundsException is thrown by Netty when a write to a DrillBuf goes out of bounds (see https://github.com/netty/netty/blob/netty-4.0.48.Final/buffer/src/main/java/io/netty/buffer/AbstractByteBuf.java#L1123 and https://github.com/netty/netty/blob/netty-4.0.48.Final/buffer/src/main/java/io/netty/buffer/UnsafeByteBufUtil.java#L283 ). This is independent of the bounds check in Drill which may be disabled. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.13.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16388974#comment-16388974 ] Pritesh Maker commented on DRILL-6202: -- [~parthc] did you get a chance to review this issue? > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.13.0 > > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382752#comment-16382752 ] ASF GitHub Bot commented on DRILL-6202: --- Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1144 @parthchandra Please take a look. > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
[ https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382741#comment-16382741 ] ASF GitHub Bot commented on DRILL-6202: --- GitHub user vrozov opened a pull request: https://github.com/apache/drill/pull/1144 DRILL-6202: Deprecate usage of IndexOutOfBoundsException to re-alloc vectors You can merge this pull request into a Git repository by running: $ git pull https://github.com/vrozov/drill DRILL-6202 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1144.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1144 commit 2af94a07340f9f13aa152822c2c8d37568ab44ab Author: Vlad Rozov Date: 2018-03-01T17:36:05Z DRILL-6202: Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > Deprecate usage of IndexOutOfBoundsException to re-alloc vectors > > > Key: DRILL-6202 > URL: https://issues.apache.org/jira/browse/DRILL-6202 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Major > > As bounds checking may be enabled or disabled, using > IndexOutOfBoundsException to resize vectors is unreliable. It works only when > bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)