[jira] [Commented] (DRILL-6238) Batch sizing for operators

2018-03-13 Thread Pritesh Maker (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398103#comment-16398103
 ] 

Pritesh Maker commented on DRILL-6238:
--

[~ppenumarthy] I added links to a bunch of issues that are related to the batch 
sizing. Do review to see if I missed any.

> Batch sizing for operators
> --
>
> Key: DRILL-6238
> URL: https://issues.apache.org/jira/browse/DRILL-6238
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
>
> *Batch Sizing For Operators*
> This document describes the approach we are taking for limiting batch sizes 
> for operators other than scan.
> *Motivation*
> Main goals are
>  # Improve concurrency
>  # Reduce query failures because of out of memory errors
> To accomplish these goals, we need to make queries execute within a specified 
> memory budget. To enforce per query memory limit, we need to be able to 
> enforce per fragment and per operator memory limits. Controlling individual 
> operators batch sizes is the first step towards all this.
> *Background*
> In Drill, different operators have different limits w.r.to outgoing batches. 
> Some use hard coded row counts, some use hard coded memory and some have none 
> at all. Based on input data size and what the operator is doing, memory used 
> by the outgoing batch can vary widely as there are no limits imposed. Queries 
> fail because we are not able to allocate the memory needed. Some operators 
> produce very large batches, causing blocking operators like sort, hash agg 
> which have to work under tight memory constraints to fail. Size of batches 
> should be a function of available memory rather than input data size and/or 
> what the operator does. Please refer to table at the end of this document for 
> details on what each operator does today.
> *Design*
> Goal is to have all operators behave the same way i.e. produce batches with 
> size less than or equal to configured outgoing batch size with a minimum of 1 
> row per batch and maximum of 64k rows per batch. A new system option 
> ‘drill.exec.memory.operator.output_batch_size’ is added which has default 
> value of 16MB.
> The basic idea is to limit size of outgoing batch by deciding how many rows 
> we can have in the batch based on average entry size of each outgoing column, 
> taking into account actual data size and metadata vector overhead we add on 
> top for tracking variable length, mode(repeated, optional, required) etc. 
> This calculation will be different for each operator and is based on
>  # What the operator is doing
>  # Incoming batch size that includes information on type and average size of 
> each column
>  # What is being projected out
> By taking this adaptive approach based on actual average data sizes, for 
> operators which were limiting batch size to less than 64K rows before can 
> possibly do lot more rows (upto 64K rows) in a batch if the memory stays 
> within the budget. For example, flatten and joins have batch size of 4K rows, 
> which probably might have been done to be conservative w.r.to memory usage. 
> By making these operators go upto 64K rows as long as they stay with in the 
> memory budget should help improve performance.
> Also, to improve performance and utilize memory more efficiently, we will
>  # Allocate memory for value vectors upfront. Since we know the number of 
> rows and sizing information for each column in the  outgoing batch, we will 
> use that information to allocate memory for value vectors upfront.  
> Currently, we either do initial allocation for 4K values and keep doubling 
> every time we need more or allocate for maximum needed upfront. With this 
> change to pre allocate memory based on sizing calculation, we can improve 
> performance by reducing the memory copies and zeroing the new half we do 
> every time we double and help save memory in cases where we were over 
> allocating before.
>  # Round down the number of rows in outgoing batch to a power of two. Since 
> memory is allocated in powers of two, this will help us pack the value 
> vectors densely thereby reducing the amount of memory that gets wasted 
> because of doubling effect.
> So, to summarize, the benefits we will get are improved memory utilization, 
> better performance, higher concurrency and less queries dying because of out 
> of memory errors.
> Note: Since these sizing calculations are based on averages, strict memory 
> usage enforcement is not possible. There could be pathological cases where 
> because of uneven data distribution, we might exceed the configured output 
> batch size potentially causing OOM errors and problems in downstream 
> operators.
> Other issues that will be addressed:
>  * We are adding extra processing for each batch in each 

[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398089#comment-16398089
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1164
  
@paul-rogers Applied review comments.


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6223) Drill fails on Schema changes

2018-03-13 Thread Pritesh Maker (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398087#comment-16398087
 ] 

Pritesh Maker commented on DRILL-6223:
--

[~sachouche] can you attach the PR as well? 

> Drill fails on Schema changes 
> --
>
> Key: DRILL-6223
> URL: https://issues.apache.org/jira/browse/DRILL-6223
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0, 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> Drill Query Failing when selecting all columns from a Complex Nested Data 
> File (Parquet) Set). There are differences in Schema among the files:
>  * The Parquet files exhibit differences both at the first level and within 
> nested data types
>  * A select * will not cause an exception but using a limit clause will
>  * Note also this issue seems to happen only when multiple Drillbit minor 
> fragments are involved (concurrency higher than one)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398085#comment-16398085
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174350127
  
--- Diff: 
exec/vector/src/test/java/org/apache/drill/exec/vector/VariableLengthVectorTest.java
 ---
@@ -0,0 +1,152 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.memory.RootAllocator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.junit.Assert;
+import org.junit.Test;
+
+/**
+ * This test uses {@link VarCharVector} to test the template code in 
VariableLengthVector.
+ */
+public class VariableLengthVectorTest
+{
+  /**
+   * If the vector contains 1000 records, setting a value count of 1000 
should work.
+   */
+  @Test
+  public void testSettingSameValueCount()
+  {
+try (RootAllocator allocator = new RootAllocator(10_000_000)) {
+  final MaterializedField field = 
MaterializedField.create("stringCol", 
Types.required(TypeProtos.MinorType.VARCHAR));
+  final VarCharVector vector = new VarCharVector(field, allocator);
+
+  vector.allocateNew();
+
+  try {
+final int size = 1000;
+final VarCharVector.Mutator mutator = vector.getMutator();
+final VarCharVector.Accessor accessor = vector.getAccessor();
+
+setSafeIndexStrings("", 0, size, mutator);
+
+mutator.setValueCount(size);
+Assert.assertEquals(size, accessor.getValueCount());
+checkIndexStrings("", 0, size, accessor);
+  } finally {
+vector.clear();
+  }
+}
+  }
+
+  /**
+   * Test trunicating data. If you have 1 records, reduce the vector 
to 1000 records.
+   */
+  @Test
+  public void testTrunicateVectorSetValueCount()
+  {
+try (RootAllocator allocator = new RootAllocator(10_000_000)) {
+  final MaterializedField field = 
MaterializedField.create("stringCol", 
Types.required(TypeProtos.MinorType.VARCHAR));
+  final VarCharVector vector = new VarCharVector(field, allocator);
+
+  vector.allocateNew();
+
+  try {
+final int size = 1000;
+final int fluffSize = 1;
+final VarCharVector.Mutator mutator = vector.getMutator();
+final VarCharVector.Accessor accessor = vector.getAccessor();
+
+setSafeIndexStrings("", 0, size, mutator);
+setSafeIndexStrings("first cut ", size, fluffSize, mutator);
+
+mutator.setValueCount(fluffSize);
+Assert.assertEquals(fluffSize, accessor.getValueCount());
+
+mutator.setValueCount(size);
+Assert.assertEquals(size, accessor.getValueCount());
+setSafeIndexStrings("redone cut ", size, fluffSize, mutator);
--- End diff --

Yikes! I didn't know this. Thanks for catching.


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that 

[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398082#comment-16398082
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174349986
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -514,6 +516,22 @@ public boolean isNull(int index){
*   The equivalent Java primitive is '${minor.javaType!type.javaType}'
*
* NB: this class is automatically generated from ValueVectorTypes.tdd 
using FreeMarker.
+   * 
+   * Contract
--- End diff --

This is a good way to layout the information. I switched the javadoc to 
follow this outline.


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398080#comment-16398080
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174349888
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -514,6 +516,22 @@ public boolean isNull(int index){
*   The equivalent Java primitive is '${minor.javaType!type.javaType}'
*
* NB: this class is automatically generated from ValueVectorTypes.tdd 
using FreeMarker.
+   * 
+   * Contract
+   * 
+   *   Variable length vectors do not support random writes. All set 
methods must be called for with a monotonically increasing consecutive sequence 
of indexes.
+   *   It is possible to trim the vector by setting the value count to be 
less than the number of values currently container in the vector with {@link 
#setValueCount(int)}, then
+   *   the process of setting values starts with the index after the last 
index.
+   * 
+   * 
+   *   It is also possible to back track and set the value at an index 
earlier than the current index, however, the caller must assume that all data 
container after the last
+   *   set index is corrupted.
+   * 
+   * Notes
+   * 
+   *   There is no gaurantee the data buffer for the {@link 
VariableWidthVector} will have enough space to contain the data you set unless 
you use setSafe. If you
+   *   use set you may get array index out of bounds exceptions.
--- End diff --

Liked this refactored phrasing


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398077#comment-16398077
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174349801
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -514,6 +516,22 @@ public boolean isNull(int index){
*   The equivalent Java primitive is '${minor.javaType!type.javaType}'
*
* NB: this class is automatically generated from ValueVectorTypes.tdd 
using FreeMarker.
+   * 
+   * Contract
+   * 
+   *   Variable length vectors do not support random writes. All set 
methods must be called for with a monotonically increasing consecutive sequence 
of indexes.
+   *   It is possible to trim the vector by setting the value count to be 
less than the number of values currently container in the vector with {@link 
#setValueCount(int)}, then
+   *   the process of setting values starts with the index after the last 
index.
+   * 
+   * 
+   *   It is also possible to back track and set the value at an index 
earlier than the current index, however, the caller must assume that all data 
container after the last
--- End diff --

Updated


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398079#comment-16398079
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174349826
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -514,6 +516,22 @@ public boolean isNull(int index){
*   The equivalent Java primitive is '${minor.javaType!type.javaType}'
*
* NB: this class is automatically generated from ValueVectorTypes.tdd 
using FreeMarker.
+   * 
+   * Contract
+   * 
+   *   Variable length vectors do not support random writes. All set 
methods must be called for with a monotonically increasing consecutive sequence 
of indexes.
+   *   It is possible to trim the vector by setting the value count to be 
less than the number of values currently container in the vector with {@link 
#setValueCount(int)}, then
+   *   the process of setting values starts with the index after the last 
index.
+   * 
+   * 
+   *   It is also possible to back track and set the value at an index 
earlier than the current index, however, the caller must assume that all data 
container after the last
+   *   set index is corrupted.
--- End diff --

Added


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398075#comment-16398075
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174349707
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -506,6 +506,8 @@ public boolean isNull(int index){
   }
 
   /**
+   * Overview
--- End diff --

Fixed


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6234:
-
Fix Version/s: 1.14.0

> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398069#comment-16398069
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174348633
  
--- Diff: 
exec/vector/src/test/java/org/apache/drill/exec/vector/VariableLengthVectorTest.java
 ---
@@ -0,0 +1,152 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.memory.RootAllocator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.junit.Assert;
+import org.junit.Test;
+
+/**
+ * This test uses {@link VarCharVector} to test the template code in 
VariableLengthVector.
+ */
+public class VariableLengthVectorTest
+{
+  /**
+   * If the vector contains 1000 records, setting a value count of 1000 
should work.
+   */
+  @Test
+  public void testSettingSameValueCount()
+  {
+try (RootAllocator allocator = new RootAllocator(10_000_000)) {
+  final MaterializedField field = 
MaterializedField.create("stringCol", 
Types.required(TypeProtos.MinorType.VARCHAR));
+  final VarCharVector vector = new VarCharVector(field, allocator);
+
+  vector.allocateNew();
+
+  try {
+final int size = 1000;
+final VarCharVector.Mutator mutator = vector.getMutator();
+final VarCharVector.Accessor accessor = vector.getAccessor();
+
+setSafeIndexStrings("", 0, size, mutator);
+
+mutator.setValueCount(size);
+Assert.assertEquals(size, accessor.getValueCount());
+checkIndexStrings("", 0, size, accessor);
+  } finally {
+vector.clear();
+  }
+}
+  }
+
+  /**
+   * Test trunicating data. If you have 1 records, reduce the vector 
to 1000 records.
+   */
+  @Test
+  public void testTrunicateVectorSetValueCount()
+  {
+try (RootAllocator allocator = new RootAllocator(10_000_000)) {
+  final MaterializedField field = 
MaterializedField.create("stringCol", 
Types.required(TypeProtos.MinorType.VARCHAR));
+  final VarCharVector vector = new VarCharVector(field, allocator);
+
+  vector.allocateNew();
+
+  try {
+final int size = 1000;
+final int fluffSize = 1;
+final VarCharVector.Mutator mutator = vector.getMutator();
+final VarCharVector.Accessor accessor = vector.getAccessor();
+
+setSafeIndexStrings("", 0, size, mutator);
+setSafeIndexStrings("first cut ", size, fluffSize, mutator);
+
+mutator.setValueCount(fluffSize);
+Assert.assertEquals(fluffSize, accessor.getValueCount());
+
+mutator.setValueCount(size);
+Assert.assertEquals(size, accessor.getValueCount());
+setSafeIndexStrings("redone cut ", size, fluffSize, mutator);
+mutator.setValueCount(fluffSize);
+Assert.assertEquals(fluffSize, accessor.getValueCount());
+
+mutator.setValueCount(size);
+Assert.assertEquals(size, accessor.getValueCount());
+
+checkIndexStrings("", 0, size, accessor);
+
+  } finally {
+vector.clear();
+  }
+}
+  }
+
+  /**
+   * Set 1 values. Then go back and set new values starting at the 
1001 the record.
--- End diff --

I agree the vector writers should be used. The reason why I was looking 
into this is that I saw strange behavior in the legacy HashTable where 
setValueCount was being called with a larger valueCount than there was data in 
a VarCharVector. I did an ugly (and now I think incorrect work around) for the 
issue and set about to make setValueCount 

[jira] [Updated] (DRILL-6239) Add Build and License Badges to README.md

2018-03-13 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6239:
-
Fix Version/s: 1.14.0

> Add Build and License Badges to README.md
> -
>
> Key: DRILL-6239
> URL: https://issues.apache.org/jira/browse/DRILL-6239
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> Other projects have pretty badges showing the build status and license on the 
> README.md page. We should have it too!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398031#comment-16398031
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174343151
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -514,6 +516,22 @@ public boolean isNull(int index){
*   The equivalent Java primitive is '${minor.javaType!type.javaType}'
*
* NB: this class is automatically generated from ValueVectorTypes.tdd 
using FreeMarker.
+   * 
+   * Contract
+   * 
+   *   Variable length vectors do not support random writes. All set 
methods must be called for with a monotonically increasing consecutive sequence 
of indexes.
--- End diff --

Thanks for bringing this up. I'm sharing a design doc on the dev list 
tomorrow or the day after about how I plan to refactor HashAgg. It will cover 
how to facilitate unit tests and how to change the memory handling to use a 
deterministic calculator like the SortMemoryManager and soon to be introduced 
HashJoinMemoryCalculator (instead of catch OOMs). Perhaps you could comment on 
the doc about how to set ourselves up to fix this case.


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4706) Fragment planning causes Drillbits to read remote chunks when local copies are available

2018-03-13 Thread Kunal Khatua (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397967#comment-16397967
 ] 

Kunal Khatua commented on DRILL-4706:
-

[~ppenumarthy] was this committed?

> Fragment planning causes Drillbits to read remote chunks when local copies 
> are available
> 
>
> Key: DRILL-4706
> URL: https://issues.apache.org/jira/browse/DRILL-4706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.6.0
> Environment: CentOS, RHEL
>Reporter: Kunal Khatua
>Assignee: Padma Penumarthy
>Priority: Major
>  Labels: performance, planning
>
> When a table (datasize=70GB) of 160 parquet files (each having a single 
> rowgroup and fitting within one chunk) is available on a 10-node setup with 
> replication=3 ; a pure data scan query causes about 2% of the data to be read 
> remotely. 
> Even with the creation of metadata cache, the planner is selecting a 
> sub-optimal plan of executing the SCAN fragments such that some of the data 
> is served from a remote server. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6240) Operator Overview should show estimated row counts

2018-03-13 Thread Kunal Khatua (JIRA)
Kunal Khatua created DRILL-6240:
---

 Summary: Operator Overview should show estimated row counts
 Key: DRILL-6240
 URL: https://issues.apache.org/jira/browse/DRILL-6240
 Project: Apache Drill
  Issue Type: Improvement
  Components: Web Server
Affects Versions: 1.12.0
Reporter: Kunal Khatua
 Fix For: 1.14.0


Operator Profile Overview should show comparison between estimated row counts 
and actual rowcounts



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6071) Limit batch size for flatten operator

2018-03-13 Thread Bridget Bevens (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-6071:
--
Labels: ready-to-commit  (was: doc-impacting ready-to-commit)

> Limit batch size for flatten operator
> -
>
> Key: DRILL-6071
> URL: https://issues.apache.org/jira/browse/DRILL-6071
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> flatten currently uses an adaptive algorithm to control the outgoing batch 
> size. 
>  While processing the input batch, it adjusts the number of records in 
> outgoing batch based on memory usage so far. Once memory usage exceeds the 
> configured limit for a batch, the algorithm becomes more proactive and 
> adjusts the limit half way through and end of every batch. All this periodic 
> checking of memory usage is unnecessary overhead and impacts performance. 
> Also, we will know only after the fact.
> Instead, figure out how many rows should be there in the outgoing batch from 
> incoming batch.
>  The way to do that would be to figure out average row size of the outgoing 
> batch and based on that figure out how many rows can be there for a given 
> amount of memory. value vectors provide us the necessary information to be 
> able to figure this out.
> Row count in output batch should be decided based on memory (with min 1 and 
> max 64k rows) and not hard coded (to 4K) in code. Memory for output batch 
> should be configurable system option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6071) Limit batch size for flatten operator

2018-03-13 Thread Bridget Bevens (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397958#comment-16397958
 ] 

Bridget Bevens commented on DRILL-6071:
---

Added the option to this page: 
[https://drill.apache.org/docs/configuring-drill-memory/#modifying-memory-allocated-to-queries]
 and this page

[https://drill.apache.org/docs/configuration-options-introduction/#system-options]
 

 

Removing doc impacting flag. Please reset if there's any issue.


Thanks,
Bridget

> Limit batch size for flatten operator
> -
>
> Key: DRILL-6071
> URL: https://issues.apache.org/jira/browse/DRILL-6071
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.13.0
>
>
> flatten currently uses an adaptive algorithm to control the outgoing batch 
> size. 
>  While processing the input batch, it adjusts the number of records in 
> outgoing batch based on memory usage so far. Once memory usage exceeds the 
> configured limit for a batch, the algorithm becomes more proactive and 
> adjusts the limit half way through and end of every batch. All this periodic 
> checking of memory usage is unnecessary overhead and impacts performance. 
> Also, we will know only after the fact.
> Instead, figure out how many rows should be there in the outgoing batch from 
> incoming batch.
>  The way to do that would be to figure out average row size of the outgoing 
> batch and based on that figure out how many rows can be there for a given 
> amount of memory. value vectors provide us the necessary information to be 
> able to figure this out.
> Row count in output batch should be decided based on memory (with min 1 and 
> max 64k rows) and not hard coded (to 4K) in code. Memory for output batch 
> should be configurable system option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397932#comment-16397932
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174328025
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -514,6 +516,22 @@ public boolean isNull(int index){
*   The equivalent Java primitive is '${minor.javaType!type.javaType}'
*
* NB: this class is automatically generated from ValueVectorTypes.tdd 
using FreeMarker.
+   * 
+   * Contract
--- End diff --

Might be worth summarizing how to use this:

1) Write to values sequentially. Fixed-width vectors allow random access, 
but special care is needed.
2) Keep track in client code of the total time count. Call 
`setValueCount()` once the vector is full to set the final value. (The vector 
does not know its count while a write is in progress.)
3) Either take responsibility for allocating enough memory, or call the 
`setSafe()` methods to automatically extend the vector.
4) Once vectors are written, they are immutable; no additional writes of 
any kind are allowed to that vector.


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397935#comment-16397935
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174327182
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -514,6 +516,22 @@ public boolean isNull(int index){
*   The equivalent Java primitive is '${minor.javaType!type.javaType}'
*
* NB: this class is automatically generated from ValueVectorTypes.tdd 
using FreeMarker.
+   * 
+   * Contract
+   * 
+   *   Variable length vectors do not support random writes. All set 
methods must be called for with a monotonically increasing consecutive sequence 
of indexes.
+   *   It is possible to trim the vector by setting the value count to be 
less than the number of values currently container in the vector with {@link 
#setValueCount(int)}, then
+   *   the process of setting values starts with the index after the last 
index.
+   * 
+   * 
+   *   It is also possible to back track and set the value at an index 
earlier than the current index, however, the caller must assume that all data 
container after the last
+   *   set index is corrupted.
--- End diff --

Maybe add "changing the index does not release memory after the index."


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397931#comment-16397931
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174328361
  
--- Diff: 
exec/vector/src/test/java/org/apache/drill/exec/vector/VariableLengthVectorTest.java
 ---
@@ -0,0 +1,152 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.memory.RootAllocator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.junit.Assert;
+import org.junit.Test;
+
+/**
+ * This test uses {@link VarCharVector} to test the template code in 
VariableLengthVector.
+ */
+public class VariableLengthVectorTest
+{
+  /**
+   * If the vector contains 1000 records, setting a value count of 1000 
should work.
+   */
+  @Test
+  public void testSettingSameValueCount()
+  {
+try (RootAllocator allocator = new RootAllocator(10_000_000)) {
+  final MaterializedField field = 
MaterializedField.create("stringCol", 
Types.required(TypeProtos.MinorType.VARCHAR));
+  final VarCharVector vector = new VarCharVector(field, allocator);
+
+  vector.allocateNew();
+
+  try {
+final int size = 1000;
+final VarCharVector.Mutator mutator = vector.getMutator();
+final VarCharVector.Accessor accessor = vector.getAccessor();
+
+setSafeIndexStrings("", 0, size, mutator);
+
+mutator.setValueCount(size);
+Assert.assertEquals(size, accessor.getValueCount());
+checkIndexStrings("", 0, size, accessor);
+  } finally {
+vector.clear();
+  }
+}
+  }
+
+  /**
+   * Test trunicating data. If you have 1 records, reduce the vector 
to 1000 records.
+   */
+  @Test
+  public void testTrunicateVectorSetValueCount()
+  {
+try (RootAllocator allocator = new RootAllocator(10_000_000)) {
+  final MaterializedField field = 
MaterializedField.create("stringCol", 
Types.required(TypeProtos.MinorType.VARCHAR));
+  final VarCharVector vector = new VarCharVector(field, allocator);
+
+  vector.allocateNew();
+
+  try {
+final int size = 1000;
+final int fluffSize = 1;
+final VarCharVector.Mutator mutator = vector.getMutator();
+final VarCharVector.Accessor accessor = vector.getAccessor();
+
+setSafeIndexStrings("", 0, size, mutator);
+setSafeIndexStrings("first cut ", size, fluffSize, mutator);
+
+mutator.setValueCount(fluffSize);
+Assert.assertEquals(fluffSize, accessor.getValueCount());
+
+mutator.setValueCount(size);
+Assert.assertEquals(size, accessor.getValueCount());
+setSafeIndexStrings("redone cut ", size, fluffSize, mutator);
--- End diff --

While this works, we are actually violating the vector contract which is 
"once the value count is set, the vector becomes immutable." If the client is 
not done writing to the vector, the client should maintain the value count 
until it is finally done.


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   

[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397933#comment-16397933
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174327511
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -514,6 +516,22 @@ public boolean isNull(int index){
*   The equivalent Java primitive is '${minor.javaType!type.javaType}'
*
* NB: this class is automatically generated from ValueVectorTypes.tdd 
using FreeMarker.
+   * 
+   * Contract
+   * 
+   *   Variable length vectors do not support random writes. All set 
methods must be called for with a monotonically increasing consecutive sequence 
of indexes.
--- End diff --

This is very important to know. This is why spill-to-disk for hash agg will 
eventually cause a serious customer failure. Aggregate UDFs write to vectors to 
store intermediate group values. A "max" string can't. Instead, it writes to a 
Java object. That object will be lost on spill and reread. Will result in 
loosing prior max values and the aggregate starting over.

So, this little note is not just a nuisance, it is the fatal flaw in how we 
handle the (albeit obscure) string aggregate values.


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397936#comment-16397936
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174328662
  
--- Diff: 
exec/vector/src/test/java/org/apache/drill/exec/vector/VariableLengthVectorTest.java
 ---
@@ -0,0 +1,152 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.memory.RootAllocator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.junit.Assert;
+import org.junit.Test;
+
+/**
+ * This test uses {@link VarCharVector} to test the template code in 
VariableLengthVector.
+ */
+public class VariableLengthVectorTest
+{
+  /**
+   * If the vector contains 1000 records, setting a value count of 1000 
should work.
+   */
+  @Test
+  public void testSettingSameValueCount()
+  {
+try (RootAllocator allocator = new RootAllocator(10_000_000)) {
+  final MaterializedField field = 
MaterializedField.create("stringCol", 
Types.required(TypeProtos.MinorType.VARCHAR));
+  final VarCharVector vector = new VarCharVector(field, allocator);
+
+  vector.allocateNew();
+
+  try {
+final int size = 1000;
+final VarCharVector.Mutator mutator = vector.getMutator();
+final VarCharVector.Accessor accessor = vector.getAccessor();
+
+setSafeIndexStrings("", 0, size, mutator);
+
+mutator.setValueCount(size);
+Assert.assertEquals(size, accessor.getValueCount());
+checkIndexStrings("", 0, size, accessor);
+  } finally {
+vector.clear();
+  }
+}
+  }
+
+  /**
+   * Test trunicating data. If you have 1 records, reduce the vector 
to 1000 records.
+   */
+  @Test
+  public void testTrunicateVectorSetValueCount()
+  {
+try (RootAllocator allocator = new RootAllocator(10_000_000)) {
+  final MaterializedField field = 
MaterializedField.create("stringCol", 
Types.required(TypeProtos.MinorType.VARCHAR));
+  final VarCharVector vector = new VarCharVector(field, allocator);
+
+  vector.allocateNew();
+
+  try {
+final int size = 1000;
+final int fluffSize = 1;
+final VarCharVector.Mutator mutator = vector.getMutator();
+final VarCharVector.Accessor accessor = vector.getAccessor();
+
+setSafeIndexStrings("", 0, size, mutator);
+setSafeIndexStrings("first cut ", size, fluffSize, mutator);
+
+mutator.setValueCount(fluffSize);
+Assert.assertEquals(fluffSize, accessor.getValueCount());
+
+mutator.setValueCount(size);
+Assert.assertEquals(size, accessor.getValueCount());
+setSafeIndexStrings("redone cut ", size, fluffSize, mutator);
+mutator.setValueCount(fluffSize);
+Assert.assertEquals(fluffSize, accessor.getValueCount());
+
+mutator.setValueCount(size);
+Assert.assertEquals(size, accessor.getValueCount());
+
+checkIndexStrings("", 0, size, accessor);
+
+  } finally {
+vector.clear();
+  }
+}
+  }
+
+  /**
+   * Set 1 values. Then go back and set new values starting at the 
1001 the record.
--- End diff --

Just FYI: the vector writers handle all this stuff for you. They allow 
overwriting the most recent value. They keep track of vector counts and data 
offsets. And so on. This is why I can offer such detailed comments: I learned 
how all this works when creating those classes. Would be very wonderful to 
start reusing that work rather than 

[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397929#comment-16397929
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174327132
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -514,6 +516,22 @@ public boolean isNull(int index){
*   The equivalent Java primitive is '${minor.javaType!type.javaType}'
*
* NB: this class is automatically generated from ValueVectorTypes.tdd 
using FreeMarker.
+   * 
+   * Contract
+   * 
+   *   Variable length vectors do not support random writes. All set 
methods must be called for with a monotonically increasing consecutive sequence 
of indexes.
+   *   It is possible to trim the vector by setting the value count to be 
less than the number of values currently container in the vector with {@link 
#setValueCount(int)}, then
+   *   the process of setting values starts with the index after the last 
index.
+   * 
+   * 
+   *   It is also possible to back track and set the value at an index 
earlier than the current index, however, the caller must assume that all data 
container after the last
--- End diff --

"all data container after" --> "all data after the updated index" ?


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397934#comment-16397934
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174327601
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -514,6 +516,22 @@ public boolean isNull(int index){
*   The equivalent Java primitive is '${minor.javaType!type.javaType}'
*
* NB: this class is automatically generated from ValueVectorTypes.tdd 
using FreeMarker.
+   * 
+   * Contract
+   * 
+   *   Variable length vectors do not support random writes. All set 
methods must be called for with a monotonically increasing consecutive sequence 
of indexes.
+   *   It is possible to trim the vector by setting the value count to be 
less than the number of values currently container in the vector with {@link 
#setValueCount(int)}, then
+   *   the process of setting values starts with the index after the last 
index.
+   * 
+   * 
+   *   It is also possible to back track and set the value at an index 
earlier than the current index, however, the caller must assume that all data 
container after the last
+   *   set index is corrupted.
+   * 
+   * Notes
+   * 
+   *   There is no gaurantee the data buffer for the {@link 
VariableWidthVector} will have enough space to contain the data you set unless 
you use setSafe. If you
+   *   use set you may get array index out of bounds exceptions.
--- End diff --

Said another way, either 1) be careful to manage your own memory, or 2) 
call `setSafe()`. That is, in fact, why `setSafe()` exists.


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397930#comment-16397930
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1164#discussion_r174327027
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -506,6 +506,8 @@ public boolean isNull(int index){
   }
 
   /**
+   * Overview
--- End diff --

Nit, but I think that h4 is the usual heading level used in Java doc 
comments. The higher levels are used in the surrounding generated HTML.


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6238) Batch sizing for operators

2018-03-13 Thread Padma Penumarthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padma Penumarthy updated DRILL-6238:

Description: 
*Batch Sizing For Operators*

This document describes the approach we are taking for limiting batch sizes for 
operators other than scan.

*Motivation*

Main goals are
 # Improve concurrency
 # Reduce query failures because of out of memory errors

To accomplish these goals, we need to make queries execute within a specified 
memory budget. To enforce per query memory limit, we need to be able to enforce 
per fragment and per operator memory limits. Controlling individual operators 
batch sizes is the first step towards all this.

*Background*

In Drill, different operators have different limits w.r.to outgoing batches. 
Some use hard coded row counts, some use hard coded memory and some have none 
at all. Based on input data size and what the operator is doing, memory used by 
the outgoing batch can vary widely as there are no limits imposed. Queries fail 
because we are not able to allocate the memory needed. Some operators produce 
very large batches, causing blocking operators like sort, hash agg which have 
to work under tight memory constraints to fail. Size of batches should be a 
function of available memory rather than input data size and/or what the 
operator does. Please refer to table at the end of this document for details on 
what each operator does today.

*Design*

Goal is to have all operators behave the same way i.e. produce batches with 
size less than or equal to configured outgoing batch size with a minimum of 1 
row per batch and maximum of 64k rows per batch. A new system option 
‘drill.exec.memory.operator.output_batch_size’ is added which has default value 
of 16MB.

The basic idea is to limit size of outgoing batch by deciding how many rows we 
can have in the batch based on average entry size of each outgoing column, 
taking into account actual data size and metadata vector overhead we add on top 
for tracking variable length, mode(repeated, optional, required) etc. This 
calculation will be different for each operator and is based on
 # What the operator is doing
 # Incoming batch size that includes information on type and average size of 
each column
 # What is being projected out

By taking this adaptive approach based on actual average data sizes, for 
operators which were limiting batch size to less than 64K rows before can 
possibly do lot more rows (upto 64K rows) in a batch if the memory stays within 
the budget. For example, flatten and joins have batch size of 4K rows, which 
probably might have been done to be conservative w.r.to memory usage. By making 
these operators go upto 64K rows as long as they stay with in the memory budget 
should help improve performance.

Also, to improve performance and utilize memory more efficiently, we will
 # Allocate memory for value vectors upfront. Since we know the number of rows 
and sizing information for each column in the  outgoing batch, we will use that 
information to allocate memory for value vectors upfront.  Currently, we either 
do initial allocation for 4K values and keep doubling every time we need more 
or allocate for maximum needed upfront. With this change to pre allocate memory 
based on sizing calculation, we can improve performance by reducing the memory 
copies and zeroing the new half we do every time we double and help save memory 
in cases where we were over allocating before.
 # Round down the number of rows in outgoing batch to a power of two. Since 
memory is allocated in powers of two, this will help us pack the value vectors 
densely thereby reducing the amount of memory that gets wasted because of 
doubling effect.

So, to summarize, the benefits we will get are improved memory utilization, 
better performance, higher concurrency and less queries dying because of out of 
memory errors.

Note: Since these sizing calculations are based on averages, strict memory 
usage enforcement is not possible. There could be pathological cases where 
because of uneven data distribution, we might exceed the configured output 
batch size potentially causing OOM errors and problems in downstream operators.

Other issues that will be addressed:
 * We are adding extra processing for each batch in each operator to figure out 
the sizing information. This overhead can be reduced by passing this 
information along with the batch between operators.
 * For some operators, it will be complex to figure out average size of 
outgoing columns especially if we have to evaluate complex expression trees and 
UDFs to figure out the transformation on incoming batches. We will use 
approximations as appropriate.

Following table summarizes the limits we have today for each operator.

flatten, merge join and external sort have already been changed to adhere to 
batch size limits as described in this document as of drill 

[jira] [Updated] (DRILL-6238) Batch sizing for operators

2018-03-13 Thread Padma Penumarthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padma Penumarthy updated DRILL-6238:

Description: 
*Batch Sizing For Operators*

This document describes the approach we are taking for limiting batch sizes for 
operators other than scan.

*Motivation*

Main goals are
 # Improve concurrency
 # Reduce query failures because of out of memory errors

To accomplish these goals, we need to make queries execute within a specified 
memory budget. To enforce per query memory limit, we need to be able to enforce 
per fragment and per operator memory limits. Controlling individual operators 
batch sizes is the first step towards all this.

*Background*

In Drill, different operators have different limits w.r.to outgoing batches. 
Some use hard coded row counts, some use hard coded memory and some have none 
at all. Based on input data size and what the operator is doing, memory used by 
the outgoing batch can vary widely as there are no limits imposed. Queries fail 
because we are not able to allocate the memory needed. Some operators produce 
very large batches, causing blocking operators like sort, hash agg which have 
to work under tight memory constraints to fail. Size of batches should be a 
function of available memory rather than input data size and/or what the 
operator does. Please refer to table at the end of this document for details on 
what each operator does today.

*Design*

Goal is to have all operators behave the same way i.e. produce batches with 
size less than or equal to configured outgoing batch size with a minimum of 1 
row per batch and maximum of 64k rows per batch. A new system option 
‘drill.exec.memory.operator.output_batch_size’ is added which has default value 
of 16MB.

The basic idea is to limit size of outgoing batch by deciding how many rows we 
can have in the batch based on average entry size of each outgoing column, 
taking into account actual data size and metadata vector overhead we add on top 
for tracking variable length, mode(repeated, optional, required) etc. This 
calculation will be different for each operator and is based on
 # What the operator is doing
 # Incoming batch size that includes information on type and average size of 
each column
 # What is being projected out

By taking this adaptive approach based on actual average data sizes, for 
operators which were limiting batch size to less than 64K rows before can 
possibly do lot more rows (upto 64K rows) in a batch if the memory stays within 
the budget. For example, flatten and joins have batch size of 4K rows, which 
probably might have been done to be conservative w.r.to memory usage. By making 
these operators go upto 64K as long as they stay with in the memory budget 
should help improve performance.

Also, to improve performance and utilize memory more efficiently, we will
 # Allocate memory for value vectors upfront. Since we know the number of rows 
and sizing information for each column in the  outgoing batch, we will use that 
information to allocate memory for value vectors upfront.  Currently, we either 
do initial allocation for 4K values and keep doubling every time we need more 
or allocate for maximum needed upfront. With this change to pre allocate memory 
based on sizing calculation, we can improve performance by reducing the memory 
copies and zeroing the new half we do every time we double and help save memory 
in cases where we were over allocating before.
 # Round down the number of rows in outgoing batch to a power of two. Since 
memory is allocated in powers of two, this will help us pack the value vectors 
densely thereby reducing the amount of memory that gets wasted because of 
doubling effect.

So, to summarize, the benefits we will get are improved memory utilization, 
better performance, higher concurrency and less queries dying because of out of 
memory errors.

Note: Since these sizing calculations are based on averages, strict memory 
usage enforcement is not possible. There could be pathological cases where 
because of uneven data distribution, we might exceed the configured output 
batch size potentially causing OOM errors and problems in downstream operators.

Other issues that will be addressed:
 * We are adding extra processing for each batch in each operator to figure out 
the sizing information. This overhead can be reduced by passing this 
information along with the batch between operators.
 * For some operators, it will be complex to figure out average size of 
outgoing columns especially if we have to evaluate complex expression trees and 
UDFs to figure out the transformation on incoming batches. We will use 
approximations as appropriate.

Following table summarizes the limits we have today for each operator.

flatten, merge join and external sort have already been changed to adhere to 
batch size limits as described in this document as of drill 

[jira] [Updated] (DRILL-6238) Batch sizing for operators

2018-03-13 Thread Padma Penumarthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padma Penumarthy updated DRILL-6238:

Description: 
*Batch Sizing For Operators*

This document describes the approach we are taking for limiting batch sizes for 
operators other than scan.

*Motivation*

Main goals are
 # Improve concurrency
 # Reduce query failures because of out of memory errors

To accomplish these goals, we need to make queries execute within a specified 
memory budget. To enforce per query memory limit, we need to be able to enforce 
per fragment and per operator memory limits. Controlling individual operators 
batch sizes is the first step towards all this.

*Background*

In Drill, different operators have different limits w.r.to outgoing batches. 
Some use hard coded row counts, some use hard coded memory and some have none 
at all. Based on input data size and what the operator is doing, memory used by 
the outgoing batch can vary widely as there are no limits imposed. Queries fail 
because we are not able to allocate the memory needed. Some operators produce 
very large batches, causing blocking operators like sort, hash agg which have 
to work under tight memory constraints to fail. Size of batches should be a 
function of available memory rather than input data size and/or what the 
operator does. Please refer to table at the end of this document for details on 
what each operator does today.

*Design*

Goal is to have all operators behave the same way i.e. produce batches with 
size less than or equal to configured outgoing batch size with a minimum of 1 
row per batch and maximum of 64k rows per batch. A new system option 
‘drill.exec.memory.operator.output_batch_size’ is added which has default value 
of 16MB.

The basic idea is to limit size of outgoing batch by deciding how many rows we 
can have in the batch based on average entry size of each outgoing column, 
taking into account actual data size and metadata vector overhead we add on top 
for tracking variable length, mode(repeated, optional, required) etc. This 
calculation will be different for each operator and is based on
 # What the operator is doing
 # Incoming batch size that includes information on type and average size of 
each column
 # What is being projected out

By taking this adaptive approach based on actual average data sizes, for 
operators which were limiting batch size to less than 64K rows before can 
possibly do lot more rows (upto 64K rows) in a batch if the memory stays within 
the budget. For example, flatten and joins have batch size of 4K rows, which 
probably might have been done to be conservative w.r.to memory usage. By making 
these operators go upto 64K as long as they stay with in the memory budget 
should help improve performance.

Also, to improve performance and utilize memory more efficiently, we will
 # Allocate memory for value vectors upfront. Since we know the number of rows 
and sizing information for each column in the  outgoing batch, we will use that 
information to allocate memory for value vectors upfront.  Currently, we either 
do initial allocation for 4K values and keep doubling every time we need more 
or allocate for maximum needed upfront. With this change to pre allocate memory 
based on sizing calculation, we can improve performance by reducing the memory 
copies and zeroing the new half we do every time we double and help save memory 
in cases where we were over allocating before.
 # Round down the number of rows in outgoing batch to a power of two. Since 
memory is allocated in powers of two, this will help us pack the value vectors 
densely thereby reducing the amount of memory that gets wasted because of 
doubling effect.

So, to summarize, the benefits we will get are improved memory utilization, 
better performance, higher concurrency and less queries dying because of out of 
memory errors.

One thing to note:

Since these sizing calculations are based on averages, strict memory usage 
enforcement is not possible. There could be pathological cases where because of 
uneven data distribution, we might exceed the configured output batch size 
potentially causing OOM errors and problems in downstream operators.

Other issues that will be addressed:
 * We are adding extra processing for each batch in each operator to figure out 
the sizing information. This overhead can be reduced by passing this 
information along with the batch between operators.
 * For some operators, it will be complex to figure out average size of 
outgoing columns especially if we have to evaluate complex expression trees and 
UDFs to figure out the transformation on incoming batches. We will use 
approximations as appropriate.

Following table summarizes the limits we have today for each operator.

flatten, merge join and external sort have already been changed to adhere to 
batch size limits as described in this document as 

[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397921#comment-16397921
 ] 

Paul Rogers commented on DRILL-6234:


[~timothyfarkas], not pointless at all. You are going though the Drill 
University of Hard Knocks. If we could do this over, vector value count should 
be maintained in the vector for ever write. No reason to keep it in application 
code (most operators) or the vector writers (new code).

The only reason we don't do it the "right way" is that historical decision to 
base vectors on network buffers.

If we ever get a chance to replace {{DrillBuf}} with fixed-width buffers 
chained together (or some other implementation), we should certainly revisit 
this issue.

> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6239) Add Build and License Badges to README.md

2018-03-13 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6239:
--
Reviewer: Arina Ielchiieva

> Add Build and License Badges to README.md
> -
>
> Key: DRILL-6239
> URL: https://issues.apache.org/jira/browse/DRILL-6239
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Other projects have pretty badges showing the build status and license on the 
> README.md page. We should have it too!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6239) Add Build and License Badges to README.md

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397916#comment-16397916
 ] 

ASF GitHub Bot commented on DRILL-6239:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1165
  
@arina-ielchiieva 


> Add Build and License Badges to README.md
> -
>
> Key: DRILL-6239
> URL: https://issues.apache.org/jira/browse/DRILL-6239
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Other projects have pretty badges showing the build status and license on the 
> README.md page. We should have it too!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6238) Batch sizing for operators

2018-03-13 Thread Padma Penumarthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padma Penumarthy updated DRILL-6238:

Description: 
*Batch Sizing For Operators*

This document describes the approach we are taking for limiting batch sizes for 
operators other than scan.

*Motivation*

Main goals are
 # Improve concurrency
 # Reduce query failures because of out of memory errors

To accomplish these goals, we need to make queries execute within a specified 
memory budget. To enforce per query memory limit, we need to be able to enforce 
per fragment and per operator memory limits. Controlling individual operators 
batch sizes is the first step towards all this.

*Background*

In Drill, different operators have different limits w.r.to outgoing batches. 
Some use hard coded row counts, some use hard coded memory and some have none 
at all. Based on input data size and what the operator is doing, memory used by 
the outgoing batch can vary widely as there are no limits imposed. Queries fail 
because we are not able to allocate the memory needed. Some operators produce 
very large batches, causing blocking operators like sort, hash agg which have 
to work under tight memory constraints to fail. Size of batches should be a 
function of available memory rather than input data size and/or what the 
operator does. Please refer to table at the end of this document for details on 
what each operator does today.

*Design*

Goal is to have all operators behave the same way i.e. produce batches with 
size less than or equal to configured outgoing batch size with a minimum of 1 
row per batch and maximum of 64k rows per batch. A new system option 
‘drill.exec.memory.operator.output_batch_size’ is added which has default value 
of 16MB.

The basic idea is to limit size of outgoing batch by deciding how many rows we 
can have in the batch based on average entry size of each outgoing column, 
taking into account actual data size and metadata vector overhead we add on top 
for tracking variable length, mode(repeated, optional, required) etc. This 
calculation will be different for each operator and is based on
 # What the operator is doing
 # Incoming batch size that includes information on type and average size of 
each column
 # What is being projected out

By taking this adaptive approach based on actual average data sizes, for 
operators which were limiting batch size to less than 64K rows before can 
possibly do lot more rows (upto 64K rows) in a batch if the memory stays within 
the budget. For example, flatten and joins have batch size of 4K rows, which 
probably might have been done to be conservative w.r.to memory usage. By making 
these operators go upto 64K as long as they stay with in the memory budget 
should help improve performance.

Also, to improve performance and utilize memory more efficiently, we will
 # Allocate memory for value vectors upfront. Since we know the number of rows 
and sizing information for each column in the  outgoing batch, we will use that 
information to allocate memory for value vectors upfront.  Currently, we either 
do initial allocation for 4K values and keep doubling every time we need more 
or allocate for maximum needed upfront. With this change to pre allocate memory 
based on sizing calculation, we can improve performance by reducing the memory 
copies and zeroing the new half we do every time we double and help save memory 
in cases where we were over allocating before.
 # Make the number of rows in outgoing batch a power of two. Since memory is 
allocated in powers of two, this will help us pack the value vectors densely 
thereby reducing the amount of memory that gets wasted because of doubling 
effect.

So, to summarize, the benefits we will get are improved memory utilization, 
better performance, higher concurrency and less queries dying because of out of 
memory errors.

One thing to note:

Since these sizing calculations are based on averages, strict memory usage 
enforcement is not possible. There could be pathological cases where because of 
uneven data distribution, we might exceed the configured output batch size 
potentially causing OOM errors and problems in downstream operators.

Other issues that will be addressed:
 * We are adding extra processing for each batch in each operator to figure out 
the sizing information. This overhead can be reduced by passing this 
information along with the batch between operators.
 * For some operators, it will be complex to figure out average size of 
outgoing columns especially if we have to evaluate complex expression trees and 
UDFs to figure out the transformation on incoming batches. We will use 
approximations as appropriate.

Following table summarizes the limits we have today for each operator.

flatten, merge join and external sort have already been changed to adhere to 
batch size limits as described in this document as of drill 

[jira] [Commented] (DRILL-6239) Add Build and License Badges to README.md

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397913#comment-16397913
 ] 

ASF GitHub Bot commented on DRILL-6239:
---

GitHub user ilooner opened a pull request:

https://github.com/apache/drill/pull/1165

DRILL-6239: Add build and license badges to README.md

Add nice build and license badges that everyone else has. See a preview of 
what they look like here:

https://github.com/ilooner/drill/tree/DRILL-6239

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner/drill DRILL-6239

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1165.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1165


commit c95030076a884b074821e169c4e58c84d89275c8
Author: Timothy Farkas 
Date:   2018-03-14T00:39:50Z

DRILL-6239: Add build and license badges to README.md




> Add Build and License Badges to README.md
> -
>
> Key: DRILL-6239
> URL: https://issues.apache.org/jira/browse/DRILL-6239
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Other projects have pretty badges showing the build status and license on the 
> README.md page. We should have it too!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6239) Add Build and License Badges to README.md

2018-03-13 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6239:
-

 Summary: Add Build and License Badges to README.md
 Key: DRILL-6239
 URL: https://issues.apache.org/jira/browse/DRILL-6239
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


Other projects have pretty badges showing the build status and license on the 
README.md page. We should have it too!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6238) Batch sizing for operators

2018-03-13 Thread Padma Penumarthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padma Penumarthy updated DRILL-6238:

Description: 
*Batch Sizing For Operators*

This document describes the approach we are taking for limiting batch sizes for 
operators other than scan.

*Motivation*

Main goals are
 # Improve concurrency
 # Reduce query failures because of out of memory errors

To accomplish these goals, we need to make queries execute within a specified 
memory budget. To enforce per query memory limit, we need to be able to enforce 
per fragment and per operator memory limits. Controlling individual operators 
batch sizes is the first step towards all this.

*Background*

In Drill, different operators have different limits w.r.to outgoing batches. 
Some use hard coded row counts, some use hard coded memory and some have none 
at all. Based on input data size and what the operator is doing, memory used by 
the outgoing batch can vary widely as there are no limits imposed. Queries fail 
because we are not able to allocate the memory needed. Some operators produce 
very large batches, causing blocking operators like sort, hash agg which have 
to work under tight memory constraints to fail. Size of batches should be a 
function of available memory rather than input data size and/or what the 
operator does. Please refer to table at the end of this document for details on 
what each operator does today.

*Design*

Goal is to have all operators behave the same way i.e. produce batches with 
size less than or equal to configured outgoing batch size with a minimum of 1 
row per batch and maximum of 64k rows per batch. A new system option 
‘drill.exec.memory.operator.output_batch_size’ is added which has default value 
of 16MB.

The basic idea is to limit size of outgoing batch by deciding how many rows we 
can have in the batch based on average entry size of each outgoing column, 
taking into account actual data size and metadata vector overhead we add on top 
for tracking variable length, mode(repeated, optional, required) etc. This 
calculation will be different for each operator, based on what the operator is 
doing, incoming batch size that includes information on type and average size 
of each column and what is being projected out.

By taking this adaptive approach based on actual average data sizes, for 
operators which were limiting batch size to something less than 64K before can 
possibly do lot more rows (upto 64K) in a batch if the memory stays within the 
budget. This should help improve performance.

Also, to improve performance and utilize memory more efficiently, we will
 # Allocate memory for value vectors upfront. Since we know the number of rows 
and sizing information for each column in the  outgoing batch, we will use that 
information to allocate memory for value vectors upfront.  Currently, we either 
do initial allocation for 4K values and keep doubling every time we need more 
or allocate for maximum needed upfront. With this change to pre allocate memory 
based on sizing calculation, we can improve performance by reducing the memory 
copies and zeroing the new half we do every time we double and help save memory 
in cases where we were over allocating before.
 # Make the number of rows in outgoing batch a power of two. Since memory is 
allocated in powers of two, this will help us pack the value vectors densely 
thereby reducing the amount of memory that gets wasted because of doubling 
effect.

So, to summarize, the benefits we will get are improved memory utilization, 
better performance, higher concurrency and less queries dying because of out of 
memory errors.

One thing to note:

Since these sizing calculations are based on averages, strict memory usage 
enforcement is not possible. There could be pathological cases where because of 
uneven data distribution, we might exceed the configured output batch size 
potentially causing OOM errors and problems in downstream operators.

Other issues that will be addressed:
 * We are adding extra processing for each batch in each operator to figure out 
the sizing information. This overhead can be reduced by passing this 
information along with the batch between operators.
 * For some operators, it will be complex to figure out average size of 
outgoing columns especially if we have to evaluate complex expression trees and 
UDFs to figure out the transformation on incoming batches. We will use 
approximations as appropriate.

Following table summarizes the limits we have today for each operator.

flatten, merge join and external sort have already been changed to adhere to 
batch size limits as described in this document as of drill release 1.13.

 
|*Operator*|*Limit* 
 *(Rows, Memory)*|*Notes*|
|Flatten|4K, 512MB|Flatten can produce very large batches based on average 
cardinality of the flatten column.|
|Merge Receiver|32K|No memory limit.|
|Hash 

[jira] [Updated] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6234:
--
Reviewer: Paul Rogers

> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397900#comment-16397900
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

GitHub user ilooner opened a pull request:

https://github.com/apache/drill/pull/1164

DRILL-6234: Improved documentation for VariableWidthVector mutators

I had some confusion about how setValueCount should behave for variable 
width vectors. I added some documentation and unit tests which explain its 
behavior so that others don't waste time in the future.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner/drill DRILL-6234

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1164.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1164


commit 4f13fd2873a9b510a9d0105ad87f72792aa46314
Author: Timothy Farkas 
Date:   2018-03-14T00:24:28Z

DRILL-6234: Improved documentation for VariableWidthVector mutators, and 
added simple unit tests demonstrating mutator behavior.




> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397901#comment-16397901
 ] 

ASF GitHub Bot commented on DRILL-6234:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1164
  
@paul-rogers 


> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6234) Improve Documentation of VariableWidthVector Behavior

2018-03-13 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6234:
--
Summary: Improve Documentation of VariableWidthVector Behavior  (was: 
VarCharVector setValueCount can throw IndexOutOfBoundsException)

> Improve Documentation of VariableWidthVector Behavior
> -
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6234) VarCharVector setValueCount can throw IndexOutOfBoundsException

2018-03-13 Thread Timothy Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397856#comment-16397856
 ] 

Timothy Farkas commented on DRILL-6234:
---

Hi [~paul-rogers], I agree with your analysis. What I tried doing was having 
the vector remember it's value count without being explicitly told it. In the 
case of variable length vectors the value count is effectively last index 
written to + 1. If someone set a larger value count we would simply increase 
the size of the offset vector but internally use the last written index + 1 as 
the effective value count. If someone sets a smaller value count than the 
current, then we trim the vector as the method does now. I got things to work 
as I wanted in isolation, but things broke down in the integration tests when 
transferTo, load, and getMetadata were called. Then I realized how pointless 
this whole thing was and you were right all along :) and gave up. I'm going to 
salvage some of the Javadoc and unit tests I wrote and supplement it with some 
of your explanation in an effort to prevent other people from being confused by 
this. I'll open a small PR with the documentation improvements and tag you on 
it.

> VarCharVector setValueCount can throw IndexOutOfBoundsException
> ---
>
> Key: DRILL-6234
> URL: https://issues.apache.org/jira/browse/DRILL-6234
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Doing the following will throw an Index out of bounds exception.
> {code}
>   final VarCharVector vector = new VarCharVector(field, allocator);
>   vector.allocateNew();
>   vector.getMutator().setValueCount(100);
> {code}
> The expected behavior is to resize the array appropriately. If an index is 
> uninitialized you should not call get for that index.
> {code}
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:729)
>   at 
> org.apache.drill.exec.vector.VarCharVectorTest.testExpandingNonEmptyVectorSetValueCount(VarCharVectorTest.java:97)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6238) Batch sizing for operators

2018-03-13 Thread Padma Penumarthy (JIRA)
Padma Penumarthy created DRILL-6238:
---

 Summary: Batch sizing for operators
 Key: DRILL-6238
 URL: https://issues.apache.org/jira/browse/DRILL-6238
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Padma Penumarthy
Assignee: Padma Penumarthy


*Batch Sizing For Operators*

This document describes the approach we are taking for limiting batch sizes for 
operators other than scan. 

*Motivation*

Main goals are
 # Improve concurrency 
 # Reduce query failures because of out of memory errors

To accomplish these goals, we need to make queries execute within a specified 
memory budget. To enforce per query memory limit, we need to be able to enforce 
per fragment and per operator memory limits. Controlling individual operators 
batch sizes is the first step towards all this. 

*Background*

In Drill, different operators have different limits w.r.to outgoing batches. 
Some use hard coded row counts, some use hard coded memory and some have none 
at all. Based on input data size and what the operator is doing, memory used by 
the outgoing batch can vary widely as there are no limits imposed. Queries fail 
because we are not able to allocate the memory needed. Some operators produce 
very large batches, causing blocking operators like sort, hash agg which have 
to work under tight memory constraints to fail. Size of batches should be a 
function of available memory rather than input data size and/or what the 
operator does. Please refer to table at the end of this document for details on 
what each operator does today.

*Design*

Goal is to have all operators behave the same way i.e. produce batches with 
size less than or equal to configured outgoing batch size with a minimum of 1 
row per batch and maximum of 64k rows per batch. A new system option 
‘drill.exec.memory.operator.output_batch_size’ is added which has default value 
of 16MB.

The basic idea is to limit size of outgoing batch by deciding how many rows we 
can have in the batch based on average entry size of each outgoing column, 
taking into account actual data size and metadata vector overhead we add on top 
for tracking variable length, mode(repeated, optional, required) etc. This 
calculation will be different for each operator, based on what the operator is 
doing, incoming data and what is being projected out. 

By taking this adaptive approach based on actual data sizes, for operators 
which were limiting batch size to something less than 64K before can possibly 
do lot more rows (upto 64K) in a batch if the memory stays within the budget. 
This should help improve performance.

Also, to improve performance and utilize memory more efficiently, we will
 # Allocate memory for value vectors upfront. Since we know the number of rows 
and sizing information for each column in the  outgoing batch, we will use that 
information to allocate memory for value vectors upfront. This will help 
improve performance by reducing the memory copies and zeroing the new half we 
do every time we double.
 # Make the number of rows in outgoing batch a power of two. Since memory is 
allocated in powers of two, this will help us pack the value vectors densely 
thereby reducing the amount of memory that gets wasted because of doubling 
effect.

So, to summarize, the benefits we will get are improved memory utilization, 
better performance, higher concurrency and less queries dying because of out of 
memory errors. 

So, what are the cons ? 
 * Since this is based on averages, strict enforcement is not possible. There 
could be pathological cases where because of uneven data distribution, we might 
exceed the configured output batch size potentially causing OOM errors and 
problems in downstream operators.

Other issues that will be addressed:
 * We are adding extra processing for each batch in each operator to figure out 
the sizing information. This overhead can be reduced by passing this 
information along with the batch between operators. 
 * For some operators, it will be complex to figure out average size of 
outgoing columns especially if we have to evaluate complex expression trees and 
UDFs to figure out the transformation on incoming batches. We will use 
approximations as appropriate.

Following table summarizes the limits we have today for each operator. 

flatten, merge join and external sort have already been changed to adhere to 
batch size limits as described in this document as of drill release 1.13.

 
|*Operator*|*Limit* 
*(Rows, Memory)*|*Notes*|
|Flatten|4K, 512MB|Flatten can produce very large batches based on average 
cardinality of the flatten column. |
|Merge Receiver|32K|No memory limit. |
|Hash Aggregate|64K|No memory limit.|
|Streaming Aggregate|32K|No memory limit.|
|Broadcast Sender|None|No limits. |
|Filter, Limit|None|No limits.|
|Hash Join|4K|No memory limit|
|Merge Join|4K|No memory limit|
|Nested 

[jira] [Updated] (DRILL-6235) Flatten query leads to out of memory in RPC layer.

2018-03-13 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-6235:
--
Attachment: drillbit_snippet.log

> Flatten query leads to out of memory in RPC layer.
> --
>
> Key: DRILL-6235
> URL: https://issues.apache.org/jira/browse/DRILL-6235
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Khurram Faraaz
>Assignee: Padma Penumarthy
>Priority: Critical
> Attachments: 25593391-512d-23ab-7c84-3651006931e2.sys.drill, 
> drillbit_snippet.log
>
>
> Flatten query leads to out of memory in RPC layer. Query profile is attached 
> here.
> Total number of JSON files = 4095
> Each JSON file has nine rows
> And each row in the JSON has an array with 1024 integer values, and there are 
> other string values outside of the array.
> Two major fragments and eighty eight minor fragments were created
> On a 4 node CentOS cluster
> number of CPU cores
> [root@qa102-45 ~]# grep -c ^processor /proc/cpuinfo
> 32
> Details of memory
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from sys.memory;
> +--++---+-+-+-+-+
> | hostname | user_port | heap_current | heap_max | direct_current | 
> jvm_direct_current | direct_max |
> +--++---+-+-+-+-+
> | qa102-45.qa.lab | 31010 | 1130364912 | 4294967296 | 0 | 170528 | 8589934592 
> |
> | qa102-47.qa.lab | 31010 | 171823104 | 4294967296 | 0 | 21912 | 8589934592 |
> | qa102-48.qa.lab | 31010 | 201326576 | 4294967296 | 0 | 21912 | 8589934592 |
> | qa102-46.qa.lab | 31010 | 214780896 | 4294967296 | 0 | 21912 | 8589934592 |
> +--++---+-+-+-+-+
> 4 rows selected (0.166 seconds)
> {noformat}
> Reset all options and set slice_target=1
> alter system reset all;
> alter system set `planner.slice_target`=1;
> {noformat}
> SELECT * , FLATTEN(arr) FROM many_json_files
> ...
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> Failure allocating buffer.
> Fragment 1:38
> [Error Id: cf4fd273-d8a2-45e8-8d72-15c738e53b0f on qa102-45.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log fir above failing query.
> {noformat}
> 2018-03-12 11:52:33,849 [25593391-512d-23ab-7c84-3651006931e2:frag:0:0] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 25593391-512d-23ab-7c84-3651006931e2:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2018-03-12 11:52:33,849 [25593391-512d-23ab-7c84-3651006931e2:frag:0:0] INFO 
> o.a.d.e.w.f.FragmentStatusReporter - 
> 25593391-512d-23ab-7c84-3651006931e2:0:0: State to report: RUNNING
> 2018-03-12 11:52:33,854 [25593391-512d-23ab-7c84-3651006931e2:frag:0:0] INFO 
> o.a.d.e.c.ClassCompilerSelector - Java compiler policy: DEFAULT, Debug 
> option: true
> 2018-03-12 11:52:35,929 [BitServer-4] WARN 
> o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
> stream due to memory limits. Current Allocation: 92340224.
> 2018-03-12 11:52:35,929 [BitServer-3] WARN 
> o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
> stream due to memory limits. Current Allocation: 92340224.
> 2018-03-12 11:52:35,930 [BitServer-3] ERROR 
> o.a.drill.exec.rpc.data.DataServer - Out of memory in RPC layer.
> 2018-03-12 11:52:35,930 [BitServer-4] ERROR 
> o.a.drill.exec.rpc.data.DataServer - Out of memory in RPC layer.
> 2018-03-12 11:52:35,930 [BitServer-4] WARN 
> o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
> stream due to memory limits. Current Allocation: 83886080.
> 2018-03-12 11:52:35,930 [BitServer-3] WARN 
> o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
> stream due to memory limits. Current Allocation: 83886080.
> 2018-03-12 11:52:35,930 [BitServer-4] ERROR 
> o.a.drill.exec.rpc.data.DataServer - Out of memory in RPC layer.
> 2018-03-12 11:52:35,930 [BitServer-3] ERROR 
> o.a.drill.exec.rpc.data.DataServer - Out of memory in RPC layer.
> 2018-03-12 11:52:35,931 [BitServer-3] WARN 
> o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
> stream due to memory limits. Current Allocation: 83886080.
> 2018-03-12 11:52:35,931 [BitServer-4] WARN 
> o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
> stream due to memory limits. Current Allocation: 83886080.
> 2018-03-12 11:52:35,931 [BitServer-3] ERROR 
> o.a.drill.exec.rpc.data.DataServer - Out of memory in RPC layer.
> 2018-03-12 11:52:35,931 [BitServer-4] ERROR 
> 

[jira] [Updated] (DRILL-6053) Avoid excessive locking in LocalPersistentStore

2018-03-13 Thread Vlad Rozov (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov updated DRILL-6053:
--
Reviewer: Arina Ielchiieva

> Avoid excessive locking in LocalPersistentStore
> ---
>
> Key: DRILL-6053
> URL: https://issues.apache.org/jira/browse/DRILL-6053
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> When query profiles are written to LocalPersistentStore, the write is 
> unnecessary serialized due to read/write lock that was introduced for 
> versioned PersistentStore. Only versioned access needs to be protected by 
> read/write lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6237) Upgrade checkstyle version to 5.9 or above

2018-03-13 Thread Vlad Rozov (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov updated DRILL-6237:
--
Reviewer: Arina Ielchiieva

> Upgrade checkstyle version to 5.9 or above
> --
>
> Key: DRILL-6237
> URL: https://issues.apache.org/jira/browse/DRILL-6237
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Minor
>
> Checkstyle versions prior to 5.9 do not support Java 8 syntax.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6053) Avoid excessive locking in LocalPersistentStore

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397696#comment-16397696
 ] 

ASF GitHub Bot commented on DRILL-6053:
---

GitHub user vrozov opened a pull request:

https://github.com/apache/drill/pull/1163

 DRILL-6053 &  DRILL-6237

- Avoid excessive locking in LocalPersistentStore
- Upgrade checkstyle version to 5.9 or above

@arina-ielchiieva Please review

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vrozov/drill DRILL-6237

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1163.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1163


commit adc5aea6ef217c3a3548e630b466f9b56adfb282
Author: Vlad Rozov 
Date:   2018-03-13T17:56:52Z

DRILL-6053: Avoid excessive locking in LocalPersistentStore

commit c1def24c62ccb729e720b5416abdb41c47a4869f
Author: Vlad Rozov 
Date:   2018-03-13T19:24:48Z

DRILL-6237: Upgrade checkstyle version to 5.9 or above

commit 6bde4215e8eb2bc19a1e981c3e444b43b08237ee
Author: Vlad Rozov 
Date:   2018-03-13T20:48:35Z

DRILL-6053: Avoid excessive locking in LocalPersistentStore




> Avoid excessive locking in LocalPersistentStore
> ---
>
> Key: DRILL-6053
> URL: https://issues.apache.org/jira/browse/DRILL-6053
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> When query profiles are written to LocalPersistentStore, the write is 
> unnecessary serialized due to read/write lock that was introduced for 
> versioned PersistentStore. Only versioned access needs to be protected by 
> read/write lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6010) Working drillbit showing as in QUIESCENT state

2018-03-13 Thread Venkata Jyothsna Donapati (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Jyothsna Donapati closed DRILL-6010.

Resolution: Fixed

> Working drillbit showing as in QUIESCENT state
> --
>
> Key: DRILL-6010
> URL: https://issues.apache.org/jira/browse/DRILL-6010
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: online_vs_quiescent.JPG
>
>
> After DRILL-4286 once I got a situation that after running all functional 
> tests three drillbits were in ONLINE state, another one in QUIESCENT. Though 
> from the one in quiescent state I could run queries and so it was online. 
> drillbit.sh stop could not shutdown it and had to do kill -9 of the process 
> (online_vs_quiescent.JPG).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6010) Working drillbit showing as in QUIESCENT state

2018-03-13 Thread Venkata Jyothsna Donapati (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397687#comment-16397687
 ] 

Venkata Jyothsna Donapati commented on DRILL-6010:
--

Yes, Looks like thats the issue. Closing the issue.

 

> Working drillbit showing as in QUIESCENT state
> --
>
> Key: DRILL-6010
> URL: https://issues.apache.org/jira/browse/DRILL-6010
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: online_vs_quiescent.JPG
>
>
> After DRILL-4286 once I got a situation that after running all functional 
> tests three drillbits were in ONLINE state, another one in QUIESCENT. Though 
> from the one in quiescent state I could run queries and so it was online. 
> drillbit.sh stop could not shutdown it and had to do kill -9 of the process 
> (online_vs_quiescent.JPG).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6010) Working drillbit showing as in QUIESCENT state

2018-03-13 Thread Venkata Jyothsna Donapati (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397687#comment-16397687
 ] 

Venkata Jyothsna Donapati edited comment on DRILL-6010 at 3/13/18 9:29 PM:
---

Yes, Looks like thats the issue. Closing the issue.


was (Author: vdonapati):
Yes, Looks like thats the issue. Closing the issue.

 

> Working drillbit showing as in QUIESCENT state
> --
>
> Key: DRILL-6010
> URL: https://issues.apache.org/jira/browse/DRILL-6010
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: online_vs_quiescent.JPG
>
>
> After DRILL-4286 once I got a situation that after running all functional 
> tests three drillbits were in ONLINE state, another one in QUIESCENT. Though 
> from the one in quiescent state I could run queries and so it was online. 
> drillbit.sh stop could not shutdown it and had to do kill -9 of the process 
> (online_vs_quiescent.JPG).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6145) Implement Hive MapR-DB JSON handler.

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397654#comment-16397654
 ] 

ASF GitHub Bot commented on DRILL-6145:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1158#discussion_r174285398
  
--- Diff: distribution/pom.xml ---
@@ -324,6 +324,14 @@
   org.apache.hbase
   hbase-protocol
 
+
--- End diff --

maven dependency handling is indeed somewhat inconsistent in regards to 
profile. Check maven 3.5 or above. AFAIK it (maven reactor) handles profile 
dependencies better compared to prior versions.


> Implement Hive MapR-DB JSON handler. 
> -
>
> Key: DRILL-6145
> URL: https://issues.apache.org/jira/browse/DRILL-6145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Similar to "hive-hbase-storage-handler" to support querying MapR-DB Hive's 
> external tables it is necessary to add "hive-maprdb-json-handler".
> Use case:
>  # Create a table MapR-DB JSON table:
> {code}
> _> mapr dbshell_
> _maprdb root:> create /tmp/table/json_  (make sure /tmp/table exists)
> {code}
> -- insert data
> {code}
> insert /tmp/table/json --value '\{"_id":"movie002" , "title":"Developers 
> on the Edge", "studio":"Command Line Studios"}'
> insert /tmp/table/json --id movie003 --value '\{"title":"The Golden 
> Master", "studio":"All-Nighter"}'
> {code} 
>  #  Create a Hive external table:
> {code}
> hive> CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( 
> > movie_id string, title string, studio string) 
> > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' 
> > TBLPROPERTIES("maprdb.table.name" = 
> "/tmp/table/json","maprdb.column.id" = "movie_id");
> {code}
>  
>  #  Use hive schema to query this table via Drill:
> {code}
> 0: jdbc:drill:> select * from hive.mapr_db_json_hive_tbl;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6237) Upgrade checkstyle version to 5.9 or above

2018-03-13 Thread Vlad Rozov (JIRA)
Vlad Rozov created DRILL-6237:
-

 Summary: Upgrade checkstyle version to 5.9 or above
 Key: DRILL-6237
 URL: https://issues.apache.org/jira/browse/DRILL-6237
 Project: Apache Drill
  Issue Type: Task
Reporter: Vlad Rozov
Assignee: Vlad Rozov


Checkstyle versions prior to 5.9 do not support Java 8 syntax.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6040) Need to add usage for graceful_stop to drillbit.sh

2018-03-13 Thread Krystal (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397308#comment-16397308
 ] 

Krystal commented on DRILL-6040:


git.commit.id.abbrev=cac2882

Verified that bug is fixed.

/opt/drill/bin/drillbit.sh 
Usage: drillbit.sh [--config|--site ] 
(start|stop|status|restart|run|graceful_stop) [args]

> Need to add usage for graceful_stop to drillbit.sh
> --
>
> Key: DRILL-6040
> URL: https://issues.apache.org/jira/browse/DRILL-6040
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> git.commit.id.abbrev=eb0c403
> Usage for graceful_stop is missing from drillbit.sh.
> ./drillbit.sh
> Usage: drillbit.sh [--config|--site ] 
> (start|stop|status|restart|run) [args]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6040) Need to add usage for graceful_stop to drillbit.sh

2018-03-13 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal closed DRILL-6040.
--

Verified that bug is fixed.

> Need to add usage for graceful_stop to drillbit.sh
> --
>
> Key: DRILL-6040
> URL: https://issues.apache.org/jira/browse/DRILL-6040
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> git.commit.id.abbrev=eb0c403
> Usage for graceful_stop is missing from drillbit.sh.
> ./drillbit.sh
> Usage: drillbit.sh [--config|--site ] 
> (start|stop|status|restart|run) [args]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6145) Implement Hive MapR-DB JSON handler.

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396928#comment-16396928
 ] 

ASF GitHub Bot commented on DRILL-6145:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1158#discussion_r174125821
  
--- Diff: distribution/pom.xml ---
@@ -324,6 +324,14 @@
   org.apache.hbase
   hbase-protocol
 
+
--- End diff --

I did it at first time, but jars are downloaded usually only for 
dependencies from common part of pom file.
Looks like to add these dependencies into mar profile of drill-hive module, 
it is necessary to add maven-assembly-plugin to mapr profile of storage-hive 
module or to edit distribution pom file somehow to leverage dependencies from 
profiles section. I will investigate it. 


> Implement Hive MapR-DB JSON handler. 
> -
>
> Key: DRILL-6145
> URL: https://issues.apache.org/jira/browse/DRILL-6145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Similar to "hive-hbase-storage-handler" to support querying MapR-DB Hive's 
> external tables it is necessary to add "hive-maprdb-json-handler".
> Use case:
>  # Create a table MapR-DB JSON table:
> {code}
> _> mapr dbshell_
> _maprdb root:> create /tmp/table/json_  (make sure /tmp/table exists)
> {code}
> -- insert data
> {code}
> insert /tmp/table/json --value '\{"_id":"movie002" , "title":"Developers 
> on the Edge", "studio":"Command Line Studios"}'
> insert /tmp/table/json --id movie003 --value '\{"title":"The Golden 
> Master", "studio":"All-Nighter"}'
> {code} 
>  #  Create a Hive external table:
> {code}
> hive> CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( 
> > movie_id string, title string, studio string) 
> > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' 
> > TBLPROPERTIES("maprdb.table.name" = 
> "/tmp/table/json","maprdb.column.id" = "movie_id");
> {code}
>  
>  #  Use hive schema to query this table via Drill:
> {code}
> 0: jdbc:drill:> select * from hive.mapr_db_json_hive_tbl;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6061) Doc Request: Global Query List showing queries from all Drill foreman nodes

2018-03-13 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6061:

Fix Version/s: (was: 1.13.0)
   1.14.0

> Doc Request: Global Query List showing queries from all Drill foreman nodes
> ---
>
> Key: DRILL-6061
> URL: https://issues.apache.org/jira/browse/DRILL-6061
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation, Metadata, Web Server
>Affects Versions: 1.11.0
> Environment: MapR 5.2
>Reporter: Hari Sekhon
>Assignee: Bridget Bevens
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Documentation Request to improve doc around Global Query List to show all 
> queries executed across all Drill nodes in a cluster for better management 
> and auditing.
> It wasn't obvious to be able to see all queries across all nodes in a Drill 
> cluster. The Web UI on any given Drill node only shows the queries 
> coordinated by that local node if acting as the foreman for the query, so if 
> using ZooKeeper or a Load Balancer to distribute queries via different Drill 
> nodes (eg. 
> [https://github.com/HariSekhon/nagios-plugins/tree/master/haproxy|https://github.com/HariSekhon/nagios-plugins/tree/master/haproxy])
>  then the query list will be spread across lots of different nodes with no 
> global timeline of queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-13 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6199:

Fix Version/s: (was: 1.13.0)
   1.14.0

> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5680) BasicPhysicalOpUnitTest can't run in Eclipse with Java 8

2018-03-13 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396850#comment-16396850
 ] 

Arina Ielchiieva commented on DRILL-5680:
-

[~paul-rogers] Drill has officially moved to JDK 8. Do you still see this 
failure?

> BasicPhysicalOpUnitTest can't run in Eclipse with Java 8
> 
>
> Key: DRILL-5680
> URL: https://issues.apache.org/jira/browse/DRILL-5680
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Priority: Minor
>
> A unit test failure was detected in the test {{BasicPhysicalOpUnitTest}}. 
> Wanted to run this test in Eclipse to track down the error. But, this test 
> uses Mockito which cannot run in Java 8 under Eclipse:
> {code}
> java.lang.UnsupportedClassVersionError: org/apache/drill/test/DrillTest : 
> Unsupported major.minor version 52.0
>   at java.lang.ClassLoader.defineClass1(Native Method)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-4333) tests in Drill2489CallsAfterCloseThrowExceptionsTest fail in Java 8

2018-03-13 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-4333.
-
   Resolution: Fixed
Fix Version/s: (was: Future)
   1.13.0

> tests in Drill2489CallsAfterCloseThrowExceptionsTest fail in Java 8
> ---
>
> Key: DRILL-4333
> URL: https://issues.apache.org/jira/browse/DRILL-4333
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build  Test
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
>Priority: Major
> Fix For: 1.13.0
>
>
> The following tests fail in Java 8:
> {noformat}
> Drill2489CallsAfterCloseThrowExceptionsTest.testClosedPlainStatementMethodsThrowRight
> Drill2489CallsAfterCloseThrowExceptionsTest.testclosedPreparedStmtOfOpenConnMethodsThrowRight
> Drill2489CallsAfterCloseThrowExceptionsTest.testClosedResultSetMethodsThrowRight1
> Drill2489CallsAfterCloseThrowExceptionsTest.testClosedResultSetMethodsThrowRight2
> Drill2489CallsAfterCloseThrowExceptionsTest.testClosedDatabaseMetaDataMethodsThrowRight
> Drill2769UnsupportedReportsUseSqlExceptionTest.testPreparedStatementMethodsThrowRight
> Drill2769UnsupportedReportsUseSqlExceptionTest.testPlainStatementMethodsThrowRight
> {noformat}
> Drill has special implementations of Statement, PreparedStatement, ResultSet 
> and DatabaseMetadata that overrides all parent methods to make sure they 
> throw a proper exception if the statement has already been closed. 
> These tests use reflection to make sure all methods behave correctly, but 
> Java 8 added more methods that need to be properly overridden.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-4547) Javadoc fails with Java8

2018-03-13 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4547:

Fix Version/s: 1.14.0

> Javadoc fails with Java8
> 
>
> Key: DRILL-4547
> URL: https://issues.apache.org/jira/browse/DRILL-4547
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.6.0
>Reporter: Laurent Goujon
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.14.0
>
>
> Javadoc cannot be generated when using Java8 (likely because the parser is 
> now more strict).
> Here's an example of issues when trying to generate javadocs in module 
> {{drill-fmpp-maven-plugin}}
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:2.9.1:jar (attach-javadocs) on 
> project drill-fmpp-maven-plugin: MavenReportException: Error while creating 
> archive:
> [ERROR] Exit code: 1 - 
> /Users/laurent/devel/drill/tools/fmpp/src/main/java/org/apache/drill/fmpp/mojo/FMPPMojo.java:44:
>  error: unknown tag: goal
> [ERROR] * @goal generate
> [ERROR] ^
> [ERROR] 
> /Users/laurent/devel/drill/tools/fmpp/src/main/java/org/apache/drill/fmpp/mojo/FMPPMojo.java:45:
>  error: unknown tag: phase
> [ERROR] * @phase generate-sources
> [ERROR] ^
> [ERROR] 
> /Users/laurent/devel/drill/tools/fmpp/target/generated-sources/plugin/org/apache/drill/fmpp/mojo/HelpMojo.java:25:
>  error: unknown tag: goal
> [ERROR] * @goal help
> [ERROR] ^
> [ERROR] 
> /Users/laurent/devel/drill/tools/fmpp/target/generated-sources/plugin/org/apache/drill/fmpp/mojo/HelpMojo.java:26:
>  error: unknown tag: requiresProject
> [ERROR] * @requiresProject false
> [ERROR] ^
> [ERROR] 
> /Users/laurent/devel/drill/tools/fmpp/target/generated-sources/plugin/org/apache/drill/fmpp/mojo/HelpMojo.java:27:
>  error: unknown tag: threadSafe
> [ERROR] * @threadSafe
> [ERROR] ^
> [ERROR] 
> [ERROR] Command line was: 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_72.jdk/Contents/Home/bin/javadoc 
> @options @packages
> [ERROR] 
> [ERROR] Refer to the generated Javadoc files in 
> '/Users/laurent/devel/drill/tools/fmpp/target/apidocs' dir.
> [ERROR] -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR] 
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :drill-fmpp-maven-plugin
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-4329) 13 Unit tests are failing with JDK 8

2018-03-13 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-4329.
-
   Resolution: Fixed
Fix Version/s: (was: Future)
   1.13.0

> 13 Unit tests are failing with JDK 8
> 
>
> Key: DRILL-4329
> URL: https://issues.apache.org/jira/browse/DRILL-4329
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
> Environment: Mac OS
> JDK 1.8.0_65
>Reporter: Deneche A. Hakim
>Priority: Major
> Fix For: 1.13.0
>
>
> The following unit tests are failing when building Drill with JDK 1.8.0_65:
> {noformat}
>   TestFlattenPlanning.testFlattenPlanningAvoidUnnecessaryProject
>   TestFrameworkTest {
>   testRepeatedColumnMatching
>   testCSVVerificationOfOrder_checkFailure
>   }
>   Drill2489CallsAfterCloseThrowExceptionsTest {
> testClosedDatabaseMetaDataMethodsThrowRight
> testClosedPlainStatementMethodsThrowRight
> testclosedPreparedStmtOfOpenConnMethodsThrowRight
> testClosedResultSetMethodsThrowRight1
> testClosedResultSetMethodsThrowRight2
>   }
>   Drill2769UnsupportedReportsUseSqlExceptionTest {
> testPreparedStatementMethodsThrowRight
> testPlainStatementMethodsThrowRight
>   }
>   TestMongoFilterPushDown {
> testFilterPushDownIsEqual
> testFilterPushDownGreaterThanWithSingleField
> testFilterPushDownLessThanWithSingleField
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6163) Switch Travis To Java 8

2018-03-13 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-6163.
-
Resolution: Fixed

Fixed in dbc95a659c3325e263340f3ec2b913a048163671.

> Switch Travis To Java 8
> ---
>
> Key: DRILL-6163
> URL: https://issues.apache.org/jira/browse/DRILL-6163
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Timothy Farkas
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.13.0
>
>
> Drill is preparing to move to Java 8 for the next release. So we should make 
> Travis test with Java 8 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6163) Switch Travis To Java 8

2018-03-13 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6163:

Fix Version/s: 1.13.0

> Switch Travis To Java 8
> ---
>
> Key: DRILL-6163
> URL: https://issues.apache.org/jira/browse/DRILL-6163
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Timothy Farkas
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.13.0
>
>
> Drill is preparing to move to Java 8 for the next release. So we should make 
> Travis test with Java 8 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6163) Switch Travis To Java 8

2018-03-13 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6163:

Affects Version/s: 1.12.0

> Switch Travis To Java 8
> ---
>
> Key: DRILL-6163
> URL: https://issues.apache.org/jira/browse/DRILL-6163
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Timothy Farkas
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.13.0
>
>
> Drill is preparing to move to Java 8 for the next release. So we should make 
> Travis test with Java 8 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6005) Fix TestGracefulShutdown tests to skip check for loopback address usage in distributed mode

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396641#comment-16396641
 ] 

ASF GitHub Bot commented on DRILL-6005:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1162#discussion_r174035415
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -640,4 +640,6 @@ public static String bootDefaultFor(String name) {
 
   public static final String DRILL_PORT_HUNT = "drill.exec.port_hunt";
 
+  public static final String ALLOW_LOOPBACK_ADDRESS_BINDING = 
"drill.exec.enable_loopback_address_binding";
--- End diff --

Please rename to `drill.exec.allow_loopback_address_binding`.


> Fix TestGracefulShutdown tests to skip check for loopback address usage in 
> distributed mode
> ---
>
> Key: DRILL-6005
> URL: https://issues.apache.org/jira/browse/DRILL-6005
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> After DRILL-4286 changes some of the newly added unit tests fail with 
> {noformat}
> Drillbit is disallowed to bind to loopback address in distributed mode.
> {noformat}
> List of failed tests:
> {noformat}
> Tests in error: 
>   TestGracefulShutdown.testOnlineEndPoints:96 » IllegalState Cluster fixture 
> set...
>   TestGracefulShutdown.testStateChange:130 » IllegalState Cluster fixture 
> setup ...
>   TestGracefulShutdown.testRestApi:167 » IllegalState Cluster fixture setup 
> fail...
>   TestGracefulShutdown.testRestApiShutdown:207 » IllegalState Cluster fixture 
> se...
> {noformat}
> This can be fixed if {{/etc/hosts}} file is edited.
> Source - 
> https://stackoverflow.com/questions/40506221/how-to-start-drillbit-locally-in-distributed-mode
> Though these changes are required on production during running unit tests I 
> don't think this check should be enforced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-4829) Configure the address to bind to

2018-03-13 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-4829:
---

Assignee: Venkata Jyothsna Donapati

> Configure the address to bind to
> 
>
> Key: DRILL-4829
> URL: https://issues.apache.org/jira/browse/DRILL-4829
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Daniel Stockton
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
> Fix For: 1.14.0
>
>
> 1.7 included the following patch to prevent Drillbits binding to the loopback 
> address: https://issues.apache.org/jira/browse/DRILL-4523
> "Drillbit is disallowed to bind to loopback address in distributed mode."
> It would be better if this was configurable rather than rely on /etc/hosts, 
> since it's common for the hostname to resolve to loopback.
> Would you accept a patch that adds this option to drill.override.conf?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-4829) Configure the address to bind to

2018-03-13 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4829:

Fix Version/s: 1.14.0

> Configure the address to bind to
> 
>
> Key: DRILL-4829
> URL: https://issues.apache.org/jira/browse/DRILL-4829
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Daniel Stockton
>Priority: Minor
> Fix For: 1.14.0
>
>
> 1.7 included the following patch to prevent Drillbits binding to the loopback 
> address: https://issues.apache.org/jira/browse/DRILL-4523
> "Drillbit is disallowed to bind to loopback address in distributed mode."
> It would be better if this was configurable rather than rely on /etc/hosts, 
> since it's common for the hostname to resolve to loopback.
> Would you accept a patch that adds this option to drill.override.conf?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6005) Fix TestGracefulShutdown tests to skip check for loopback address usage in distributed mode

2018-03-13 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6005:

Labels: doc-impacting  (was: )

> Fix TestGracefulShutdown tests to skip check for loopback address usage in 
> distributed mode
> ---
>
> Key: DRILL-6005
> URL: https://issues.apache.org/jira/browse/DRILL-6005
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> After DRILL-4286 changes some of the newly added unit tests fail with 
> {noformat}
> Drillbit is disallowed to bind to loopback address in distributed mode.
> {noformat}
> List of failed tests:
> {noformat}
> Tests in error: 
>   TestGracefulShutdown.testOnlineEndPoints:96 » IllegalState Cluster fixture 
> set...
>   TestGracefulShutdown.testStateChange:130 » IllegalState Cluster fixture 
> setup ...
>   TestGracefulShutdown.testRestApi:167 » IllegalState Cluster fixture setup 
> fail...
>   TestGracefulShutdown.testRestApiShutdown:207 » IllegalState Cluster fixture 
> se...
> {noformat}
> This can be fixed if {{/etc/hosts}} file is edited.
> Source - 
> https://stackoverflow.com/questions/40506221/how-to-start-drillbit-locally-in-distributed-mode
> Though these changes are required on production during running unit tests I 
> don't think this check should be enforced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-4957) Drillbit is disallowed to bind to loopback address in distributed mode.

2018-03-13 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva closed DRILL-4957.
---
Resolution: Invalid

> Drillbit is disallowed to bind to loopback address in distributed mode.
> ---
>
> Key: DRILL-4957
> URL: https://issues.apache.org/jira/browse/DRILL-4957
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: evan912
>Priority: Major
>
> Drill works well in embedded mode.
> But not work in distributed mode, I set the zookeeper at localhost:2181 and 
> in drill's config file. 
> in drillbit. out: 
> Exception in thread "main" 
> org.apache.drill.exec.exception.DrillbitStartupException: Failure during 
> initial startup of Drillbit.
>   at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:295)
>   at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:271)
>   at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:267)
> Caused by: org.apache.drill.exec.exception.DrillbitStartupException: Drillbit 
> is disallowed to bind to loopback address in distributed mode.
>   at 
> org.apache.drill.exec.service.ServiceEngine.checkLoopbackAddress(ServiceEngine.java:186)
>   at 
> org.apache.drill.exec.service.ServiceEngine.start(ServiceEngine.java:146)
>   at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:119)
>   at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:291)
>   ... 2 more
> Try to start drill shell with drill-localhost:
> kelvin@kelvin:~/Downloads/apache-drill-1.8.0/bin$ ./drill-localhost
> Error: Failure in connecting to Drill: 
> org.apache.drill.exec.rpc.RpcException: CONNECTION : java.net.Connec
> tException: Connection refused: localhost/127.0.0.1:31010 (state=,code=0)
> java.sql.SQLException: Failure in connecting to Drill: 
> org.apache.drill.exec.rpc.RpcException: CONNECTION : 
> java.net.ConnectException: Connection refused: localhost/127.0.0.1:31010
> at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:162)
> at 
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:64)
> at 
> org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:69)
> at 
> net.hydromatic.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:126)
> at org.apache.drill.jdbc.Driver.connect(Driver.java:72)
> at sqlline.DatabaseConnection.connect(DatabaseConnection.java:167)
> at org.apache.drill.jdbc.Driver.connect(Driver.java:72)   
>  [0/1942]
> at sqlline.DatabaseConnection.connect(DatabaseConnection.java:167)
> at 
> sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:213)
> at sqlline.Commands.connect(Commands.java:1083)
> at sqlline.Commands.connect(Commands.java:1015)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
> at sqlline.SqlLine.dispatch(SqlLine.java:742)
> at sqlline.SqlLine.initArgs(SqlLine.java:528)
> at sqlline.SqlLine.begin(SqlLine.java:596)
> at sqlline.SqlLine.start(SqlLine.java:375)
> at sqlline.SqlLine.main(SqlLine.java:268)
> Caused by: org.apache.drill.exec.rpc.RpcException: CONNECTION : 
> java.net.ConnectException: Connection refus
> ed: localhost/127.0.0.1:31010
> at 
> org.apache.drill.exec.client.DrillClient$FutureHandler.connectionFailed(DrillClient.java:675)
> at 
> org.apache.drill.exec.rpc.user.QueryResultHandler$ChannelClosedHandler.connectionFailed(QueryRes
> ultHandler.java:389)
> at 
> org.apache.drill.exec.rpc.BasicClient$ConnectionMultiListener$ConnectionHandler.operationComplet
> e(BasicClient.java:233)
> at 
> org.apache.drill.exec.rpc.BasicClient$ConnectionMultiListener$ConnectionHandler.operationComplet
> e(BasicClient.java:196)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
> at 
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
> at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChann
> el.java:268)
>