[GitHub] ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join to skip incoming build duplicates, automatically stop skipping if too few

2019-01-15 Thread GitBox
ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join 
to skip incoming build duplicates, automatically stop skipping if too few
URL: https://github.com/apache/drill/pull/1606#discussion_r248149480
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinSpillControlImpl.java
 ##
 @@ -18,39 +18,211 @@
 
 package org.apache.drill.exec.physical.impl.join;
 
+  import org.apache.drill.exec.ExecConstants;
   import org.apache.drill.exec.memory.BufferAllocator;
+  import org.apache.drill.exec.ops.FragmentContext;
+  import org.apache.drill.exec.record.RecordBatch;
   import org.apache.drill.exec.record.RecordBatchMemoryManager;
-  import org.apache.drill.exec.record.RecordBatchSizer;
-  import org.apache.drill.exec.record.VectorContainer;
+  import org.slf4j.Logger;
+  import org.slf4j.LoggerFactory;
+
+  import javax.annotation.Nullable;
+  import java.util.Set;
+
+  import static org.apache.drill.exec.record.JoinBatchMemoryManager.LEFT_INDEX;
+  import static 
org.apache.drill.exec.record.JoinBatchMemoryManager.RIGHT_INDEX;
 
 /**
  * This class is currently used only for Semi-Hash-Join that avoids duplicates 
by the use of a hash table
- * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill(VectorContainer)} 
returns true if the memory available now to the allocator if not enough
+ * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill()} returns true if 
the memory available now to the allocator if not enough
  * to hold (a multiple of, for safety) a new allocated batch
  */
-public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.HashJoinSpillControl {
+public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.BuildSidePartitioning {
+  private static final Logger logger = 
LoggerFactory.getLogger(HashJoinSpillControlImpl.class);
+
   private BufferAllocator allocator;
   private int recordsPerBatch;
   private int minBatchesInAvailableMemory;
   private RecordBatchMemoryManager batchMemoryManager;
+  private int initialPartitions;
+  private int numPartitions;
+  private int recordsPerPartitionBatchProbe;
+  private FragmentContext context;
 
-  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager) {
+  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager, 
FragmentContext context) {
 this.allocator = allocator;
 this.recordsPerBatch = recordsPerBatch;
 this.minBatchesInAvailableMemory = minBatchesInAvailableMemory;
 this.batchMemoryManager = batchMemoryManager;
+this.context = context;
   }
 
   @Override
-  public boolean shouldSpill(VectorContainer currentVectorContainer) {
-assert currentVectorContainer.hasRecordCount();
-assert currentVectorContainer.getRecordCount() == recordsPerBatch;
-// Expected new batch size like the current, plus the Hash Value vector (4 
bytes per HV)
-long batchSize = new 
RecordBatchSizer(currentVectorContainer).getActualSize() + 4 * recordsPerBatch;
+  public boolean shouldSpill() {
+// Expected new batch size like the current, plus the Hash Values vector 
(4 bytes per HV)
+long batchSize = ( 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX).getRowAllocWidth() + 4 ) * 
recordsPerBatch;
 long reserveForOutgoing = batchMemoryManager.getOutputBatchSize();
 long memoryAvailableNow = allocator.getLimit() - 
allocator.getAllocatedMemory() - reserveForOutgoing;
 boolean needsSpill = minBatchesInAvailableMemory * batchSize > 
memoryAvailableNow;
+if ( needsSpill ) {
+  logger.debug("should spill now - batch size {}, mem avail {}, reserved 
for outgoing {}", batchSize, memoryAvailableNow, reserveForOutgoing);
+}
 return needsSpill;   // go spill if too little memory is available
   }
 
+  @Override
+  public void initialize(boolean firstCycle,
+ boolean reserveHash,
+ RecordBatch buildSideBatch,
+ RecordBatch probeSideBatch,
+ Set joinColumns,
+ boolean probeEmpty,
+ long memoryAvailable,
+ int initialPartitions,
+ int recordsPerPartitionBatchBuild,
+ int recordsPerPartitionBatchProbe,
+ int maxBatchNumRecordsBuild,
+ int maxBatchNumRecordsProbe,
+ int outputBatchSize,
+ double loadFactor) {
+this.initialPartitions = initialPartitions;
+this.recordsPerPartitionBatchProbe = recordsPerPartitionBatchProbe;
+
+calculateMemoryUsage();
+  }
+
+  @Override
+  public void setPartitionStatSet(HashJoinMemoryCalculator.PartitionStatSet 

[GitHub] ilooner commented on issue #1606: DRILL-6845: Semi-Hash-Join to skip incoming build duplicates, automatically stop skipping if too few

2019-01-15 Thread GitBox
ilooner commented on issue #1606: DRILL-6845: Semi-Hash-Join to skip incoming 
build duplicates, automatically stop skipping if too few
URL: https://github.com/apache/drill/pull/1606#issuecomment-454667185
 
 
   @Ben-Zvi let's not rush this PR. I agree with you we should not be trying to 
do things perfectly, that's why I mainly only focus on functional correctness 
in my reviews and avoid superficial comments about variable names and 
shortening lines of code. As you've experienced first hand with HashAgg, 
getting memory calculations right is extremely tricky, and they're even 
trickier to debug when users hit bugs. Let's make sure the logic is rock solid 
and unit tested while everything is still fresh in our minds. Doing this now 
will save us a lot more time in the coming months. Plus I think we are getting 
pretty close, I don't think there is that much code left to write.
   
   If there is a time crunch and this needs to go into our private branch soon, 
I have no issues with putting this into the private branch, and continuing the 
review process in open source. Since the changes are mainly in the memory 
calculators, the chances of any significant merge conflict are almost zero.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join to skip incoming build duplicates, automatically stop skipping if too few

2019-01-15 Thread GitBox
ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join 
to skip incoming build duplicates, automatically stop skipping if too few
URL: https://github.com/apache/drill/pull/1606#discussion_r248160231
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinSpillControlImpl.java
 ##
 @@ -0,0 +1,238 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.physical.impl.join;
+
+  import org.apache.drill.exec.ExecConstants;
+  import org.apache.drill.exec.memory.BufferAllocator;
+  import org.apache.drill.exec.ops.FragmentContext;
+  import org.apache.drill.exec.record.RecordBatch;
+  import org.apache.drill.exec.record.RecordBatchMemoryManager;
+  import org.slf4j.Logger;
+  import org.slf4j.LoggerFactory;
+
+  import javax.annotation.Nullable;
+  import java.util.Set;
+
+  import static org.apache.drill.exec.record.JoinBatchMemoryManager.LEFT_INDEX;
+  import static 
org.apache.drill.exec.record.JoinBatchMemoryManager.RIGHT_INDEX;
+
+/**
+ * This class is currently used only for Semi-Hash-Join that avoids duplicates 
by the use of a hash table
+ * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill()} returns true if 
the memory available now to the allocator if not enough
+ * to hold (a multiple of, for safety) a new allocated batch
+ */
+public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.BuildSidePartitioning {
+  private static final Logger logger = 
LoggerFactory.getLogger(HashJoinSpillControlImpl.class);
+
+  private BufferAllocator allocator;
+  private int recordsPerBatch;
+  private int minBatchesInAvailableMemory;
+  private RecordBatchMemoryManager batchMemoryManager;
+  private int initialPartitions;
+  private int numPartitions;
+  private int recordsPerPartitionBatchProbe;
+  private FragmentContext context;
+  HashJoinMemoryCalculator.PartitionStatSet partitionStatSet;
+
+  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager, 
FragmentContext context) {
+this.allocator = allocator;
+this.recordsPerBatch = recordsPerBatch;
+this.minBatchesInAvailableMemory = minBatchesInAvailableMemory;
+this.batchMemoryManager = batchMemoryManager;
+this.context = context;
+  }
+
+  @Override
+  public boolean shouldSpill() {
+// Expected new batch size like the current, plus the Hash Values vector 
(4 bytes per HV)
+long batchSize = ( 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX).getRowAllocWidth() + 4 ) * 
recordsPerBatch;
+long reserveForOutgoing = batchMemoryManager.getOutputBatchSize();
+long memoryAvailableNow = allocator.getLimit() - 
allocator.getAllocatedMemory() - reserveForOutgoing;
+boolean needsSpill = minBatchesInAvailableMemory * batchSize > 
memoryAvailableNow;
+if ( needsSpill ) {
+  logger.debug("should spill now - batch size {}, mem avail {}, reserved 
for outgoing {}", batchSize, memoryAvailableNow, reserveForOutgoing);
+}
+return needsSpill;   // go spill if too little memory is available
+  }
+
+  @Override
+  public void initialize(boolean firstCycle,
+ boolean reserveHash,
+ RecordBatch buildSideBatch,
+ RecordBatch probeSideBatch,
+ Set joinColumns,
+ boolean probeEmpty,
+ long memoryAvailable,
+ int initialPartitions,
+ int recordsPerPartitionBatchBuild,
+ int recordsPerPartitionBatchProbe,
+ int maxBatchNumRecordsBuild,
+ int maxBatchNumRecordsProbe,
+ int outputBatchSize,
+ double loadFactor) {
+this.initialPartitions = initialPartitions;
+this.recordsPerPartitionBatchProbe = recordsPerPartitionBatchProbe;
+
+calculateMemoryUsage();
+  }
+
+  @Override
+  public void setPartitionStatSet(HashJoinMemoryCalculator.PartitionStatSet 
partitionStatSet) {
+this.partitionStatSet = 

[GitHub] ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join to skip incoming build duplicates, automatically stop skipping if too few

2019-01-15 Thread GitBox
ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join 
to skip incoming build duplicates, automatically stop skipping if too few
URL: https://github.com/apache/drill/pull/1606#discussion_r248152865
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinSpillControlImpl.java
 ##
 @@ -0,0 +1,238 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.physical.impl.join;
+
+  import org.apache.drill.exec.ExecConstants;
+  import org.apache.drill.exec.memory.BufferAllocator;
+  import org.apache.drill.exec.ops.FragmentContext;
+  import org.apache.drill.exec.record.RecordBatch;
+  import org.apache.drill.exec.record.RecordBatchMemoryManager;
+  import org.slf4j.Logger;
+  import org.slf4j.LoggerFactory;
+
+  import javax.annotation.Nullable;
+  import java.util.Set;
+
+  import static org.apache.drill.exec.record.JoinBatchMemoryManager.LEFT_INDEX;
+  import static 
org.apache.drill.exec.record.JoinBatchMemoryManager.RIGHT_INDEX;
+
+/**
+ * This class is currently used only for Semi-Hash-Join that avoids duplicates 
by the use of a hash table
+ * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill()} returns true if 
the memory available now to the allocator if not enough
+ * to hold (a multiple of, for safety) a new allocated batch
+ */
+public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.BuildSidePartitioning {
+  private static final Logger logger = 
LoggerFactory.getLogger(HashJoinSpillControlImpl.class);
+
+  private BufferAllocator allocator;
+  private int recordsPerBatch;
+  private int minBatchesInAvailableMemory;
+  private RecordBatchMemoryManager batchMemoryManager;
+  private int initialPartitions;
+  private int numPartitions;
+  private int recordsPerPartitionBatchProbe;
+  private FragmentContext context;
+  HashJoinMemoryCalculator.PartitionStatSet partitionStatSet;
+
+  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager, 
FragmentContext context) {
+this.allocator = allocator;
+this.recordsPerBatch = recordsPerBatch;
+this.minBatchesInAvailableMemory = minBatchesInAvailableMemory;
+this.batchMemoryManager = batchMemoryManager;
+this.context = context;
+  }
+
+  @Override
+  public boolean shouldSpill() {
+// Expected new batch size like the current, plus the Hash Values vector 
(4 bytes per HV)
+long batchSize = ( 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX).getRowAllocWidth() + 4 ) * 
recordsPerBatch;
+long reserveForOutgoing = batchMemoryManager.getOutputBatchSize();
+long memoryAvailableNow = allocator.getLimit() - 
allocator.getAllocatedMemory() - reserveForOutgoing;
+boolean needsSpill = minBatchesInAvailableMemory * batchSize > 
memoryAvailableNow;
+if ( needsSpill ) {
+  logger.debug("should spill now - batch size {}, mem avail {}, reserved 
for outgoing {}", batchSize, memoryAvailableNow, reserveForOutgoing);
+}
+return needsSpill;   // go spill if too little memory is available
+  }
+
+  @Override
+  public void initialize(boolean firstCycle,
+ boolean reserveHash,
+ RecordBatch buildSideBatch,
+ RecordBatch probeSideBatch,
+ Set joinColumns,
+ boolean probeEmpty,
+ long memoryAvailable,
+ int initialPartitions,
+ int recordsPerPartitionBatchBuild,
+ int recordsPerPartitionBatchProbe,
+ int maxBatchNumRecordsBuild,
+ int maxBatchNumRecordsProbe,
+ int outputBatchSize,
+ double loadFactor) {
+this.initialPartitions = initialPartitions;
+this.recordsPerPartitionBatchProbe = recordsPerPartitionBatchProbe;
+
+calculateMemoryUsage();
+  }
+
+  @Override
+  public void setPartitionStatSet(HashJoinMemoryCalculator.PartitionStatSet 
partitionStatSet) {
+this.partitionStatSet = 

[GitHub] ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join to skip incoming build duplicates, automatically stop skipping if too few

2019-01-15 Thread GitBox
ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join 
to skip incoming build duplicates, automatically stop skipping if too few
URL: https://github.com/apache/drill/pull/1606#discussion_r248160061
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinSpillControlImpl.java
 ##
 @@ -0,0 +1,238 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.physical.impl.join;
+
+  import org.apache.drill.exec.ExecConstants;
+  import org.apache.drill.exec.memory.BufferAllocator;
+  import org.apache.drill.exec.ops.FragmentContext;
+  import org.apache.drill.exec.record.RecordBatch;
+  import org.apache.drill.exec.record.RecordBatchMemoryManager;
+  import org.slf4j.Logger;
+  import org.slf4j.LoggerFactory;
+
+  import javax.annotation.Nullable;
+  import java.util.Set;
+
+  import static org.apache.drill.exec.record.JoinBatchMemoryManager.LEFT_INDEX;
+  import static 
org.apache.drill.exec.record.JoinBatchMemoryManager.RIGHT_INDEX;
+
+/**
+ * This class is currently used only for Semi-Hash-Join that avoids duplicates 
by the use of a hash table
+ * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill()} returns true if 
the memory available now to the allocator if not enough
+ * to hold (a multiple of, for safety) a new allocated batch
+ */
+public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.BuildSidePartitioning {
+  private static final Logger logger = 
LoggerFactory.getLogger(HashJoinSpillControlImpl.class);
+
+  private BufferAllocator allocator;
+  private int recordsPerBatch;
+  private int minBatchesInAvailableMemory;
+  private RecordBatchMemoryManager batchMemoryManager;
+  private int initialPartitions;
+  private int numPartitions;
+  private int recordsPerPartitionBatchProbe;
+  private FragmentContext context;
+  HashJoinMemoryCalculator.PartitionStatSet partitionStatSet;
+
+  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager, 
FragmentContext context) {
+this.allocator = allocator;
+this.recordsPerBatch = recordsPerBatch;
+this.minBatchesInAvailableMemory = minBatchesInAvailableMemory;
+this.batchMemoryManager = batchMemoryManager;
+this.context = context;
+  }
+
+  @Override
+  public boolean shouldSpill() {
+// Expected new batch size like the current, plus the Hash Values vector 
(4 bytes per HV)
+long batchSize = ( 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX).getRowAllocWidth() + 4 ) * 
recordsPerBatch;
+long reserveForOutgoing = batchMemoryManager.getOutputBatchSize();
+long memoryAvailableNow = allocator.getLimit() - 
allocator.getAllocatedMemory() - reserveForOutgoing;
+boolean needsSpill = minBatchesInAvailableMemory * batchSize > 
memoryAvailableNow;
+if ( needsSpill ) {
+  logger.debug("should spill now - batch size {}, mem avail {}, reserved 
for outgoing {}", batchSize, memoryAvailableNow, reserveForOutgoing);
+}
+return needsSpill;   // go spill if too little memory is available
+  }
+
+  @Override
+  public void initialize(boolean firstCycle,
+ boolean reserveHash,
+ RecordBatch buildSideBatch,
+ RecordBatch probeSideBatch,
+ Set joinColumns,
+ boolean probeEmpty,
+ long memoryAvailable,
+ int initialPartitions,
+ int recordsPerPartitionBatchBuild,
+ int recordsPerPartitionBatchProbe,
+ int maxBatchNumRecordsBuild,
+ int maxBatchNumRecordsProbe,
+ int outputBatchSize,
+ double loadFactor) {
+this.initialPartitions = initialPartitions;
+this.recordsPerPartitionBatchProbe = recordsPerPartitionBatchProbe;
+
+calculateMemoryUsage();
+  }
+
+  @Override
+  public void setPartitionStatSet(HashJoinMemoryCalculator.PartitionStatSet 
partitionStatSet) {
+this.partitionStatSet = 

[GitHub] ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join to skip incoming build duplicates, automatically stop skipping if too few

2019-01-15 Thread GitBox
ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join 
to skip incoming build duplicates, automatically stop skipping if too few
URL: https://github.com/apache/drill/pull/1606#discussion_r248158415
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinSpillControlImpl.java
 ##
 @@ -18,39 +18,211 @@
 
 package org.apache.drill.exec.physical.impl.join;
 
+  import org.apache.drill.exec.ExecConstants;
   import org.apache.drill.exec.memory.BufferAllocator;
+  import org.apache.drill.exec.ops.FragmentContext;
+  import org.apache.drill.exec.record.RecordBatch;
   import org.apache.drill.exec.record.RecordBatchMemoryManager;
-  import org.apache.drill.exec.record.RecordBatchSizer;
-  import org.apache.drill.exec.record.VectorContainer;
+  import org.slf4j.Logger;
+  import org.slf4j.LoggerFactory;
+
+  import javax.annotation.Nullable;
+  import java.util.Set;
+
+  import static org.apache.drill.exec.record.JoinBatchMemoryManager.LEFT_INDEX;
+  import static 
org.apache.drill.exec.record.JoinBatchMemoryManager.RIGHT_INDEX;
 
 /**
  * This class is currently used only for Semi-Hash-Join that avoids duplicates 
by the use of a hash table
- * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill(VectorContainer)} 
returns true if the memory available now to the allocator if not enough
+ * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill()} returns true if 
the memory available now to the allocator if not enough
  * to hold (a multiple of, for safety) a new allocated batch
  */
-public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.HashJoinSpillControl {
+public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.BuildSidePartitioning {
+  private static final Logger logger = 
LoggerFactory.getLogger(HashJoinSpillControlImpl.class);
+
   private BufferAllocator allocator;
   private int recordsPerBatch;
   private int minBatchesInAvailableMemory;
   private RecordBatchMemoryManager batchMemoryManager;
+  private int initialPartitions;
+  private int numPartitions;
+  private int recordsPerPartitionBatchProbe;
+  private FragmentContext context;
 
-  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager) {
+  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager, 
FragmentContext context) {
 this.allocator = allocator;
 this.recordsPerBatch = recordsPerBatch;
 this.minBatchesInAvailableMemory = minBatchesInAvailableMemory;
 this.batchMemoryManager = batchMemoryManager;
+this.context = context;
   }
 
   @Override
-  public boolean shouldSpill(VectorContainer currentVectorContainer) {
-assert currentVectorContainer.hasRecordCount();
-assert currentVectorContainer.getRecordCount() == recordsPerBatch;
-// Expected new batch size like the current, plus the Hash Value vector (4 
bytes per HV)
-long batchSize = new 
RecordBatchSizer(currentVectorContainer).getActualSize() + 4 * recordsPerBatch;
+  public boolean shouldSpill() {
+// Expected new batch size like the current, plus the Hash Values vector 
(4 bytes per HV)
+long batchSize = ( 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX).getRowAllocWidth() + 4 ) * 
recordsPerBatch;
 long reserveForOutgoing = batchMemoryManager.getOutputBatchSize();
 long memoryAvailableNow = allocator.getLimit() - 
allocator.getAllocatedMemory() - reserveForOutgoing;
 
 Review comment:
   - By *partition batch* I was referring to the small batches that are used to 
store the records that are appended to each partition. They are stored in the 
HashPartition.tmpBatchesList list. We need to reserve space for one partition 
batch for each partition. Otherwise we can run into a scenario where we OOM 
when appending a row to a partition and allocate a new partition batch to store 
it. I already have code to account for these in my memory calculator. See 
**HashJoinMemoryCalculatorImpl lines 361 and 373**.
   
   - Also we should not be using MIN_BATCHES_IN_MEMORY. We can account for the 
probe batches exactly when handling a spilled partition. I have already 
implemented the logic for this in **HashJoinMemoryCalculatorImpl lines 363 - 
365, 378 - 380, and 383 - 389**. It should just be a matter of copying the code 
and plugging it in. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join to skip incoming build duplicates, automatically stop skipping if too few

2019-01-15 Thread GitBox
ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join 
to skip incoming build duplicates, automatically stop skipping if too few
URL: https://github.com/apache/drill/pull/1606#discussion_r248161996
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinSpillControlImpl.java
 ##
 @@ -18,39 +18,211 @@
 
 package org.apache.drill.exec.physical.impl.join;
 
+  import org.apache.drill.exec.ExecConstants;
   import org.apache.drill.exec.memory.BufferAllocator;
+  import org.apache.drill.exec.ops.FragmentContext;
+  import org.apache.drill.exec.record.RecordBatch;
   import org.apache.drill.exec.record.RecordBatchMemoryManager;
-  import org.apache.drill.exec.record.RecordBatchSizer;
-  import org.apache.drill.exec.record.VectorContainer;
+  import org.slf4j.Logger;
+  import org.slf4j.LoggerFactory;
+
+  import javax.annotation.Nullable;
+  import java.util.Set;
+
+  import static org.apache.drill.exec.record.JoinBatchMemoryManager.LEFT_INDEX;
+  import static 
org.apache.drill.exec.record.JoinBatchMemoryManager.RIGHT_INDEX;
 
 /**
  * This class is currently used only for Semi-Hash-Join that avoids duplicates 
by the use of a hash table
- * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill(VectorContainer)} 
returns true if the memory available now to the allocator if not enough
+ * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill()} returns true if 
the memory available now to the allocator if not enough
  * to hold (a multiple of, for safety) a new allocated batch
  */
-public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.HashJoinSpillControl {
+public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.BuildSidePartitioning {
+  private static final Logger logger = 
LoggerFactory.getLogger(HashJoinSpillControlImpl.class);
+
   private BufferAllocator allocator;
   private int recordsPerBatch;
   private int minBatchesInAvailableMemory;
   private RecordBatchMemoryManager batchMemoryManager;
+  private int initialPartitions;
+  private int numPartitions;
+  private int recordsPerPartitionBatchProbe;
+  private FragmentContext context;
 
-  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager) {
+  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager, 
FragmentContext context) {
 this.allocator = allocator;
 this.recordsPerBatch = recordsPerBatch;
 this.minBatchesInAvailableMemory = minBatchesInAvailableMemory;
 this.batchMemoryManager = batchMemoryManager;
+this.context = context;
   }
 
   @Override
-  public boolean shouldSpill(VectorContainer currentVectorContainer) {
-assert currentVectorContainer.hasRecordCount();
-assert currentVectorContainer.getRecordCount() == recordsPerBatch;
-// Expected new batch size like the current, plus the Hash Value vector (4 
bytes per HV)
-long batchSize = new 
RecordBatchSizer(currentVectorContainer).getActualSize() + 4 * recordsPerBatch;
+  public boolean shouldSpill() {
+// Expected new batch size like the current, plus the Hash Values vector 
(4 bytes per HV)
+long batchSize = ( 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX).getRowAllocWidth() + 4 ) * 
recordsPerBatch;
 long reserveForOutgoing = batchMemoryManager.getOutputBatchSize();
 long memoryAvailableNow = allocator.getLimit() - 
allocator.getAllocatedMemory() - reserveForOutgoing;
 boolean needsSpill = minBatchesInAvailableMemory * batchSize > 
memoryAvailableNow;
+if ( needsSpill ) {
+  logger.debug("should spill now - batch size {}, mem avail {}, reserved 
for outgoing {}", batchSize, memoryAvailableNow, reserveForOutgoing);
+}
 return needsSpill;   // go spill if too little memory is available
   }
 
+  @Override
+  public void initialize(boolean firstCycle,
+ boolean reserveHash,
+ RecordBatch buildSideBatch,
+ RecordBatch probeSideBatch,
+ Set joinColumns,
+ boolean probeEmpty,
+ long memoryAvailable,
+ int initialPartitions,
+ int recordsPerPartitionBatchBuild,
+ int recordsPerPartitionBatchProbe,
+ int maxBatchNumRecordsBuild,
+ int maxBatchNumRecordsProbe,
+ int outputBatchSize,
+ double loadFactor) {
+this.initialPartitions = initialPartitions;
+this.recordsPerPartitionBatchProbe = recordsPerPartitionBatchProbe;
+
+calculateMemoryUsage();
+  }
+
+  @Override
+  public void setPartitionStatSet(HashJoinMemoryCalculator.PartitionStatSet 

[GitHub] ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join to skip incoming build duplicates, automatically stop skipping if too few

2019-01-15 Thread GitBox
ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join 
to skip incoming build duplicates, automatically stop skipping if too few
URL: https://github.com/apache/drill/pull/1606#discussion_r248153681
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinSpillControlImpl.java
 ##
 @@ -0,0 +1,238 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.physical.impl.join;
+
+  import org.apache.drill.exec.ExecConstants;
+  import org.apache.drill.exec.memory.BufferAllocator;
+  import org.apache.drill.exec.ops.FragmentContext;
+  import org.apache.drill.exec.record.RecordBatch;
+  import org.apache.drill.exec.record.RecordBatchMemoryManager;
+  import org.slf4j.Logger;
+  import org.slf4j.LoggerFactory;
+
+  import javax.annotation.Nullable;
+  import java.util.Set;
+
+  import static org.apache.drill.exec.record.JoinBatchMemoryManager.LEFT_INDEX;
+  import static 
org.apache.drill.exec.record.JoinBatchMemoryManager.RIGHT_INDEX;
+
+/**
+ * This class is currently used only for Semi-Hash-Join that avoids duplicates 
by the use of a hash table
+ * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill()} returns true if 
the memory available now to the allocator if not enough
+ * to hold (a multiple of, for safety) a new allocated batch
+ */
+public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.BuildSidePartitioning {
+  private static final Logger logger = 
LoggerFactory.getLogger(HashJoinSpillControlImpl.class);
+
+  private BufferAllocator allocator;
+  private int recordsPerBatch;
+  private int minBatchesInAvailableMemory;
+  private RecordBatchMemoryManager batchMemoryManager;
+  private int initialPartitions;
+  private int numPartitions;
+  private int recordsPerPartitionBatchProbe;
+  private FragmentContext context;
+  HashJoinMemoryCalculator.PartitionStatSet partitionStatSet;
+
+  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager, 
FragmentContext context) {
+this.allocator = allocator;
+this.recordsPerBatch = recordsPerBatch;
+this.minBatchesInAvailableMemory = minBatchesInAvailableMemory;
+this.batchMemoryManager = batchMemoryManager;
+this.context = context;
+  }
+
+  @Override
+  public boolean shouldSpill() {
+// Expected new batch size like the current, plus the Hash Values vector 
(4 bytes per HV)
+long batchSize = ( 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX).getRowAllocWidth() + 4 ) * 
recordsPerBatch;
+long reserveForOutgoing = batchMemoryManager.getOutputBatchSize();
+long memoryAvailableNow = allocator.getLimit() - 
allocator.getAllocatedMemory() - reserveForOutgoing;
+boolean needsSpill = minBatchesInAvailableMemory * batchSize > 
memoryAvailableNow;
+if ( needsSpill ) {
+  logger.debug("should spill now - batch size {}, mem avail {}, reserved 
for outgoing {}", batchSize, memoryAvailableNow, reserveForOutgoing);
+}
+return needsSpill;   // go spill if too little memory is available
+  }
+
+  @Override
+  public void initialize(boolean firstCycle,
+ boolean reserveHash,
+ RecordBatch buildSideBatch,
+ RecordBatch probeSideBatch,
+ Set joinColumns,
+ boolean probeEmpty,
+ long memoryAvailable,
+ int initialPartitions,
+ int recordsPerPartitionBatchBuild,
+ int recordsPerPartitionBatchProbe,
+ int maxBatchNumRecordsBuild,
+ int maxBatchNumRecordsProbe,
+ int outputBatchSize,
+ double loadFactor) {
+this.initialPartitions = initialPartitions;
+this.recordsPerPartitionBatchProbe = recordsPerPartitionBatchProbe;
+
+calculateMemoryUsage();
+  }
+
+  @Override
+  public void setPartitionStatSet(HashJoinMemoryCalculator.PartitionStatSet 
partitionStatSet) {
+this.partitionStatSet = 

[GitHub] ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join to skip incoming build duplicates, automatically stop skipping if too few

2019-01-15 Thread GitBox
ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join 
to skip incoming build duplicates, automatically stop skipping if too few
URL: https://github.com/apache/drill/pull/1606#discussion_r248151688
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -959,14 +964,40 @@ public IterOutcome executeBuildPhase() throws 
SchemaChangeException {
 numPartitions,
 RECORDS_PER_BATCH,
 RECORDS_PER_BATCH,
-maxBatchSize,
-maxBatchSize,
+maxBatchRowCount,
+maxBatchRowCount,
 batchMemoryManager.getOutputBatchSize(),
 HashTable.DEFAULT_LOAD_FACTOR);
 
   if (spilledState.isFirstCycle() && doMemoryCalculation) {
 // Do auto tuning
-buildCalc = partitionNumTuning(maxBatchSize, buildCalc);
+buildCalc = partitionNumTuning(maxBatchRowCount, buildCalc);
+  }
+  if ( semiSkipDuplicates ) {
+// in case of a Semi Join skippinging duplicates, use a "spill 
control" calc
+// (may revert back to the buildCalc if the code decides to stop 
skipping)
+currentCalc = new HashJoinSpillControlImpl(allocator, 
RECORDS_PER_BATCH,
+  (int) 
context.getOptions().getOption(ExecConstants.HASHJOIN_MIN_BATCHES_IN_AVAILABLE_MEMORY_VALIDATOR),
+  batchMemoryManager, context);
+
+// calculates the max number of partitions possible
+if ( spilledState.isFirstCycle() && doMemoryCalculation ) {
+  currentCalc.initialize(spilledState.isFirstCycle(), true, // TODO 
Fix after growing hash values bug fixed
+  buildBatch,
+  probeBatch,
+  buildJoinColumns,
+  probeSideIsEmpty.booleanValue(),
+  allocator.getLimit(),
+  numPartitions,
+  RECORDS_PER_BATCH,
+  RECORDS_PER_BATCH,
+  maxBatchRowCount,
+  maxBatchRowCount,
+  batchMemoryManager.getOutputBatchSize(),
+  HashTable.DEFAULT_LOAD_FACTOR);
+
+  numPartitions = currentCalc.getNumPartitions();
 
 Review comment:
   Make this the `numPartitions = min(numPartitions, 
currentCalc.getNumPartitions())`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6979) Add autofocus attribute to username on login page, and to query textbox on Query tab.

2019-01-15 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-6979:
-

 Summary: Add autofocus attribute to username on login page, and to 
query textbox on Query tab.
 Key: DRILL-6979
 URL: https://issues.apache.org/jira/browse/DRILL-6979
 Project: Apache Drill
  Issue Type: Improvement
  Components: Web Server
Affects Versions: 1.16.0
Reporter: Khurram Faraaz
Assignee: Khurram Faraaz


Add autofocus attribute to username on login page, and to query textbox on 
Query tab.
The two text boxes that need the change are in these files

./exec/java-exec/src/main/resources/rest/query/query.ftl
./exec/java-exec/src/main/resources/rest/login.ftl



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join to skip incoming build duplicates, automatically stop skipping if too few

2019-01-15 Thread GitBox
ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join 
to skip incoming build duplicates, automatically stop skipping if too few
URL: https://github.com/apache/drill/pull/1606#discussion_r248150656
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinSpillControlImpl.java
 ##
 @@ -18,39 +18,211 @@
 
 package org.apache.drill.exec.physical.impl.join;
 
+  import org.apache.drill.exec.ExecConstants;
   import org.apache.drill.exec.memory.BufferAllocator;
+  import org.apache.drill.exec.ops.FragmentContext;
+  import org.apache.drill.exec.record.RecordBatch;
   import org.apache.drill.exec.record.RecordBatchMemoryManager;
-  import org.apache.drill.exec.record.RecordBatchSizer;
-  import org.apache.drill.exec.record.VectorContainer;
+  import org.slf4j.Logger;
+  import org.slf4j.LoggerFactory;
+
+  import javax.annotation.Nullable;
+  import java.util.Set;
+
+  import static org.apache.drill.exec.record.JoinBatchMemoryManager.LEFT_INDEX;
+  import static 
org.apache.drill.exec.record.JoinBatchMemoryManager.RIGHT_INDEX;
 
 /**
  * This class is currently used only for Semi-Hash-Join that avoids duplicates 
by the use of a hash table
- * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill(VectorContainer)} 
returns true if the memory available now to the allocator if not enough
+ * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill()} returns true if 
the memory available now to the allocator if not enough
  * to hold (a multiple of, for safety) a new allocated batch
  */
-public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.HashJoinSpillControl {
+public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.BuildSidePartitioning {
+  private static final Logger logger = 
LoggerFactory.getLogger(HashJoinSpillControlImpl.class);
+
   private BufferAllocator allocator;
   private int recordsPerBatch;
   private int minBatchesInAvailableMemory;
   private RecordBatchMemoryManager batchMemoryManager;
+  private int initialPartitions;
+  private int numPartitions;
+  private int recordsPerPartitionBatchProbe;
+  private FragmentContext context;
 
-  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager) {
+  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager, 
FragmentContext context) {
 this.allocator = allocator;
 this.recordsPerBatch = recordsPerBatch;
 this.minBatchesInAvailableMemory = minBatchesInAvailableMemory;
 this.batchMemoryManager = batchMemoryManager;
+this.context = context;
   }
 
   @Override
-  public boolean shouldSpill(VectorContainer currentVectorContainer) {
-assert currentVectorContainer.hasRecordCount();
-assert currentVectorContainer.getRecordCount() == recordsPerBatch;
-// Expected new batch size like the current, plus the Hash Value vector (4 
bytes per HV)
-long batchSize = new 
RecordBatchSizer(currentVectorContainer).getActualSize() + 4 * recordsPerBatch;
+  public boolean shouldSpill() {
+// Expected new batch size like the current, plus the Hash Values vector 
(4 bytes per HV)
+long batchSize = ( 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX).getRowAllocWidth() + 4 ) * 
recordsPerBatch;
 long reserveForOutgoing = batchMemoryManager.getOutputBatchSize();
 long memoryAvailableNow = allocator.getLimit() - 
allocator.getAllocatedMemory() - reserveForOutgoing;
 boolean needsSpill = minBatchesInAvailableMemory * batchSize > 
memoryAvailableNow;
+if ( needsSpill ) {
+  logger.debug("should spill now - batch size {}, mem avail {}, reserved 
for outgoing {}", batchSize, memoryAvailableNow, reserveForOutgoing);
+}
 return needsSpill;   // go spill if too little memory is available
   }
 
+  @Override
+  public void initialize(boolean firstCycle,
+ boolean reserveHash,
+ RecordBatch buildSideBatch,
+ RecordBatch probeSideBatch,
+ Set joinColumns,
+ boolean probeEmpty,
+ long memoryAvailable,
+ int initialPartitions,
+ int recordsPerPartitionBatchBuild,
+ int recordsPerPartitionBatchProbe,
+ int maxBatchNumRecordsBuild,
+ int maxBatchNumRecordsProbe,
+ int outputBatchSize,
+ double loadFactor) {
+this.initialPartitions = initialPartitions;
+this.recordsPerPartitionBatchProbe = recordsPerPartitionBatchProbe;
+
+calculateMemoryUsage();
+  }
+
+  @Override
+  public void setPartitionStatSet(HashJoinMemoryCalculator.PartitionStatSet 

[GitHub] ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join to skip incoming build duplicates, automatically stop skipping if too few

2019-01-15 Thread GitBox
ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join 
to skip incoming build duplicates, automatically stop skipping if too few
URL: https://github.com/apache/drill/pull/1606#discussion_r248150656
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinSpillControlImpl.java
 ##
 @@ -18,39 +18,211 @@
 
 package org.apache.drill.exec.physical.impl.join;
 
+  import org.apache.drill.exec.ExecConstants;
   import org.apache.drill.exec.memory.BufferAllocator;
+  import org.apache.drill.exec.ops.FragmentContext;
+  import org.apache.drill.exec.record.RecordBatch;
   import org.apache.drill.exec.record.RecordBatchMemoryManager;
-  import org.apache.drill.exec.record.RecordBatchSizer;
-  import org.apache.drill.exec.record.VectorContainer;
+  import org.slf4j.Logger;
+  import org.slf4j.LoggerFactory;
+
+  import javax.annotation.Nullable;
+  import java.util.Set;
+
+  import static org.apache.drill.exec.record.JoinBatchMemoryManager.LEFT_INDEX;
+  import static 
org.apache.drill.exec.record.JoinBatchMemoryManager.RIGHT_INDEX;
 
 /**
  * This class is currently used only for Semi-Hash-Join that avoids duplicates 
by the use of a hash table
- * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill(VectorContainer)} 
returns true if the memory available now to the allocator if not enough
+ * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill()} returns true if 
the memory available now to the allocator if not enough
  * to hold (a multiple of, for safety) a new allocated batch
  */
-public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.HashJoinSpillControl {
+public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.BuildSidePartitioning {
+  private static final Logger logger = 
LoggerFactory.getLogger(HashJoinSpillControlImpl.class);
+
   private BufferAllocator allocator;
   private int recordsPerBatch;
   private int minBatchesInAvailableMemory;
   private RecordBatchMemoryManager batchMemoryManager;
+  private int initialPartitions;
+  private int numPartitions;
+  private int recordsPerPartitionBatchProbe;
+  private FragmentContext context;
 
-  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager) {
+  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager, 
FragmentContext context) {
 this.allocator = allocator;
 this.recordsPerBatch = recordsPerBatch;
 this.minBatchesInAvailableMemory = minBatchesInAvailableMemory;
 this.batchMemoryManager = batchMemoryManager;
+this.context = context;
   }
 
   @Override
-  public boolean shouldSpill(VectorContainer currentVectorContainer) {
-assert currentVectorContainer.hasRecordCount();
-assert currentVectorContainer.getRecordCount() == recordsPerBatch;
-// Expected new batch size like the current, plus the Hash Value vector (4 
bytes per HV)
-long batchSize = new 
RecordBatchSizer(currentVectorContainer).getActualSize() + 4 * recordsPerBatch;
+  public boolean shouldSpill() {
+// Expected new batch size like the current, plus the Hash Values vector 
(4 bytes per HV)
+long batchSize = ( 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX).getRowAllocWidth() + 4 ) * 
recordsPerBatch;
 long reserveForOutgoing = batchMemoryManager.getOutputBatchSize();
 long memoryAvailableNow = allocator.getLimit() - 
allocator.getAllocatedMemory() - reserveForOutgoing;
 boolean needsSpill = minBatchesInAvailableMemory * batchSize > 
memoryAvailableNow;
+if ( needsSpill ) {
+  logger.debug("should spill now - batch size {}, mem avail {}, reserved 
for outgoing {}", batchSize, memoryAvailableNow, reserveForOutgoing);
+}
 return needsSpill;   // go spill if too little memory is available
   }
 
+  @Override
+  public void initialize(boolean firstCycle,
+ boolean reserveHash,
+ RecordBatch buildSideBatch,
+ RecordBatch probeSideBatch,
+ Set joinColumns,
+ boolean probeEmpty,
+ long memoryAvailable,
+ int initialPartitions,
+ int recordsPerPartitionBatchBuild,
+ int recordsPerPartitionBatchProbe,
+ int maxBatchNumRecordsBuild,
+ int maxBatchNumRecordsProbe,
+ int outputBatchSize,
+ double loadFactor) {
+this.initialPartitions = initialPartitions;
+this.recordsPerPartitionBatchProbe = recordsPerPartitionBatchProbe;
+
+calculateMemoryUsage();
+  }
+
+  @Override
+  public void setPartitionStatSet(HashJoinMemoryCalculator.PartitionStatSet 

[GitHub] ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join to skip incoming build duplicates, automatically stop skipping if too few

2019-01-15 Thread GitBox
ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join 
to skip incoming build duplicates, automatically stop skipping if too few
URL: https://github.com/apache/drill/pull/1606#discussion_r248149480
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinSpillControlImpl.java
 ##
 @@ -18,39 +18,211 @@
 
 package org.apache.drill.exec.physical.impl.join;
 
+  import org.apache.drill.exec.ExecConstants;
   import org.apache.drill.exec.memory.BufferAllocator;
+  import org.apache.drill.exec.ops.FragmentContext;
+  import org.apache.drill.exec.record.RecordBatch;
   import org.apache.drill.exec.record.RecordBatchMemoryManager;
-  import org.apache.drill.exec.record.RecordBatchSizer;
-  import org.apache.drill.exec.record.VectorContainer;
+  import org.slf4j.Logger;
+  import org.slf4j.LoggerFactory;
+
+  import javax.annotation.Nullable;
+  import java.util.Set;
+
+  import static org.apache.drill.exec.record.JoinBatchMemoryManager.LEFT_INDEX;
+  import static 
org.apache.drill.exec.record.JoinBatchMemoryManager.RIGHT_INDEX;
 
 /**
  * This class is currently used only for Semi-Hash-Join that avoids duplicates 
by the use of a hash table
- * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill(VectorContainer)} 
returns true if the memory available now to the allocator if not enough
+ * The method {@link 
HashJoinMemoryCalculator.HashJoinSpillControl#shouldSpill()} returns true if 
the memory available now to the allocator if not enough
  * to hold (a multiple of, for safety) a new allocated batch
  */
-public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.HashJoinSpillControl {
+public class HashJoinSpillControlImpl implements 
HashJoinMemoryCalculator.BuildSidePartitioning {
+  private static final Logger logger = 
LoggerFactory.getLogger(HashJoinSpillControlImpl.class);
+
   private BufferAllocator allocator;
   private int recordsPerBatch;
   private int minBatchesInAvailableMemory;
   private RecordBatchMemoryManager batchMemoryManager;
+  private int initialPartitions;
+  private int numPartitions;
+  private int recordsPerPartitionBatchProbe;
+  private FragmentContext context;
 
-  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager) {
+  HashJoinSpillControlImpl(BufferAllocator allocator, int recordsPerBatch, int 
minBatchesInAvailableMemory, RecordBatchMemoryManager batchMemoryManager, 
FragmentContext context) {
 this.allocator = allocator;
 this.recordsPerBatch = recordsPerBatch;
 this.minBatchesInAvailableMemory = minBatchesInAvailableMemory;
 this.batchMemoryManager = batchMemoryManager;
+this.context = context;
   }
 
   @Override
-  public boolean shouldSpill(VectorContainer currentVectorContainer) {
-assert currentVectorContainer.hasRecordCount();
-assert currentVectorContainer.getRecordCount() == recordsPerBatch;
-// Expected new batch size like the current, plus the Hash Value vector (4 
bytes per HV)
-long batchSize = new 
RecordBatchSizer(currentVectorContainer).getActualSize() + 4 * recordsPerBatch;
+  public boolean shouldSpill() {
+// Expected new batch size like the current, plus the Hash Values vector 
(4 bytes per HV)
+long batchSize = ( 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX).getRowAllocWidth() + 4 ) * 
recordsPerBatch;
 long reserveForOutgoing = batchMemoryManager.getOutputBatchSize();
 long memoryAvailableNow = allocator.getLimit() - 
allocator.getAllocatedMemory() - reserveForOutgoing;
 boolean needsSpill = minBatchesInAvailableMemory * batchSize > 
memoryAvailableNow;
+if ( needsSpill ) {
+  logger.debug("should spill now - batch size {}, mem avail {}, reserved 
for outgoing {}", batchSize, memoryAvailableNow, reserveForOutgoing);
+}
 return needsSpill;   // go spill if too little memory is available
   }
 
+  @Override
+  public void initialize(boolean firstCycle,
+ boolean reserveHash,
+ RecordBatch buildSideBatch,
+ RecordBatch probeSideBatch,
+ Set joinColumns,
+ boolean probeEmpty,
+ long memoryAvailable,
+ int initialPartitions,
+ int recordsPerPartitionBatchBuild,
+ int recordsPerPartitionBatchProbe,
+ int maxBatchNumRecordsBuild,
+ int maxBatchNumRecordsProbe,
+ int outputBatchSize,
+ double loadFactor) {
+this.initialPartitions = initialPartitions;
+this.recordsPerPartitionBatchProbe = recordsPerPartitionBatchProbe;
+
+calculateMemoryUsage();
+  }
+
+  @Override
+  public void setPartitionStatSet(HashJoinMemoryCalculator.PartitionStatSet 

[GitHub] ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join to skip incoming build duplicates, automatically stop skipping if too few

2019-01-15 Thread GitBox
ilooner commented on a change in pull request #1606: DRILL-6845: Semi-Hash-Join 
to skip incoming build duplicates, automatically stop skipping if too few
URL: https://github.com/apache/drill/pull/1606#discussion_r248149275
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -973,9 +973,32 @@ public IterOutcome executeBuildPhase() throws 
SchemaChangeException {
 // Do auto tuning
 buildCalc = partitionNumTuning(maxBatchRowCount, buildCalc);
   }
-  // to be used in case of a Semi Join skippinging duplicates
-  spillControlCalc = new HashJoinSpillControlImpl(allocator, 
RECORDS_PER_BATCH,
-(int) 
context.getOptions().getOption(ExecConstants.HASHJOIN_MIN_BATCHES_IN_AVAILABLE_MEMORY_VALIDATOR),
 batchMemoryManager);
+  if ( semiSkipDuplicates ) {
+// in case of a Semi Join skippinging duplicates, use a "spill 
control" calc
+// (may revert back to the buildCalc if the code decides to stop 
skipping)
+currentCalc = new HashJoinSpillControlImpl(allocator, 
RECORDS_PER_BATCH,
+  (int) 
context.getOptions().getOption(ExecConstants.HASHJOIN_MIN_BATCHES_IN_AVAILABLE_MEMORY_VALIDATOR),
 
 Review comment:
   I understand we want to have a knob to allow for some slack. But there is 
already the safety factor parameter that does this. Why give the user yet 
another tuning parameter when one already exists? It makes things more 
difficult for the user.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Ben-Zvi opened a new pull request #1606: Drill 6845: Semi-Hash-Join to skip incoming build duplicates, automatically stop skipping if too few

2019-01-15 Thread GitBox
Ben-Zvi opened a new pull request #1606: Drill 6845: Semi-Hash-Join to skip 
incoming build duplicates, automatically stop skipping if too few
URL: https://github.com/apache/drill/pull/1606
 
 
   The first two commits here were extracted from the original PR #1522 
(DRILL-6735), where the Semi-Hash-Join was implemented in a straightforward 
way: Read data like a regular hash join (e.g. into partitions, then later build 
hash-tables), and only during probe time perform at most a single probe match. 
  The issue with the above implementation is the case of excessive incoming 
build-side duplicates (more common with synthetic data in benchmarks). In such 
a case, reading *all* the data first can blow up the hash join memory (e.g., 
cause spills) and regress performance.
   
  This PR addresses the problem by creating the hash-tables first, and 
using them to detect build duplicates early (before copying from incoming into 
partitions), so those duplicates can be simply ignored/skipped (see the new 
method `insertKeyIntoHashTable()`).  After all the build side is read (if no 
spill), there is no need to build the hash tables as they already exist - see 
the new method `buildContainers()` .  
   All this logic is in the **first** commit. The issue with this logic is 
that it adds overhead (e.g., hash table doubling), which is a waste when there 
are very little duplicates. So this issue is addressed by the second commit. 
(Also note the new option `semi_skip_duplicates` that can be used to disable 
this whole feature).
   
  The **second** commit performs some "runtime statistics" to decide if 
there are too few duplicates. In such a case, it drops those hash tables and 
falls back to the simple semi-join work (a la PR #1522). This decision uses a 
"threshold", which is half the size of all the hash tables (so they won't 
double), and incoming duplicates are counted. After so many incoming rows are 
processed, the percentage of duplicates is checked - if under %20 (hard coded), 
then stop skipping, else continue using the hash tables to eliminate the 
duplicates.
   
 The **third** commit extends the memory manager to handle this special 
"duplicate skipping" mode. With a new class `HashJoinSpillControlImpl` and 
interface `HashJoinMemoryCalculator.HashJoinSpillControl`. The technique used 
for `shouldSpill()` is simply ensuring that the available memory is large 
enough for at least 3 (see below) more batches.  That required a change to all 
the `shouldSpill()` methods - add the `currentVectorContainer` parameter.
   Most of the code changes in HashPartition were a rename (batch -> 
vectorContainer) and in HashJoinBatch (added "semi" to some variable names).
   As for "running out of memory" while inserting into the hash table 
(either allocating a new keys batch, or resizing the hash table) -- this is 
handled by the hash table throwing `RetryAfterSpillException`, which is caught 
in the new `insertKeyIntoHashTable()` which leads to a spill, and a reset of 
the hash table anyway, and return false (it's a new key - it would be inserted 
into the new empty hash-table).  So this case is much simpler than Hash-Aggr.
   
  The **fourth** commit adds an option `min_batches_in_available_memory` 
instead of the above hard coded "3". Also added a method `IntegerValidator` 
that can specify the min/max values. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Ben-Zvi commented on issue #1606: Drill 6845: Semi-Hash-Join to skip incoming build duplicates, automatically stop skipping if too few

2019-01-15 Thread GitBox
Ben-Zvi commented on issue #1606: Drill 6845: Semi-Hash-Join to skip incoming 
build duplicates, automatically stop skipping if too few
URL: https://github.com/apache/drill/pull/1606#issuecomment-454634848
 
 
   Seems that this PR was closed by mistake; re-opening now.
   @ilooner - do you have more comments or suggestions ?  we are trying to 
finish and commit this work soon. The spill control may not be perfect, but in 
most use cases this "duplicate skipping" work will not depend on such fine 
control.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sohami commented on a change in pull request #1611: DRILL-6971: Display query state in query result page

2019-01-15 Thread GitBox
sohami commented on a change in pull request #1611: DRILL-6971: Display query 
state in query result page
URL: https://github.com/apache/drill/pull/1611#discussion_r248102321
 
 

 ##
 File path: exec/java-exec/src/main/resources/rest/query/result.ftl
 ##
 @@ -33,11 +33,21 @@
   
 
   
-  Query Profile: ${model.getQueryId()} 
+  Query Profile: ${model.getQueryId()} <#switch 
model.getQueryState()>
+<#case "COMPLETED">
+  
+  <#break>
+<#case "CANCELED">
+  
+  <#break>
 
 Review comment:
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sohami commented on a change in pull request #1611: DRILL-6971: Display query state in query result page

2019-01-15 Thread GitBox
sohami commented on a change in pull request #1611: DRILL-6971: Display query 
state in query result page
URL: https://github.com/apache/drill/pull/1611#discussion_r248102188
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/AbstractDisposableUserClientConnection.java
 ##
 @@ -73,6 +75,7 @@ public void sendResult(RpcOutcomeListener listener, 
QueryResult result) {
 
 // Release the wait latch if the query is terminated.
 final QueryState state = result.getQueryState();
+queryState = state.toString();
 
 Review comment:
   Not required as type of variable is reflecting that it's a string.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sohami commented on a change in pull request #1611: DRILL-6971: Display query state in query result page

2019-01-15 Thread GitBox
sohami commented on a change in pull request #1611: DRILL-6971: Display query 
state in query result page
URL: https://github.com/apache/drill/pull/1611#discussion_r248102275
 
 

 ##
 File path: exec/java-exec/src/main/resources/rest/query/result.ftl
 ##
 @@ -34,6 +34,9 @@
 
   
   Query Profile: ${model.getQueryId()} 
+ 
+ 
+ Query 
State: ${model.getQueryState()}
 
 Review comment:
   Thanks for sharing the change for this. Added commit in this PR


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (DRILL-4721) Doc "dateDiff" in Drill

2019-01-15 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens resolved DRILL-4721.
---
   Resolution: Fixed
 Reviewer: Vitalii Diravka
Fix Version/s: 1.15.0

Documented the date_diff function.

> Doc "dateDiff" in Drill
> ---
>
> Key: DRILL-4721
> URL: https://issues.apache.org/jira/browse/DRILL-4721
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation
>Reporter: Bridget Bevens
>Assignee: Bridget Bevens
>Priority: Minor
> Fix For: 1.15.0
>
>
> Fwd: [mapr-tech-qa:12242]Doc "dateDiff" in Drill
> Inbox
> x 
> Neeraja Rentachintala
> 3:32 PM (19 minutes ago)
> to me 
> we should document the datediff function. Cisco was asking about it today.
> -- Forwarded message --
> From: Bob Rumsby 
> Date: Thu, Aug 20, 2015 at 11:56 AM
> Subject: Re: [mapr-tech-qa:12242] "dateDiff" in Drill
> To: "mapr-tech...@maprtech.com" 
> Yes, it should be. We can fix that.
> Bob
> On Thu, Aug 20, 2015 at 11:46 AM, Joseph Blue  wrote:
> Seems as though functions like datediff should be here:
> https://drill.apache.org/docs/date-time-functions-and-arithmetic/
> On Thu, Aug 20, 2015 at 11:32 AM, Ted Dunning  wrote:
> Joe
> Do you have a suggestion for the docs?  Perhaps a few cross links would make 
> it better?
> Sent from my iPhone
> On Aug 20, 2015, at 5:36, Joseph Blue  wrote:
> OK, duh. Thanks for that. I just went down the wrong path in the 
> documentation. 
> This obviously does exactly what I want.
> On Wed, Aug 19, 2015 at 10:26 PM, Mehant Baid  wrote:
> You can use the datediff function as follows:
>  select datediff(date '2008-2-23', date '2008-1-20') from cp.`employee.json` 
> limit 1;
> +-+
> | EXPR$0  |
> +-+
> | 34  |
> +-+
> Thanks
> Mehant
> On Wed, Aug 19, 2015 at 7:00 PM, Joseph Blue  wrote:
> Want to get the difference between two dates in Drill. I used the AGE 
> function which produces an interval. But I want the answer in days, not 
> months & days. Am I missing something from the documentation?
> select age(cast('2015-01-01' as timestamp),cast('2014-11-30' as timestamp)) 
> from sys.version;
> +-+
> | EXPR$0  |
> +-+
> | P1M2D   |
> +-+
> -- 
> Joseph Blue
> Data Scientist



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6978) typeOf drillTypeOf sqlTypeOf not work with generated tables

2019-01-15 Thread benj (JIRA)
benj created DRILL-6978:
---

 Summary: typeOf drillTypeOf sqlTypeOf not work with generated 
tables
 Key: DRILL-6978
 URL: https://issues.apache.org/jira/browse/DRILL-6978
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.15.0
Reporter: benj


 

*TypeOf functions works when request on files but doesn't work on "generated" 
data

This works :
{code:java}
SELECT typeof(md5), drillTypeOf(md5), sqlTypeOf(md5) FROM dfs.tmp.`mytable.csv` 
LIMIT 2;
=> (OK)
+--+--++
|  EXPR$0  |  EXPR$1  |   EXPR$2   |
+--+--++
| VARCHAR  | VARCHAR  | CHARACTER VARYING  |
| VARCHAR  | VARCHAR  | CHARACTER VARYING  |
+--+--++{code}
But not :

 

 
{code:java}
SELECT typeOf(a) FROM (SELECT CAST (5 as int) AS a) x;
=> (NOK)
Error: SYSTEM ERROR: IllegalArgumentException: Can not set 
org.apache.drill.exec.vector.complex.reader.FieldReader field 
org.apache.drill.exec.expr.fn.impl.UnionFunctions$GetType.input to 
org.apache.drill.exec.expr.holders.IntHolder
{code}
And in a surprising way the next query works :
{code:java}
SELECT md5, typeof(t), drillTypeOf(t), sqlTypeOf(t) FROM ((SELECT 'foo' AS t ) 
union (SELECT 'far' AS t)) x;
=> (OK)
+---+--+--++
|  md5  |  EXPR$1  |  EXPR$2  |   EXPR$3   |
+---+--+--++
| foo   | VARCHAR  | VARCHAR  | CHARACTER VARYING  |
| bar   | VARCHAR  | VARCHAR  | CHARACTER VARYING  |
+---+--+--++{code}
 

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[ANNOUNCE] Apache Roadshow Chicago, Call for Presentations

2019-01-15 Thread Trevor Grant
Hello Devs!


You're receiving this email because you are subscribed to one or more
Apache developer email lists.

I’m writing to let you know about an exciting event coming to the Chicago
area: The Apache Roadshow Chicago.  It will be held May 13th and 14th at
three bars in the Logan Square neighborhood (Revolution Brewing, The
Native, and the Radler).

There will be six tracks:

   -

   Apache in Adtech:  Tell us how Apache works in your advertising stack
   -

   Apache in Fintech: Tell us how Apache works in your finance/insurance
   business
   -

   Apache in Startups: Tell us how you’re using Apache in your startup
   -

   Diversity in Apache: How do we increase and encourage diversity in
   Apache and tech fields overall?
   -

   Made in Chicago: Apache related things made by people in Chicago that
   don’t fall into other buckets
   -

   Project Shark Tank: Do you want more developers or users for your Apache
   project? Come here and pitch it!


This is an exciting chance to learn about how Apache Projects are in use in
production around Chicago, how business users make the decision to use
Apache projects, to learn about exciting new projects that want help from
developers like you, and how/why to increase diversity in tech and IT.

If you have any use cases of Apache products in Adtech, Fintech, or
Startups; if you represent a minority working in tech and have perspectives
to share, if you live in the Chicagoland area and want to highlight some
work you’ve done on an Apache project, or if you want to get other people
excited to come work on your project, then please submit a CFP before the
deadline on February 15th!

Tickets to the Apache Roadshow Chicago are $100; speakers will get a
complimentary ticket.

We’re looking forward to reading your submissions and seeing you there on
May 13-14!

Sincerely,

Trevor Grant

https://www.apachecon.com/chiroadshow19/cfp.html

https://www.apachecon.com/chiroadshow19/register.html


[GitHub] ihuzenko opened a new pull request #1613: DRILL-6977: Improve Hive tests configuration

2019-01-15 Thread GitBox
ihuzenko opened a new pull request #1613: DRILL-6977: Improve Hive tests 
configuration
URL: https://github.com/apache/drill/pull/1613
 
 
   1. Moved initialization of HiveTestBase data to static block
   2. Extracted hive related setup code to HiveTestConfig, which allows build 
configuration
   and accept custom data generators as Driver consumers
   3. Refactored HiveTestDataGenerator to only generate test data and not 
manage hive test
   config


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6977) Improve Hive tests configuration

2019-01-15 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-6977:
---

 Summary: Improve Hive tests configuration
 Key: DRILL-6977
 URL: https://issues.apache.org/jira/browse/DRILL-6977
 Project: Apache Drill
  Issue Type: Bug
Reporter: Igor Guzenko


Class HiveTestDataGenerator is responsible for initialization of hive metadata 
service and configuration of hive storage plugin for tested drillbit. 
Originally it was supposed to be initialized once before all tests in hive 
module, but actually it's initialized for every test class. And such 
initialization takes a lot of time, so it's worth to spend some time to 
accelerate hive tests.

This task has two main aims: 
 # Use HiveTestDataGenerator once for all test classes 
 # Provide flexible configuration of Hive tests that can be used with 
ClusterFicture for autonomic(not bounded to HiveTestBase) test classes 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)