[GitHub] drill issue #934: DRILL-3449 When Foreman node dies, the FragmentExecutor st...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/934
  
Thanks for the explanation.
+1


---


[GitHub] drill pull request #934: DRILL-3449 When Foreman node dies, the FragmentExec...

2017-09-09 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/934#discussion_r137941417
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentStatusReporter.java
 ---
@@ -113,4 +120,14 @@ void fail(final UserException ex) {
 sendStatus(status);
   }
 
+  @Override
+  public void close()
+  {
+final ControlTunnel tunnel = this.tunnel.getAndSet(null);
+if (tunnel != null) {
+  logger.debug("Closing {}", this);
--- End diff --

No, close() is not a placeholder. It closes FragmentStatusReporter and 
after the close, request to send status becomes no-op.


---


[GitHub] drill issue #926: DRILL-5269 Make DirectSubScan Jackson JSON deserializable

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/926
  
On the first comment, it sounds like the close is a placeholder, actual 
close is to be done after some refactoring? This is fine. Just a bit surprising 
to see a log message, but not actual close.

On the second, yes, Drill uses "unit test" to mean a system test run via 
JUnit. Can we create a JUnit-based system test that demonstrates the problem 
and fix? I'm guessing that we can do the test at the system level to avoid the 
difficulty, at present, of writing true unit tests against Drill code.


---


[GitHub] drill issue #926: DRILL-5269 Make DirectSubScan Jackson JSON deserializable

2017-09-09 Thread vrozov
Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/926
  
@paul-rogers I guess you refer to a system/integration test (execute the 
query provided in JIRA), not a unit test.


---


[GitHub] drill pull request #934: DRILL-3449 When Foreman node dies, the FragmentExec...

2017-09-09 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/934#discussion_r137941124
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentStatusReporter.java
 ---
@@ -113,4 +120,14 @@ void fail(final UserException ex) {
 sendStatus(status);
   }
 
+  @Override
+  public void close()
+  {
+final ControlTunnel tunnel = this.tunnel.getAndSet(null);
+if (tunnel != null) {
+  logger.debug("Closing {}", this);
--- End diff --

We are closing FragmentStatusReporter, not the `tunnel` that it references. 
The ControlTunnel is not Closable even though it has a reference to a resource 
that is Closable and should provide a way to release the resource it holds. 
Please let me know if a comment is required here, but I do plan to make 
ControlTunnel Closable. As it requires code refactoring not directly related to 
the JIRA/PR, I plan to do this in a separate PR.


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939203
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -109,14 +107,21 @@
 
   private boolean isTwoPhase = false; // 1 phase or 2 phase aggr?
   private boolean is2ndPhase = false;
-  private boolean canSpill = true; // make it false in case can not spill
+  private boolean is1stPhase = false;
+  private boolean canSpill = true; // make it false in case can not 
spill/return-early
   private ChainedHashTable baseHashTable;
   private boolean earlyOutput = false; // when 1st phase returns a 
partition due to no memory
   private int earlyPartition = 0; // which partition to return early
-
-  private long memoryLimit; // max memory to be used by this oerator
-  private long estMaxBatchSize = 0; // used for adjusting #partitions
-  private long estRowWidth = 0;
+  private boolean retrySameIndex = false; // in case put failed during 1st 
phase - need to output early, then retry
+  private boolean useMemoryPrediction = false; // whether to use memory 
prediction to decide when to spill
+  private long estMaxBatchSize = 0; // used for adjusting #partitions and 
deciding when to spill
+  private long estRowWidth = 0; // the size of the internal "row" (keys + 
values + extra columns)
+  private long estValuesRowWidth = 0; // the size of the internal values ( 
values + extra )
+  private long estOutputRowWidth = 0; // the size of the output "row" (no 
extra columns)
+  private long estValuesBatchSize = 0; // used for "reserving" memory for 
the Values batch to overcome an OOM
+  private long estOutgoingAllocSize = 0; // used for "reserving" memory 
for the Outgoing Output Values to overcome an OOM
+  private long reserveValueBatchMemory; // keep "reserve memory" for 
Values Batch
+  private long reserveOutgoingMemory; // keep "reserve memory" for the 
Outgoing (Values only) output
--- End diff --

Long lists of member variables are generally frowned upon. Can't unit test 
them. Too many states to keep in mind. Can these be grouped into a read-only 
config class (set up front, then never changed) vs, running estimates?


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939168
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ---
@@ -293,7 +299,7 @@ private HashAggregator createAggregatorInternal() 
throws SchemaChangeException,
 aggrExprs,
 cgInner.getWorkspaceTypes(),
 groupByOutFieldIds,
-this.container);
+this.container, extraNonNullColumns * 8 /* sizeof(BigInt) */);
--- End diff --

If the `BigInt` column is used to indicate nulls, then each value is of 
size 9. And, since, on average, each vector has 25% internal fragmentation. To 
account for this, perhaps assume that the average size is 12 or 13 bytes.


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939287
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -500,22 +516,45 @@ private void initializeSetup(RecordBatch newIncoming) 
throws SchemaChangeExcepti
*/
   private void updateEstMaxBatchSize(RecordBatch incoming) {
 if ( estMaxBatchSize > 0 ) { return; }  // no handling of a schema (or 
varchar) change
+// Use the sizer to get the input row width and the length of the 
longest varchar column
 RecordBatchSizer sizer = new RecordBatchSizer(incoming);
 logger.trace("Incoming sizer: {}",sizer);
 // An empty batch only has the schema, can not tell actual length of 
varchars
 // else use the actual varchars length, each capped at 50 (to match 
the space allocation)
-estRowWidth = sizer.rowCount() == 0 ? sizer.stdRowWidth() : 
sizer.netRowWidthCap50();
-estMaxBatchSize = estRowWidth * MAX_BATCH_SIZE;
+long estInputRowWidth = sizer.rowCount() == 0 ? sizer.stdRowWidth() : 
sizer.netRowWidthCap50();
 
 // Get approx max (varchar) column width to get better memory 
allocation
-maxColumnWidth = Math.max(sizer.maxSize(), 
VARIABLE_MIN_WIDTH_VALUE_SIZE);
+maxColumnWidth = Math.max(sizer.maxAvgColumnSize(), 
VARIABLE_MIN_WIDTH_VALUE_SIZE);
 maxColumnWidth = Math.min(maxColumnWidth, 
VARIABLE_MAX_WIDTH_VALUE_SIZE);
 
-logger.trace("{} phase. Estimated row width: {}  batch size: {}  
memory limit: {}  max column width: {}",
-
isTwoPhase?(is2ndPhase?"2nd":"1st"):"Single",estRowWidth,estMaxBatchSize,memoryLimit,maxColumnWidth);
+//
+// Calculate the estimated max (internal) batch (i.e. Keys batch + 
Values batch) size
+// (which is used to decide when to spill)
+// Also calculate the values batch size (used as a reserve to overcome 
an OOM)
+//
+Iterator outgoingIter = outContainer.iterator();
+int fieldId = 0;
+while (outgoingIter.hasNext()) {
+  ValueVector vv = outgoingIter.next().getValueVector();
+  MaterializedField mr = vv.getField();
+  int fieldSize = vv instanceof VariableWidthVector ? maxColumnWidth :
+  TypeHelper.getSize(mr.getType());
+  estRowWidth += fieldSize;
+  estOutputRowWidth += fieldSize;
+  if ( fieldId < numGroupByOutFields ) { fieldId++; }
+  else { estValuesRowWidth += fieldSize; }
+}
+// multiply by the max number of rows in a batch to get the final 
estimated max size
+estMaxBatchSize = Math.max(estRowWidth, estInputRowWidth) * 
MAX_BATCH_SIZE;
--- End diff --

Here, the output batch size is fixed based on the number of rows. Suppose 
we had a sort as the output of this operator, and the sort has a memory ceiling 
of x MB. Can the code here create batches larger than x/2 MB, meaning that that 
sort is forced to consume batches so large that it can't buffer two and spill?

In other words, is there an attempt here to control overall output batch 
memory use instead of just assuming that we always output `MAX_BATCH_SIZE` rows 
regardless of memory use?


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939296
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -545,16 +584,19 @@ public AggOutcome doWork() {
   if (EXTRA_DEBUG_1) {
 logger.debug("Starting outer loop of doWork()...");
   }
-  for (; underlyingIndex < currentBatchRecordCount; incIndex()) {
+  while (underlyingIndex < currentBatchRecordCount) {
 if (EXTRA_DEBUG_2) {
   logger.debug("Doing loop with values underlying {}, current {}", 
underlyingIndex, currentIndex);
 }
 checkGroupAndAggrValues(currentIndex);
+
+if ( retrySameIndex ) { retrySameIndex = false; }  // need to 
retry this row (e.g. we had an OOM)
--- End diff --

I think Drill's coding style guidelines says no spaces after ( or before ).


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939122
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -92,18 +92,20 @@
 
   // Hash Aggregate Options
 
-  String HASHAGG_NUM_PARTITIONS = "drill.exec.hashagg.num_partitions";
   String HASHAGG_NUM_PARTITIONS_KEY = "exec.hashagg.num_partitions";
   LongValidator HASHAGG_NUM_PARTITIONS_VALIDATOR = new 
RangeLongValidator(HASHAGG_NUM_PARTITIONS_KEY, 1, 128); // 1 means - no spilling
-  String HASHAGG_MAX_MEMORY = "drill.exec.hashagg.mem_limit";
   String HASHAGG_MAX_MEMORY_KEY = "exec.hashagg.mem_limit";
   LongValidator HASHAGG_MAX_MEMORY_VALIDATOR = new 
RangeLongValidator(HASHAGG_MAX_MEMORY_KEY, 0, Integer.MAX_VALUE);
   // min batches is used for tuning (each partition needs so many batches 
when planning the number of partitions,
   // or reserve this number when calculating whether the remaining 
available memory is too small and requires a spill.)
   // Low value may OOM (e.g., when incoming rows become wider), higher 
values use fewer partitions but are safer
-  String HASHAGG_MIN_BATCHES_PER_PARTITION = 
"drill.exec.hashagg.min_batches_per_partition";
-  String HASHAGG_MIN_BATCHES_PER_PARTITION_KEY = 
"drill.exec.hashagg.min_batches_per_partition";
-  LongValidator HASHAGG_MIN_BATCHES_PER_PARTITION_VALIDATOR = new 
RangeLongValidator(HASHAGG_MIN_BATCHES_PER_PARTITION_KEY, 2, 5);
+  String HASHAGG_MIN_BATCHES_PER_PARTITION_KEY = 
"exec.hashagg.min_batches_per_partition";
+  LongValidator HASHAGG_MIN_BATCHES_PER_PARTITION_VALIDATOR = new 
RangeLongValidator(HASHAGG_MIN_BATCHES_PER_PARTITION_KEY, 1, 5);
+  // Can be turns off mainly for testing. Memory prediction is used to 
decide on when to spill to disk; with this option off,
--- End diff --

turns --> turned


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939184
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -109,14 +107,21 @@
 
   private boolean isTwoPhase = false; // 1 phase or 2 phase aggr?
   private boolean is2ndPhase = false;
-  private boolean canSpill = true; // make it false in case can not spill
+  private boolean is1stPhase = false;
+  private boolean canSpill = true; // make it false in case can not 
spill/return-early
   private ChainedHashTable baseHashTable;
   private boolean earlyOutput = false; // when 1st phase returns a 
partition due to no memory
   private int earlyPartition = 0; // which partition to return early
-
-  private long memoryLimit; // max memory to be used by this oerator
-  private long estMaxBatchSize = 0; // used for adjusting #partitions
-  private long estRowWidth = 0;
+  private boolean retrySameIndex = false; // in case put failed during 1st 
phase - need to output early, then retry
--- End diff --

As it turns out, unlike C++, Java is pretty good at initializing booleans 
to false and longs to 0. We only need to explicitly initialize values when the 
value should be other than 0/false/null.


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939496
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java
 ---
@@ -58,7 +59,7 @@
 
   public int getHashCode(int incomingRowIdx) throws SchemaChangeException;
 
-  public PutStatus put(int incomingRowIdx, IndexPointer htIdxHolder, int 
hashCode) throws SchemaChangeException;
+  public PutStatus put(int incomingRowIdx, IndexPointer htIdxHolder, int 
hashCode) throws SchemaChangeException, RetryAfterSpillException;
--- End diff --

At present, `RetryAfterSpillException` is unchecked, so it is not necessary 
to declare. But, change `RetryAfterSpillException` to extend `Exception` 
(checked) and this declaration then becomes useful.


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939361
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -1335,7 +1470,7 @@ private void updateStats(HashTable[] htables) {
 }
 if ( rowsReturnedEarly > 0 ) {
   stats.setLongStat(Metric.SPILL_MB, // update stats - est. total MB 
returned early
-  (int) Math.round( rowsReturnedEarly * estRowWidth / 1024.0D / 
1024.0));
+  (int) Math.round( rowsReturnedEarly * estOutputRowWidth / 
1024.0D / 1024.0));
--- End diff --

This file is a template. This means, we copy *all* this code each time we 
generate a new class. How is doing so helping stability, customer value or 
performance? Should all this code be in a template that is copied on every 
query? Or, should it be refactored into a driver class, with only a very light 
wrapper appearing in the copied template?

As this code get ever more complex, it puts a strain on the Java code that 
must walk though this code and do method fixup, scalar replacements, etc. That 
work takes time. What value accrues to the user from doing this fixup on code 
that never changes  from one query to the next?

Filed [DRILL-5779](https://issues.apache.org/jira/browse/DRILL-5779) for 
this issue.


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939459
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -1178,20 +1273,38 @@ private void checkGroupAndAggrValues(int 
incomingRowIdx) {
 hashCode >>>= bitsInMask;
 HashTable.PutStatus putStatus = null;
 long allocatedBeforeHTput = allocator.getAllocatedMemory();
-
 // ==
 // Insert the key columns into the hash table
 // ==
+boolean noReserveMem = reserveValueBatchMemory == 0;
 try {
+  if ( noReserveMem && canSpill ) { throw new 
RetryAfterSpillException();} // proactive spill, skip put()
+
   putStatus = htables[currentPartition].put(incomingRowIdx, 
htIdxHolder, hashCode);
+
+} catch (RetryAfterSpillException re) {
+  if ( ! canSpill ) { throw new 
OutOfMemoryException(getOOMErrorMsg("Can not spill")); }
--- End diff --

This is the message sent to the log and user. Should we explain why we 
can't spill? And, what to do? Something like:

"Incoming batch too large and no in-memory partitions to spill. Increase 
memory assigned to the Hash Agg."

Replace the above wording with the actual reasons and fixes.


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939319
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -646,6 +687,46 @@ public AggOutcome doWork() {
   }
 
   /**
+   *   Use reserved values memory (if available) to try and preemp an OOM
+   */
+  private void useReservedValuesMemory() {
+// try to preempt an OOM by using the reserved memory
+long reservedMemory = reserveValueBatchMemory;
+if ( reservedMemory > 0 ) { allocator.setLimit(allocator.getLimit() + 
reservedMemory); }
+
+reserveValueBatchMemory = 0;
+  }
+  /**
+   *   Use reserved outgoing output memory (if available) to try and 
preemp an OOM
+   */
+  private void useReservedOutgoingMemory() {
+// try to preempt an OOM by using the reserved memory
+long reservedMemory = reserveOutgoingMemory;
+if ( reservedMemory > 0 ) { allocator.setLimit(allocator.getLimit() + 
reservedMemory); }
--- End diff --

Why is it necessary to change the allocator limit? The allocator limit 
should be fixed: it is the amount of memory given to this operator. Shouldn't 
the code use its own, internal, limits to make decisions? That is, if allocated 
memory + some expected use > a defined internal size, then spill?


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939481
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggregator.java
 ---
@@ -47,10 +47,7 @@
   // OK - batch returned, NONE - end of data, RESTART - call again
   public enum AggIterOutcome { AGG_OK, AGG_NONE, AGG_RESTART }
 
-  public abstract void setup(HashAggregate hashAggrConfig, HashTableConfig 
htConfig, FragmentContext context,
- OperatorStats stats, OperatorContext 
oContext, RecordBatch incoming, HashAggBatch outgoing,
- LogicalExpression[] valueExprs, 
List valueFieldIds, TypedFieldId[] keyFieldIds,
- VectorContainer outContainer) throws 
SchemaChangeException, IOException, ClassTransformationException;
+  public abstract void setup(HashAggregate hashAggrConfig, HashTableConfig 
htConfig, FragmentContext context, OperatorStats stats, OperatorContext 
oContext, RecordBatch incoming, HashAggBatch outgoing, LogicalExpression[] 
valueExprs, List valueFieldIds, TypedFieldId[] keyFieldIds, 
VectorContainer outContainer, int extraRowBytes) throws SchemaChangeException, 
IOException, ClassTransformationException;
--- End diff --

Not sure that putting all items on one big line is an improvement over the 
arg-per-line format previously.

Also, see note above: a large number of arguments suggest a muddy design 
with one class trying to do far too much.


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939223
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -297,10 +302,7 @@ public void outputRecordValues(@Named("htRowIdx") int 
htRowIdx, @Named("outRowId
   }
 
   @Override
-  public void setup(HashAggregate hashAggrConfig, HashTableConfig 
htConfig, FragmentContext context,
-OperatorStats stats, OperatorContext oContext, 
RecordBatch incoming, HashAggBatch outgoing,
-LogicalExpression[] valueExprs, List 
valueFieldIds, TypedFieldId[] groupByOutFieldIds,
-VectorContainer outContainer) throws 
SchemaChangeException, IOException {
+  public void setup(HashAggregate hashAggrConfig, HashTableConfig 
htConfig, FragmentContext context, OperatorStats stats, OperatorContext 
oContext, RecordBatch incoming, HashAggBatch outgoing, LogicalExpression[] 
valueExprs, List valueFieldIds, TypedFieldId[] 
groupByOutFieldIds, VectorContainer outContainer, int extraRowBytes) throws 
SchemaChangeException, IOException {
--- End diff --

Methods and constructors with many arguments are generally frowned upon as 
it suggests that a single class is trying to do too much: it is has too much 
internal coupling, performing tasks that should be broken apart. Can this 
single, huge, class be split into smaller, more focused, abstractions?


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939116
  
--- Diff: 
common/src/main/java/org/apache/drill/common/exceptions/RetryAfterSpillException.java
 ---
@@ -0,0 +1,32 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.common.exceptions;
+
+import org.apache.drill.exec.proto.UserBitShared;
+
+/**
+ *  A special exception to be caught by caller, who is supposed to free 
memory by spilling and try again
+ *
+ */
+public class RetryAfterSpillException extends UserException {
--- End diff --

If this exception is thrown and caught internally, it should not extend 
`UserException`. Instead it should extend the Java `RuntimeException`.

Better, since you know you must catch this, this should be a checked 
exception, extended from `Exception` and declared by the method that throws it.

`UserException` is purely for exceptions reported to the user.


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939427
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -1178,20 +1273,38 @@ private void checkGroupAndAggrValues(int 
incomingRowIdx) {
 hashCode >>>= bitsInMask;
 HashTable.PutStatus putStatus = null;
 long allocatedBeforeHTput = allocator.getAllocatedMemory();
-
 // ==
 // Insert the key columns into the hash table
 // ==
+boolean noReserveMem = reserveValueBatchMemory == 0;
 try {
+  if ( noReserveMem && canSpill ) { throw new 
RetryAfterSpillException();} // proactive spill, skip put()
+
   putStatus = htables[currentPartition].put(incomingRowIdx, 
htIdxHolder, hashCode);
+
+} catch (RetryAfterSpillException re) {
--- End diff --

See above. Should be a checked exception declared by `put()`.

Also, why do we need to throw an exception before calling `put` only to 
catch it a couple of lines later?

If the spill code was in a method, rather than just inline, seems we could 
do:

```
if ( noReserveMem && canSpill ) { doSpill(); }
try {
  ...put(...)
} catch (...) {
   doSpill();
}
```


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939532
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java
 ---
@@ -158,19 +158,17 @@ public BatchHolder(int idx) {
   } finally {
 if (!success) {
   htContainer.clear();
-  if (links != null) {
-links.clear();
-  }
+  if (links != null) { links.clear();}
 }
   }
 }
 
 private void init(IntVector links, IntVector hashValues, int size) {
   for (int i = 0; i < size; i++) {
-links.getMutator().setSafe(i, EMPTY_SLOT);
+links.getMutator().set(i, EMPTY_SLOT);
--- End diff --

Is size ever less than the vector capacity()? Else, you can just ask the 
vector for its capacity.

The `links.getMutator()` call in an inner loop is inefficient.

Instead of a single function initializing two `IntVector`s with redundant 
code, can this be refactored to have a function that initializes one vector, 
that is called twice?


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r137939253
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -382,19 +390,25 @@ private void delayedSetup() {
 final boolean fallbackEnabled = 
context.getOptions().getOption(ExecConstants.HASHAGG_FALLBACK_ENABLED_KEY).bool_val;
 
 // Set the number of partitions from the configuration (raise to a 
power of two, if needed)
-numPartitions = 
context.getConfig().getInt(ExecConstants.HASHAGG_NUM_PARTITIONS);
-if ( numPartitions == 1 ) {
+numPartitions = 
(int)context.getOptions().getOption(ExecConstants.HASHAGG_NUM_PARTITIONS_VALIDATOR);
+if ( numPartitions == 1 && is2ndPhase  ) { // 1st phase can still do 
early return with 1 partition
   canSpill = false;
   logger.warn("Spilling is disabled due to configuration setting of 
num_partitions to 1");
 }
 numPartitions = BaseAllocator.nextPowerOfTwo(numPartitions); // in 
case not a power of 2
 
-if ( schema == null ) { estMaxBatchSize = 0; } // incoming was an 
empty batch
+if ( schema == null ) { estValuesBatchSize = estOutgoingAllocSize = 
estMaxBatchSize = 0; } // incoming was an empty batch
 else {
   // Estimate the max batch size; should use actual data (e.g. lengths 
of varchars)
   updateEstMaxBatchSize(incoming);
 }
-long memAvail = memoryLimit - allocator.getAllocatedMemory();
+// create "reserved memory" and adjust the memory limit down
+reserveValueBatchMemory = reserveOutgoingMemory = estValuesBatchSize ;
+long newMemoryLimit = allocator.getLimit() - reserveValueBatchMemory - 
reserveOutgoingMemory ;
+long memAvail = newMemoryLimit - allocator.getAllocatedMemory();
+if ( memAvail <= 0 ) { throw new OutOfMemoryException("Too little 
memory available"); }
+allocator.setLimit(newMemoryLimit);
+
--- End diff --

This code has grown to be incredibly complex with many, many paths through 
the various functions.

Tests are handy things. Do we have system-level unit tests that exercise 
each path through the code? Otherwise, as a reviewer, how can I be sure that 
each execution path does, in fact, work?


---


[jira] [Created] (DRILL-5779) HashAgg template is far too large, cause performance hit

2017-09-09 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5779:
--

 Summary: HashAgg template is far too large, cause performance hit
 Key: DRILL-5779
 URL: https://issues.apache.org/jira/browse/DRILL-5779
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Paul Rogers


Drill uses code generation to produce query-specific code to copy values, 
perform calculations, and so on. Drill does this by generating code based on 
templates. Drill, internally, copies the template byte codes and merges them 
with generated by byte codes. (Drill does not use Java subclassing for 
generated code.)

The Hash Agg batch places thousands of lines of boilerplate code into the 
template. This forces Drill to:

1. Copy those byte codes *for every query*.
2. The "byte code fixup" logic to walk the byte code tree for the template *for 
every query.*
3. The code cache to cache a separate copy of the template *for every query*.

There is a clear performance cost from doing the copying and tree walking. 
There is a memory cost to buffering multiple copies of the same code. It is not 
clear that we have any data that says that doing this work provides benefits to 
the Drill user in terms of better stability, greater performance or more 
features.

We should consider moving the bulk of the code out of the template to avoid the 
overheads cited above. The result may be better performance and reduced memory 
pressure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] drill pull request #937: DRILL-5002: Using hive's date functions on top of d...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/937#discussion_r137938539
  
--- Diff: 
contrib/storage-hive/core/src/main/codegen/templates/ObjectInspectorHelper.java 
---
@@ -204,7 +204,11 @@ public static JBlock getDrillObject(JCodeModel m, 
ObjectInspector oi,
   <#elseif entry.hiveType == "TIMESTAMP">
 JVar tsVar = 
jc._else().decl(m.directClass(java.sql.Timestamp.class.getCanonicalName()), 
"ts",
   castedOI.invoke("getPrimitiveJavaObject").arg(returnValue));
-jc._else().assign(returnValueHolder.ref("value"), 
tsVar.invoke("getTime"));
+// Bringing relative timestamp value without timezone info to 
timestamp value in UTC, since Drill keeps date-time values in UTC
--- End diff --

Drill does not actually keep values in UTC: Drill keeps values in the 
server time zone. This is why Drill dates, when shipped to a client with a 
different time zone, get corrupted.


---


[GitHub] drill pull request #934: DRILL-3449 When Foreman node dies, the FragmentExec...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/934#discussion_r137938397
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentStatusReporter.java
 ---
@@ -113,4 +120,14 @@ void fail(final UserException ex) {
 sendStatus(status);
   }
 
+  @Override
+  public void close()
+  {
+final ControlTunnel tunnel = this.tunnel.getAndSet(null);
+if (tunnel != null) {
+  logger.debug("Closing {}", this);
--- End diff --

We say we are closing the tunnel, but then don't do anything. Perhaps a 
comment to explain how this works?


---


[GitHub] drill issue #932: DRILL-5758: Fix for repeated columns; enable managed sort ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/932
  
@Ben-Zvi, added another fix. Please take another look.


---


[GitHub] drill pull request #930: DRILL-5761: Disable Lilith ClassicMultiplexSocketAp...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/930#discussion_r137938315
  
--- Diff: common/src/test/resources/logback-test.xml ---
@@ -0,0 +1,111 @@
+
+
+
+
+  
+
+  
+true
+1
+true
+${LILITH_HOSTNAME:-localhost}
+  
+
+  
+
+  
+
+
+  %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - 
%msg%n
+
+  
+
+  
+
+  
+
+
+  
+
+  
+
+
+  
+
+  
+
+
+  
+
+  
+
+  
+
+  
+
+  
+
--- End diff --

See comment below about MapR. It is not clear that we always want 
info-level logging from Hadoop. Seems this should be enabled only in tests that 
want that detail to avoid cluttering output with unwanted messages.


---


[GitHub] drill pull request #930: DRILL-5761: Disable Lilith ClassicMultiplexSocketAp...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/930#discussion_r137938300
  
--- Diff: common/src/test/resources/logback-test.xml ---
@@ -0,0 +1,111 @@
+
+
+
+
+  
+
+  
+true
+1
+true
+${LILITH_HOSTNAME:-localhost}
+  
+
+  
+
+  
+
+
+  %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - 
%msg%n
+
+  
+
+  
+
+  
+
+
+  
+
+  
+
+
+  
+
+  
+
+
+  
+
+  
+
+  
+
+  
+
+  
+
+
+  
+
+  
+
+
+  
+
+  
--- End diff --

Should this be in Apache Drill? Only the MapR profile pulls in MapR code... 
It is not clear that tests that happen to use the MapR profile want debug level 
logging from this subsystem. Better to use the `LogFixture` to set more 
detailed logging in those tests that need it.


---


[GitHub] drill pull request #930: DRILL-5761: Disable Lilith ClassicMultiplexSocketAp...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/930#discussion_r137938284
  
--- Diff: common/src/test/resources/logback-test.xml ---
@@ -0,0 +1,111 @@
+
+
+
+
+  
+
+  
+true
+1
+true
+${LILITH_HOSTNAME:-localhost}
+  
+
+  
+
+  
+
+
+  %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - 
%msg%n
+
+  
+
+  
+
+  
+
--- End diff --

Please put the test debug level at `ERROR`. I have a copy of 
`logback-test.xml` that I copy into each working branch that sets the level to 
`ERROR`. Then, specific tests set more detailed logging as needed. Otherwise, 
the console is bombarded with unwanted logging messages making it very hard to 
find those of interest.


---


[GitHub] drill issue #928: DRILL-5716: Queue-driven memory allocation

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/928
  
@jinfengni, can you take a quick look at the Foreman changes? Especially 
the bits that muck about with the physical plan: some of the work is moved from 
one place to another. Thanks!


---


[GitHub] drill pull request #925: DRILL-5749: solve foreman and netty threads deadloc...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/925#discussion_r137938116
  
--- Diff: 
exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java ---
@@ -54,10 +52,14 @@ void channelClosed(Throwable ex) {
 isOpen.set(false);
 if (ex != null) {
   final RpcException e = RpcException.mapException(ex);
+  IntObjectHashMap clonedMap;
   synchronized (map) {
-map.forEach(new SetExceptionProcedure(e));
+clonedMap = map.clone();
 map.clear();
   }
+  if (clonedMap != null) {
--- End diff --

When would `clonedMap` be `null`?


---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r137937858
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoinAdvanced.java
 ---
@@ -102,10 +103,11 @@ public void testFix2967() throws Exception {
   test("select * from dfs_test.`%s/join/j1` j1 left outer join 
dfs_test.`%s/join/j2` j2 on (j1.c_varchar = j2.c_varchar)",
 TEST_RES_PATH, TEST_RES_PATH);
 } finally {
-  setSessionOption(PlannerSettings.BROADCAST.getOptionName(), 
String.valueOf(PlannerSettings.BROADCAST.getDefault().bool_val));
-  setSessionOption(PlannerSettings.HASHJOIN.getOptionName(), 
String.valueOf(PlannerSettings.HASHJOIN.getDefault().bool_val));
+  final OperatorFixture.TestOptionSet testOptionSet = new 
OperatorFixture.TestOptionSet();
+  setSessionOption(PlannerSettings.BROADCAST.getOptionName(), 
String.valueOf(testOptionSet.getDefault(PlannerSettings.BROADCAST.getOptionName()).bool_val));
--- End diff --

No need to fix now but I wonder:

* The `ClusterFixture` class has a method of form `setSystemOption(String 
name, Option value)` to avoid this kind of silly to/from String overhead. We 
might want to do the same for these "legacy" `BaseTestQuery` style tests.
* But, here we are setting system options to their default value. Why? 
Isn't that already the value? If not, wouldn't clearing (deleting) the option 
value be better?
* Or, has the test previously set a system option to other than the default 
value, and we are undoing that here? Seems it would be better to allow a test 
to say not to muck with the defaults. (Hard to do with `BaseTestQuery`, but 
there is a built-in provision for this in the `ClusterFixture`.)


---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r137937977
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/work/metadata/TestMetadataProvider.java
 ---
@@ -196,9 +200,12 @@ public void tablesWithSystemTableFilter() throws 
Exception {
 verifyTable("sys", "boot", tables);
 verifyTable("sys", "drillbits", tables);
 verifyTable("sys", "memory", tables);
-verifyTable("sys", "options", tables);
+verifyTable("sys", SystemTable.OPTION.getTableName(), tables);
+verifyTable("sys", SystemTable.OPTION_VAL.getTableName(), tables);
 verifyTable("sys", "threads", tables);
 verifyTable("sys", "version", tables);
+verifyTable("sys", SystemTable.INTERNAL_OPTIONS.getTableName(), 
tables);
+verifyTable("sys", SystemTable.INTERNAL_OPTIONS_VAL.getTableName(), 
tables);
--- End diff --

Thanks for providing the constants. Nice improvement.


---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r13793
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
 ---
@@ -346,17 +347,63 @@ public void deleteAllOptions(OptionType type) {
 }
   }
 
-  public void populateDefaultValues() {
-
+  public static CaseInsensitiveMap 
populateDefualtValues(Map definitions, DrillConfig 
bootConfig) {
--- End diff --

`populateDefualtValues` --> `populateDefaultValues` (that is "ua" --> "au").


---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r137937350
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/OptionMetaData.java
 ---
@@ -0,0 +1,68 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.server.options;
+
+/**
+ * Contains information about the scopes in which an option can be set, 
and an option's visibility.
+ */
+public class OptionMetaData {
+  public static final OptionMetaData DEFAULT = new 
OptionMetaData(OptionValue.OptionType.ALL, false, false);
+
+  private OptionValue.OptionType type;
--- End diff --

`final` here and others?


---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r137937864
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/limit/TestLimitWithExchanges.java
 ---
@@ -71,7 +72,8 @@ public void testPushLimitPastUnionExchange() throws 
Exception {
   final String[] expectedPlan5 = 
{"(?s)Limit\\(fetch=\\[1\\].*UnionExchange.*Limit\\(fetch=\\[1\\]\\).*Join"};
   testLimitHelper(sql5, expectedPlan5, excludedPlan, 1);
 } finally {
-  test("alter session set `planner.slice_target` = " + 
ExecConstants.SLICE_TARGET_OPTION.getDefault().getValue());
+  final OperatorFixture.TestOptionSet testOptionSet = new 
OperatorFixture.TestOptionSet();
+  test("alter session set `planner.slice_target` = " + 
testOptionSet.getDefault(ExecConstants.SLICE_TARGET).getValue());
--- End diff --

Here, seems it would be better to refer to the constant defining the option 
name rather than hard-coding it. (Yes, this is existing code, so OK to leave as 
is...)


---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r137937100
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/BaseOptionManager.java
 ---
@@ -17,44 +17,84 @@
  */
 package org.apache.drill.exec.server.options;
 
-import 
org.apache.drill.exec.server.options.TypeValidators.BooleanValidator;
-import org.apache.drill.exec.server.options.TypeValidators.DoubleValidator;
-import org.apache.drill.exec.server.options.TypeValidators.LongValidator;
-import org.apache.drill.exec.server.options.TypeValidators.StringValidator;
-
-public abstract class BaseOptionManager implements OptionSet {
-//  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BaseOptionManager.class);
-
-  /**
-   * Gets the current option value given a validator.
-   *
-   * @param validator the validator
-   * @return option value
-   * @throws IllegalArgumentException - if the validator is not found
-   */
-  private OptionValue getOptionSafe(OptionValidator validator)  {
-OptionValue value = getOption(validator.getOptionName());
-return value == null ? validator.getDefault() : value;
+import org.apache.drill.common.exceptions.UserException;
+
+import java.util.Iterator;
+
+/**
+ * This {@link OptionManager} implements some the basic methods and should 
be extended by concrete implementations.
+ */
+public abstract class BaseOptionManager extends BaseOptionSet implements 
OptionManager {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BaseOptionManager.class);
+
+  @Override
+  public OptionList getInternalOptionList() {
+return getAllOptionList(true);
   }
 
   @Override
-  public boolean getOption(BooleanValidator validator) {
-return getOptionSafe(validator).bool_val;
+  public OptionList getPublicOptionList() {
+return getAllOptionList(false);
   }
 
   @Override
-  public double getOption(DoubleValidator validator) {
-return getOptionSafe(validator).float_val;
+  public void setLocalOption(String name, boolean value) {
+setLocalOption(OptionValue.Kind.BOOLEAN, name, 
Boolean.toString(value));
--- End diff --

Very nice improvement to the API! I wonder, however, if we should pass the 
value as an `Object` rather than as a `String`? Seems more natural to do it 
that way. Or, maybe here just create the option value since we know the type?

```
setLocalOption(String name, OptionValue.fromBoolean(value));
```

This means that the other fields would be filled in later, so they couldn't 
be final.

Hence, the alternative, use an `Object`

```
setLocalOption(String name, value);
...

private void setLocalOption(String name, Object value) {
```

Then, when creating an option, validate that the type of the `Object` 
matches the type of the option. (In fact, the `Object` for would be handy for 
testing...)



---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r137937991
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/RestClientFixture.java ---
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Preconditions;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.server.rest.StatusResources;
+import org.glassfish.jersey.client.ClientConfig;
+import org.glassfish.jersey.client.JerseyClientBuilder;
+
+import javax.annotation.Nullable;
+import javax.ws.rs.client.Client;
+import javax.ws.rs.client.WebTarget;
+import javax.ws.rs.core.GenericType;
+import javax.ws.rs.core.MediaType;
+
+import java.util.List;
+
+/**
+ * Represents a client for the Drill Rest API.
+ */
+public class RestClientFixture implements AutoCloseable {
+  /**
+   * A builder for the rest client.
+   */
+  public static class Builder {
+private ClusterFixture cluster;
+
+public Builder(ClusterFixture cluster) {
+  this.cluster = Preconditions.checkNotNull(cluster);
+}
+
+public RestClientFixture build() {
+  return new RestClientFixture(cluster);
+}
+  }
+
+  private WebTarget baseTarget;
+  private Client client;
--- End diff --

`final` here and above?


---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r137937333
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/OptionManager.java
 ---
@@ -17,49 +17,97 @@
  */
 package org.apache.drill.exec.server.options;
 
-import org.apache.drill.exec.server.options.OptionValue.OptionType;
+import javax.validation.constraints.NotNull;
 
 /**
  * Manager for Drill {@link OptionValue options}. Implementations must be 
case-insensitive to the name of an option.
  */
 public interface OptionManager extends OptionSet, Iterable {
 
   /**
-   * Sets an option value.
-   *
-   * @param value option value
-   * @throws org.apache.drill.common.exceptions.UserException message to 
describe error with value
+   * Sets a boolean option on the {@link OptionManager}.
+   * @param name The name of the option.
+   * @param value The value of the option.
*/
-  void setOption(OptionValue value);
+  void setLocalOption(String name, boolean value);
 
   /**
-   * Deletes the option. Unfortunately, the type is required given the 
fallback structure of option managers.
-   * See {@link FallbackOptionManager}.
+   * Sets a long option on the {@link OptionManager}.
+   * @param name The name of the option.
+   * @param value The value of the option.
+   */
+  void setLocalOption(String name, long value);
+
+  /**
+   * Sets a double option on the {@link OptionManager}.
+   * @param name The name of the option.
+   * @param value The value of the option.
+   */
+  void setLocalOption(String name, double value);
+
+  /**
+   * Sets a String option on the {@link OptionManager}.
+   * @param name The name of the option.
+   * @param value The value of the option.
+   */
+  void setLocalOption(String name, String value);
+
+  /**
+   * Sets an option of the specified {@link OptionValue.Kind} on the 
{@link OptionManager}.
+   * @param kind The kind of the option.
+   * @param name The name of the option.
+   * @param value The value of the option.
+   */
+  void setLocalOption(OptionValue.Kind kind, String name, String value);
+
+  /**
+   * Deletes the option.
*
-   * If the option name is valid (exists in {@link 
SystemOptionManager#VALIDATORS}),
+   * If the option name is valid (exists in the set of validators produced 
by {@link SystemOptionManager#createDefaultOptionDefinitions()}),
* but the option was not set within this manager, calling this method 
should be a no-op.
*
* @param name option name
-   * @param type option type
* @throws org.apache.drill.common.exceptions.UserException message to 
describe error with value
*/
-  void deleteOption(String name, OptionType type);
+  void deleteLocalOption(String name);
 
   /**
-   * Deletes all options. Unfortunately, the type is required given the 
fallback structure of option managers.
-   * See {@link FallbackOptionManager}.
+   * Deletes all options.
*
* If no options are set, calling this method should be no-op.
*
-   * @param type option type
* @throws org.apache.drill.common.exceptions.UserException message to 
describe error with value
*/
-  void deleteAllOptions(OptionType type);
+  void deleteAllLocalOptions();
+
+  /**
+   * Get the option definition corresponding to the given option name.
+   * @param name The name of the option to retrieve a validator for.
+   * @return The option validator corresponding to the given option name.
+   */
+  @NotNull
+  OptionDefinition getOptionDefinition(String name);
 
   /**
* Gets the list of options managed this manager.
*
* @return the list of options
*/
   OptionList getOptionList();
+
+  /**
+   * Returns all the internal options contained in this option manager.
+   *
+   * @return All the internal options contained in this option manager.
+   */
+  @NotNull
+  OptionList getInternalOptionList();
--- End diff --

`Internal` --> `Local` to be consistent with the other methods?


---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r137937210
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/DrillConfigIterator.java
 ---
@@ -58,17 +58,17 @@ public OptionValue next() {
   OptionValue optionValue = null;
   switch(cv.valueType()) {
   case BOOLEAN:
-optionValue = OptionValue.createBoolean(OptionType.BOOT, name, 
(Boolean) cv.unwrapped(), OptionScope.BOOT);
+optionValue = OptionValue.create(OptionType.BOOT, name, (Boolean) 
cv.unwrapped(), OptionScope.BOOT);
--- End diff --

I suppose this has worked, but it is a bit of a muddle. Config settings are 
not runtime options. It is a hack to map them into the same structures. Since 
this iterator is probably only used to create a table, at some point we 
probably should create a new object to handle config options. We might want: 1) 
the full name, 2) the source (system options, command line, 
drill-override.conf, etc.), 3) the TypeSafe type (which is a superset of the 
runtime option types.

OK to leave this for now, but we should consider fixing this in the future.


---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r137937938
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/testing/TestExceptionInjection.java
 ---
@@ -216,79 +216,77 @@ public void injectionOnSpecificBit() {
 final ZookeeperHelper zkHelper = new ZookeeperHelper();
 zkHelper.startZookeeper(1);
 
-// Creating two drillbits
-final Drillbit drillbit1, drillbit2;
-final DrillConfig drillConfig = zkHelper.getConfig();
 try {
-  drillbit1 = Drillbit.start(drillConfig, remoteServiceSet);
-  drillbit2 = Drillbit.start(drillConfig, remoteServiceSet);
-} catch (DrillbitStartupException e) {
-  throw new RuntimeException("Failed to start drillbits.", e);
-}
+  // Creating two drillbits
+  final Drillbit drillbit1, drillbit2;
+  final DrillConfig drillConfig = zkHelper.getConfig();
+  try {
+drillbit1 = Drillbit.start(drillConfig, remoteServiceSet);
+drillbit2 = Drillbit.start(drillConfig, remoteServiceSet);
+  } catch (DrillbitStartupException e) {
+throw new RuntimeException("Failed to start drillbits.", e);
+  }
 
-final DrillbitContext drillbitContext1 = drillbit1.getContext();
-final DrillbitContext drillbitContext2 = drillbit2.getContext();
+  final DrillbitContext drillbitContext1 = drillbit1.getContext();
+  final DrillbitContext drillbitContext2 = drillbit2.getContext();
 
-final UserSession session = UserSession.Builder.newBuilder()
-
.withCredentials(UserBitShared.UserCredentials.newBuilder().setUserName("foo").build())
-.withUserProperties(UserProperties.getDefaultInstance())
-.withOptionManager(drillbitContext1.getOptionManager())
-.build();
+  final UserSession session = 
UserSession.Builder.newBuilder().withCredentials(UserBitShared.UserCredentials.newBuilder().setUserName("foo").build()).withUserProperties(UserProperties.getDefaultInstance()).withOptionManager(drillbitContext1.getOptionManager()).build();
--- End diff --

Did we want to combine all the method calls onto one line rather than the 
fluent, multi-line style originally in the code? An artifact of a code 
formatter?


---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r137937125
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/BaseOptionManager.java
 ---
@@ -17,44 +17,84 @@
  */
 package org.apache.drill.exec.server.options;
 
-import 
org.apache.drill.exec.server.options.TypeValidators.BooleanValidator;
-import org.apache.drill.exec.server.options.TypeValidators.DoubleValidator;
-import org.apache.drill.exec.server.options.TypeValidators.LongValidator;
-import org.apache.drill.exec.server.options.TypeValidators.StringValidator;
-
-public abstract class BaseOptionManager implements OptionSet {
-//  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BaseOptionManager.class);
-
-  /**
-   * Gets the current option value given a validator.
-   *
-   * @param validator the validator
-   * @return option value
-   * @throws IllegalArgumentException - if the validator is not found
-   */
-  private OptionValue getOptionSafe(OptionValidator validator)  {
-OptionValue value = getOption(validator.getOptionName());
-return value == null ? validator.getDefault() : value;
+import org.apache.drill.common.exceptions.UserException;
+
+import java.util.Iterator;
+
+/**
+ * This {@link OptionManager} implements some the basic methods and should 
be extended by concrete implementations.
+ */
+public abstract class BaseOptionManager extends BaseOptionSet implements 
OptionManager {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BaseOptionManager.class);
+
+  @Override
+  public OptionList getInternalOptionList() {
+return getAllOptionList(true);
   }
 
   @Override
-  public boolean getOption(BooleanValidator validator) {
-return getOptionSafe(validator).bool_val;
+  public OptionList getPublicOptionList() {
+return getAllOptionList(false);
   }
 
   @Override
-  public double getOption(DoubleValidator validator) {
-return getOptionSafe(validator).float_val;
+  public void setLocalOption(String name, boolean value) {
+setLocalOption(OptionValue.Kind.BOOLEAN, name, 
Boolean.toString(value));
   }
 
   @Override
-  public long getOption(LongValidator validator) {
-return getOptionSafe(validator).num_val;
+  public void setLocalOption(String name, long value) {
+setLocalOption(OptionValue.Kind.LONG, name, Long.toString(value));
   }
 
   @Override
-  public String getOption(StringValidator validator) {
-return getOptionSafe(validator).string_val;
+  public void setLocalOption(String name, double value) {
+setLocalOption(OptionValue.Kind.DOUBLE, name, Double.toString(value));
   }
 
+  @Override
+  public void setLocalOption(String name, String value) {
+setLocalOption(OptionValue.Kind.STRING, name, value);
+  }
+
+  @Override
+  public void setLocalOption(OptionValue.Kind kind, String name, String 
value) {
+final OptionDefinition definition = getOptionDefinition(name);
+final OptionValidator validator = definition.getValidator();
+final OptionMetaData metaData = definition.getMetaData();
+final OptionValue.OptionType type = definition.getMetaData().getType();
+final OptionValue.OptionScope scope = getScope();
+checkOptionPermissions(name, type, scope);
+final OptionValue optionValue = OptionValue.create(kind, type, name, 
value, scope);
+validator.validate(optionValue, metaData, this);
+setLocalOptionHelper(optionValue);
--- End diff --

Very nice and clean, does all the necessary steps in one place. Great 
improvement!


---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r137937347
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/OptionMetaData.java
 ---
@@ -0,0 +1,68 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.server.options;
+
+/**
+ * Contains information about the scopes in which an option can be set, 
and an option's visibility.
+ */
+public class OptionMetaData {
+  public static final OptionMetaData DEFAULT = new 
OptionMetaData(OptionValue.OptionType.ALL, false, false);
+
+  private OptionValue.OptionType type;
+  private boolean adminOption;
--- End diff --

`adminOnly` (we know it is an option...)


---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r137937945
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/testing/TestPauseInjection.java
 ---
@@ -150,66 +150,61 @@ public void pauseOnSpecificBit() {
 final ZookeeperHelper zkHelper = new ZookeeperHelper();
 zkHelper.startZookeeper(1);
 
-// Creating two drillbits
-final Drillbit drillbit1, drillbit2;
-final DrillConfig drillConfig = zkHelper.getConfig();
 try {
-  drillbit1 = Drillbit.start(drillConfig, remoteServiceSet);
-  drillbit2 = Drillbit.start(drillConfig, remoteServiceSet);
-} catch (final DrillbitStartupException e) {
-  throw new RuntimeException("Failed to start two drillbits.", e);
-}
-
-final DrillbitContext drillbitContext1 = drillbit1.getContext();
-final DrillbitContext drillbitContext2 = drillbit2.getContext();
-
-final UserSession session = UserSession.Builder.newBuilder()
-  .withCredentials(UserCredentials.newBuilder()
-.setUserName("foo")
-.build())
-  .withUserProperties(UserProperties.getDefaultInstance())
-  .withOptionManager(drillbitContext1.getOptionManager())
-  .build();
-
-final DrillbitEndpoint drillbitEndpoint1 = 
drillbitContext1.getEndpoint();
-final String controls = Controls.newBuilder()
-  .addPauseOnBit(DummyClass.class, DummyClass.PAUSES, 
drillbitEndpoint1)
-  .build();
-
-ControlsInjectionUtil.setControls(session, controls);
-
-{
-  final long expectedDuration = 1000L;
-  final ExtendedLatch trigger = new ExtendedLatch(1);
-  final Pointer ex = new Pointer<>();
-  final QueryContext queryContext = new QueryContext(session, 
drillbitContext1, QueryId.getDefaultInstance());
-  (new ResumingThread(queryContext, trigger, ex, 
expectedDuration)).start();
-
-  // test that the pause happens
-  final DummyClass dummyClass = new DummyClass(queryContext, trigger);
-  final long actualDuration = dummyClass.pauses();
-  assertTrue(String.format("Test should stop for at least %d 
milliseconds.", expectedDuration),
-expectedDuration <= actualDuration);
-  assertTrue("No exception should be thrown.", ex.value == null);
+  // Creating two drillbits
+  final Drillbit drillbit1, drillbit2;
+  final DrillConfig drillConfig = zkHelper.getConfig();
   try {
-queryContext.close();
-  } catch (final Exception e) {
-fail("Failed to close query context: " + e);
+drillbit1 = Drillbit.start(drillConfig, remoteServiceSet);
+drillbit2 = Drillbit.start(drillConfig, remoteServiceSet);
+  } catch (final DrillbitStartupException e) {
+throw new RuntimeException("Failed to start two drillbits.", e);
   }
-}
 
-{
-  final ExtendedLatch trigger = new ExtendedLatch(1);
-  final QueryContext queryContext = new QueryContext(session, 
drillbitContext2, QueryId.getDefaultInstance());
+  final DrillbitContext drillbitContext1 = drillbit1.getContext();
+  final DrillbitContext drillbitContext2 = drillbit2.getContext();
+
+  final UserSession session = 
UserSession.Builder.newBuilder().withCredentials(UserCredentials.newBuilder().setUserName("foo").build()).withUserProperties(UserProperties.getDefaultInstance()).withOptionManager(drillbitContext1.getOptionManager()).build();
--- End diff --

Same issue of one big line vs. multi-line.


---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r137937112
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/BaseOptionManager.java
 ---
@@ -17,44 +17,84 @@
  */
 package org.apache.drill.exec.server.options;
 
-import 
org.apache.drill.exec.server.options.TypeValidators.BooleanValidator;
-import org.apache.drill.exec.server.options.TypeValidators.DoubleValidator;
-import org.apache.drill.exec.server.options.TypeValidators.LongValidator;
-import org.apache.drill.exec.server.options.TypeValidators.StringValidator;
-
-public abstract class BaseOptionManager implements OptionSet {
-//  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BaseOptionManager.class);
-
-  /**
-   * Gets the current option value given a validator.
-   *
-   * @param validator the validator
-   * @return option value
-   * @throws IllegalArgumentException - if the validator is not found
-   */
-  private OptionValue getOptionSafe(OptionValidator validator)  {
-OptionValue value = getOption(validator.getOptionName());
-return value == null ? validator.getDefault() : value;
+import org.apache.drill.common.exceptions.UserException;
+
+import java.util.Iterator;
+
+/**
+ * This {@link OptionManager} implements some the basic methods and should 
be extended by concrete implementations.
+ */
+public abstract class BaseOptionManager extends BaseOptionSet implements 
OptionManager {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BaseOptionManager.class);
+
+  @Override
+  public OptionList getInternalOptionList() {
+return getAllOptionList(true);
   }
 
   @Override
-  public boolean getOption(BooleanValidator validator) {
-return getOptionSafe(validator).bool_val;
+  public OptionList getPublicOptionList() {
+return getAllOptionList(false);
   }
 
   @Override
-  public double getOption(DoubleValidator validator) {
-return getOptionSafe(validator).float_val;
+  public void setLocalOption(String name, boolean value) {
+setLocalOption(OptionValue.Kind.BOOLEAN, name, 
Boolean.toString(value));
   }
 
   @Override
-  public long getOption(LongValidator validator) {
-return getOptionSafe(validator).num_val;
+  public void setLocalOption(String name, long value) {
+setLocalOption(OptionValue.Kind.LONG, name, Long.toString(value));
   }
 
   @Override
-  public String getOption(StringValidator validator) {
-return getOptionSafe(validator).string_val;
+  public void setLocalOption(String name, double value) {
+setLocalOption(OptionValue.Kind.DOUBLE, name, Double.toString(value));
   }
 
+  @Override
+  public void setLocalOption(String name, String value) {
+setLocalOption(OptionValue.Kind.STRING, name, value);
+  }
+
+  @Override
+  public void setLocalOption(OptionValue.Kind kind, String name, String 
value) {
+final OptionDefinition definition = getOptionDefinition(name);
--- End diff --

Presumably the `getOptionDefinition()` method checks that the option is 
defined and throws an exception if not?


---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r137937392
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/OptionValue.java
 ---
@@ -63,32 +88,32 @@
   public final Double float_val;
   public final OptionScope scope;
 
-  public static OptionValue createLong(OptionType type, String name, long 
val, OptionScope scope) {
+  public static OptionValue create(OptionType type, String name, long val, 
OptionScope scope) {
 return new OptionValue(Kind.LONG, type, name, val, null, null, null, 
scope);
   }
 
-  public static OptionValue createBoolean(OptionType type, String name, 
boolean bool, OptionScope scope) {
+  public static OptionValue create(OptionType type, String name, boolean 
bool, OptionScope scope) {
 return new OptionValue(Kind.BOOLEAN, type, name, null, null, bool, 
null, scope);
   }
 
-  public static OptionValue createString(OptionType type, String name, 
String val, OptionScope scope) {
+  public static OptionValue create(OptionType type, String name, String 
val, OptionScope scope) {
 return new OptionValue(Kind.STRING, type, name, null, val, null, null, 
scope);
   }
 
-  public static OptionValue createDouble(OptionType type, String name, 
double val, OptionScope scope) {
+  public static OptionValue create(OptionType type, String name, double 
val, OptionScope scope) {
 return new OptionValue(Kind.DOUBLE, type, name, null, null, null, val, 
scope);
   }
 
-  public static OptionValue createOption(Kind kind, OptionType type, 
String name, String val, OptionScope scope) {
+  public static OptionValue create(Kind kind, OptionType type, String 
name, String val, OptionScope scope) {
--- End diff --

Per earlier comment, `String val` --> `Object val`, with casts rather than 
string conversions below?


---


[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/923#discussion_r137937901
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/server/TestOptions.java ---
@@ -56,7 +56,7 @@ public void checkChangedColumn() throws Exception {
 test("ALTER session SET `%s` = %d;", SLICE_TARGET,
   ExecConstants.SLICE_TARGET_DEFAULT);
 testBuilder()
-.sqlQuery("SELECT status FROM sys.options WHERE name = '%s' AND 
type = 'SESSION'", SLICE_TARGET)
+.sqlQuery("SELECT status FROM sys.options WHERE name = '%s' AND 
optionScope = 'SESSION'", SLICE_TARGET)
--- End diff --

Hmmm... I wonder about this. Aside from our own tests, might users have 
existing code or scripts that know about the existing schema? It is for this 
reason that Jyothsna added a new table for the enhanced option data, going out 
of her way to preserve the public schema of the existing tables. I wonder, 
should we do that here? Or, are we fortunate and no one depends on the current 
system table schema?


---


[GitHub] drill issue #919: DRILL-5721: Query with only root fragment and no non-root ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/919
  
@sohami, can you clean up the stray commits and ask @parthchandra to again 
review the later commits?


---


[GitHub] drill issue #916: DRILL-5377: Five-digit year dates are displayed incorrectl...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/916
  
Back to my original question. The premise of this bug seems to be that we 
corrupt Parquet dates and convert perfectly valid 4-digit years into invalid 
5-digit years. That is clearly a data corruption bug that should never occur. 
Why don't we fix that?

Given that we've accepted the data corruption, we need to display 
five-digit years which the Java classes for date and time don't support in 
`toString()`. The code uses `toString()` because it does not do correct 
formatting using the classes provided. That's the second bug. Date display 
should make use of format preferences provided by the user, not the default 
ones provided by `toString()`. So, that's bug number 2.

Now given the above two bugs, we introduce a third by creating ad-hoc, 
Drill-specific date/time classes, violating the JDBC standard, to display the 
corrupt five-digit years. So, no longer will Drill return the java.sql.Date 
class as specified by the standard, but rather our own subclass. How will this 
affect client code that relies on standard behavior?

I feel we are compounding error upon error. Can we go back and fix the 
original problem: that users might prefer that we don't corrupt dates in their 
data? That is, the problem is not so much that we don't format corrupt data 
correctly, but rather that we do, in fact, corrupt data.


---


[GitHub] drill issue #905: DRILL-1162: Fix OOM for hash join operator when the right ...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/905
  
@jinfengni, have you been able to take a look at this PR?


---


[GitHub] drill issue #903: DRILL-5712: Update the pom files with dependency exclusion...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/903
  
@sindhurirayavaram, please rebase this onto the latest master; looks like 
some stray commits have found their way into this PR.


---


[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...

2017-09-09 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/889
  
@arina-ielchiieva, looks like Weijie added a couple of commits since your 
+1. Can you take a look at them?


---


Re: Checkstyle Unused Imports

2017-09-09 Thread Arina Yelchiyeva
Also I might want to add check style for Apache header (which should be in
a form of comment, not Javadoc), agreed code style (like indents etc) and
enforced java doc for methods. At least the last two are enforced in
Calcite.
I used to point to all that stuff during code reviews but if all that would
be enforced, it would be much easier ...

Kind regards
Arina

On Sat, Sep 9, 2017 at 6:46 AM, Paul Rogers  wrote:

> Hi Vlad,
>
> Java has a wide variety of warnings available; each project decides which
> to ignore, which are warnings and which are errors. It may be that Eclipse,
> by default, has resource warnings turned on. The quick & dirty solution is
> simply to turn off warnings for AutoCloseables and missing @Overrides. This
> is, as they say, “crude but effective."
>
> It seems that the Drill community stand on imports is not to change them.
> Eclipse has an “organize imports” feature. I have to be careful when
> removing unused imports not to invoke this feature as it changes import
> order and often cause reviews to complain about unnecessary code changes.
>
> Would be good if we could 1) agree on a standard and 2) make sure that
> both Eclipse and IntelliJ can automatically organize imports to follow the
> standard. But, I personally don’t worry about imports because Eclipse takes
> care of it for me.
>
> For me, the bigger concern is about code style. Operators are implemented
> as huge, complex, deeply nested methods with many local variables (such as
> flags) set one place and used elsewhere — all with no comments. Would seem
> like a good idea to adopt best practices and require human-digestible
> method sizes with good Javadoc comments. To my mind, that will contribute
> more to the project than import order.
>
> Oh, and the other item that needs addressing is a requirement to create
> true unit tests (not just system tests coded with JUnit.) Good unit test
> will increase our code quality immensely, and will simplify the task for
> code reviews. So, I’d want to push that ahead before worrying about imports.
>
> Just my two cents…
>
> Thanks,
>
> - Paul
>
> > On Sep 8, 2017, at 6:58 PM, Vlad Rozov  wrote:
> >
> > Paul, is AutoCloseable warning specific to Eclipse? I don't remember
> seeing the same warning in IntelliJ or during compilation.
> >
> > I know that some communities are significantly more strict regarding
> code style and enforce not only unused imports, but also order of imports
> and placement of static imports. What is the Drill community stand on those
> items?
> >
> > Thank you,
> >
> > Vlad
> >
> > On 9/8/17 18:04, Paul Rogers wrote:
> >> I clean up the imports as I find them, but it would be nice to do them
> all at once to avoid the constant drip-drip-drop of warnings.
> >>
> >> The key problem is the generated code: the templates can’t really tell
> which imports are used where. So, we’d need to exclude generated code
> directories from the check style rules.
> >>
> >> Drill also has thousands of omitted “@Override” annotations and heavy
> abuse of AutoCloseable (which triggers warnings when used outside of
> try-with-resources).
> >>
> >> At present, Eclipse complains about 17,883 warnings in Drill code.
> >>
> >> - Paul
> >>
> >>> On Sep 8, 2017, at 4:43 PM, Timothy Farkas  wrote:
> >>>
> >>> Hi All,
> >>>
> >>> I've noticed that a lot of files have unused imports, and I frequently
> accidentally leave unused imports behind when I do refactoring. So I'd like
> to enable checkstyle to check for unused imports.
> >>>
> >>> Thanks,
> >>> Tim
> >
>
>