[GitHub] drill issue #934: DRILL-3449 When Foreman node dies, the FragmentExecutor st...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/934 Thanks for the explanation. +1 ---
[GitHub] drill pull request #934: DRILL-3449 When Foreman node dies, the FragmentExec...
Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/934#discussion_r137941417 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentStatusReporter.java --- @@ -113,4 +120,14 @@ void fail(final UserException ex) { sendStatus(status); } + @Override + public void close() + { +final ControlTunnel tunnel = this.tunnel.getAndSet(null); +if (tunnel != null) { + logger.debug("Closing {}", this); --- End diff -- No, close() is not a placeholder. It closes FragmentStatusReporter and after the close, request to send status becomes no-op. ---
[GitHub] drill issue #926: DRILL-5269 Make DirectSubScan Jackson JSON deserializable
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/926 On the first comment, it sounds like the close is a placeholder, actual close is to be done after some refactoring? This is fine. Just a bit surprising to see a log message, but not actual close. On the second, yes, Drill uses "unit test" to mean a system test run via JUnit. Can we create a JUnit-based system test that demonstrates the problem and fix? I'm guessing that we can do the test at the system level to avoid the difficulty, at present, of writing true unit tests against Drill code. ---
[GitHub] drill issue #926: DRILL-5269 Make DirectSubScan Jackson JSON deserializable
Github user vrozov commented on the issue: https://github.com/apache/drill/pull/926 @paul-rogers I guess you refer to a system/integration test (execute the query provided in JIRA), not a unit test. ---
[GitHub] drill pull request #934: DRILL-3449 When Foreman node dies, the FragmentExec...
Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/934#discussion_r137941124 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentStatusReporter.java --- @@ -113,4 +120,14 @@ void fail(final UserException ex) { sendStatus(status); } + @Override + public void close() + { +final ControlTunnel tunnel = this.tunnel.getAndSet(null); +if (tunnel != null) { + logger.debug("Closing {}", this); --- End diff -- We are closing FragmentStatusReporter, not the `tunnel` that it references. The ControlTunnel is not Closable even though it has a reference to a resource that is Closable and should provide a way to release the resource it holds. Please let me know if a comment is required here, but I do plan to make ControlTunnel Closable. As it requires code refactoring not directly related to the JIRA/PR, I plan to do this in a separate PR. ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939203 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -109,14 +107,21 @@ private boolean isTwoPhase = false; // 1 phase or 2 phase aggr? private boolean is2ndPhase = false; - private boolean canSpill = true; // make it false in case can not spill + private boolean is1stPhase = false; + private boolean canSpill = true; // make it false in case can not spill/return-early private ChainedHashTable baseHashTable; private boolean earlyOutput = false; // when 1st phase returns a partition due to no memory private int earlyPartition = 0; // which partition to return early - - private long memoryLimit; // max memory to be used by this oerator - private long estMaxBatchSize = 0; // used for adjusting #partitions - private long estRowWidth = 0; + private boolean retrySameIndex = false; // in case put failed during 1st phase - need to output early, then retry + private boolean useMemoryPrediction = false; // whether to use memory prediction to decide when to spill + private long estMaxBatchSize = 0; // used for adjusting #partitions and deciding when to spill + private long estRowWidth = 0; // the size of the internal "row" (keys + values + extra columns) + private long estValuesRowWidth = 0; // the size of the internal values ( values + extra ) + private long estOutputRowWidth = 0; // the size of the output "row" (no extra columns) + private long estValuesBatchSize = 0; // used for "reserving" memory for the Values batch to overcome an OOM + private long estOutgoingAllocSize = 0; // used for "reserving" memory for the Outgoing Output Values to overcome an OOM + private long reserveValueBatchMemory; // keep "reserve memory" for Values Batch + private long reserveOutgoingMemory; // keep "reserve memory" for the Outgoing (Values only) output --- End diff -- Long lists of member variables are generally frowned upon. Can't unit test them. Too many states to keep in mind. Can these be grouped into a read-only config class (set up front, then never changed) vs, running estimates? ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939168 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java --- @@ -293,7 +299,7 @@ private HashAggregator createAggregatorInternal() throws SchemaChangeException, aggrExprs, cgInner.getWorkspaceTypes(), groupByOutFieldIds, -this.container); +this.container, extraNonNullColumns * 8 /* sizeof(BigInt) */); --- End diff -- If the `BigInt` column is used to indicate nulls, then each value is of size 9. And, since, on average, each vector has 25% internal fragmentation. To account for this, perhaps assume that the average size is 12 or 13 bytes. ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939287 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -500,22 +516,45 @@ private void initializeSetup(RecordBatch newIncoming) throws SchemaChangeExcepti */ private void updateEstMaxBatchSize(RecordBatch incoming) { if ( estMaxBatchSize > 0 ) { return; } // no handling of a schema (or varchar) change +// Use the sizer to get the input row width and the length of the longest varchar column RecordBatchSizer sizer = new RecordBatchSizer(incoming); logger.trace("Incoming sizer: {}",sizer); // An empty batch only has the schema, can not tell actual length of varchars // else use the actual varchars length, each capped at 50 (to match the space allocation) -estRowWidth = sizer.rowCount() == 0 ? sizer.stdRowWidth() : sizer.netRowWidthCap50(); -estMaxBatchSize = estRowWidth * MAX_BATCH_SIZE; +long estInputRowWidth = sizer.rowCount() == 0 ? sizer.stdRowWidth() : sizer.netRowWidthCap50(); // Get approx max (varchar) column width to get better memory allocation -maxColumnWidth = Math.max(sizer.maxSize(), VARIABLE_MIN_WIDTH_VALUE_SIZE); +maxColumnWidth = Math.max(sizer.maxAvgColumnSize(), VARIABLE_MIN_WIDTH_VALUE_SIZE); maxColumnWidth = Math.min(maxColumnWidth, VARIABLE_MAX_WIDTH_VALUE_SIZE); -logger.trace("{} phase. Estimated row width: {} batch size: {} memory limit: {} max column width: {}", - isTwoPhase?(is2ndPhase?"2nd":"1st"):"Single",estRowWidth,estMaxBatchSize,memoryLimit,maxColumnWidth); +// +// Calculate the estimated max (internal) batch (i.e. Keys batch + Values batch) size +// (which is used to decide when to spill) +// Also calculate the values batch size (used as a reserve to overcome an OOM) +// +IteratoroutgoingIter = outContainer.iterator(); +int fieldId = 0; +while (outgoingIter.hasNext()) { + ValueVector vv = outgoingIter.next().getValueVector(); + MaterializedField mr = vv.getField(); + int fieldSize = vv instanceof VariableWidthVector ? maxColumnWidth : + TypeHelper.getSize(mr.getType()); + estRowWidth += fieldSize; + estOutputRowWidth += fieldSize; + if ( fieldId < numGroupByOutFields ) { fieldId++; } + else { estValuesRowWidth += fieldSize; } +} +// multiply by the max number of rows in a batch to get the final estimated max size +estMaxBatchSize = Math.max(estRowWidth, estInputRowWidth) * MAX_BATCH_SIZE; --- End diff -- Here, the output batch size is fixed based on the number of rows. Suppose we had a sort as the output of this operator, and the sort has a memory ceiling of x MB. Can the code here create batches larger than x/2 MB, meaning that that sort is forced to consume batches so large that it can't buffer two and spill? In other words, is there an attempt here to control overall output batch memory use instead of just assuming that we always output `MAX_BATCH_SIZE` rows regardless of memory use? ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939296 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -545,16 +584,19 @@ public AggOutcome doWork() { if (EXTRA_DEBUG_1) { logger.debug("Starting outer loop of doWork()..."); } - for (; underlyingIndex < currentBatchRecordCount; incIndex()) { + while (underlyingIndex < currentBatchRecordCount) { if (EXTRA_DEBUG_2) { logger.debug("Doing loop with values underlying {}, current {}", underlyingIndex, currentIndex); } checkGroupAndAggrValues(currentIndex); + +if ( retrySameIndex ) { retrySameIndex = false; } // need to retry this row (e.g. we had an OOM) --- End diff -- I think Drill's coding style guidelines says no spaces after ( or before ). ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939122 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java --- @@ -92,18 +92,20 @@ // Hash Aggregate Options - String HASHAGG_NUM_PARTITIONS = "drill.exec.hashagg.num_partitions"; String HASHAGG_NUM_PARTITIONS_KEY = "exec.hashagg.num_partitions"; LongValidator HASHAGG_NUM_PARTITIONS_VALIDATOR = new RangeLongValidator(HASHAGG_NUM_PARTITIONS_KEY, 1, 128); // 1 means - no spilling - String HASHAGG_MAX_MEMORY = "drill.exec.hashagg.mem_limit"; String HASHAGG_MAX_MEMORY_KEY = "exec.hashagg.mem_limit"; LongValidator HASHAGG_MAX_MEMORY_VALIDATOR = new RangeLongValidator(HASHAGG_MAX_MEMORY_KEY, 0, Integer.MAX_VALUE); // min batches is used for tuning (each partition needs so many batches when planning the number of partitions, // or reserve this number when calculating whether the remaining available memory is too small and requires a spill.) // Low value may OOM (e.g., when incoming rows become wider), higher values use fewer partitions but are safer - String HASHAGG_MIN_BATCHES_PER_PARTITION = "drill.exec.hashagg.min_batches_per_partition"; - String HASHAGG_MIN_BATCHES_PER_PARTITION_KEY = "drill.exec.hashagg.min_batches_per_partition"; - LongValidator HASHAGG_MIN_BATCHES_PER_PARTITION_VALIDATOR = new RangeLongValidator(HASHAGG_MIN_BATCHES_PER_PARTITION_KEY, 2, 5); + String HASHAGG_MIN_BATCHES_PER_PARTITION_KEY = "exec.hashagg.min_batches_per_partition"; + LongValidator HASHAGG_MIN_BATCHES_PER_PARTITION_VALIDATOR = new RangeLongValidator(HASHAGG_MIN_BATCHES_PER_PARTITION_KEY, 1, 5); + // Can be turns off mainly for testing. Memory prediction is used to decide on when to spill to disk; with this option off, --- End diff -- turns --> turned ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939184 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -109,14 +107,21 @@ private boolean isTwoPhase = false; // 1 phase or 2 phase aggr? private boolean is2ndPhase = false; - private boolean canSpill = true; // make it false in case can not spill + private boolean is1stPhase = false; + private boolean canSpill = true; // make it false in case can not spill/return-early private ChainedHashTable baseHashTable; private boolean earlyOutput = false; // when 1st phase returns a partition due to no memory private int earlyPartition = 0; // which partition to return early - - private long memoryLimit; // max memory to be used by this oerator - private long estMaxBatchSize = 0; // used for adjusting #partitions - private long estRowWidth = 0; + private boolean retrySameIndex = false; // in case put failed during 1st phase - need to output early, then retry --- End diff -- As it turns out, unlike C++, Java is pretty good at initializing booleans to false and longs to 0. We only need to explicitly initialize values when the value should be other than 0/false/null. ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939496 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java --- @@ -58,7 +59,7 @@ public int getHashCode(int incomingRowIdx) throws SchemaChangeException; - public PutStatus put(int incomingRowIdx, IndexPointer htIdxHolder, int hashCode) throws SchemaChangeException; + public PutStatus put(int incomingRowIdx, IndexPointer htIdxHolder, int hashCode) throws SchemaChangeException, RetryAfterSpillException; --- End diff -- At present, `RetryAfterSpillException` is unchecked, so it is not necessary to declare. But, change `RetryAfterSpillException` to extend `Exception` (checked) and this declaration then becomes useful. ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939361 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -1335,7 +1470,7 @@ private void updateStats(HashTable[] htables) { } if ( rowsReturnedEarly > 0 ) { stats.setLongStat(Metric.SPILL_MB, // update stats - est. total MB returned early - (int) Math.round( rowsReturnedEarly * estRowWidth / 1024.0D / 1024.0)); + (int) Math.round( rowsReturnedEarly * estOutputRowWidth / 1024.0D / 1024.0)); --- End diff -- This file is a template. This means, we copy *all* this code each time we generate a new class. How is doing so helping stability, customer value or performance? Should all this code be in a template that is copied on every query? Or, should it be refactored into a driver class, with only a very light wrapper appearing in the copied template? As this code get ever more complex, it puts a strain on the Java code that must walk though this code and do method fixup, scalar replacements, etc. That work takes time. What value accrues to the user from doing this fixup on code that never changes from one query to the next? Filed [DRILL-5779](https://issues.apache.org/jira/browse/DRILL-5779) for this issue. ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939459 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -1178,20 +1273,38 @@ private void checkGroupAndAggrValues(int incomingRowIdx) { hashCode >>>= bitsInMask; HashTable.PutStatus putStatus = null; long allocatedBeforeHTput = allocator.getAllocatedMemory(); - // == // Insert the key columns into the hash table // == +boolean noReserveMem = reserveValueBatchMemory == 0; try { + if ( noReserveMem && canSpill ) { throw new RetryAfterSpillException();} // proactive spill, skip put() + putStatus = htables[currentPartition].put(incomingRowIdx, htIdxHolder, hashCode); + +} catch (RetryAfterSpillException re) { + if ( ! canSpill ) { throw new OutOfMemoryException(getOOMErrorMsg("Can not spill")); } --- End diff -- This is the message sent to the log and user. Should we explain why we can't spill? And, what to do? Something like: "Incoming batch too large and no in-memory partitions to spill. Increase memory assigned to the Hash Agg." Replace the above wording with the actual reasons and fixes. ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939319 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -646,6 +687,46 @@ public AggOutcome doWork() { } /** + * Use reserved values memory (if available) to try and preemp an OOM + */ + private void useReservedValuesMemory() { +// try to preempt an OOM by using the reserved memory +long reservedMemory = reserveValueBatchMemory; +if ( reservedMemory > 0 ) { allocator.setLimit(allocator.getLimit() + reservedMemory); } + +reserveValueBatchMemory = 0; + } + /** + * Use reserved outgoing output memory (if available) to try and preemp an OOM + */ + private void useReservedOutgoingMemory() { +// try to preempt an OOM by using the reserved memory +long reservedMemory = reserveOutgoingMemory; +if ( reservedMemory > 0 ) { allocator.setLimit(allocator.getLimit() + reservedMemory); } --- End diff -- Why is it necessary to change the allocator limit? The allocator limit should be fixed: it is the amount of memory given to this operator. Shouldn't the code use its own, internal, limits to make decisions? That is, if allocated memory + some expected use > a defined internal size, then spill? ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939481 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggregator.java --- @@ -47,10 +47,7 @@ // OK - batch returned, NONE - end of data, RESTART - call again public enum AggIterOutcome { AGG_OK, AGG_NONE, AGG_RESTART } - public abstract void setup(HashAggregate hashAggrConfig, HashTableConfig htConfig, FragmentContext context, - OperatorStats stats, OperatorContext oContext, RecordBatch incoming, HashAggBatch outgoing, - LogicalExpression[] valueExprs, List valueFieldIds, TypedFieldId[] keyFieldIds, - VectorContainer outContainer) throws SchemaChangeException, IOException, ClassTransformationException; + public abstract void setup(HashAggregate hashAggrConfig, HashTableConfig htConfig, FragmentContext context, OperatorStats stats, OperatorContext oContext, RecordBatch incoming, HashAggBatch outgoing, LogicalExpression[] valueExprs, List valueFieldIds, TypedFieldId[] keyFieldIds, VectorContainer outContainer, int extraRowBytes) throws SchemaChangeException, IOException, ClassTransformationException; --- End diff -- Not sure that putting all items on one big line is an improvement over the arg-per-line format previously. Also, see note above: a large number of arguments suggest a muddy design with one class trying to do far too much. ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939223 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -297,10 +302,7 @@ public void outputRecordValues(@Named("htRowIdx") int htRowIdx, @Named("outRowId } @Override - public void setup(HashAggregate hashAggrConfig, HashTableConfig htConfig, FragmentContext context, -OperatorStats stats, OperatorContext oContext, RecordBatch incoming, HashAggBatch outgoing, -LogicalExpression[] valueExprs, List valueFieldIds, TypedFieldId[] groupByOutFieldIds, -VectorContainer outContainer) throws SchemaChangeException, IOException { + public void setup(HashAggregate hashAggrConfig, HashTableConfig htConfig, FragmentContext context, OperatorStats stats, OperatorContext oContext, RecordBatch incoming, HashAggBatch outgoing, LogicalExpression[] valueExprs, List valueFieldIds, TypedFieldId[] groupByOutFieldIds, VectorContainer outContainer, int extraRowBytes) throws SchemaChangeException, IOException { --- End diff -- Methods and constructors with many arguments are generally frowned upon as it suggests that a single class is trying to do too much: it is has too much internal coupling, performing tasks that should be broken apart. Can this single, huge, class be split into smaller, more focused, abstractions? ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939116 --- Diff: common/src/main/java/org/apache/drill/common/exceptions/RetryAfterSpillException.java --- @@ -0,0 +1,32 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.common.exceptions; + +import org.apache.drill.exec.proto.UserBitShared; + +/** + * A special exception to be caught by caller, who is supposed to free memory by spilling and try again + * + */ +public class RetryAfterSpillException extends UserException { --- End diff -- If this exception is thrown and caught internally, it should not extend `UserException`. Instead it should extend the Java `RuntimeException`. Better, since you know you must catch this, this should be a checked exception, extended from `Exception` and declared by the method that throws it. `UserException` is purely for exceptions reported to the user. ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939427 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -1178,20 +1273,38 @@ private void checkGroupAndAggrValues(int incomingRowIdx) { hashCode >>>= bitsInMask; HashTable.PutStatus putStatus = null; long allocatedBeforeHTput = allocator.getAllocatedMemory(); - // == // Insert the key columns into the hash table // == +boolean noReserveMem = reserveValueBatchMemory == 0; try { + if ( noReserveMem && canSpill ) { throw new RetryAfterSpillException();} // proactive spill, skip put() + putStatus = htables[currentPartition].put(incomingRowIdx, htIdxHolder, hashCode); + +} catch (RetryAfterSpillException re) { --- End diff -- See above. Should be a checked exception declared by `put()`. Also, why do we need to throw an exception before calling `put` only to catch it a couple of lines later? If the spill code was in a method, rather than just inline, seems we could do: ``` if ( noReserveMem && canSpill ) { doSpill(); } try { ...put(...) } catch (...) { doSpill(); } ``` ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939532 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java --- @@ -158,19 +158,17 @@ public BatchHolder(int idx) { } finally { if (!success) { htContainer.clear(); - if (links != null) { -links.clear(); - } + if (links != null) { links.clear();} } } } private void init(IntVector links, IntVector hashValues, int size) { for (int i = 0; i < size; i++) { -links.getMutator().setSafe(i, EMPTY_SLOT); +links.getMutator().set(i, EMPTY_SLOT); --- End diff -- Is size ever less than the vector capacity()? Else, you can just ask the vector for its capacity. The `links.getMutator()` call in an inner loop is inefficient. Instead of a single function initializing two `IntVector`s with redundant code, can this be refactored to have a function that initializes one vector, that is called twice? ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939253 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -382,19 +390,25 @@ private void delayedSetup() { final boolean fallbackEnabled = context.getOptions().getOption(ExecConstants.HASHAGG_FALLBACK_ENABLED_KEY).bool_val; // Set the number of partitions from the configuration (raise to a power of two, if needed) -numPartitions = context.getConfig().getInt(ExecConstants.HASHAGG_NUM_PARTITIONS); -if ( numPartitions == 1 ) { +numPartitions = (int)context.getOptions().getOption(ExecConstants.HASHAGG_NUM_PARTITIONS_VALIDATOR); +if ( numPartitions == 1 && is2ndPhase ) { // 1st phase can still do early return with 1 partition canSpill = false; logger.warn("Spilling is disabled due to configuration setting of num_partitions to 1"); } numPartitions = BaseAllocator.nextPowerOfTwo(numPartitions); // in case not a power of 2 -if ( schema == null ) { estMaxBatchSize = 0; } // incoming was an empty batch +if ( schema == null ) { estValuesBatchSize = estOutgoingAllocSize = estMaxBatchSize = 0; } // incoming was an empty batch else { // Estimate the max batch size; should use actual data (e.g. lengths of varchars) updateEstMaxBatchSize(incoming); } -long memAvail = memoryLimit - allocator.getAllocatedMemory(); +// create "reserved memory" and adjust the memory limit down +reserveValueBatchMemory = reserveOutgoingMemory = estValuesBatchSize ; +long newMemoryLimit = allocator.getLimit() - reserveValueBatchMemory - reserveOutgoingMemory ; +long memAvail = newMemoryLimit - allocator.getAllocatedMemory(); +if ( memAvail <= 0 ) { throw new OutOfMemoryException("Too little memory available"); } +allocator.setLimit(newMemoryLimit); + --- End diff -- This code has grown to be incredibly complex with many, many paths through the various functions. Tests are handy things. Do we have system-level unit tests that exercise each path through the code? Otherwise, as a reviewer, how can I be sure that each execution path does, in fact, work? ---
[jira] [Created] (DRILL-5779) HashAgg template is far too large, cause performance hit
Paul Rogers created DRILL-5779: -- Summary: HashAgg template is far too large, cause performance hit Key: DRILL-5779 URL: https://issues.apache.org/jira/browse/DRILL-5779 Project: Apache Drill Issue Type: Bug Affects Versions: 1.11.0 Reporter: Paul Rogers Drill uses code generation to produce query-specific code to copy values, perform calculations, and so on. Drill does this by generating code based on templates. Drill, internally, copies the template byte codes and merges them with generated by byte codes. (Drill does not use Java subclassing for generated code.) The Hash Agg batch places thousands of lines of boilerplate code into the template. This forces Drill to: 1. Copy those byte codes *for every query*. 2. The "byte code fixup" logic to walk the byte code tree for the template *for every query.* 3. The code cache to cache a separate copy of the template *for every query*. There is a clear performance cost from doing the copying and tree walking. There is a memory cost to buffering multiple copies of the same code. It is not clear that we have any data that says that doing this work provides benefits to the Drill user in terms of better stability, greater performance or more features. We should consider moving the bulk of the code out of the template to avoid the overheads cited above. The result may be better performance and reduced memory pressure. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] drill pull request #937: DRILL-5002: Using hive's date functions on top of d...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/937#discussion_r137938539 --- Diff: contrib/storage-hive/core/src/main/codegen/templates/ObjectInspectorHelper.java --- @@ -204,7 +204,11 @@ public static JBlock getDrillObject(JCodeModel m, ObjectInspector oi, <#elseif entry.hiveType == "TIMESTAMP"> JVar tsVar = jc._else().decl(m.directClass(java.sql.Timestamp.class.getCanonicalName()), "ts", castedOI.invoke("getPrimitiveJavaObject").arg(returnValue)); -jc._else().assign(returnValueHolder.ref("value"), tsVar.invoke("getTime")); +// Bringing relative timestamp value without timezone info to timestamp value in UTC, since Drill keeps date-time values in UTC --- End diff -- Drill does not actually keep values in UTC: Drill keeps values in the server time zone. This is why Drill dates, when shipped to a client with a different time zone, get corrupted. ---
[GitHub] drill pull request #934: DRILL-3449 When Foreman node dies, the FragmentExec...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/934#discussion_r137938397 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentStatusReporter.java --- @@ -113,4 +120,14 @@ void fail(final UserException ex) { sendStatus(status); } + @Override + public void close() + { +final ControlTunnel tunnel = this.tunnel.getAndSet(null); +if (tunnel != null) { + logger.debug("Closing {}", this); --- End diff -- We say we are closing the tunnel, but then don't do anything. Perhaps a comment to explain how this works? ---
[GitHub] drill issue #932: DRILL-5758: Fix for repeated columns; enable managed sort ...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/932 @Ben-Zvi, added another fix. Please take another look. ---
[GitHub] drill pull request #930: DRILL-5761: Disable Lilith ClassicMultiplexSocketAp...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/930#discussion_r137938315 --- Diff: common/src/test/resources/logback-test.xml --- @@ -0,0 +1,111 @@ + + + + + + + +true +1 +true +${LILITH_HOSTNAME:-localhost} + + + + + + + + %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n + + + + + + + + + + + + + + + + + + + + + + + + + + + + --- End diff -- See comment below about MapR. It is not clear that we always want info-level logging from Hadoop. Seems this should be enabled only in tests that want that detail to avoid cluttering output with unwanted messages. ---
[GitHub] drill pull request #930: DRILL-5761: Disable Lilith ClassicMultiplexSocketAp...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/930#discussion_r137938300 --- Diff: common/src/test/resources/logback-test.xml --- @@ -0,0 +1,111 @@ + + + + + + + +true +1 +true +${LILITH_HOSTNAME:-localhost} + + + + + + + + %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + --- End diff -- Should this be in Apache Drill? Only the MapR profile pulls in MapR code... It is not clear that tests that happen to use the MapR profile want debug level logging from this subsystem. Better to use the `LogFixture` to set more detailed logging in those tests that need it. ---
[GitHub] drill pull request #930: DRILL-5761: Disable Lilith ClassicMultiplexSocketAp...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/930#discussion_r137938284 --- Diff: common/src/test/resources/logback-test.xml --- @@ -0,0 +1,111 @@ + + + + + + + +true +1 +true +${LILITH_HOSTNAME:-localhost} + + + + + + + + %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n + + + + + + + --- End diff -- Please put the test debug level at `ERROR`. I have a copy of `logback-test.xml` that I copy into each working branch that sets the level to `ERROR`. Then, specific tests set more detailed logging as needed. Otherwise, the console is bombarded with unwanted logging messages making it very hard to find those of interest. ---
[GitHub] drill issue #928: DRILL-5716: Queue-driven memory allocation
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/928 @jinfengni, can you take a quick look at the Foreman changes? Especially the bits that muck about with the physical plan: some of the work is moved from one place to another. Thanks! ---
[GitHub] drill pull request #925: DRILL-5749: solve foreman and netty threads deadloc...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/925#discussion_r137938116 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java --- @@ -54,10 +52,14 @@ void channelClosed(Throwable ex) { isOpen.set(false); if (ex != null) { final RpcException e = RpcException.mapException(ex); + IntObjectHashMapclonedMap; synchronized (map) { -map.forEach(new SetExceptionProcedure(e)); +clonedMap = map.clone(); map.clear(); } + if (clonedMap != null) { --- End diff -- When would `clonedMap` be `null`? ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r137937858 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoinAdvanced.java --- @@ -102,10 +103,11 @@ public void testFix2967() throws Exception { test("select * from dfs_test.`%s/join/j1` j1 left outer join dfs_test.`%s/join/j2` j2 on (j1.c_varchar = j2.c_varchar)", TEST_RES_PATH, TEST_RES_PATH); } finally { - setSessionOption(PlannerSettings.BROADCAST.getOptionName(), String.valueOf(PlannerSettings.BROADCAST.getDefault().bool_val)); - setSessionOption(PlannerSettings.HASHJOIN.getOptionName(), String.valueOf(PlannerSettings.HASHJOIN.getDefault().bool_val)); + final OperatorFixture.TestOptionSet testOptionSet = new OperatorFixture.TestOptionSet(); + setSessionOption(PlannerSettings.BROADCAST.getOptionName(), String.valueOf(testOptionSet.getDefault(PlannerSettings.BROADCAST.getOptionName()).bool_val)); --- End diff -- No need to fix now but I wonder: * The `ClusterFixture` class has a method of form `setSystemOption(String name, Option value)` to avoid this kind of silly to/from String overhead. We might want to do the same for these "legacy" `BaseTestQuery` style tests. * But, here we are setting system options to their default value. Why? Isn't that already the value? If not, wouldn't clearing (deleting) the option value be better? * Or, has the test previously set a system option to other than the default value, and we are undoing that here? Seems it would be better to allow a test to say not to muck with the defaults. (Hard to do with `BaseTestQuery`, but there is a built-in provision for this in the `ClusterFixture`.) ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r137937977 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/work/metadata/TestMetadataProvider.java --- @@ -196,9 +200,12 @@ public void tablesWithSystemTableFilter() throws Exception { verifyTable("sys", "boot", tables); verifyTable("sys", "drillbits", tables); verifyTable("sys", "memory", tables); -verifyTable("sys", "options", tables); +verifyTable("sys", SystemTable.OPTION.getTableName(), tables); +verifyTable("sys", SystemTable.OPTION_VAL.getTableName(), tables); verifyTable("sys", "threads", tables); verifyTable("sys", "version", tables); +verifyTable("sys", SystemTable.INTERNAL_OPTIONS.getTableName(), tables); +verifyTable("sys", SystemTable.INTERNAL_OPTIONS_VAL.getTableName(), tables); --- End diff -- Thanks for providing the constants. Nice improvement. ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r13793 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java --- @@ -346,17 +347,63 @@ public void deleteAllOptions(OptionType type) { } } - public void populateDefaultValues() { - + public static CaseInsensitiveMap populateDefualtValues(Mapdefinitions, DrillConfig bootConfig) { --- End diff -- `populateDefualtValues` --> `populateDefaultValues` (that is "ua" --> "au"). ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r137937350 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/OptionMetaData.java --- @@ -0,0 +1,68 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.server.options; + +/** + * Contains information about the scopes in which an option can be set, and an option's visibility. + */ +public class OptionMetaData { + public static final OptionMetaData DEFAULT = new OptionMetaData(OptionValue.OptionType.ALL, false, false); + + private OptionValue.OptionType type; --- End diff -- `final` here and others? ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r137937864 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/limit/TestLimitWithExchanges.java --- @@ -71,7 +72,8 @@ public void testPushLimitPastUnionExchange() throws Exception { final String[] expectedPlan5 = {"(?s)Limit\\(fetch=\\[1\\].*UnionExchange.*Limit\\(fetch=\\[1\\]\\).*Join"}; testLimitHelper(sql5, expectedPlan5, excludedPlan, 1); } finally { - test("alter session set `planner.slice_target` = " + ExecConstants.SLICE_TARGET_OPTION.getDefault().getValue()); + final OperatorFixture.TestOptionSet testOptionSet = new OperatorFixture.TestOptionSet(); + test("alter session set `planner.slice_target` = " + testOptionSet.getDefault(ExecConstants.SLICE_TARGET).getValue()); --- End diff -- Here, seems it would be better to refer to the constant defining the option name rather than hard-coding it. (Yes, this is existing code, so OK to leave as is...) ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r137937100 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/BaseOptionManager.java --- @@ -17,44 +17,84 @@ */ package org.apache.drill.exec.server.options; -import org.apache.drill.exec.server.options.TypeValidators.BooleanValidator; -import org.apache.drill.exec.server.options.TypeValidators.DoubleValidator; -import org.apache.drill.exec.server.options.TypeValidators.LongValidator; -import org.apache.drill.exec.server.options.TypeValidators.StringValidator; - -public abstract class BaseOptionManager implements OptionSet { -// private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BaseOptionManager.class); - - /** - * Gets the current option value given a validator. - * - * @param validator the validator - * @return option value - * @throws IllegalArgumentException - if the validator is not found - */ - private OptionValue getOptionSafe(OptionValidator validator) { -OptionValue value = getOption(validator.getOptionName()); -return value == null ? validator.getDefault() : value; +import org.apache.drill.common.exceptions.UserException; + +import java.util.Iterator; + +/** + * This {@link OptionManager} implements some the basic methods and should be extended by concrete implementations. + */ +public abstract class BaseOptionManager extends BaseOptionSet implements OptionManager { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BaseOptionManager.class); + + @Override + public OptionList getInternalOptionList() { +return getAllOptionList(true); } @Override - public boolean getOption(BooleanValidator validator) { -return getOptionSafe(validator).bool_val; + public OptionList getPublicOptionList() { +return getAllOptionList(false); } @Override - public double getOption(DoubleValidator validator) { -return getOptionSafe(validator).float_val; + public void setLocalOption(String name, boolean value) { +setLocalOption(OptionValue.Kind.BOOLEAN, name, Boolean.toString(value)); --- End diff -- Very nice improvement to the API! I wonder, however, if we should pass the value as an `Object` rather than as a `String`? Seems more natural to do it that way. Or, maybe here just create the option value since we know the type? ``` setLocalOption(String name, OptionValue.fromBoolean(value)); ``` This means that the other fields would be filled in later, so they couldn't be final. Hence, the alternative, use an `Object` ``` setLocalOption(String name, value); ... private void setLocalOption(String name, Object value) { ``` Then, when creating an option, validate that the type of the `Object` matches the type of the option. (In fact, the `Object` for would be handy for testing...) ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r137937991 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/RestClientFixture.java --- @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.test; + +import com.google.common.base.Preconditions; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.server.rest.StatusResources; +import org.glassfish.jersey.client.ClientConfig; +import org.glassfish.jersey.client.JerseyClientBuilder; + +import javax.annotation.Nullable; +import javax.ws.rs.client.Client; +import javax.ws.rs.client.WebTarget; +import javax.ws.rs.core.GenericType; +import javax.ws.rs.core.MediaType; + +import java.util.List; + +/** + * Represents a client for the Drill Rest API. + */ +public class RestClientFixture implements AutoCloseable { + /** + * A builder for the rest client. + */ + public static class Builder { +private ClusterFixture cluster; + +public Builder(ClusterFixture cluster) { + this.cluster = Preconditions.checkNotNull(cluster); +} + +public RestClientFixture build() { + return new RestClientFixture(cluster); +} + } + + private WebTarget baseTarget; + private Client client; --- End diff -- `final` here and above? ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r137937333 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/OptionManager.java --- @@ -17,49 +17,97 @@ */ package org.apache.drill.exec.server.options; -import org.apache.drill.exec.server.options.OptionValue.OptionType; +import javax.validation.constraints.NotNull; /** * Manager for Drill {@link OptionValue options}. Implementations must be case-insensitive to the name of an option. */ public interface OptionManager extends OptionSet, Iterable { /** - * Sets an option value. - * - * @param value option value - * @throws org.apache.drill.common.exceptions.UserException message to describe error with value + * Sets a boolean option on the {@link OptionManager}. + * @param name The name of the option. + * @param value The value of the option. */ - void setOption(OptionValue value); + void setLocalOption(String name, boolean value); /** - * Deletes the option. Unfortunately, the type is required given the fallback structure of option managers. - * See {@link FallbackOptionManager}. + * Sets a long option on the {@link OptionManager}. + * @param name The name of the option. + * @param value The value of the option. + */ + void setLocalOption(String name, long value); + + /** + * Sets a double option on the {@link OptionManager}. + * @param name The name of the option. + * @param value The value of the option. + */ + void setLocalOption(String name, double value); + + /** + * Sets a String option on the {@link OptionManager}. + * @param name The name of the option. + * @param value The value of the option. + */ + void setLocalOption(String name, String value); + + /** + * Sets an option of the specified {@link OptionValue.Kind} on the {@link OptionManager}. + * @param kind The kind of the option. + * @param name The name of the option. + * @param value The value of the option. + */ + void setLocalOption(OptionValue.Kind kind, String name, String value); + + /** + * Deletes the option. * - * If the option name is valid (exists in {@link SystemOptionManager#VALIDATORS}), + * If the option name is valid (exists in the set of validators produced by {@link SystemOptionManager#createDefaultOptionDefinitions()}), * but the option was not set within this manager, calling this method should be a no-op. * * @param name option name - * @param type option type * @throws org.apache.drill.common.exceptions.UserException message to describe error with value */ - void deleteOption(String name, OptionType type); + void deleteLocalOption(String name); /** - * Deletes all options. Unfortunately, the type is required given the fallback structure of option managers. - * See {@link FallbackOptionManager}. + * Deletes all options. * * If no options are set, calling this method should be no-op. * - * @param type option type * @throws org.apache.drill.common.exceptions.UserException message to describe error with value */ - void deleteAllOptions(OptionType type); + void deleteAllLocalOptions(); + + /** + * Get the option definition corresponding to the given option name. + * @param name The name of the option to retrieve a validator for. + * @return The option validator corresponding to the given option name. + */ + @NotNull + OptionDefinition getOptionDefinition(String name); /** * Gets the list of options managed this manager. * * @return the list of options */ OptionList getOptionList(); + + /** + * Returns all the internal options contained in this option manager. + * + * @return All the internal options contained in this option manager. + */ + @NotNull + OptionList getInternalOptionList(); --- End diff -- `Internal` --> `Local` to be consistent with the other methods? ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r137937210 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/DrillConfigIterator.java --- @@ -58,17 +58,17 @@ public OptionValue next() { OptionValue optionValue = null; switch(cv.valueType()) { case BOOLEAN: -optionValue = OptionValue.createBoolean(OptionType.BOOT, name, (Boolean) cv.unwrapped(), OptionScope.BOOT); +optionValue = OptionValue.create(OptionType.BOOT, name, (Boolean) cv.unwrapped(), OptionScope.BOOT); --- End diff -- I suppose this has worked, but it is a bit of a muddle. Config settings are not runtime options. It is a hack to map them into the same structures. Since this iterator is probably only used to create a table, at some point we probably should create a new object to handle config options. We might want: 1) the full name, 2) the source (system options, command line, drill-override.conf, etc.), 3) the TypeSafe type (which is a superset of the runtime option types. OK to leave this for now, but we should consider fixing this in the future. ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r137937938 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/testing/TestExceptionInjection.java --- @@ -216,79 +216,77 @@ public void injectionOnSpecificBit() { final ZookeeperHelper zkHelper = new ZookeeperHelper(); zkHelper.startZookeeper(1); -// Creating two drillbits -final Drillbit drillbit1, drillbit2; -final DrillConfig drillConfig = zkHelper.getConfig(); try { - drillbit1 = Drillbit.start(drillConfig, remoteServiceSet); - drillbit2 = Drillbit.start(drillConfig, remoteServiceSet); -} catch (DrillbitStartupException e) { - throw new RuntimeException("Failed to start drillbits.", e); -} + // Creating two drillbits + final Drillbit drillbit1, drillbit2; + final DrillConfig drillConfig = zkHelper.getConfig(); + try { +drillbit1 = Drillbit.start(drillConfig, remoteServiceSet); +drillbit2 = Drillbit.start(drillConfig, remoteServiceSet); + } catch (DrillbitStartupException e) { +throw new RuntimeException("Failed to start drillbits.", e); + } -final DrillbitContext drillbitContext1 = drillbit1.getContext(); -final DrillbitContext drillbitContext2 = drillbit2.getContext(); + final DrillbitContext drillbitContext1 = drillbit1.getContext(); + final DrillbitContext drillbitContext2 = drillbit2.getContext(); -final UserSession session = UserSession.Builder.newBuilder() - .withCredentials(UserBitShared.UserCredentials.newBuilder().setUserName("foo").build()) -.withUserProperties(UserProperties.getDefaultInstance()) -.withOptionManager(drillbitContext1.getOptionManager()) -.build(); + final UserSession session = UserSession.Builder.newBuilder().withCredentials(UserBitShared.UserCredentials.newBuilder().setUserName("foo").build()).withUserProperties(UserProperties.getDefaultInstance()).withOptionManager(drillbitContext1.getOptionManager()).build(); --- End diff -- Did we want to combine all the method calls onto one line rather than the fluent, multi-line style originally in the code? An artifact of a code formatter? ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r137937125 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/BaseOptionManager.java --- @@ -17,44 +17,84 @@ */ package org.apache.drill.exec.server.options; -import org.apache.drill.exec.server.options.TypeValidators.BooleanValidator; -import org.apache.drill.exec.server.options.TypeValidators.DoubleValidator; -import org.apache.drill.exec.server.options.TypeValidators.LongValidator; -import org.apache.drill.exec.server.options.TypeValidators.StringValidator; - -public abstract class BaseOptionManager implements OptionSet { -// private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BaseOptionManager.class); - - /** - * Gets the current option value given a validator. - * - * @param validator the validator - * @return option value - * @throws IllegalArgumentException - if the validator is not found - */ - private OptionValue getOptionSafe(OptionValidator validator) { -OptionValue value = getOption(validator.getOptionName()); -return value == null ? validator.getDefault() : value; +import org.apache.drill.common.exceptions.UserException; + +import java.util.Iterator; + +/** + * This {@link OptionManager} implements some the basic methods and should be extended by concrete implementations. + */ +public abstract class BaseOptionManager extends BaseOptionSet implements OptionManager { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BaseOptionManager.class); + + @Override + public OptionList getInternalOptionList() { +return getAllOptionList(true); } @Override - public boolean getOption(BooleanValidator validator) { -return getOptionSafe(validator).bool_val; + public OptionList getPublicOptionList() { +return getAllOptionList(false); } @Override - public double getOption(DoubleValidator validator) { -return getOptionSafe(validator).float_val; + public void setLocalOption(String name, boolean value) { +setLocalOption(OptionValue.Kind.BOOLEAN, name, Boolean.toString(value)); } @Override - public long getOption(LongValidator validator) { -return getOptionSafe(validator).num_val; + public void setLocalOption(String name, long value) { +setLocalOption(OptionValue.Kind.LONG, name, Long.toString(value)); } @Override - public String getOption(StringValidator validator) { -return getOptionSafe(validator).string_val; + public void setLocalOption(String name, double value) { +setLocalOption(OptionValue.Kind.DOUBLE, name, Double.toString(value)); } + @Override + public void setLocalOption(String name, String value) { +setLocalOption(OptionValue.Kind.STRING, name, value); + } + + @Override + public void setLocalOption(OptionValue.Kind kind, String name, String value) { +final OptionDefinition definition = getOptionDefinition(name); +final OptionValidator validator = definition.getValidator(); +final OptionMetaData metaData = definition.getMetaData(); +final OptionValue.OptionType type = definition.getMetaData().getType(); +final OptionValue.OptionScope scope = getScope(); +checkOptionPermissions(name, type, scope); +final OptionValue optionValue = OptionValue.create(kind, type, name, value, scope); +validator.validate(optionValue, metaData, this); +setLocalOptionHelper(optionValue); --- End diff -- Very nice and clean, does all the necessary steps in one place. Great improvement! ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r137937347 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/OptionMetaData.java --- @@ -0,0 +1,68 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.server.options; + +/** + * Contains information about the scopes in which an option can be set, and an option's visibility. + */ +public class OptionMetaData { + public static final OptionMetaData DEFAULT = new OptionMetaData(OptionValue.OptionType.ALL, false, false); + + private OptionValue.OptionType type; + private boolean adminOption; --- End diff -- `adminOnly` (we know it is an option...) ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r137937945 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/testing/TestPauseInjection.java --- @@ -150,66 +150,61 @@ public void pauseOnSpecificBit() { final ZookeeperHelper zkHelper = new ZookeeperHelper(); zkHelper.startZookeeper(1); -// Creating two drillbits -final Drillbit drillbit1, drillbit2; -final DrillConfig drillConfig = zkHelper.getConfig(); try { - drillbit1 = Drillbit.start(drillConfig, remoteServiceSet); - drillbit2 = Drillbit.start(drillConfig, remoteServiceSet); -} catch (final DrillbitStartupException e) { - throw new RuntimeException("Failed to start two drillbits.", e); -} - -final DrillbitContext drillbitContext1 = drillbit1.getContext(); -final DrillbitContext drillbitContext2 = drillbit2.getContext(); - -final UserSession session = UserSession.Builder.newBuilder() - .withCredentials(UserCredentials.newBuilder() -.setUserName("foo") -.build()) - .withUserProperties(UserProperties.getDefaultInstance()) - .withOptionManager(drillbitContext1.getOptionManager()) - .build(); - -final DrillbitEndpoint drillbitEndpoint1 = drillbitContext1.getEndpoint(); -final String controls = Controls.newBuilder() - .addPauseOnBit(DummyClass.class, DummyClass.PAUSES, drillbitEndpoint1) - .build(); - -ControlsInjectionUtil.setControls(session, controls); - -{ - final long expectedDuration = 1000L; - final ExtendedLatch trigger = new ExtendedLatch(1); - final Pointer ex = new Pointer<>(); - final QueryContext queryContext = new QueryContext(session, drillbitContext1, QueryId.getDefaultInstance()); - (new ResumingThread(queryContext, trigger, ex, expectedDuration)).start(); - - // test that the pause happens - final DummyClass dummyClass = new DummyClass(queryContext, trigger); - final long actualDuration = dummyClass.pauses(); - assertTrue(String.format("Test should stop for at least %d milliseconds.", expectedDuration), -expectedDuration <= actualDuration); - assertTrue("No exception should be thrown.", ex.value == null); + // Creating two drillbits + final Drillbit drillbit1, drillbit2; + final DrillConfig drillConfig = zkHelper.getConfig(); try { -queryContext.close(); - } catch (final Exception e) { -fail("Failed to close query context: " + e); +drillbit1 = Drillbit.start(drillConfig, remoteServiceSet); +drillbit2 = Drillbit.start(drillConfig, remoteServiceSet); + } catch (final DrillbitStartupException e) { +throw new RuntimeException("Failed to start two drillbits.", e); } -} -{ - final ExtendedLatch trigger = new ExtendedLatch(1); - final QueryContext queryContext = new QueryContext(session, drillbitContext2, QueryId.getDefaultInstance()); + final DrillbitContext drillbitContext1 = drillbit1.getContext(); + final DrillbitContext drillbitContext2 = drillbit2.getContext(); + + final UserSession session = UserSession.Builder.newBuilder().withCredentials(UserCredentials.newBuilder().setUserName("foo").build()).withUserProperties(UserProperties.getDefaultInstance()).withOptionManager(drillbitContext1.getOptionManager()).build(); --- End diff -- Same issue of one big line vs. multi-line. ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r137937112 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/BaseOptionManager.java --- @@ -17,44 +17,84 @@ */ package org.apache.drill.exec.server.options; -import org.apache.drill.exec.server.options.TypeValidators.BooleanValidator; -import org.apache.drill.exec.server.options.TypeValidators.DoubleValidator; -import org.apache.drill.exec.server.options.TypeValidators.LongValidator; -import org.apache.drill.exec.server.options.TypeValidators.StringValidator; - -public abstract class BaseOptionManager implements OptionSet { -// private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BaseOptionManager.class); - - /** - * Gets the current option value given a validator. - * - * @param validator the validator - * @return option value - * @throws IllegalArgumentException - if the validator is not found - */ - private OptionValue getOptionSafe(OptionValidator validator) { -OptionValue value = getOption(validator.getOptionName()); -return value == null ? validator.getDefault() : value; +import org.apache.drill.common.exceptions.UserException; + +import java.util.Iterator; + +/** + * This {@link OptionManager} implements some the basic methods and should be extended by concrete implementations. + */ +public abstract class BaseOptionManager extends BaseOptionSet implements OptionManager { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BaseOptionManager.class); + + @Override + public OptionList getInternalOptionList() { +return getAllOptionList(true); } @Override - public boolean getOption(BooleanValidator validator) { -return getOptionSafe(validator).bool_val; + public OptionList getPublicOptionList() { +return getAllOptionList(false); } @Override - public double getOption(DoubleValidator validator) { -return getOptionSafe(validator).float_val; + public void setLocalOption(String name, boolean value) { +setLocalOption(OptionValue.Kind.BOOLEAN, name, Boolean.toString(value)); } @Override - public long getOption(LongValidator validator) { -return getOptionSafe(validator).num_val; + public void setLocalOption(String name, long value) { +setLocalOption(OptionValue.Kind.LONG, name, Long.toString(value)); } @Override - public String getOption(StringValidator validator) { -return getOptionSafe(validator).string_val; + public void setLocalOption(String name, double value) { +setLocalOption(OptionValue.Kind.DOUBLE, name, Double.toString(value)); } + @Override + public void setLocalOption(String name, String value) { +setLocalOption(OptionValue.Kind.STRING, name, value); + } + + @Override + public void setLocalOption(OptionValue.Kind kind, String name, String value) { +final OptionDefinition definition = getOptionDefinition(name); --- End diff -- Presumably the `getOptionDefinition()` method checks that the option is defined and throws an exception if not? ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r137937392 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/OptionValue.java --- @@ -63,32 +88,32 @@ public final Double float_val; public final OptionScope scope; - public static OptionValue createLong(OptionType type, String name, long val, OptionScope scope) { + public static OptionValue create(OptionType type, String name, long val, OptionScope scope) { return new OptionValue(Kind.LONG, type, name, val, null, null, null, scope); } - public static OptionValue createBoolean(OptionType type, String name, boolean bool, OptionScope scope) { + public static OptionValue create(OptionType type, String name, boolean bool, OptionScope scope) { return new OptionValue(Kind.BOOLEAN, type, name, null, null, bool, null, scope); } - public static OptionValue createString(OptionType type, String name, String val, OptionScope scope) { + public static OptionValue create(OptionType type, String name, String val, OptionScope scope) { return new OptionValue(Kind.STRING, type, name, null, val, null, null, scope); } - public static OptionValue createDouble(OptionType type, String name, double val, OptionScope scope) { + public static OptionValue create(OptionType type, String name, double val, OptionScope scope) { return new OptionValue(Kind.DOUBLE, type, name, null, null, null, val, scope); } - public static OptionValue createOption(Kind kind, OptionType type, String name, String val, OptionScope scope) { + public static OptionValue create(Kind kind, OptionType type, String name, String val, OptionScope scope) { --- End diff -- Per earlier comment, `String val` --> `Object val`, with casts rather than string conversions below? ---
[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/923#discussion_r137937901 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/server/TestOptions.java --- @@ -56,7 +56,7 @@ public void checkChangedColumn() throws Exception { test("ALTER session SET `%s` = %d;", SLICE_TARGET, ExecConstants.SLICE_TARGET_DEFAULT); testBuilder() -.sqlQuery("SELECT status FROM sys.options WHERE name = '%s' AND type = 'SESSION'", SLICE_TARGET) +.sqlQuery("SELECT status FROM sys.options WHERE name = '%s' AND optionScope = 'SESSION'", SLICE_TARGET) --- End diff -- Hmmm... I wonder about this. Aside from our own tests, might users have existing code or scripts that know about the existing schema? It is for this reason that Jyothsna added a new table for the enhanced option data, going out of her way to preserve the public schema of the existing tables. I wonder, should we do that here? Or, are we fortunate and no one depends on the current system table schema? ---
[GitHub] drill issue #919: DRILL-5721: Query with only root fragment and no non-root ...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/919 @sohami, can you clean up the stray commits and ask @parthchandra to again review the later commits? ---
[GitHub] drill issue #916: DRILL-5377: Five-digit year dates are displayed incorrectl...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/916 Back to my original question. The premise of this bug seems to be that we corrupt Parquet dates and convert perfectly valid 4-digit years into invalid 5-digit years. That is clearly a data corruption bug that should never occur. Why don't we fix that? Given that we've accepted the data corruption, we need to display five-digit years which the Java classes for date and time don't support in `toString()`. The code uses `toString()` because it does not do correct formatting using the classes provided. That's the second bug. Date display should make use of format preferences provided by the user, not the default ones provided by `toString()`. So, that's bug number 2. Now given the above two bugs, we introduce a third by creating ad-hoc, Drill-specific date/time classes, violating the JDBC standard, to display the corrupt five-digit years. So, no longer will Drill return the java.sql.Date class as specified by the standard, but rather our own subclass. How will this affect client code that relies on standard behavior? I feel we are compounding error upon error. Can we go back and fix the original problem: that users might prefer that we don't corrupt dates in their data? That is, the problem is not so much that we don't format corrupt data correctly, but rather that we do, in fact, corrupt data. ---
[GitHub] drill issue #905: DRILL-1162: Fix OOM for hash join operator when the right ...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/905 @jinfengni, have you been able to take a look at this PR? ---
[GitHub] drill issue #903: DRILL-5712: Update the pom files with dependency exclusion...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/903 @sindhurirayavaram, please rebase this onto the latest master; looks like some stray commits have found their way into this PR. ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/889 @arina-ielchiieva, looks like Weijie added a couple of commits since your +1. Can you take a look at them? ---
Re: Checkstyle Unused Imports
Also I might want to add check style for Apache header (which should be in a form of comment, not Javadoc), agreed code style (like indents etc) and enforced java doc for methods. At least the last two are enforced in Calcite. I used to point to all that stuff during code reviews but if all that would be enforced, it would be much easier ... Kind regards Arina On Sat, Sep 9, 2017 at 6:46 AM, Paul Rogerswrote: > Hi Vlad, > > Java has a wide variety of warnings available; each project decides which > to ignore, which are warnings and which are errors. It may be that Eclipse, > by default, has resource warnings turned on. The quick & dirty solution is > simply to turn off warnings for AutoCloseables and missing @Overrides. This > is, as they say, “crude but effective." > > It seems that the Drill community stand on imports is not to change them. > Eclipse has an “organize imports” feature. I have to be careful when > removing unused imports not to invoke this feature as it changes import > order and often cause reviews to complain about unnecessary code changes. > > Would be good if we could 1) agree on a standard and 2) make sure that > both Eclipse and IntelliJ can automatically organize imports to follow the > standard. But, I personally don’t worry about imports because Eclipse takes > care of it for me. > > For me, the bigger concern is about code style. Operators are implemented > as huge, complex, deeply nested methods with many local variables (such as > flags) set one place and used elsewhere — all with no comments. Would seem > like a good idea to adopt best practices and require human-digestible > method sizes with good Javadoc comments. To my mind, that will contribute > more to the project than import order. > > Oh, and the other item that needs addressing is a requirement to create > true unit tests (not just system tests coded with JUnit.) Good unit test > will increase our code quality immensely, and will simplify the task for > code reviews. So, I’d want to push that ahead before worrying about imports. > > Just my two cents… > > Thanks, > > - Paul > > > On Sep 8, 2017, at 6:58 PM, Vlad Rozov wrote: > > > > Paul, is AutoCloseable warning specific to Eclipse? I don't remember > seeing the same warning in IntelliJ or during compilation. > > > > I know that some communities are significantly more strict regarding > code style and enforce not only unused imports, but also order of imports > and placement of static imports. What is the Drill community stand on those > items? > > > > Thank you, > > > > Vlad > > > > On 9/8/17 18:04, Paul Rogers wrote: > >> I clean up the imports as I find them, but it would be nice to do them > all at once to avoid the constant drip-drip-drop of warnings. > >> > >> The key problem is the generated code: the templates can’t really tell > which imports are used where. So, we’d need to exclude generated code > directories from the check style rules. > >> > >> Drill also has thousands of omitted “@Override” annotations and heavy > abuse of AutoCloseable (which triggers warnings when used outside of > try-with-resources). > >> > >> At present, Eclipse complains about 17,883 warnings in Drill code. > >> > >> - Paul > >> > >>> On Sep 8, 2017, at 4:43 PM, Timothy Farkas wrote: > >>> > >>> Hi All, > >>> > >>> I've noticed that a lot of files have unused imports, and I frequently > accidentally leave unused imports behind when I do refactoring. So I'd like > to enable checkstyle to check for unused imports. > >>> > >>> Thanks, > >>> Tim > > > >