[GitHub] drill issue #1224: DRILL-6321: Customize Drill's conformance. Allow support ...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1224 +1 Overall. Note that this is needed for implementing Lateral join and Unnest support. ---
[GitHub] drill issue #1224: DRILL-6321: Customize Drill's conformance. Allow support ...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1224 @vvysotskyi The lateral join implementation committed in DRILL-6323 supports cross join for all conditions. The main difficulty in cross join is the management of the memory needed to produce a cartesian product (M x N). In cross apply though, the cross join is only computed on one row in the left side at a time (1 x N), since the left and right sides are correlated. This allows cross apply to operate with a very small amount of memory. ---
[GitHub] drill pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver ...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r185063074 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java --- @@ -77,4 +83,47 @@ public long getByteCount() { public boolean isAckSent() { return ackSent.get(); } + + /** + * Transfer ownership of this DrillBuf to the target allocator. This is done for better memory + * accounting (that is, the operator should be charged with the body's Drillbuf memory). + * + * NOTES - + * + * This operation is a NOOP when a) the current allocator (associated with the DrillBuf) is not the + * owning allocator or b) the target allocator is already the owner + * When transfer happens, a new RawFragmentBatch instance is allocated; this is done for proper + * DrillBuf reference count accounting + * The RPC handling code caches a reference to this RawFragmentBatch object instance; release() + * calls should be routed to the previous DrillBuf + * Caller is responsible for checking an OOM condition + * + * + * @param targetAllocator target allocator + * @return a new {@link RawFragmentBatch} object instance on success (where the buffer ownership has + * been switched to the target allocator); otherwise this operation is a NOOP (current instance + * returned) + */ + public RawFragmentBatch transferBodyOwnership(BufferAllocator targetAllocator) { --- End diff -- DrillBuf has a transferOwnership method that could be used instead? See the transferTo methods in any of the vector classes - [FixedValueVectors.transferTo()](https://github.com/apache/drill/blob/master/exec/vector/src/main/codegen/templates/FixedValueVectors.java#L282) Saves you the trouble of rolling your own and also avoids the need to make changes to any of the existing classes. ---
[GitHub] drill pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver ...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r185064842 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java --- @@ -182,13 +189,15 @@ public IterOutcome next() { return IterOutcome.OUT_OF_MEMORY; } + // Transfer the ownership of this raw-batch to this operator for proper memory statistics reporting + batch = batch.transferBodyOwnership(oContext.getAllocator()); --- End diff -- This should probably be done inside the [ RecordBatchLoader.load() ](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java#L78) method. Note that `MergingRecordBatch` has similar code and so probably suffers from the same memory accounting issue. All other uses of `RecordBatchLoader.load()` appear to be in the client code or test code so we are unlikely to break anything by making the change in `RecordBatchLoader`. ---
[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1184 ``` What do you mean by "Json representation"? ``` Sorry, my mistake, got all tangled up. ``` we may want to further translate the Local [Date|Time|DateTime] objects inside the Map|List to java.sql.[Date|Time|Timestamp] upon access. But to do that inside the SqlAccessor, you would need to deep copy the Map|List and build another version with the date|time translated into java.sql.date|time. ``` That is what I thought you wanted to get to. If the current state is something you can work with, then great. I can review the final changes once you're done and merge them as well. Let's move the other discussion to another thread or JIRA. ---
[GitHub] drill issue #1240: DRILL-6327: Update unary operators to handle IterOutcome....
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1240 +1. Very nicely done. ---
[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1184 Putting aside the discussion on date/time/timezone for the moment, @jiang-wu let's say getObject returns to you an object that implements java.sql.{Struct|Array}. You now use the Struct|Array apis to get the attribute you are interested in. If the attribute is of type date|time the object returned for that attribute should now correspond to java.sql.{Date|Time} instead of the Json representation. Will that not address your requirement? ---
[GitHub] drill issue #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsException ...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1144 Looks good. +1 ---
[GitHub] drill pull request #1234: DRILL-5927: Fixed memory leak in TestBsonRecordRea...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1234#discussion_r184549758 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/bson/TestBsonRecordReader.java --- @@ -45,21 +47,24 @@ import org.bson.BsonTimestamp; import org.bson.BsonWriter; import org.bson.types.ObjectId; -import org.junit.AfterClass; -import org.junit.BeforeClass; +import org.junit.After; +import org.junit.Before; import org.junit.Test; -public class TestBsonRecordReader extends BaseTestQuery { - private static VectorContainerWriter writer; - private static TestOutputMutator mutator; - private static BsonRecordReader bsonReader; +public class TestBsonRecordReader { + private BufferAllocator allocator; + private VectorContainerWriter writer; + private TestOutputMutator mutator; + private BufferManager bufferManager; + private BsonRecordReader bsonReader; - @BeforeClass - public static void setUp() { -BufferAllocator bufferAllocator = getDrillbitContext().getAllocator(); -mutator = new TestOutputMutator(bufferAllocator); + @Before + public void setUp() { +allocator = new RootAllocator(Long.MAX_VALUE); --- End diff -- Any reason why you had to change to RootAllocator? Should not really be necessary. ---
[GitHub] drill issue #1232: DRILL-6094: Decimal data type enhancements
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1232 +1 for the C++ part. Looks really good. ---
[GitHub] drill issue #1214: DRILL-6331: Revisit Hive Drill native parquet implementat...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1214 Looks Good. Thanks for making the changes Arina. +1 ---
[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1184 > But, if April 15 is your birthday, it is your birthday in all timezones. We don't say your birthday (or order date, or newspaper issue date or...) is one day in, say London and another day in Los Angeles. If it is your birthday in California, it may already be the day after your birthday in Japan. :) IMO, Representing dates, times, and timestamp's as UTC is not the problem. It is, in fact, perfectly correct (since UTC is the timezone). Converting a date|time|timestamp without a timezone to/from UTC, is the problem. The problem is made worse by java.util and JDBC APIs. java.time gets it right though. However, as Jiang-wu points out, that still does not address the mismatch between Joda/Java8 representation and JDBC. It also does not address his original problem, the issue of how to represent a complex type in JDBC; just return an Object, it says, which is no help at all . It is even worse for ODBC which (last I checked) did not even have an API to return an Object type (which is why in ODBC we return a JSON string representation). For Jiang-wu's use case, since the string representation is not enough, we might look at returning a java.sql.Struct [1] type for Maps and java.sql.Array [2] types. [1] https://docs.oracle.com/javase/7/docs/api/java/sql/Struct.html [2] https://docs.oracle.com/javase/7/docs/api/java/sql/Array.html ---
[GitHub] drill issue #1238: DRILL-6281: Refactor TimedRunnable
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1238 Fair enough. But that *still* does not give me a clue about the problem(s) you were trying to fix, or how the refactoring helps. Is the cause of the problem in TimedRunnable? Not an unreasonable request is it? ---
[GitHub] drill issue #1238: DRILL-6281: Refactor TimedRunnable
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1238 I had the same question as Arina about the need for this change and I got info from DRILL-5908 about the problem cause and how this change will help fix it. Can you elucidate? ---
[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1184#discussion_r184243856 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) { public ${friendlyType} getObject(int index) { org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC); date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault()); - return date; + return new java.sql.Date(date.getMillis()); --- End diff -- Hmm. That takes us back to the original problem, that of the date|time|timestamp field inside a complex object. ``` select t.context.date, t.context from test t; will return a java.sql.Date object for column 1, but a java.time.LocalDate for the same object inside column 2. This doesn't seem like a good thing. ``` Why should that be a bad thing though? Ultimately, the object returned by getObject() is displayed to the end user thru the toString method. The string representation of Local[Date|Time|Timestamp] should be the same as that of java.sql.[Date|Time|Timestamp]. Isn't it? ---
[GitHub] drill pull request #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsExc...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r184242257 --- Diff: src/main/resources/checkstyle-config.xml --- @@ -30,10 +30,15 @@ + --- End diff -- So what do you tell the user when you get a runtime exception (any exception) that is the result of a bug? It is silly to show them a stack trace that does not help them. It is better to let the user know that there was an internal error and they should log a bug with support or look for help on the dev list. ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r184199682 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java --- @@ -158,25 +159,26 @@ public void close() { } catch (RuntimeException e) { ex = ex == null ? e : ex; } -try { - if (fs != null) { + +for (DrillFileSystem fs : fileSystems) { + try { fs.close(); -fs = null; - } -} catch (IOException e) { + } catch (IOException e) { throw UserException.resourceError(e) -.addContext("Failed to close the Drill file system for " + getName()) -.build(logger); + .addContext("Failed to close the Drill file system for " + getName()) + .build(logger); + } } + if (ex != null) { throw ex; } } @Override public DrillFileSystem newFileSystem(Configuration conf) throws IOException { -Preconditions.checkState(fs == null, "Tried to create a second FileSystem. Can only be called once per OperatorContext"); -fs = new DrillFileSystem(conf, getStats()); +DrillFileSystem fs = new DrillFileSystem(conf, getStats()); --- End diff -- I'm not suggesting we use the same fs for each split, but the opposite. The fs obect used per split/rowgroup should be different so that we get the right fs wait time for every minor fragment. But this change allows more than one fs object per operator context; which we were explicitly preventing earlier. I'm not sure I understand why you needed to change that. ---
[GitHub] drill issue #1231: DRILL-6342: Fix schema path unIndexed method to return co...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1231 +1. LGTM ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183909850 --- Diff: common/src/main/java/org/apache/drill/common/Stopwatch.java --- @@ -0,0 +1,186 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.common; + +import com.google.common.base.Ticker; + +import java.util.concurrent.TimeUnit; + +/** + * Helper that creates stopwatch based if debug level is enabled. --- End diff -- Do we really need this? In general we have (or should have) used Stopwatch to track metrics and or performance bottlenecks in production. In neither case do we want to enable debug. Also, for debugging performance issues (I see that the places you've changed to use this Stopwatch are places where we encountered performance issues), would it be better to use ``` Stopwatch timer; if(logger.isDebugEnabled()){ timer = Stopwatch.createStarted(); } ``` More verbose, but guaranteed to be optimized away by the JVM. Not insisting that we change this, BTW. ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183919198 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java --- @@ -158,25 +159,26 @@ public void close() { } catch (RuntimeException e) { ex = ex == null ? e : ex; } -try { - if (fs != null) { + +for (DrillFileSystem fs : fileSystems) { + try { fs.close(); -fs = null; - } -} catch (IOException e) { + } catch (IOException e) { throw UserException.resourceError(e) -.addContext("Failed to close the Drill file system for " + getName()) -.build(logger); + .addContext("Failed to close the Drill file system for " + getName()) + .build(logger); + } } + if (ex != null) { throw ex; } } @Override public DrillFileSystem newFileSystem(Configuration conf) throws IOException { -Preconditions.checkState(fs == null, "Tried to create a second FileSystem. Can only be called once per OperatorContext"); -fs = new DrillFileSystem(conf, getStats()); +DrillFileSystem fs = new DrillFileSystem(conf, getStats()); --- End diff -- I don't get why you need multiple DrillFileSystems per operator context? The reason for the DrillFileSystem abstraction (and the reason for tying it to the operator context) is to track the time a (scan) operator was waiting for a file system call to return. This is reported in the wait time for the operator in the query profile. For scans this is a critical number as the time spent waiting for a disk read determines if the query is disk bound. Associating multiple file system objects with a single operator context will throw the math out of whack. I think. ---
[GitHub] drill pull request #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsExc...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r183530901 --- Diff: src/main/resources/checkstyle-config.xml --- @@ -30,10 +30,15 @@ + --- End diff -- I think IOBE should be caught and converted to a UserException so that the end user does not get an incomprehensible error message. ---
[GitHub] drill issue #1233: Updated with links to previous releases
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1233 For older releases a single link to http://archive.apache.org/dist/drill would be better (you won't have to update this very time there is a release). ---
[GitHub] drill pull request #1223: DRILL-6324: Unnest initial implementation
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1223#discussion_r183105915 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestImpl.java --- @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.unnest; + +import com.google.common.collect.ImmutableList; +import org.apache.drill.exec.exception.SchemaChangeException; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.LateralContract; +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.TransferPair; +import org.apache.drill.exec.vector.ValueVector; +import org.apache.drill.exec.vector.complex.RepeatedValueVector; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.List; + +/** + * Contains the actual unnest operation. Unnest is a simple transfer operation in this impelementation. + * For use as a table function, we will need to change the logic of the unnest method to operate on + * more than one row at a time and remove any dependence on Lateral + * {@link org.apache.drill.exec.physical.impl.flatten.FlattenTemplate}. + * This class follows the pattern of other operators that generate code at runtime. Normally this class + * would be abstract and have placeholders for doSetup and doEval. Unnest however, doesn't require code + * generation so we can simply implement the code in a simple class that looks similar to the code gen + * templates used by other operators but does not implement the doSetup and doEval methods. + */ +public class UnnestImpl implements Unnest { + private static final Logger logger = LoggerFactory.getLogger(UnnestImpl.class); + + private static final int OUTPUT_ROW_COUNT = ValueVector.MAX_ROW_COUNT; + + private ImmutableList transfers; + private LateralContract lateral; // corresponding lateral Join (or other operator implementing the Lateral Contract) + private SelectionVectorMode svMode; + private RepeatedValueVector fieldToUnnest; + private RepeatedValueVector.RepeatedAccessor accessor; + + /** + * The output batch limit starts at OUTPUT_ROW_COUNT, but may be decreased + * if records are found to be large. + */ + private int outputLimit = OUTPUT_ROW_COUNT; + + + // The index in the unnest column that is being processed.We start at zero and continue until + // InnerValueCount is reached or if the batch limit is reached + // this allows for groups to be written between batches if we run out of space, for cases where we have finished + // a batch on the boundary it will be set to 0 + private int innerValueIndex = 0; + + @Override + public void setUnnestField(RepeatedValueVector unnestField) { +this.fieldToUnnest = unnestField; +this.accessor = RepeatedValueVector.RepeatedAccessor.class.cast(unnestField.getAccessor()); + } + + @Override + public RepeatedValueVector getUnnestField() { +return fieldToUnnest; + } + + @Override + public void setOutputCount(int outputCount) { +outputLimit = outputCount; + } + + @Override + public final int unnestRecords(final int recordCount) { +switch (svMode) { + case FOUR_BYTE: +throw new UnsupportedOperationException("Unnest does not support selection vector inputs."); + + case TWO_BYTE: +throw new UnsupportedOperationException("Unnest does not support selection vector inputs."); + + case NONE: +if (innerValueIndex == -1) { + innerValueIndex = 0; +} + +// Current record being processed in the incoming record batch. We could keep +// track of
[GitHub] drill pull request #1223: DRILL-6324: Unnest initial implementation
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1223#discussion_r183105769 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/Unnest.java --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.unnest; + +import org.apache.drill.exec.exception.SchemaChangeException; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.LateralContract; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.TransferPair; +import org.apache.drill.exec.vector.complex.RepeatedValueVector; + +import java.util.List; + +/** + * Placeholder for future unnest implementation that may require code generation. Current implementation does not + * require any + * @see UnnestImpl + */ +public interface Unnest { + //TemplateClassDefinition TEMPLATE_DEFINITION = new TemplateClassDefinition(Unnest.class, UnnestImpl + // .class); + + void setup(FragmentContext context, RecordBatch incoming, RecordBatch outgoing, List transfers, + LateralContract lateral) throws SchemaChangeException; + + int unnestRecords(int recordCount); --- End diff -- Done ---
[GitHub] drill pull request #1223: DRILL-6324: Unnest initial implementation
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1223#discussion_r183106563 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java --- @@ -0,0 +1,451 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.unnest; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.common.expression.FieldReference; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.logical.data.NamedExpression; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.exception.ClassTransformationException; +import org.apache.drill.exec.exception.OutOfMemoryException; +import org.apache.drill.exec.exception.SchemaChangeException; +import org.apache.drill.exec.expr.ClassGenerator; +import org.apache.drill.exec.expr.CodeGenerator; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.MetricDef; +import org.apache.drill.exec.physical.base.LateralContract; +import org.apache.drill.exec.physical.config.UnnestPOP; +import org.apache.drill.exec.record.AbstractSingleRecordBatch; +import org.apache.drill.exec.record.AbstractTableFunctionRecordBatch; +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode; +import org.apache.drill.exec.record.CloseableRecordBatch; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.RecordBatchMemoryManager; +import org.apache.drill.exec.record.RecordBatchSizer; +import org.apache.drill.exec.record.TransferPair; +import org.apache.drill.exec.record.TypedFieldId; +import org.apache.drill.exec.record.VectorContainer; +import org.apache.drill.exec.vector.ValueVector; +import org.apache.drill.exec.vector.complex.RepeatedMapVector; +import org.apache.drill.exec.vector.complex.RepeatedValueVector; + +import java.io.IOException; +import java.util.List; + +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK_NEW_SCHEMA; + +// TODO - handle the case where a user tries to unnest a scalar, should just return the column as is +public class UnnestRecordBatch extends AbstractTableFunctionRecordBatch { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(UnnestRecordBatch.class); + + private Unnest unnest; + private boolean hasRemainder = false; // set to true if there is data left over for the current row AND if we want +// to keep processing it. Kill may be called by a limit in a subquery that +// requires us to stop processing thecurrent row, but not stop processing +// the data. + // In some cases we need to return a predetermined state from a call to next. These are: + // 1) Kill is called due to an error occurring in the processing of the query. IterOutcome should be NONE + // 2) Kill is called by LIMIT to stop processing of the current row (This occurs when the LIMIT is part of a subquery + //between UNNEST and LATERAL. Iteroutcome should be EMIT + // 3) Kill is called by LIMIT downstream from LATERAL. IterOutcome should be NONE + private IterOutcome nextState = OK; + private int remainderIndex = 0; + private int recordCount; + private MaterializedField unnestFieldMetadata; + private final UnnestMemoryManager memoryManager; + + public enum Metric implements MetricDef { +INPUT_BATCH_COUNT, +AVG_INPUT_BATCH_BYTES, +AVG_INPUT_ROW_BYTES, +INPUT_RECORD_COUNT, +
[GitHub] drill pull request #1223: DRILL-6324: Unnest initial implementation
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1223#discussion_r183105882 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestImpl.java --- @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.unnest; + +import com.google.common.collect.ImmutableList; +import org.apache.drill.exec.exception.SchemaChangeException; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.LateralContract; +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.TransferPair; +import org.apache.drill.exec.vector.ValueVector; +import org.apache.drill.exec.vector.complex.RepeatedValueVector; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.List; + +/** + * Contains the actual unnest operation. Unnest is a simple transfer operation in this impelementation. + * For use as a table function, we will need to change the logic of the unnest method to operate on + * more than one row at a time and remove any dependence on Lateral + * {@link org.apache.drill.exec.physical.impl.flatten.FlattenTemplate}. + * This class follows the pattern of other operators that generate code at runtime. Normally this class + * would be abstract and have placeholders for doSetup and doEval. Unnest however, doesn't require code + * generation so we can simply implement the code in a simple class that looks similar to the code gen + * templates used by other operators but does not implement the doSetup and doEval methods. + */ +public class UnnestImpl implements Unnest { + private static final Logger logger = LoggerFactory.getLogger(UnnestImpl.class); + + private static final int OUTPUT_ROW_COUNT = ValueVector.MAX_ROW_COUNT; + + private ImmutableList transfers; + private LateralContract lateral; // corresponding lateral Join (or other operator implementing the Lateral Contract) + private SelectionVectorMode svMode; + private RepeatedValueVector fieldToUnnest; + private RepeatedValueVector.RepeatedAccessor accessor; + + /** + * The output batch limit starts at OUTPUT_ROW_COUNT, but may be decreased + * if records are found to be large. + */ + private int outputLimit = OUTPUT_ROW_COUNT; + + + // The index in the unnest column that is being processed.We start at zero and continue until + // InnerValueCount is reached or if the batch limit is reached + // this allows for groups to be written between batches if we run out of space, for cases where we have finished + // a batch on the boundary it will be set to 0 + private int innerValueIndex = 0; + + @Override + public void setUnnestField(RepeatedValueVector unnestField) { +this.fieldToUnnest = unnestField; +this.accessor = RepeatedValueVector.RepeatedAccessor.class.cast(unnestField.getAccessor()); + } + + @Override + public RepeatedValueVector getUnnestField() { +return fieldToUnnest; + } + + @Override + public void setOutputCount(int outputCount) { +outputLimit = outputCount; + } + + @Override + public final int unnestRecords(final int recordCount) { +switch (svMode) { --- End diff -- It is not. Changed this and put a precondition check instead ---
[GitHub] drill pull request #1223: Drill 6324: Unnest initial implementation
GitHub user parthchandra opened a pull request: https://github.com/apache/drill/pull/1223 Drill 6324: Unnest initial implementation Implementation of the unnest operator that works in sync with the Lateral Join operator. The code is based on the Flatten implementation except that it does not do the cross join that flatten does. As a result, the operator does not need any code generation. Commit # 2 - adds specific handling for kill that may be as a result of a limit being reached by a downstream operator. @sohami, @amansinha100 please review. You can merge this pull request into a Git repository by running: $ git pull https://github.com/parthchandra/drill DRILL-6324 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1223.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1223 commit 7056e66b6e05f3a8f6cb9c4326d4b0a73cd87122 Author: Parth Chandra <parthc@...> Date: 2018-02-08T05:13:13Z DRILL-6324: Unnest - Initial Implementation - Based on Flatten - Implement unnestRecords in UnnestTemplate - Remove unnecessary code inherited from Flatten/Project. Add schema change handling. - Fix build failure after rebase since RecordBatchSizer used by UNNEST was relocated to a different package - Add unit tests - Handling of input row splitting across multiple batches. Also do not kill incoming in killIncoming. - Schema change generated by Unnest commit b5aaa143089da127396b2abc766cdb14b2843817 Author: Parth Chandra <parthc@...> Date: 2018-02-26T15:58:31Z DRILL-6324: Unnest - kill handling, remove codegen, and unit test for non array columns commit 298f90654a487ab340582ce0cd5a0bf665536b9f Author: Parth Chandra <parthc@...> Date: 2018-03-07T08:23:14Z DRILL-6324: Unnest - Add tests with real Unnest and real Lateral. commit 8d9e02b4f11cda8458288c873bc2a94765569c43 Author: Parth Chandra <parthc@...> Date: 2018-03-26T11:16:33Z DRILL-6324: Unnest - code cleanup, more comments, fix license headers, and more logging. Refactor Unnest to allow setting in incoming batch after construction fix compilation after rebase ---
[GitHub] drill issue #1221: DRILL-6323: Fix license headers
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1221 Closed via cc606db ---
[GitHub] drill pull request #1221: DRILL-6323: Fix license headers
GitHub user parthchandra opened a pull request: https://github.com/apache/drill/pull/1221 DRILL-6323: Fix license headers @ilooner can you please check these. Looks like the latest commit(s) went in with some broken license headers. You can merge this pull request into a Git repository by running: $ git pull https://github.com/parthchandra/drill DRILL-6323-l Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1221.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1221 commit cc606dbb9635cee0c5ecae8dba5530a46887d481 Author: Parth Chandra <parthc@...> Date: 2018-04-18T22:19:10Z DRILL-6323: Fix license headers ---
[GitHub] drill pull request #1212: DRILL-6323: Lateral Join - Initial implementation
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1212#discussion_r182283067 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/LateralJoinBatch.java --- @@ -0,0 +1,861 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.join; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.exception.OutOfMemoryException; +import org.apache.drill.exec.exception.SchemaChangeException; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.LateralContract; +import org.apache.drill.exec.physical.config.LateralJoinPOP; +import org.apache.drill.exec.record.AbstractBinaryRecordBatch; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.exec.record.JoinBatchMemoryManager; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.RecordBatchSizer; +import org.apache.drill.exec.record.VectorAccessibleUtilities; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.vector.ValueVector; + +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.EMIT; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.NONE; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK_NEW_SCHEMA; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OUT_OF_MEMORY; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.STOP; + +/** + * RecordBatch implementation for the lateral join operator. Currently it's expected LATERAL to co-exists with UNNEST + * operator. Both LATERAL and UNNEST will share a contract with each other defined at {@link LateralContract} + */ +public class LateralJoinBatch extends AbstractBinaryRecordBatch implements LateralContract { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(LateralJoinBatch.class); + + // Input indexes to correctly update the stats + private static final int LEFT_INPUT = 0; + + private static final int RIGHT_INPUT = 1; + + // Maximum number records in the outgoing batch + private int maxOutputRowCount; + + // Schema on the left side + private BatchSchema leftSchema; + + // Schema on the right side + private BatchSchema rightSchema; + + // Index in output batch to populate next row + private int outputIndex; + + // Current index of record in left incoming which is being processed + private int leftJoinIndex = -1; + + // Current index of record in right incoming which is being processed + private int rightJoinIndex = -1; + + // flag to keep track if current left batch needs to be processed in future next call + private boolean processLeftBatchInFuture; + + // Keep track if any matching right record was found for current left index record + private boolean matchedRecordFound; + + private boolean useMemoryManager = true; + + /* + * Public Methods + * / + public LateralJoinBatch(LateralJoinPOP popConfig, FragmentContext context, + RecordBatch left, RecordBatch right) throws OutOfMemoryException { +super(popConfig,
[GitHub] drill issue #1208: DRILL-6295: PartitionerDecorator may close partitioners w...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1208 +1 ---
[GitHub] drill pull request #1212: DRILL-6323: Lateral Join - Initial implementation
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1212#discussion_r182172836 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/LateralJoinBatch.java --- @@ -0,0 +1,861 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.join; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.exception.OutOfMemoryException; +import org.apache.drill.exec.exception.SchemaChangeException; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.LateralContract; +import org.apache.drill.exec.physical.config.LateralJoinPOP; +import org.apache.drill.exec.record.AbstractBinaryRecordBatch; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.exec.record.JoinBatchMemoryManager; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.RecordBatchSizer; +import org.apache.drill.exec.record.VectorAccessibleUtilities; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.vector.ValueVector; + +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.EMIT; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.NONE; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK_NEW_SCHEMA; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OUT_OF_MEMORY; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.STOP; + +/** + * RecordBatch implementation for the lateral join operator. Currently it's expected LATERAL to co-exists with UNNEST + * operator. Both LATERAL and UNNEST will share a contract with each other defined at {@link LateralContract} + */ +public class LateralJoinBatch extends AbstractBinaryRecordBatch implements LateralContract { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(LateralJoinBatch.class); + + // Input indexes to correctly update the stats + private static final int LEFT_INPUT = 0; + + private static final int RIGHT_INPUT = 1; + + // Maximum number records in the outgoing batch + private int maxOutputRowCount; + + // Schema on the left side + private BatchSchema leftSchema; + + // Schema on the right side + private BatchSchema rightSchema; + + // Index in output batch to populate next row + private int outputIndex; + + // Current index of record in left incoming which is being processed + private int leftJoinIndex = -1; + + // Current index of record in right incoming which is being processed + private int rightJoinIndex = -1; + + // flag to keep track if current left batch needs to be processed in future next call + private boolean processLeftBatchInFuture; + + // Keep track if any matching right record was found for current left index record + private boolean matchedRecordFound; + + private boolean useMemoryManager = true; + + /* + * Public Methods + * / + public LateralJoinBatch(LateralJoinPOP popConfig, FragmentContext context, + RecordBatch left, RecordBatch right) throws OutOfMemoryException { +super(popConfig,
[GitHub] drill pull request #1212: DRILL-6323: Lateral Join - Initial implementation
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1212#discussion_r182172309 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java --- @@ -186,6 +186,11 @@ private ExecConstants() { public static final String BIT_ENCRYPTION_SASL_ENABLED = "drill.exec.security.bit.encryption.sasl.enabled"; public static final String BIT_ENCRYPTION_SASL_MAX_WRAPPED_SIZE = "drill.exec.security.bit.encryption.sasl.max_wrapped_size"; + /** --- End diff -- Not needed as you don't have code gen ---
[GitHub] drill issue #1211: DRILL-6322: Lateral Join: Common changes - Add new iterOu...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1211 +1 ---
[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1184#discussion_r181513279 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) { public ${friendlyType} getObject(int index) { org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC); date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault()); - return date; + return new java.sql.Date(date.getMillis()); --- End diff -- Sounds good. ---
[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1184#discussion_r181461481 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) { public ${friendlyType} getObject(int index) { org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC); date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault()); - return date; + return new java.sql.Date(date.getMillis()); --- End diff -- Either one is fine (since java.time is based on Joda). We've switched to Java 8, but just for consistency with the rest of the code, we might as well use Joda. ---
[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1184 I think that would be best. ---
[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1184 Can you check the unit tests after rebasing? I applied the PR to the latest master and get errors in the same tests. Thanks. ---
[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1184#discussion_r181248969 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) { public ${friendlyType} getObject(int index) { org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC); date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault()); - return date; + return new java.sql.Date(date.getMillis()); --- End diff -- Agree that messing with Timezones is a Bad Thing. Probably an artifact of the way java.util did things. Anyway, I did mean using org.joda.time.Local[Data|Time|TimeStamp] or the corresponding java.time.* classes. ---
[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1184#discussion_r180914676 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) { public ${friendlyType} getObject(int index) { org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC); date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault()); - return date; + return new java.sql.Date(date.getMillis()); --- End diff -- @jiang-wu Thanks for making these changes. Your fix is on the right track. However, I'm not sure if we want to introduce a dependency on JDBC classes in the vectors. Take a look at DateAccessor, TimeAccessor, and TimeStampAccessor. These are generated from [SqlAccessor](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/codegen/templates/SqlAccessors.java). The get methods in these convert from UTC to a Local{Date|Time|TimeStamp}. Subsequently they convert to the JDBC type since they are used by the JDBC driver. The vectors should be able to do the same, just returning the Local{Date|Time|TimeStamp} object. I'm not sure if that might affect tests that depend on timezone though. Perhaps @vdiravka can comment. ---
[GitHub] drill pull request #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsExc...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1144#discussion_r180598658 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -534,15 +534,11 @@ public void setSafe(int index, byte[] bytes) { assert index >= 0; final int currentOffset = offsetVector.getAccessor().get(index); - offsetVector.getMutator().setSafe(index + 1, currentOffset + bytes.length); - try { -data.setBytes(currentOffset, bytes, 0, bytes.length); - } catch (IndexOutOfBoundsException e) { -while (data.capacity() < currentOffset + bytes.length) { - reAlloc(); -} -data.setBytes(currentOffset, bytes, 0, bytes.length); + while (data.capacity() < currentOffset + bytes.length) { --- End diff -- I'm on the fence on this one. The change from 18 months ago resulted in a performance improvement of a few percent for variable length columns. This change essentially undoes that and takes us back to where we were. It seems that with the recent changes to batch sizing, we are on the path to eliminating the realloc behaviour that is part of the performance bottleneck for the setSafe methods. So, perhaps, it might be ok to let this one stay. ---
[GitHub] drill issue #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsException ...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1144 I think we need to include a few other folks into this. @paul-rogers, @sachouche, have also looked into the issue of excessive bounds checking and ways to write to direct memory with minimum overhead. Both Salim and Paul have done work which has tried to eliminate the excessive checking and use `PlatformDependent` directly, so it might be the right time to agree on the right approach here. At a high level, I believe there is agreement that we need to 1) reduce bounds checking to (preferably) once per write, and 2) to minimise the number of function calls before memory is actually written to. We have three layers where we could potentially check bounds - i) the operators, ii) the vectors, iii) DrillBuf. Right now, we do so at each level, at multiple times at that. Paul's work on batch sizing provides us with a layer that gives us the bounds check guarantees at the operator level. This means we could potentially use value-vectors' set methods (as opposed to the setSafe methods) and DrillBuf can use PlatformDependent directly. Some caveats - UDLE's check for and enforce little-endianness. Checking for endianness is important for value vectors because they assume little endian, but the enforcement is sometimes not so desirable. Drill's Java client uses the same DrillBuf backed by a UDLE and that means that client applications can no longer run on big endian machines (and yes, I have heard this request from end users). However, the fact is that UDLE's are an intrinsic part of the drill-memory design [1] [2]. Eliminating UDLE's can lead to re-doing large parts of very well tested code. The caveat to using the vector set methods is that the setSafe methods provide resizing capability that operators have come to rely upon. Switching from setSafe to set breaks practically every operator. [1] https://github.com/jacques-n/drill/blob/DRILL-4144/exec/memory/base/src/main/java/org/apache/drill/exec/memory/README.md [2] https://docs.google.com/nonceSigner?nonce=nj279efks0ro0=https://doc-0o-as-docs.googleusercontent.com/docs/securesc/gipu3hlcf22l6svruqr71h7qe2k3djum/5v7eb2cm4bghq76nj658ai5hkk9h52ur/152274960/11021365158327727934/11021365158327727934/0B6CmYjIAywyCalVwcURkaFlkc1U?e%3Ddownload=41l8jspccbj1pp63750c5von8ol4ijtl ---
[GitHub] drill issue #1182: DRILL-6287: apache-release profile should be disabled by ...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1182 Sorry, Maven not being a strong point, I didn't understand initially what I was looking at. +1 ---
[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1166 @rajrahul thanks for making all the changes (and of course for the fix)! ---
[GitHub] drill issue #1182: DRILL-6287: apache-release profile should be disabled by ...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1182 I don't understand why the apache-release be disabled by default. And I don't see how this change achieves that anyway. Also, moving -Xdoclint:none to all profiles implies we are no longer supporting development using JDK7 ? I'm OK with that, but not sure if we concluded that at the time of the 1.13 release. If that's what we want to do, I'm fine with this change. ---
[GitHub] drill issue #1060: DRILL-5846: Improve parquet performance for Flat Data Typ...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1060 I feel putting this PR in without finalizing DRILL-6301 is putting the cart before the horse. (BTW, it would help the discussion if the benchmarks were published !). My observation based on profiling I did sometime back is that the performance gains seen here are roughly in line with removing bounds checks. Paul has seen similar gains in the batch sizing project. Which takes us back to the question, raised by Paul in his first comment, of how we want to reconcile batch sizing and vectorizing of scans; a question we have deferred. If removing bounds checks gets us the same performance gains, then why not would put our efforts in implementing batch sizing with the accompanying elimination in bounds checking. I'm mostly not in favor of having MemoryUtils unless you make a compelling argument that it is the only way to save the planet (i.e get the performance you want). I feel operators should not establish the pattern of accessing memory directly. So far, I'm -0 on this as my arguments are mostly high level (and somewhat philosophical). Minor nitpick - The prefix VL is not as informative as say, VarLen or VariableLength. ---
[GitHub] drill issue #1168: DRILL-6246: Reduced the size of the jdbc-all jar file
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1168 I would recommend trying to setup a connection using Spotfire or Squirrel and running a couple of metadata queries and a couple of queries on complex data. (These have traditionally been areas that were affected when the jar was incomplete). Also try on a secure cluster. You might need help from other folks on some of this. ---
[GitHub] drill issue #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsException ...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1144 To be honest, I don't know. There are too many paths available to write to direct memory, some which were developed over time and so not all vector code may be using them. Ideally, every bit of code should go thru DrillBuf which wraps a UDLE (UnsafeDirectLittleEndian). UDLE has one level of bounds check (which I feel is necessary) and uses PlatformDependent directly. Vector set safe methods provides vector level bounds checking and the vector set methods bypass bounds checking. ResultSetLoader methods provide guarantees that may make the set safe methods redundant. Our goal should be to not depend on the set safe methods except for debugging. What exactly are you thinking of when proposing using PlatformDependent directly. ---
[GitHub] drill issue #1194: DRILL-6300: Refresh protobuf C++ source files
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1194 +1 ---
[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1166 +1. LGTM ---
[GitHub] drill issue #1170: DRILL-6223: Fixed several Drillbit failures due to schema...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1170 I added a comment in the JIRA - [DRILL-6223](https://issues.apache.org/jira/browse/DRILL-6223?focusedCommentId=16402223=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16402223) ---
[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1166 @rajrahul this link is good. As expected, the int96 column is dictionary encoded. Is it possible for you to extract just a couple of records from this file and then use that for a unit test? see [TestParquetWriter.testImpalaParquetBinaryAsTimeStamp_DictChange](https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java#L784) @vdiravka TestParquetWriter.testImpalaParquetBinaryAsTimeStamp_DictChange also uses an int96 that is dictionary encoded. Any idea whether (and why) it might be going thru a different code path? ---
[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1166 @rajrahul, thanks for submitting the patch. It looks good. I guess we missed dictionary encoded int96 timestamps (even though timestamps with nanosecond precision) are the one thing that should never, ever, be dictionary encoded! Just to make sure, I tried the use the sample file in DRILL-6016, but I could not even unzip it! Can you please check and see if the file is correct? WE can use that to create the unit test as well. ---
[GitHub] drill issue #1143: DRILL-1491: Support for JDK 8
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1143 +1 ---
[GitHub] drill issue #1145: DRILL-6187: Exception in RPC communication between DataCl...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1145 +1 ---
[GitHub] drill issue #1153: DRILL-6044: Fixed shutdown button in Web UI when ssl,auth...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1153 +1 ---
[GitHub] drill issue #1160: DRILL-6224: Publish current memory usage for Drillbits in...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1160 Looks wrong dunnit? Direct memory is (still?) showing as zero? ---
[GitHub] drill pull request #1143: DRILL-1491: Support for JDK 8
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1143#discussion_r173093078 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/server/TestDrillbitResilience.java --- @@ -650,6 +650,7 @@ public void dataArrived(final QueryDataBatch result, final ConnectionThrottle th @Test // DRILL-2383: Cancellation TC 3: cancel after all result set are produced but not all are fetched @Repeat(count = NUM_RUNS) + @Ignore --- End diff -- Please log a JIRA and reference it in this ignore statement ---
[GitHub] drill pull request #1156: DRILL-6218: Update release profile to not generate...
GitHub user parthchandra opened a pull request: https://github.com/apache/drill/pull/1156 DRILL-6218: Update release profile to not generate MD5 checksum @arina-ielchiieva, @amansinha100, please review You can merge this pull request into a Git repository by running: $ git pull https://github.com/parthchandra/drill DRILL-6218 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1156.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1156 commit a9c4cfde7ae41a42d396178b29c451d21a2daca7 Author: Parth Chandra <parthc@...> Date: 2018-03-07T09:53:42Z DRILL-6218: Update release profile to not generate MD5 checksum ---
[GitHub] drill issue #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsException ...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1144 AFAIK, IndexOutOfBoundsException is thrown by Netty when a write to a DrillBuf goes out of bounds (see https://github.com/netty/netty/blob/netty-4.0.48.Final/buffer/src/main/java/io/netty/buffer/AbstractByteBuf.java#L1123 and https://github.com/netty/netty/blob/netty-4.0.48.Final/buffer/src/main/java/io/netty/buffer/UnsafeByteBufUtil.java#L283 ). This is independent of the bounds check in Drill which may be disabled. ---
[GitHub] drill issue #1146: DRILL-6204: Pass tables columns without partition columns...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1146 +1 ---
[GitHub] drill issue #1134: DRILL-6191 - Add acknowledgement sequence number and flag...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1134 +1 ---
[GitHub] drill issue #1133: DRILL-6190 - Fix handling of packets longer than legally ...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1133 +1 ---
[GitHub] drill issue #1132: DRILL-6188: Fix C++ client build on Centos7, OS X
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1132 @pwong-mapr please review ---
[GitHub] drill pull request #1132: DRILL-6188: Fix C++ client build on Centos7, OS X
GitHub user parthchandra opened a pull request: https://github.com/apache/drill/pull/1132 DRILL-6188: Fix C++ client build on Centos7, OS X Removed unused method that didn't compile. Included iostream for declaration of std::cout You can merge this pull request into a Git repository by running: $ git pull https://github.com/parthchandra/drill DRILL-6188 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1132.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1132 commit bf1f90138502300649b0945fcfc2b1b109512b52 Author: Parth Chandra <parthc@...> Date: 2018-02-27T12:01:51Z DRILL-6188: Fix C++ client build on Centos7, OS X ---
[GitHub] drill issue #1124: DRILL-6172: setValueCount of VariableLengthVectors throws...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1124 +1 ---
[GitHub] drill issue #1122: DRILL-6164: Heap memory leak during parquet scan and OOM
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1122 +1. Very nice catch! ---
[GitHub] drill issue #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout(int)
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1024 +1. Look like there are no more review comments to be addressed. ---
[GitHub] drill issue #1100: DRILL-6102: Fix ConcurrentModificationException in the Ba...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1100 +1 BTW, I think the synchronization is needed only for the historical log, but this is a debug print statement so it doesn't matter. ---
[GitHub] drill issue #1097: DRILL-6100: Intermittent failure while reading Parquet fi...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1097 +1. ---
[GitHub] drill issue #1087: DRILL-6079: Attempt to fix memory leak in Parquet
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1087 +1 Thanks Salim. ---
[GitHub] drill issue #1086: DRILL-6076: Reduce the default memory from a total of 13G...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1086 The settings for running the unit tests are (from the pom file : -Xms512m -Xmx4096m -XX:MaxPermSize=512M -XX:MaxDirectMemorySize=4096M -XX:+CMSClassUnloadingEnabled -ea At the very least, the default settings for Drill and the unit tests should match. ---
[GitHub] drill issue #1076: DRILL-6036: Create sys.connections table
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1076 +1 (binding) ---
[GitHub] drill pull request #1079: DRILL-6063: Set correct ThreadContext ClassLoader ...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1079#discussion_r159508398 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserClient.java --- @@ -344,7 +344,18 @@ private boolean clientNeedsAuthExceptPlain(DrillProperties props) { mechanismName = factory.getSimpleName(); logger.trace("Will try to authenticate to server using {} mechanism with encryption context {}", mechanismName, connection.getEncryptionCtxtString()); + + // Update the thread context class loader to current class loader + // See DRILL-6063 for detailed description + final ClassLoader oldThreadCtxtCL = Thread.currentThread().getContextClassLoader(); + final ClassLoader newThreadCtxtCL = this.getClass().getClassLoader(); + Thread.currentThread().setContextClassLoader(newThreadCtxtCL); + --- End diff -- makes sense ---
[GitHub] drill pull request #1079: DRILL-6063: Set correct ThreadContext ClassLoader ...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1079#discussion_r159487621 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserClient.java --- @@ -344,7 +344,18 @@ private boolean clientNeedsAuthExceptPlain(DrillProperties props) { mechanismName = factory.getSimpleName(); logger.trace("Will try to authenticate to server using {} mechanism with encryption context {}", mechanismName, connection.getEncryptionCtxtString()); + + // Update the thread context class loader to current class loader + // See DRILL-6063 for detailed description + final ClassLoader oldThreadCtxtCL = Thread.currentThread().getContextClassLoader(); + final ClassLoader newThreadCtxtCL = this.getClass().getClassLoader(); + Thread.currentThread().setContextClassLoader(newThreadCtxtCL); + --- End diff -- Is this the only place we need to update? For eaxmple, with SASL the authentication mechanism plugin would have to be in the class path when the authentication method is dtermined. But you have already restored the thread context. (Or does the oldThreadContextCL have the SASL jars in the classpath?) ---
[GitHub] drill issue #1067: DRILL-3958: Return a valid error message when storage plu...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1067 +1 ---
[GitHub] drill pull request #1049: DRILL-5971: Fix INT64, INT32 logical types in comp...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1049#discussion_r158985937 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java --- @@ -138,6 +138,8 @@ return new ParquetFixedWidthDictionaryReaders.DictionaryBigIntReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (BigIntVector) v, schemaElement); --- End diff -- Hmm. Looks like I missed INT_32 here. Added it but I cannot seem to force the test generator to generate dictionary encoded values for the logical int types. I'll make a note to enhance it later. Note that the DATE type is (apparently) handled by the code on line 101. ---
[GitHub] drill pull request #1049: DRILL-5971: Fix INT64, INT32 logical types in comp...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1049#discussion_r158986798 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetComplex.java --- @@ -193,4 +200,168 @@ public void notxistsField() throws Exception { .run(); } + @Test //DRILL-5971 + public void testComplexLogicalIntTypes() throws Exception { +String query = String.format("select t.complextype as complextype, " + +"t.uint_64 as uint_64, t.uint_32 as uint_32, t.uint_16 as uint_16, t.uint_8 as uint_8, " + +"t.int_64 as int_64, t.int_32 as int_32, t.int_16 as int_16, t.int_8 as int_8 " + +"from cp.`store/parquet/complex/logical_int_complex.parquet` t" ); +String[] columns = {"complextype", "uint_64", "uint_32", "uint_16", "uint_8", "int_64", "int_32", "int_16", "int_8" }; +testBuilder() +.sqlQuery(query) +.unOrdered() +.baselineColumns(columns) +.baselineValues( mapOf("a","a","b","b") , 0L , 0 , 0, 0 , 0L, 0, 0 ,0 ) +.baselineValues( mapOf("a","a","b","b") , -1L , -1 , -1 , -1 , -1L , -1 , -1 , -1 ) +.baselineValues( mapOf("a","a","b","b") , 1L , 1 , 1, 1 , -9223372036854775808L , 1, 1 , 1 ) +.baselineValues( mapOf("a","a","b","b") , 9223372036854775807L , 2147483647 , 65535, 255 , 9223372036854775807L , -2147483648 , -32768 , -128 ) +.build() +.run(); + } + + @Test //DRILL-5971 + public void testComplexLogicalIntTypes2() throws Exception { +byte[] bytes12 = {'1', '2', '3', '4', '5', '6', '7', '8', '9', '0', 'a', 'b' }; +byte[] bytesOnes = new byte[12]; Arrays.fill(bytesOnes, (byte)1); +byte[] bytesZeros = new byte[12]; +String query = String.format( +" select " + +" t.rowKey as rowKey, " + +" t.StringTypes._UTF8 as _UTF8, " + +" t.StringTypes._Enum as _Enum, " + +" t.NumericTypes.Int32._INT32_RAW as _INT32_RAW, " + +" t.NumericTypes.Int32._INT_8 as _INT_8, " + +" t.NumericTypes.Int32._INT_16 as _INT_16, " + +" t.NumericTypes.Int32._INT_32 as _INT_32, " + +" t.NumericTypes.Int32._UINT_8 as _UINT_8, " + +" t.NumericTypes.Int32._UINT_16 as _UINT_16, " + +" t.NumericTypes.Int32._UINT_32 as _UINT_32, " + +" t.NumericTypes.Int64._INT64_RAW as _INT64_RAW, " + +" t.NumericTypes.Int64._INT_64 as _INT_64, " + +" t.NumericTypes.Int64._UINT_64 as _UINT_64, " + +" t.NumericTypes.DateTimeTypes._DATE_int32 as _DATE_int32, " + +" t.NumericTypes.DateTimeTypes._TIME_MILLIS_int32 as _TIME_MILLIS_int32, " + +" t.NumericTypes.DateTimeTypes._TIMESTAMP_MILLIS_int64 as _TIMESTAMP_MILLIS_int64, " + +" t.NumericTypes.DateTimeTypes._INTERVAL_fixed_len_byte_array_12 as _INTERVAL_fixed_len_byte_array_12, " + +" t.NumericTypes.Int96._INT96_RAW as _INT96_RAW " + +" from " + +" cp.`store/parquet/complex/parquet_logical_types_complex.parquet` t " + +" order by t.rowKey " +); +String[] columns = { +"rowKey " , +"_UTF8" , +"_Enum" , +"_INT32_RAW" , +"_INT_8" , +"_INT_16" , +"_INT_32" , +"_UINT_8" , +"_UINT_16" , +"_UINT_32" , +"_INT64_RAW" , +"_INT_64" , +"_UINT_64" , +"_DATE_int32" , +"_TIME_MILLIS_int32" , +"_TIMESTAMP_MILLIS_int64" , +"_INTERVAL_fixed_len_byte_array_12" , +"_INT96_RAW" + +}; +testBuilder() +.sqlQuery(query) +.ordered() --- End diff -- In these tests the result of the query is always ordered since there is only one fragment. Ordered tests run sightly faster and if/when the test fails, ordered tests are _much_ easier to debug. ---
[GitHub] drill pull request #1049: DRILL-5971: Fix INT64, INT32 logical types in comp...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1049#discussion_r158986526 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetComplex.java --- @@ -193,4 +200,168 @@ public void notxistsField() throws Exception { .run(); } + @Test //DRILL-5971 + public void testComplexLogicalIntTypes() throws Exception { +String query = String.format("select t.complextype as complextype, " + +"t.uint_64 as uint_64, t.uint_32 as uint_32, t.uint_16 as uint_16, t.uint_8 as uint_8, " + +"t.int_64 as int_64, t.int_32 as int_32, t.int_16 as int_16, t.int_8 as int_8 " + +"from cp.`store/parquet/complex/logical_int_complex.parquet` t" ); +String[] columns = {"complextype", "uint_64", "uint_32", "uint_16", "uint_8", "int_64", "int_32", "int_16", "int_8" }; +testBuilder() +.sqlQuery(query) +.unOrdered() +.baselineColumns(columns) +.baselineValues( mapOf("a","a","b","b") , 0L , 0 , 0, 0 , 0L, 0, 0 ,0 ) +.baselineValues( mapOf("a","a","b","b") , -1L , -1 , -1 , -1 , -1L , -1 , -1 , -1 ) +.baselineValues( mapOf("a","a","b","b") , 1L , 1 , 1, 1 , -9223372036854775808L , 1, 1 , 1 ) +.baselineValues( mapOf("a","a","b","b") , 9223372036854775807L , 2147483647 , 65535, 255 , 9223372036854775807L , -2147483648 , -32768 , -128 ) +.build() +.run(); + } + + @Test //DRILL-5971 + public void testComplexLogicalIntTypes2() throws Exception { +byte[] bytes12 = {'1', '2', '3', '4', '5', '6', '7', '8', '9', '0', 'a', 'b' }; +byte[] bytesOnes = new byte[12]; Arrays.fill(bytesOnes, (byte)1); --- End diff -- Can't see how that makes things more readable. This way, the declaration and initialization is together and all the declarations are clearly readable. Changed it anyway since you asked :) ---
[GitHub] drill pull request #1049: DRILL-5971: Fix INT64, INT32 logical types in comp...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1049#discussion_r159003539 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -394,9 +394,11 @@ public void get(int index, Nullable${minor.class}Holder holder){ final int offsetIndex = index * VALUE_WIDTH; final int months = data.getInt(offsetIndex); final int days= data.getInt(offsetIndex + ${minor.daysOffset}); - final int millis = data.getInt(offsetIndex + ${minor.millisecondsOffset}); + int millis = data.getInt(offsetIndex + 8); --- End diff -- Hmm. The only reason for this change was to overcome a Joda problem that caused the unit tests to fail. I fixed the unit tests and undid the change. ---
[GitHub] drill pull request #1060: DRILL-5846: Improve parquet performance for Flat D...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1060#discussion_r158549859 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/VLAbstractEntryReader.java --- @@ -0,0 +1,215 @@ +/*** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + **/ +package org.apache.drill.exec.store.parquet.columnreaders; + +import java.nio.ByteBuffer; + +import org.apache.drill.exec.store.parquet.columnreaders.VLColumnBulkInput.ColumnPrecisionInfo; +import org.apache.drill.exec.store.parquet.columnreaders.VLColumnBulkInput.PageDataInfo; +import org.apache.drill.exec.util.MemoryUtils; + +/** Abstract class for sub-classes implementing several algorithms for loading a Bulk Entry */ +abstract class VLAbstractEntryReader { + + /** byte buffer used for buffering page data */ + protected final ByteBuffer buffer; + /** Page Data Information */ + protected final PageDataInfo pageInfo; + /** expected precision type: fixed or variable length */ + protected final ColumnPrecisionInfo columnPrecInfo; + /** Bulk entry */ + protected final VLColumnBulkEntry entry; + + /** + * CTOR. + * @param _buffer byte buffer for data buffering (within CPU cache) + * @param _pageInfo page being processed information + * @param _columnPrecInfo column precision information + * @param _entry reusable bulk entry object + */ + VLAbstractEntryReader(ByteBuffer _buffer, +PageDataInfo _pageInfo, +ColumnPrecisionInfo _columnPrecInfo, +VLColumnBulkEntry _entry) { + +this.buffer = _buffer; +this.pageInfo = _pageInfo; +this.columnPrecInfo = _columnPrecInfo; +this.entry = _entry; + } + + /** + * @param valuesToRead maximum values to read within the current page + * @return a bulk entry object + */ + abstract VLColumnBulkEntry getEntry(int valsToReadWithinPage); + + /** + * Indicates whether to use bulk processing + */ + protected final boolean bulkProcess() { +return columnPrecInfo.bulkProcess; + } + + /** + * Loads new data into the buffer if empty or the force flag is set. + * + * @param force flag to force loading new data into the buffer + */ + protected final boolean load(boolean force) { + +if (!force && buffer.hasRemaining()) { + return true; // NOOP +} + +// We're here either because the buffer is empty or we want to force a new load operation. +// In the case of force, there might be unprocessed data (still in the buffer) which is fine +// since the caller updates the page data buffer's offset only for the data it has consumed; this +// means unread data will be loaded again but this time will be positioned in the beginning of the +// buffer. This can happen only for the last entry in the buffer when either of its length or value +// is incomplete. +buffer.clear(); + +int remaining = remainingPageData(); +int toCopy= remaining > buffer.capacity() ? buffer.capacity() : remaining; + +if (toCopy == 0) { + return false; +} + +pageInfo.pageData.getBytes(pageInfo.pageDataOff, buffer.array(), buffer.position(), toCopy); --- End diff -- So seriously, this is faster? I would have expected the copy from direct to java heap memory to be a big issue. There are HDFS APIs to read into ByteBuffer (not DirectByteBuffer) that we could leverage and reduce the memory copy across direct memory and Java heap memory. ---
[GitHub] drill issue #1076: DRILL-6036: Create sys.connections table
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1076 +1. Nice work. ---
[GitHub] drill issue #1049: DRILL-5971: Fix INT64, INT32 logical types in complex par...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1049 @vvysotskyi please review - added fixes for date time and interval types, fixed the reading of dictionary encoded fixed width array types (including int96), fixed object representation of interval types (used by unit tests and sqlline/jdbc), and added a new test file generator that generates data for all (primitive) logical types in either flat or hierarchical schemas. Added those files as unit tests for the logical types. ---
[GitHub] drill issue #1055: DRILL-5973 : Support injection of time-bound pauses in se...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1055 +1. No need to squash/rebase. ---
[GitHub] drill pull request #1062: DRILL-6007: Use default refresh timeout (10 second...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1062#discussion_r155875821 --- Diff: exec/java-exec/src/main/resources/rest/index.ftl --- @@ -185,11 +185,10 @@
[GitHub] drill issue #1063: DRILL-5702: Jdbc Driver Class not found
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1063 +1 LGTM ---
[GitHub] drill pull request #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1024#discussion_r155402523 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java --- @@ -156,29 +156,19 @@ public void cleanUp() { } @Override - public int getQueryTimeout() throws AlreadyClosedSqlException + public int getQueryTimeout() throws AlreadyClosedSqlException, SQLException { throwIfClosed(); -return 0; // (No no timeout.) +return super.getQueryTimeout(); } @Override - public void setQueryTimeout( int milliseconds ) + public void setQueryTimeout( int seconds ) throws AlreadyClosedSqlException, InvalidParameterSqlException, - SQLFeatureNotSupportedException { + SQLException { --- End diff -- the parent setQueryTimeout will throw a SQLException if the parameter is invalid, so this method now no longer throws an InvalidParameterSqlException ---
[GitHub] drill pull request #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1024#discussion_r155409333 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -96,6 +105,14 @@ private void throwIfClosed() throws AlreadyClosedSqlException, throw new AlreadyClosedSqlException( "ResultSet is already closed." ); } } + +//Implicit check for whether timeout is set +if (elapsedTimer != null) { --- End diff -- Do you want to try statement.cancel() to release the memory ? ---
[GitHub] drill issue #1055: DRILL-5973 : Support injection of time-bound pauses in se...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1055 This PR seems to be failing a bunch of unit that don't seem related. do you need to rebase ? ---
[GitHub] drill pull request #1041: DRILL-5961: For long running queries (> 10 min) Dr...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1041#discussion_r155397359 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/control/WorkEventBus.java --- @@ -74,83 +70,42 @@ public void statusUpdate(final FragmentStatus status) { public void addFragmentManager(final FragmentManager fragmentManager) { if (logger.isDebugEnabled()) { - logger.debug("Manager created: {}", QueryIdHelper.getQueryIdentifier(fragmentManager.getHandle())); + logger.debug("Fragment {} manager created: {}", QueryIdHelper.getQueryIdentifier(fragmentManager.getHandle()), fragmentManager); } final FragmentManager old = managers.putIfAbsent(fragmentManager.getHandle(), fragmentManager); - if (old != null) { -throw new IllegalStateException( -"Tried to set fragment manager when has already been set for the provided fragment handle."); -} - } - - public FragmentManager getFragmentManagerIfExists(final FragmentHandle handle) { -synchronized (this) { - return managers.get(handle); +if (old != null) { + throw new IllegalStateException( + String.format("Manager {} for fragment {} already exists.", old, QueryIdHelper.getQueryIdentifier(fragmentManager.getHandle(; } } - public FragmentManager getFragmentManager(final FragmentHandle handle) throws FragmentSetupException { -synchronized (this) { - // Check if this was a recently finished (completed or cancelled) fragment. If so, throw away message. - if (recentlyFinishedFragments.asMap().containsKey(handle)) { -if (logger.isDebugEnabled()) { - logger.debug("Fragment: {} was cancelled. Ignoring fragment handle", handle); -} -return null; - } - - // since non-leaf fragments are sent first, it is an error condition if the manager is unavailable. - final FragmentManager m = managers.get(handle); - if (m != null) { -return m; - } -} -throw new FragmentSetupException("Failed to receive plan fragment that was required for id: " -+ QueryIdHelper.getQueryIdentifier(handle)); + public FragmentManager getFragmentManager(final FragmentHandle handle) { +return managers.get(handle); } /** - * Removes fragment manager (for the corresponding the handle) from the work event bus. This method can be called - * multiple times. The manager will be removed only once (the first call). - * @param handle the handle to the fragment - */ - public void removeFragmentManager(final FragmentHandle handle) { -if (logger.isDebugEnabled()) { - logger.debug("Removing fragment manager: {}", QueryIdHelper.getQueryIdentifier(handle)); -} - -synchronized (this) { - final FragmentManager manager = managers.get(handle); - if (manager != null) { -recentlyFinishedFragments.put(handle, 1); -managers.remove(handle); - } else { -logger.warn("Fragment {} not found in the work bus.", QueryIdHelper.getQueryIdentifier(handle)); - } -} - } - - /** - * Cancels and removes fragment manager (for the corresponding the handle) from the work event bus, Currently, used - * for fragments waiting on data (root and intermediate). + * Optionally cancels and removes fragment manager (for the corresponding the handle) from the work event bus. Currently, used + * for fragments waiting on data (root and intermediate). This method can be called multiple times. The manager will be removed + * only once (the first call). * @param handle the handle to the fragment + * @param cancel * @return if the fragment was found and removed from the event bus */ - public boolean cancelAndRemoveFragmentManagerIfExists(final FragmentHandle handle) { -if (logger.isDebugEnabled()) { - logger.debug("Cancelling and removing fragment manager: {}", QueryIdHelper.getQueryIdentifier(handle)); -} - -synchronized (this) { - final FragmentManager manager = managers.get(handle); - if (manager == null) { -return false; + public boolean removeFragmentManager(final FragmentHandle handle, final boolean cancel) { +final FragmentManager manager = managers.remove(handle); +if (manager != null) { + assert !manager.isCancelled() : String.format("Fragment {} manager {} is already cancelled.", QueryIdHelper.getQueryIdentifier(handle), manager); + if (cancel) { +manager.can
[GitHub] drill pull request #1041: DRILL-5961: For long running queries (> 10 min) Dr...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1041#discussion_r155396889 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java --- @@ -277,7 +277,9 @@ public void startFragmentPendingRemote(final FragmentManager fragmentManager) { @Override protected void cleanup() { runningFragments.remove(fragmentHandle); - workBus.removeFragmentManager(fragmentHandle); + if (!fragmentManager.isCancelled()) { +workBus.removeFragmentManager(fragmentHandle, false); --- End diff -- Not sure why you don't want to cancel here. ---
[GitHub] drill pull request #1055: DRILL-5973 : Support injection of time-bound pause...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1055#discussion_r155326676 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/testing/CountDownLatchInjectionImpl.java --- @@ -41,8 +41,11 @@ private CountDownLatchInjectionImpl(@JsonProperty("address") final String address, @JsonProperty("port") final int port, @JsonProperty("siteClass") final String siteClass, - @JsonProperty("desc") final String desc) throws InjectionConfigurationException { -super(address, port, siteClass, desc, 0, 1); + @JsonProperty("desc") final String desc, + @JsonProperty("nSkip") final int nSkip, + @JsonProperty("nFire") final int nFire, + @JsonProperty("msPause") final Long msPause) throws InjectionConfigurationException { --- End diff -- long instead of Long? You will save a lot of unnecessary checking for null ---
[GitHub] drill issue #1050: DRILL-5964: Do not allow queries to access paths outside ...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1050 Changed the type of allowAccessOutsideWorkspace to boolean. This changes behavior. All existing dfs workspaces will also disallow access outside the workspace. Ideally, we should deprecate this parameter in a future release, but for the moment it allows existing users to access paths outside the workspace if they want to. ---
[GitHub] drill issue #1049: DRILL-5971: Fix INT64, INT32 logical types in complex par...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1049 Updated the PR to include all the remaining logical int types. While generating the test file for the unit test, found that some of the types were missing from the fast parquet reader so added those as well. ---
[GitHub] drill issue #1038: DRILL-5972: Slow performance for query on INFORMATION_SCH...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1038 +1. LGTM ---
[GitHub] drill pull request #1038: DRILL-5972: Slow performance for query on INFORMAT...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1038#discussion_r153658073 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaFilter.java --- @@ -202,14 +202,19 @@ private Result evaluateHelperFunction(Map<String, String> recordValues, Function // If at least one arg returns FALSE, then the AND function value is FALSE // If at least one arg returns INCONCLUSIVE, then the AND function value is INCONCLUSIVE // If all args return TRUE, then the AND function value is TRUE +Result result = Result.TRUE; + for(ExprNode arg : exprNode.args) { Result exprResult = evaluateHelper(recordValues, arg); - if (exprResult != Result.TRUE) { + if (exprResult == Result.FALSE) { return exprResult; } + if (exprResult == Result.INCONCLUSIVE) { --- End diff -- Just to be clear. You want to return `Result.INCONCLUSIVE` if any one of the expressions is inconclusive *and* there is no expression that is false. Correct ? ---
[GitHub] drill pull request #1050: DRILL-5964: Do not allow queries to access paths o...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1050#discussion_r153388800 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceConfig.java --- @@ -30,18 +30,24 @@ public class WorkspaceConfig { /** Default workspace is a root directory which supports read, but not write. */ - public static final WorkspaceConfig DEFAULT = new WorkspaceConfig("/", false, null); + public static final WorkspaceConfig DEFAULT = new WorkspaceConfig("/", false, null, false); private final String location; private final boolean writable; private final String defaultInputFormat; - + private final Boolean allowAccessOutsideWorkspace; // allow access outside the workspace by default. This --- End diff -- Yes it would, I believe. But we want the value to be `true` for backward compatibility. (This also addresses your next comment). So we need to know if the value is missing. Can only do that with a non primitive AFAIK. ---
[GitHub] drill pull request #1050: DRILL-5964: Do not allow queries to access paths o...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1050#discussion_r152862636 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java --- @@ -252,11 +252,15 @@ private static String buildPath(final String[] path, final int folderIndex) { return builder.toString(); } - public static FileSelection create(final DrillFileSystem fs, final String parent, final String path) throws IOException { + public static FileSelection create(final DrillFileSystem fs, final String parent, final String path, + final boolean allowAccessOutsideWorkspace) throws IOException { Stopwatch timer = Stopwatch.createStarted(); boolean hasWildcard = path.contains(WILD_CARD); final Path combined = new Path(parent, removeLeadingSlash(path)); +if (!allowAccessOutsideWorkspace) { + checkBackPaths(parent, combined.toString(), path); --- End diff -- Done ---
[GitHub] drill pull request #1050: DRILL-5964: Do not allow queries to access paths o...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1050#discussion_r152862954 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceConfig.java --- @@ -30,18 +30,24 @@ public class WorkspaceConfig { /** Default workspace is a root directory which supports read, but not write. */ - public static final WorkspaceConfig DEFAULT = new WorkspaceConfig("/", false, null); + public static final WorkspaceConfig DEFAULT = new WorkspaceConfig("/", false, null, false); private final String location; private final boolean writable; private final String defaultInputFormat; - + private final Boolean allowAccessOutsideWorkspace; // allow access outside the workspace by default. This --- End diff -- I need to check if the value is not present (i.e. null). That will be the case with all storage plugin configurations that have already been created. ---
[GitHub] drill pull request #1050: DRILL-5964: Do not allow queries to access paths o...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1050#discussion_r152862722 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java --- @@ -359,15 +363,30 @@ private static Path handleWildCard(final String root) { } } - private static String removeLeadingSlash(String path) { -if (path.charAt(0) == '/') { + public static String removeLeadingSlash(String path) { +if (!path.isEmpty() && path.charAt(0) == '/') { String newPath = path.substring(1); return removeLeadingSlash(newPath); } else { return path; } } + // Check if the path is a valid sub path under the parent after removing backpaths. Throw an exception if + // it is not + // We pass subpath in as a parameter only for the error message + public static boolean checkBackPaths(String parent, String combinedPath, String subpath) { +Preconditions.checkArgument(!parent.isEmpty()); +Preconditions.checkArgument(!combinedPath.isEmpty()); --- End diff -- Done ---