[jira] [Created] (DRILL-6354) Errors from drill-config.sh on Mac
Paul Rogers created DRILL-6354: -- Summary: Errors from drill-config.sh on Mac Key: DRILL-6354 URL: https://issues.apache.org/jira/browse/DRILL-6354 Project: Apache Drill Issue Type: Bug Affects Versions: 1.14.0 Reporter: Paul Rogers Built Drill 1.14 from sources. Tried to start a Drillbit got errors from {{drill-config.sh}}: {code} apache-drill-1.14.0-SNAPSHOT$ bin/drillbit.sh --site ~/bin/site start .../drill/distribution/target/apache-drill-1.14.0-SNAPSHOT/apache-drill-1.14.0-SNAPSHOT/bin/drill-config.sh: line 141: let: lineCount=: syntax error: operand expected (error token is "=") ...drill/distribution/target/apache-drill-1.14.0-SNAPSHOT/apache-drill-1.14.0-SNAPSHOT/bin/drill-config.sh: line 141: let: lineCount=: syntax error: operand expected (error token is "=") Starting drillbit, logging to .../drill/distribution/target/apache-drill-1.14.0-SNAPSHOT/apache-drill-1.14.0-SNAPSHOT/log/drillbit.out {code} The Drillbit did appear to start. Got even more errors when I tried to check status: {code} apache-drill-1.14.0-SNAPSHOT $ bin/drillbit.sh status .../drill/distribution/target/apache-drill-1.14.0-SNAPSHOT/apache-drill-1.14.0-SNAPSHOT/bin/drill-config.sh: line 141: let: lineCount=: syntax error: operand expected (error token is "=") .../drill/distribution/target/apache-drill-1.14.0-SNAPSHOT/apache-drill-1.14.0-SNAPSHOT/bin/drill-config.sh: line 141: let: lineCount=: syntax error: operand expected (error token is "=") .../drill/distribution/target/apache-drill-1.14.0-SNAPSHOT/apache-drill-1.14.0-SNAPSHOT/bin/drill-config.sh: line 141: let: lineCount=: syntax error: operand expected (error token is "=") .../drill/distribution/target/apache-drill-1.14.0-SNAPSHOT/apache-drill-1.14.0-SNAPSHOT/bin/drill-config.sh: line 141: let: lineCount=: syntax error: operand expected (error token is "=") drillbit is running. {code} Obviously, the error needs to be fixed. Secondly, why are we executing that same line two or four times? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6353) Upgrade Parquet MR dependencies
Vlad Rozov created DRILL-6353: - Summary: Upgrade Parquet MR dependencies Key: DRILL-6353 URL: https://issues.apache.org/jira/browse/DRILL-6353 Project: Apache Drill Issue Type: Task Reporter: Vlad Rozov Assignee: Vlad Rozov Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] drill pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver ...
Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r183898352 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java --- @@ -182,13 +184,18 @@ public IterOutcome next() { return IterOutcome.OUT_OF_MEMORY; } + // Transfer the ownership of this raw-batch to this operator for proper memory statistics reporting + batch = batch.transferBodyOwnership(oContext.getAllocator()); --- End diff -- @HanumathRao, - This will not affect the outcome since the unordered-receiver is a child of the context allocator (the minor fragment) - Doing the check earlier seemed cleaner as there is no point of changing ownership when there is already an out-of-memory condition - I think the code was written this way since implicitly, the network handlers are receiving the batch and then performing a change of ownership (from RPC to context allocator); this step could lead to an out-of-memory condition at the fragment level - If your point is about reporting, that is attributing the OOM condition to the child operator, then I guess I don't have a problem with that. @parthchandra what do you think? ---
[GitHub] drill pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver ...
Github user HanumathRao commented on a diff in the pull request: https://github.com/apache/drill/pull/1237#discussion_r183886561 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java --- @@ -182,13 +184,18 @@ public IterOutcome next() { return IterOutcome.OUT_OF_MEMORY; } + // Transfer the ownership of this raw-batch to this operator for proper memory statistics reporting + batch = batch.transferBodyOwnership(oContext.getAllocator()); --- End diff -- @sachouche Thank you for the changes. I was just contemplating if the below change batch = batch.transferBodyOwnership(oContext.getAllocator()); needs to be moved before the check at line number 183. ---
[GitHub] drill issue #1238: DRILL-6281: Refactor TimedRunnable
Github user HanumathRao commented on the issue: https://github.com/apache/drill/pull/1238 @vrozov Thanks for making the changes. Code changes looks good to me. ---
[GitHub] drill pull request #1238: DRILL-6281: Refactor TimedRunnable
Github user HanumathRao commented on a diff in the pull request: https://github.com/apache/drill/pull/1238#discussion_r183875553 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java --- @@ -0,0 +1,265 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store; + +import java.io.IOException; +import java.util.List; +import java.util.Objects; +import java.util.concurrent.Callable; +import java.util.concurrent.CancellationException; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.Future; +import java.util.concurrent.RejectedExecutionException; +import java.util.concurrent.TimeUnit; +import java.util.function.Consumer; +import java.util.function.Function; +import java.util.stream.Collectors; + +import org.apache.drill.common.exceptions.UserException; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Preconditions; +import com.google.common.base.Stopwatch; +import com.google.common.util.concurrent.MoreExecutors; +import com.google.common.util.concurrent.ThreadFactoryBuilder; + +/** + * Class used to allow parallel executions of tasks in a simplified way. Also maintains and reports timings of task completion. + * TODO: look at switching to fork join. + * @param The time value that will be returned when the task is executed. + */ +public abstract class TimedCallable implements Callable { + private static final Logger logger = LoggerFactory.getLogger(TimedCallable.class); + + private static long TIMEOUT_PER_RUNNABLE_IN_MSECS = 15000; + + private volatile long startTime = 0; + private volatile long executionTime = -1; + + private static class FutureMapper implements Function, V> { +int count; +Throwable throwable = null; + +private void setThrowable(Throwable t) { + if (throwable == null) { +throwable = t; + } else { +throwable.addSuppressed(t); + } +} + +@Override +public V apply(Future future) { + Preconditions.checkState(future.isDone()); + if (!future.isCancelled()) { +try { + count++; + return future.get(); +} catch (InterruptedException e) { + // there is no wait as we are getting result from the completed/done future + logger.error("Unexpected exception", e); + throw UserException.internalError(e) + .message("Unexpected exception") + .build(logger); +} catch (ExecutionException e) { + setThrowable(e.getCause()); +} + } else { +setThrowable(new CancellationException()); + } + return null; +} + } + + private static class Statistics implements Consumer> { +final long start = System.nanoTime(); +final Stopwatch watch = Stopwatch.createStarted(); +long totalExecution; +long maxExecution; +int count; +int startedCount; +private int doneCount; +// measure thread creation times +long earliestStart; +long latestStart; +long totalStart; + +@Override +public void accept(TimedCallable task) { + count++; + long threadStart = task.getStartTime(TimeUnit.NANOSECONDS) - start; + if (threadStart >= 0) { +startedCount++; +earliestStart = Math.min(earliestStart, threadStart); +latestStart = Math.max(latestStart, threadStart); +totalStart += threadStart; +long executionTime = task.getExecutionTime(TimeUnit.NANOSECONDS); +if (executionTime != -1) { + doneCount++; + totalExecution += executionTime; + maxExecution = Math.m
[GitHub] drill pull request #1189: DRILL-6282: Excluding io.dropwizard.metrics depend...
Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1189#discussion_r183870299 --- Diff: pom.xml --- @@ -1333,6 +1353,12 @@ + --- End diff -- Where is the dependency used? Can it be replaced with the `io.dropwizard.metrics`? ---
[GitHub] drill pull request #1189: DRILL-6282: Excluding io.dropwizard.metrics depend...
Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1189#discussion_r183868650 --- Diff: pom.xml --- @@ -1164,7 +1164,27 @@ io.dropwizard.metrics metrics-core -4.0.2 +4.1.0-rc0 --- End diff -- define version as a property and avoid using `rc` version unless it provides a significant benefit. ---
[GitHub] drill pull request #1234: DRILL-5927: Fixed memory leak in TestBsonRecordRea...
Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1234#discussion_r183865249 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/bson/TestBsonRecordReader.java --- @@ -45,21 +46,24 @@ import org.bson.BsonTimestamp; import org.bson.BsonWriter; import org.bson.types.ObjectId; -import org.junit.AfterClass; -import org.junit.BeforeClass; +import org.junit.After; +import org.junit.Before; import org.junit.Test; -public class TestBsonRecordReader extends BaseTestQuery { - private static VectorContainerWriter writer; - private static TestOutputMutator mutator; - private static BsonRecordReader bsonReader; +public class TestBsonRecordReader { + private BufferAllocator allocator; + private VectorContainerWriter writer; + private TestOutputMutator mutator; + private DrillBuf buffer; + private BsonRecordReader bsonReader; - @BeforeClass - public static void setUp() { -BufferAllocator bufferAllocator = getDrillbitContext().getAllocator(); -mutator = new TestOutputMutator(bufferAllocator); + @Before + public void setUp() { +allocator = new RootAllocator(400_000); --- End diff -- Two minor suggestions: - use `RootAllocator` with an unlimited amount of memory as limiting it to 400K does not test `BsonRecordReader` and depends more on how Vectors are allocated. - instead of working with `DrillBuf` directly, use `BufferManager`. ---
[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...
Github user jiang-wu commented on a diff in the pull request: https://github.com/apache/drill/pull/1184#discussion_r183862162 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) { public ${friendlyType} getObject(int index) { org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC); date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault()); - return date; + return new java.sql.Date(date.getMillis()); --- End diff -- As I work through using Local[Date|Time|DateTime] inside the vector package, I notice that it will create the following inconsistency on the JDBC output: SqlAccessor provides "getDate()", "getTime()", and "getTimestamp()" that are returns java.sql.[Date|Time|Timestamp]. This will convert Local[Date|Time|DateTime] into java.sql.[Date|Time|Timestamp] For complex objects, SqlAccessor provides "getObject()" which will return JsonStringHashMap or JsonStringArrayList. If the Local[Date|Time|DateTime] objects are inside the map and list, then they will NOT be converted into java.sql.[Date|Time|Timestamp]. Example: `select t.context.date, t.context from test t; ` will return a java.sql.Date object for column 1, but a java.time.LocalDate for the same object inside column 2. This doesn't seem like a good thing. What should be the right thing to do here? Introduce SqlAccessor.getLocal[Date|Time|Timestamp] accessors to supplement the existing get[Date|Time|Timestamp]? ---
[GitHub] drill issue #1234: DRILL-5927: Fixed memory leak in TestBsonRecordReader, an...
Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1234 @vrozov addressed comments ---
[GitHub] drill pull request #1234: DRILL-5927: Fixed memory leak in TestBsonRecordRea...
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1234#discussion_r183846974 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/bson/TestBsonRecordReader.java --- @@ -272,6 +276,9 @@ public static void cleanUp() { } catch (Exception e) { } + +buffer.close(); --- End diff -- k thx ---
[GitHub] drill pull request #1234: DRILL-5927: Fixed memory leak in TestBsonRecordRea...
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1234#discussion_r183846385 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/bson/TestBsonRecordReader.java --- @@ -49,17 +50,20 @@ import org.junit.BeforeClass; import org.junit.Test; -public class TestBsonRecordReader extends BaseTestQuery { +public class TestBsonRecordReader { + private static BufferAllocator allocator; --- End diff -- Initializing in Before and closing in After works. Changed the variables to be non static as well. ---
[GitHub] drill pull request #1234: DRILL-5927: Fixed memory leak in TestBsonRecordRea...
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1234#discussion_r183846152 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/bson/TestBsonRecordReader.java --- @@ -49,17 +50,20 @@ import org.junit.BeforeClass; import org.junit.Test; -public class TestBsonRecordReader extends BaseTestQuery { +public class TestBsonRecordReader { + private static BufferAllocator allocator; private static VectorContainerWriter writer; private static TestOutputMutator mutator; + private static DrillBuf buffer; private static BsonRecordReader bsonReader; @BeforeClass public static void setUp() { -BufferAllocator bufferAllocator = getDrillbitContext().getAllocator(); -mutator = new TestOutputMutator(bufferAllocator); +allocator = new RootAllocator(9_000_00); --- End diff -- After addressing review comments and recompiling I observed different behavoir. Now the tests pass with 400kb on the root allocator. But reducing it to 300kb causes an IOB. I have filed an issue here https://issues.apache.org/jira/browse/DRILL-6352 ---
[jira] [Created] (DRILL-6352) Investigate Why TestBsonRecordReader needs 900kb to run
Timothy Farkas created DRILL-6352: - Summary: Investigate Why TestBsonRecordReader needs 900kb to run Key: DRILL-6352 URL: https://issues.apache.org/jira/browse/DRILL-6352 Project: Apache Drill Issue Type: Bug Reporter: Timothy Farkas TestBsonRecordReader requires 400kb on the allocator in order to run all tests successfully. This seems like it too much. Reducing the memory below that to 300kb will cause an IOB {code} objc[92518]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/bin/java (0x10e8fe4c0) and /Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/jre/lib/libinstrument.dylib (0x10e9824e0). One of the two will be used. Which one is undefined. java.lang.IndexOutOfBoundsException: DrillBuf[7], udle: [1 0..0], index: 0, length: 4 (expected: range(0, 0)) DrillBuf[7], udle: [1 0..0] at org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) at org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) at org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe(VarCharVector.java:696) at org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setSafe(NullableVarCharVector.java:609) at org.apache.drill.exec.vector.complex.impl.NullableVarCharWriterImpl.write(NullableVarCharWriterImpl.java:110) at org.apache.drill.exec.store.bson.BsonRecordReader.writeString(BsonRecordReader.java:276) at org.apache.drill.exec.store.bson.BsonRecordReader.writeToListOrMap(BsonRecordReader.java:167) at org.apache.drill.exec.store.bson.BsonRecordReader.writeToListOrMap(BsonRecordReader.java:139) at org.apache.drill.exec.store.bson.BsonRecordReader.write(BsonRecordReader.java:75) at org.apache.drill.exec.store.bson.TestBsonRecordReader.testRecursiveDocuments(TestBsonRecordReader.java:193) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.lang.reflect.Method.invoke(Method.java:498) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6351) Drill fails with NullPointerException when starting in embedded mode
Volodymyr Vysotskyi created DRILL-6351: -- Summary: Drill fails with NullPointerException when starting in embedded mode Key: DRILL-6351 URL: https://issues.apache.org/jira/browse/DRILL-6351 Project: Apache Drill Issue Type: Bug Reporter: Volodymyr Vysotskyi When starting Drill in embedded mode and another Drill instance is already ran, it fails with NPE: {noformat} java.lang.NullPointerException at org.apache.drill.exec.coord.local.LocalClusterCoordinator.update(LocalClusterCoordinator.java:98) at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:231) at org.apache.drill.jdbc.impl.DrillConnectionImpl.cleanup(DrillConnectionImpl.java:827) at org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:186) at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:72) at org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:68) at org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:138) at org.apache.drill.jdbc.Driver.connect(Driver.java:72) at sqlline.DatabaseConnection.connect(DatabaseConnection.java:167) at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:213) at sqlline.Commands.close(Commands.java:925) at sqlline.Commands.quit(Commands.java:889) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36) at sqlline.SqlLine.dispatch(SqlLine.java:742) at sqlline.SqlLine.begin(SqlLine.java:621) at sqlline.SqlLine.start(SqlLine.java:375) at sqlline.SqlLine.main(SqlLine.java:268) {noformat} The correct exception with stack trace is: {noformat} Error: Failure in starting embedded Drillbit: java.net.BindException: Address already in use (state=,code=0) java.sql.SQLException: Failure in starting embedded Drillbit: java.net.BindException: Address already in use at org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:144) at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:72) at org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:68) at org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:138) at org.apache.drill.jdbc.Driver.connect(Driver.java:72) at sqlline.DatabaseConnection.connect(DatabaseConnection.java:167) at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:213) at sqlline.Commands.connect(Commands.java:1083) at sqlline.Commands.connect(Commands.java:1015) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36) at sqlline.SqlLine.dispatch(SqlLine.java:742) at sqlline.SqlLine.initArgs(SqlLine.java:528) at sqlline.SqlLine.begin(SqlLine.java:596) at sqlline.SqlLine.start(SqlLine.java:375) at sqlline.SqlLine.main(SqlLine.java:268) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:279) at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80) at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:218) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) at org.eclipse.jetty.server.Server.doStart(Server.java:337) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) at org.apache.drill.exec.server.rest.WebServer.start(WebServer.java:168) at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:193) at org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:135) ... 18 more {noformat} This is a regression caused by changes introduced
[GitHub] drill issue #1189: DRILL-6282: Excluding io.dropwizard.metrics dependencies
Github user vdiravka commented on the issue: https://github.com/apache/drill/pull/1189 @vrozov I have replaced `com.codahale.metrics` with last `io.dropwizard.metrics`. Please review. ---
[GitHub] drill issue #1233: Updated with links to previous releases
Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/1233 Agreed. We do need a single place for showing the releases. I'll work with Bridget to change at both places. ---
[GitHub] drill pull request #1239: DRILL-143: CGroup Support for Drill-on-YARN
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1239#discussion_r183805056 --- Diff: distribution/src/resources/yarn-drillbit.sh --- @@ -110,6 +114,36 @@ # Enables Java GC logging. Passed from the drill.yarn.drillbit.log-gc # garbage collection option. +### Function to enforce CGroup (Refer local drillbit.sh) +check_and_enforce_cgroup(){ +dbitPid=$1; +kill -0 $dbitPid +if [ $? -gt 0 ]; then + echo "ERROR: Failed to add Drillbit to CGroup ( $DRILLBIT_CGROUP ) for 'cpu'. Ensure that the Drillbit ( pid=$dbitPid ) started up." >&2 + exit 1 +fi +SYS_CGROUP_DIR=${SYS_CGROUP_DIR:-"/sys/fs/cgroup"} +if [ -f $SYS_CGROUP_DIR/cpu/$DRILLBIT_CGROUP/cgroup.procs ]; then + echo $dbitPid > $SYS_CGROUP_DIR/cpu/$DRILLBIT_CGROUP/cgroup.procs + # Verify Enforcement + cgroupStatus=`grep -w $pid $SYS_CGROUP_DIR/cpu/${DRILLBIT_CGROUP}/cgroup.procs` + if [ -z "$cgroupStatus" ]; then --- End diff -- Fixed the changes. ---
[GitHub] drill issue #1226: DRILL-3855: Enable FilterSetOpTransposeRule, DrillProject...
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/1226 Thanks for making the changes. +1 ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183777825 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestVarDecimalFunctions.java --- @@ -0,0 +1,911 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.fn.impl; + +import org.apache.drill.categories.SqlFunctionTest; +import org.apache.drill.exec.planner.physical.PlannerSettings; +import org.apache.drill.test.BaseTestQuery; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +import java.math.BigDecimal; +import java.math.MathContext; +import java.math.RoundingMode; + +@Category(SqlFunctionTest.class) +public class TestVarDecimalFunctions extends BaseTestQuery { + + // Tests for math functions + + @Test + public void testDecimalAdd() throws Exception { +String query = +"select\n" + +// checks trimming of scale +"cast('999.92345678912' as DECIMAL(38, 11))\n" + +"+ cast('0.32345678912345678912345678912345678912' as DECIMAL(38, 38)) as s1,\n" + +// sanitary checks +"cast('1234567891234567891234567891234567.89' as DECIMAL(36, 2))\n" + +"+ cast('123456789123456789123456789123456.789' as DECIMAL(36, 3)) as s2,\n" + +"cast('1234567891234567891234567891234567.89' as DECIMAL(36, 2))\n" + +"+ cast('0' as DECIMAL(36, 3)) as s3,\n" + +"cast('15.02' as DECIMAL(4, 2)) - cast('12.93' as DECIMAL(4, 2)) as s4,\n" + +"cast('11.02' as DECIMAL(4, 2)) - cast('12.93' as DECIMAL(4, 2)) as s5,\n" + +"cast('0' as DECIMAL(36, 2)) - cast('12.93' as DECIMAL(36, 2)) as s6,\n" + +// check trimming (negative scale) +"cast('2345678912' as DECIMAL(38, 0))\n" + +"+ cast('32345678912345678912345678912345678912' as DECIMAL(38, 0)) as s7"; +try { + alterSession(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY, true); --- End diff -- Consider consider using `@BeforeClass` and `@AfterClass` for decimal option. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183765063 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/FrameSupportTemplate.java --- @@ -300,7 +300,7 @@ public void cleanup() { * @param index of row to aggregate */ public abstract void evaluatePeer(@Named("index") int index); - public abstract void setupEvaluatePeer(@Named("incoming") VectorAccessible incoming, @Named("outgoing") VectorAccessible outgoing) throws SchemaChangeException; + public abstract void setupEvaluatePeer(@Named("incoming") WindowDataBatch incoming, @Named("outgoing") VectorAccessible outgoing) throws SchemaChangeException; --- End diff -- Could you please explain this change? ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183771627 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/TypeInferenceUtils.java --- @@ -382,13 +407,26 @@ public RelDataType inferReturnType(SqlOperatorBinding opBinding) { final RelDataType operandType = opBinding.getOperandType(0); final TypeProtos.MinorType inputMinorType = getDrillTypeFromCalciteType(operandType); - if(TypeCastRules.getLeastRestrictiveType(Lists.newArrayList(inputMinorType, TypeProtos.MinorType.BIGINT)) + if (TypeCastRules.getLeastRestrictiveType(Lists.newArrayList(inputMinorType, TypeProtos.MinorType.BIGINT)) == TypeProtos.MinorType.BIGINT) { return createCalciteTypeWithNullability( factory, SqlTypeName.BIGINT, isNullable); - } else if(TypeCastRules.getLeastRestrictiveType(Lists.newArrayList(inputMinorType, TypeProtos.MinorType.FLOAT8)) + } else if (TypeCastRules.getLeastRestrictiveType(Lists.newArrayList(inputMinorType, TypeProtos.MinorType.FLOAT4)) --- End diff -- Please add explanation. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183352077 --- Diff: exec/vector/src/main/codegen/templates/AbstractFieldReader.java --- @@ -29,9 +29,9 @@ * This class is generated using freemarker and the ${.template_name} template. */ @SuppressWarnings("unused") -abstract class AbstractFieldReader extends AbstractBaseReader implements FieldReader{ +abstract class AbstractFieldReader extends AbstractBaseReader implements FieldReader { - AbstractFieldReader(){ + AbstractFieldReader() { --- End diff -- You can remove `super()`. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183358609 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestVarlenDecimal.java --- @@ -17,61 +17,137 @@ */ package org.apache.drill.exec.store.parquet; +import org.apache.commons.io.FileUtils; +import org.apache.drill.exec.ExecConstants; import org.apache.drill.test.BaseTestQuery; import org.apache.drill.exec.planner.physical.PlannerSettings; -import org.apache.drill.exec.proto.UserBitShared; -import org.junit.BeforeClass; -import org.junit.AfterClass; +import org.hamcrest.CoreMatchers; +import org.junit.Assert; import org.junit.Test; -public class TestVarlenDecimal extends BaseTestQuery { - // enable decimal data type - @BeforeClass --- End diff -- Why did you remove before and after class? Instead of you are setting `alterSession(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY, true);` in tests... ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183749830 --- Diff: contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcRecordReader.java --- @@ -225,10 +247,10 @@ public int next() { int counter = 0; Boolean b = true; try { - while (counter < 4095 && b == true) { // loop at 4095 since nullables use one more than record count and we + while (counter < 4095 && b) { // loop at 4095 since nullables use one more than record count and we // allocate on powers of two. b = resultSet.next(); -if(b == false) { +if(!b) { --- End diff -- Please add space. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183778375 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestFixedlenDecimal.java --- @@ -20,61 +20,74 @@ import org.apache.drill.categories.UnlikelyTest; import org.apache.drill.test.BaseTestQuery; import org.apache.drill.exec.planner.physical.PlannerSettings; -import org.junit.BeforeClass; import org.junit.Test; import org.junit.experimental.categories.Category; @Category({UnlikelyTest.class}) public class TestFixedlenDecimal extends BaseTestQuery { - // enable decimal data type - @BeforeClass - public static void enableDecimalDataType() throws Exception { -test(String.format("alter session set `%s` = true", PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY)); - } - private static final String DATAFILE = "cp.`parquet/fixedlenDecimal.parquet`"; @Test public void testNullCount() throws Exception { -testBuilder() -.sqlQuery("select count(*) as c from %s where department_id is null", DATAFILE) -.unOrdered() -.baselineColumns("c") -.baselineValues(1L) -.build() -.run(); +try { + alterSession(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY, true); + testBuilder() + .sqlQuery("select count(*) as c from %s where department_id is null", DATAFILE) + .unOrdered() + .baselineColumns("c") + .baselineValues(1L) + .build() + .run(); +} finally { + resetSessionOption(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY); +} } @Test public void testNotNullCount() throws Exception { -testBuilder() -.sqlQuery("select count(*) as c from %s where department_id is not null", DATAFILE) -.unOrdered() -.baselineColumns("c") -.baselineValues(106L) -.build() -.run(); +try { + alterSession(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY, true); + testBuilder() + .sqlQuery("select count(*) as c from %s where department_id is not null", DATAFILE) + .unOrdered() + .baselineColumns("c") + .baselineValues(106L) + .build() --- End diff -- Please consider `go()`. Please check in other classes as well. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183763504 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java --- @@ -409,7 +409,7 @@ public void clear() { // Check if the field exists. ValueVector v = fieldVectorMap.get(field.getName()); - if (v == null || v.getClass() != clazz) { + if (v == null || !v.getField().getType().equals(field.getType())) { --- End diff -- Please add explanation comment. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183779315 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestVarlenDecimal.java --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.parquet; + +import org.apache.commons.io.FileUtils; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.test.BaseTestQuery; +import org.apache.drill.exec.planner.physical.PlannerSettings; +import org.hamcrest.CoreMatchers; +import org.junit.Assert; +import org.junit.Test; + +import java.math.BigDecimal; +import java.nio.file.Paths; + +public class TestVarlenDecimal extends BaseTestQuery { + + private static final String DATAFILE = "cp.`parquet/varlenDecimal.parquet`"; + + @Test + public void testNullCount() throws Exception { +String query = String.format("select count(*) as c from %s where department_id is null", DATAFILE); --- End diff -- `sqlQuery` can do` String.format` for you. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183766242 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillValuesRelBase.java --- @@ -169,12 +168,12 @@ private static void writeLiteral(RexLiteral literal, JsonOutput out) throws IOEx return; case DECIMAL: +// Still used double instead of decimal since the scale and precision of values are unknown --- End diff -- Please adjust comment. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183762729 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/RangeExprEvaluator.java --- @@ -219,6 +219,7 @@ private Statistics evalCastFunc(FunctionHolderExpression holderExpr, Statistics return null; // cast func between srcType and destType is NOT allowed. } + // TODO: add decimal support --- End diff -- Please remove. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183350828 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetFixedWidthDictionaryReaders.java --- @@ -248,27 +227,61 @@ protected void readField(long recordsToReadInThisPass) { } } - static class DictionaryDecimal18Reader extends FixedByteAlignedReader { -DictionaryDecimal18Reader(ParquetRecordReader parentReader, int allocateSize, ColumnDescriptor descriptor, - ColumnChunkMetaData columnChunkMetaData, boolean fixedLength, Decimal18Vector v, - SchemaElement schemaElement) throws ExecutionSetupException { + static class DictionaryVarDecimalReader extends FixedByteAlignedReader { + +DictionaryVarDecimalReader(ParquetRecordReader parentReader, int allocateSize, ColumnDescriptor descriptor, +ColumnChunkMetaData columnChunkMetaData, boolean fixedLength, VarDecimalVector v, +SchemaElement schemaElement) throws ExecutionSetupException { super(parentReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, v, schemaElement); } // this method is called by its superclass during a read loop @Override protected void readField(long recordsToReadInThisPass) { + recordsReadInThisIteration = + Math.min(pageReader.currentPageCount - pageReader.valuesRead, + recordsToReadInThisPass - valuesReadInCurrentPass); + + switch (columnDescriptor.getType()) { +case INT32: + if (usingDictionary) { +for (int i = 0; i < recordsReadInThisIteration; i++) { + byte[] bytes = Ints.toByteArray(pageReader.dictionaryValueReader.readInteger()); + setValueBytes(i, bytes); +} +setWriteIndex(); + } else { +super.readField(recordsToReadInThisPass); + } + break; +case INT64: + if (usingDictionary) { +for (int i = 0; i < recordsReadInThisIteration; i++) { + byte[] bytes = Longs.toByteArray(pageReader.dictionaryValueReader.readLong()); + setValueBytes(i, bytes); +} +setWriteIndex(); + } else { +super.readField(recordsToReadInThisPass); + } + break; + } +} - recordsReadInThisIteration = Math.min(pageReader.currentPageCount -- pageReader.valuesRead, recordsToReadInThisPass - valuesReadInCurrentPass); +/** + * Set the write Index. The next page that gets read might be a page that does not use dictionary encoding + * and we will go into the else condition below. The readField method of the parent class requires the + * writer index to be set correctly. + */ +private void setWriteIndex() { + readLengthInBits = recordsReadInThisIteration * dataTypeLengthInBits; + readLength = (int) Math.ceil(readLengthInBits / 8.0); --- End diff -- Do you know how this magic number was chosen? ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183359169 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java --- @@ -409,4 +409,30 @@ public void testNLJoinCorrectnessRightMultipleBatches() throws Exception { setSessionOption(ExecConstants.SLICE_TARGET, 10); } } + + @Test + public void testNlJoinWithStringsInCondition() throws Exception { +try { + test(DISABLE_NLJ_SCALAR); + test(DISABLE_JOIN_OPTIMIZATION); + + final String query = --- End diff -- How these changes relate to decimals? ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183770874 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/TypeInferenceUtils.java --- @@ -668,46 +706,95 @@ public RelDataType inferReturnType(SqlOperatorBinding opBinding) { } private static class DrillSameSqlReturnTypeInference implements SqlReturnTypeInference { -private static final DrillSameSqlReturnTypeInference INSTANCE = new DrillSameSqlReturnTypeInference(); +private static final DrillSameSqlReturnTypeInference INSTANCE = new DrillSameSqlReturnTypeInference(true); --- End diff -- Please rename `INSTANCE`. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183354594 --- Diff: exec/vector/src/main/codegen/templates/NullReader.java --- @@ -31,19 +31,19 @@ * This class is generated using freemarker and the ${.template_name} template. */ @SuppressWarnings("unused") -public class NullReader extends AbstractBaseReader implements FieldReader{ +public class NullReader extends AbstractBaseReader implements FieldReader { public static final NullReader INSTANCE = new NullReader(); public static final NullReader EMPTY_LIST_INSTANCE = new NullReader(Types.repeated(TypeProtos.MinorType.NULL)); public static final NullReader EMPTY_MAP_INSTANCE = new NullReader(Types.required(TypeProtos.MinorType.MAP)); private MajorType type; - private NullReader(){ + private NullReader() { --- End diff -- Please remove `super()` here and below. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183777285 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestVarDecimalFunctions.java --- @@ -0,0 +1,911 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.fn.impl; + +import org.apache.drill.categories.SqlFunctionTest; +import org.apache.drill.exec.planner.physical.PlannerSettings; +import org.apache.drill.test.BaseTestQuery; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +import java.math.BigDecimal; +import java.math.MathContext; +import java.math.RoundingMode; + +@Category(SqlFunctionTest.class) +public class TestVarDecimalFunctions extends BaseTestQuery { + + // Tests for math functions + + @Test + public void testDecimalAdd() throws Exception { +String query = +"select\n" + +// checks trimming of scale +"cast('999.92345678912' as DECIMAL(38, 11))\n" + +"+ cast('0.32345678912345678912345678912345678912' as DECIMAL(38, 38)) as s1,\n" + +// sanitary checks +"cast('1234567891234567891234567891234567.89' as DECIMAL(36, 2))\n" + +"+ cast('123456789123456789123456789123456.789' as DECIMAL(36, 3)) as s2,\n" + +"cast('1234567891234567891234567891234567.89' as DECIMAL(36, 2))\n" + +"+ cast('0' as DECIMAL(36, 3)) as s3,\n" + +"cast('15.02' as DECIMAL(4, 2)) - cast('12.93' as DECIMAL(4, 2)) as s4,\n" + +"cast('11.02' as DECIMAL(4, 2)) - cast('12.93' as DECIMAL(4, 2)) as s5,\n" + +"cast('0' as DECIMAL(36, 2)) - cast('12.93' as DECIMAL(36, 2)) as s6,\n" + +// check trimming (negative scale) +"cast('2345678912' as DECIMAL(38, 0))\n" + +"+ cast('32345678912345678912345678912345678912' as DECIMAL(38, 0)) as s7"; +try { + alterSession(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY, true); + testBuilder() +.sqlQuery(query) +.ordered() +.baselineColumns("s1", "s2", "s3", "s4", "s5", "s6", "s7") +.baselineValues( +new BigDecimal("999.92345678912") +.add(new BigDecimal("0.32345678912345678912345678912345678912")) +.round(new MathContext(38, RoundingMode.HALF_UP)), +new BigDecimal("1358024680358024680358024680358024.679"), +new BigDecimal("1234567891234567891234567891234567.890"), +new BigDecimal("2.09"), new BigDecimal("-1.91"), new BigDecimal("-12.93"), +new BigDecimal("1.3234567891234567891234567890469135782E+38")) +.build() --- End diff -- Please replace with `go()`. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183461299 --- Diff: exec/java-exec/src/test/java/org/apache/drill/PlanningBase.java --- @@ -20,12 +20,17 @@ import java.io.IOException; import java.net.URL; +import com.google.common.base.Function; --- End diff -- How did you test backward compatibility? ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183762287 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/output/DecimalReturnTypeInference.java --- @@ -281,20 +295,45 @@ @Override public TypeProtos.MajorType getType(List logicalExpressions, FunctionAttributes attributes) { int scale = 0; - int precision = 0; // Get the max scale and precision from the inputs for (LogicalExpression e : logicalExpressions) { scale = Math.max(scale, e.getMajorType().getScale()); -precision = Math.max(precision, e.getMajorType().getPrecision()); } - return (TypeProtos.MajorType.newBuilder() - .setMinorType(attributes.getReturnValue().getType().getMinorType()) + return TypeProtos.MajorType.newBuilder() + .setMinorType(TypeProtos.MinorType.VARDECIMAL) .setScale(scale) - .setPrecision(38) - .setMode(TypeProtos.DataMode.REQUIRED) - .build()); + .setPrecision(DRILL_REL_DATATYPE_SYSTEM.getMaxNumericPrecision()) + .setMode(TypeProtos.DataMode.OPTIONAL) + .build(); +} + } + + /** + * Return type calculation implementation for functions with return type set as + * {@link org.apache.drill.exec.expr.annotations.FunctionTemplate.ReturnType#DECIMAL_AVG_AGGREGATE}. + */ + public static class DecimalAvgAggReturnTypeInference implements ReturnTypeInference { --- End diff -- Please add information how precision and scale are calculated. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183761333 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64WithSeedAsDouble.java --- @@ -265,6 +268,42 @@ public void eval() { } } + @FunctionTemplate(name = "hash64AsDouble", scope = FunctionScope.SIMPLE, nulls = FunctionTemplate.NullHandling.INTERNAL) + public static class VarDecimalHash implements DrillSimpleFunc { +@Param VarDecimalHolder in; +@Param BigIntHolder seed; +@Output BigIntHolder out; + +public void setup() { +} + +public void eval() { + java.math.BigDecimal input = org.apache.drill.exec.util.DecimalUtility.getBigDecimalFromDrillBuf(in.buffer, + in.start, in.end - in.start, in.scale); + out.value = org.apache.drill.exec.expr.fn.impl.HashHelper.hash64(input.doubleValue(), seed.value); +} + } + + @FunctionTemplate(name = "hash64AsDouble", scope = FunctionScope.SIMPLE, nulls = FunctionTemplate.NullHandling.INTERNAL) + public static class NullableVarDecimalHash implements DrillSimpleFunc { +@Param NullableVarDecimalHolder in; +@Param BigIntHolder seed; +@Output BigIntHolder out; + +public void setup() { +} + +public void eval() { + if (in.isSet == 0) { +out.value = seed.value; + } else { +java.math.BigDecimal input = org.apache.drill.exec.util.DecimalUtility.getBigDecimalFromDrillBuf(in.buffer, +in.start, in.end - in.start, in.scale); +out.value = org.apache.drill.exec.expr.fn.impl.HashHelper.hash64(input.doubleValue(), seed.value); + } +} + } --- End diff -- Please consider removing functions using old decimals. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183775219 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/stat/ParquetMetaStatCollector.java --- @@ -152,6 +152,7 @@ private ColumnStatistics getStat(Object min, Object max, Long numNull, } if (min != null && max != null ) { + // TODO: add vardecimal type --- End diff -- Please remove. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183356082 --- Diff: exec/vector/src/main/codegen/templates/ValueHolders.java --- @@ -32,99 +32,81 @@ /* * This class is generated using freemarker and the ${.template_name} template. */ -public final class ${className} implements ValueHolder{ +public final class ${className} implements ValueHolder { public static final MajorType TYPE = Types.${mode.name?lower_case}(MinorType.${minor.class?upper_case}); -<#if mode.name == "Repeated"> + <#if mode.name == "Repeated"> -/** The first index (inclusive) into the Vector. **/ -public int start; + /** The first index (inclusive) into the Vector. **/ + public int start; -/** The last index (exclusive) into the Vector. **/ -public int end; + /** The last index (exclusive) into the Vector. **/ + public int end; -/** The Vector holding the actual values. **/ -public ${minor.class}Vector vector; + /** The Vector holding the actual values. **/ + public ${minor.class}Vector vector; -<#else> -public static final int WIDTH = ${type.width}; + <#else> + public static final int WIDTH = ${type.width}; -<#if mode.name == "Optional">public int isSet; -<#assign fields = minor.fields!type.fields /> -<#list fields as field> -public ${field.type} ${field.name}; - + <#if mode.name == "Optional">public int isSet; + <#assign fields = minor.fields!type.fields /> + <#list fields as field> + public ${field.type} ${field.name}; + -<#if minor.class.startsWith("Decimal")> -public static final int maxPrecision = ${minor.maxPrecisionDigits}; -<#if minor.class.startsWith("Decimal28") || minor.class.startsWith("Decimal38")> -public static final int nDecimalDigits = ${minor.nDecimalDigits}; + <#if minor.class.startsWith("Decimal")> + public static final int maxPrecision = ${minor.maxPrecisionDigits}; + <#if minor.class.startsWith("Decimal28") || minor.class.startsWith("Decimal38")> --- End diff -- Please mark old decimal value holders as deprecated. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183756898 --- Diff: exec/java-exec/src/main/codegen/templates/Decimal/CastDecimalFloat.java --- @@ -86,24 +46,21 @@ public void eval() { */ @SuppressWarnings("unused") -@FunctionTemplate(name = "cast${type.to?upper_case}", scope = FunctionTemplate.FunctionScope.SIMPLE, nulls=NullHandling.NULL_IF_NULL) +@FunctionTemplate(name = "cast${type.to?upper_case}", + scope = FunctionTemplate.FunctionScope.SIMPLE, + nulls=NullHandling.NULL_IF_NULL) --- End diff -- ` nulls=NullHandling.NULL_IF_NULL` -> `nulls = NullHandling.NULL_IF_NULL` ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183349829 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java --- @@ -228,14 +232,33 @@ private void newSchema() throws IOException { setUp(schema, consumer); } - private PrimitiveType getPrimitiveType(MaterializedField field) { + protected PrimitiveType getPrimitiveType(MaterializedField field) { MinorType minorType = field.getType().getMinorType(); String name = field.getName(); +int length = ParquetTypeHelper.getLengthForMinorType(minorType); PrimitiveTypeName primitiveTypeName = ParquetTypeHelper.getPrimitiveTypeNameForMinorType(minorType); +if (DecimalUtility.isDecimalType(minorType)) { + OptionManager optionManager = oContext.getFragmentContext().getOptions(); + if (optionManager.getString(PARQUET_WRITER_LOGICAL_TYPE_FOR_DECIMALS) --- End diff -- It looks like using `writerOptions` is more preferred approach in this class. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183354148 --- Diff: exec/vector/src/main/codegen/templates/AbstractPromotableFieldWriter.java --- @@ -75,12 +75,19 @@ public void endList() { <#list vv.types as type><#list type.minor as minor><#assign name = minor.class?cap_first /> <#assign fields = minor.fields!type.fields /> - <#if !minor.class?starts_with("Decimal") > + <#if minor.class?contains("VarDecimal")> + @Override + public void write${minor.class}(BigDecimal value) { +getWriter(MinorType.${name?upper_case}).write${minor.class}(value); + } + + @Override public void write(${name}Holder holder) { getWriter(MinorType.${name?upper_case}).write(holder); } + <#if !minor.class?contains("Decimal") || minor.class?contains("VarDecimal")> --- End diff -- Please add comment that this is done to cover previous decimal functionality. ---
[GitHub] drill pull request #1232: DRILL-6094: Decimal data type enhancements
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1232#discussion_r183768790 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java --- @@ -559,6 +560,19 @@ public RexNode makeCast(RelDataType type, RexNode exp, boolean matchNullability) if (matchNullability) { return makeAbstractCast(type, exp); } + // for the case when BigDecimal literal has a scale or precision + // that differs from the value from specified RelDataType, cast cannot be removed + // TODO: remove this code when CALCITE-1468 is fixed + if (type.getSqlTypeName() == SqlTypeName.DECIMAL && exp instanceof RexLiteral) { --- End diff -- Please create the follow up Drill Jira. ---
[GitHub] drill pull request #1234: DRILL-5927: Fixed memory leak in TestBsonRecordRea...
Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1234#discussion_r183779660 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/bson/TestBsonRecordReader.java --- @@ -49,17 +50,20 @@ import org.junit.BeforeClass; import org.junit.Test; -public class TestBsonRecordReader extends BaseTestQuery { +public class TestBsonRecordReader { + private static BufferAllocator allocator; private static VectorContainerWriter writer; private static TestOutputMutator mutator; + private static DrillBuf buffer; private static BsonRecordReader bsonReader; @BeforeClass public static void setUp() { -BufferAllocator bufferAllocator = getDrillbitContext().getAllocator(); -mutator = new TestOutputMutator(bufferAllocator); +allocator = new RootAllocator(9_000_00); --- End diff -- It does not sound right as there is not much data used in the test. I'd suggest filing a JIRA to look into why it takes 900 KB or almost 1 MB to hold data and using unlimited allocator (the same as before). ---
[GitHub] drill pull request #1226: DRILL-3855: Enable FilterSetOpTransposeRule, Drill...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1226#discussion_r183763991 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerPhase.java --- @@ -178,6 +178,12 @@ public RuleSet getRules(OptimizerRulesContext context, Collection PlannerPhase.getPhysicalRules(context), getStorageRules(context, plugins, this)); } + }, + + PRE_LOGICAL_PLANNING("Planning with Hep planner only for rules, which are failed for Volcano planner") { --- End diff -- You are right, it is necessary to add these rules to Drill's `staticRuleSet`, so then these setOpTranspose rules can cover more cases. When it will be added to Volcano Planner we can think, whether to remove `PlannerPhase.PRE_LOGICAL_PLANNING` or to leave it for more complete `directory_pruning` or think to move `PlannerPhase.DIRECTORY_PRUNING` to `staticRuleSet` for Volcano Planner too. ---
[GitHub] drill pull request #1226: DRILL-3855: Enable FilterSetOpTransposeRule, Drill...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1226#discussion_r183764377 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java --- @@ -238,26 +238,27 @@ protected DrillRel convertToRawDrel(final RelNode relNode) throws SqlUnsupported // HEP Directory pruning . final RelNode pruned = transform(PlannerType.HEP_BOTTOM_UP, PlannerPhase.DIRECTORY_PRUNING, relNode); final RelTraitSet logicalTraits = pruned.getTraitSet().plus(DrillRel.DRILL_LOGICAL); + final RelNode intermediateNode = transform(PlannerType.HEP, PlannerPhase.PRE_LOGICAL_PLANNING, pruned); --- End diff -- Nice suggestion. Done. ---
[GitHub] drill pull request #1226: DRILL-3855: Enable FilterSetOpTransposeRule, Drill...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1226#discussion_r183768380 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestUnionAll.java --- @@ -634,10 +634,14 @@ public void testFilterPushDownOverUnionAll() throws Exception { + "order by n_regionkey"; // Validate the plan -final String[] expectedPlan = {".*Filter.*\n" + -".*UnionAll.*\n" + -".*Scan.*columns=\\[`n_regionkey`\\].*\n" + -".*Scan.*columns=\\[`r_regionkey`\\].*"}; +final String[] expectedPlan = {"Sort.*\n" + --- End diff -- Looks like there are no issues related to it. I have added additional test case to verify it: `select n_nationkey from (select n_nationkey, n_name, n_comment from cp."tpch/nation.parquet" union all select r_regionkey, r_name, r_comment from cp."tpch/region.parquet") where n_nationkey > 4;` When filter is pushed down the result from right side of UNION will be empty. Generally speaking using UNION operator with empty data was resolved in [DRILL-4185](https://issues.apache.org/jira/browse/DRILL-4185) and we have test cases with empty data batches. ---
[GitHub] drill issue #1210: DRILL-6270: Add debug startup option flag for drill in em...
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1210 LGTM. ---
[GitHub] drill pull request #1230: DRILL-6345: DRILL Query fails on Function LOG10
Github user vladimirtkach commented on a diff in the pull request: https://github.com/apache/drill/pull/1230#discussion_r183753053 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java --- @@ -116,8 +115,7 @@ public void testTrigoMathFunc() throws Throwable { @Test public void testExtendedMathFunc() throws Throwable { final BigDecimal d = new BigDecimal("1001.1"); -final Object [] expected = new Object[] {Math.cbrt(1000), Math.log(10), (Math.log(64.0)/Math.log(2.0)), Math.exp(10), Math.toDegrees(0.5), Math.toRadians(45.0), Math.PI, Math.cbrt(d.doubleValue()), Math.log(d.doubleValue()), (Math.log(d.doubleValue())/Math.log(2)), Math.exp(d.doubleValue()), Math.toDegrees(d.doubleValue()), Math.toRadians(d.doubleValue())}; - +final Object [] expected = new Object[] {Math.cbrt(1000), Math.log(10), Math.log10(5), (Math.log(64.0)/Math.log(2.0)), Math.exp(10), Math.toDegrees(0.5), Math.toRadians(45.0), Math.PI, Math.cbrt(d.doubleValue()), Math.log(d.doubleValue()), (Math.log(d.doubleValue())/Math.log(2)), Math.exp(d.doubleValue()), Math.toDegrees(d.doubleValue()), Math.toRadians(d.doubleValue())}; --- End diff -- @vvysotskyi added tests with double, float, int, bigInt, uint4 types for log10 ---
[GitHub] drill pull request #1234: DRILL-5927: Fixed memory leak in TestBsonRecordRea...
Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1234#discussion_r183747252 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/bson/TestBsonRecordReader.java --- @@ -272,6 +276,9 @@ public static void cleanUp() { } catch (Exception e) { } + +buffer.close(); --- End diff -- IMO `close()` should be used within try-with-resources only (no explicit calls to `close()`). ---
[GitHub] drill pull request #1234: DRILL-5927: Fixed memory leak in TestBsonRecordRea...
Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1234#discussion_r183741516 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/bson/TestBsonRecordReader.java --- @@ -49,17 +50,20 @@ import org.junit.BeforeClass; import org.junit.Test; -public class TestBsonRecordReader extends BaseTestQuery { +public class TestBsonRecordReader { + private static BufferAllocator allocator; --- End diff -- What is a reason `allocator` and other variables are initialized in `@BeforeClass`? Can they be initialized in `@Before` instead? ---
[GitHub] drill pull request #1239: DRILL-143: CGroup Support for Drill-on-YARN
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1239#discussion_r183732780 --- Diff: distribution/src/resources/yarn-drillbit.sh --- @@ -110,6 +114,36 @@ # Enables Java GC logging. Passed from the drill.yarn.drillbit.log-gc # garbage collection option. +### Function to enforce CGroup (Refer local drillbit.sh) +check_and_enforce_cgroup(){ +dbitPid=$1; +kill -0 $dbitPid +if [ $? -gt 0 ]; then + echo "ERROR: Failed to add Drillbit to CGroup ( $DRILLBIT_CGROUP ) for 'cpu'. Ensure that the Drillbit ( pid=$dbitPid ) started up." >&2 + exit 1 +fi +SYS_CGROUP_DIR=${SYS_CGROUP_DIR:-"/sys/fs/cgroup"} +if [ -f $SYS_CGROUP_DIR/cpu/$DRILLBIT_CGROUP/cgroup.procs ]; then + echo $dbitPid > $SYS_CGROUP_DIR/cpu/$DRILLBIT_CGROUP/cgroup.procs + # Verify Enforcement + cgroupStatus=`grep -w $pid $SYS_CGROUP_DIR/cpu/${DRILLBIT_CGROUP}/cgroup.procs` + if [ -z "$cgroupStatus" ]; then --- End diff -- You're right. I seem to have missed negating the check. Since this check only affects publication of a message and not the actual application of the CGroup, we didn't catch it during testing. I'll fix this port and the original patch as well. ---
[jira] [Created] (DRILL-6350) Umbrella jira which tracks issues connected to Transitive Closure Inference
Vitalii Diravka created DRILL-6350: -- Summary: Umbrella jira which tracks issues connected to Transitive Closure Inference Key: DRILL-6350 URL: https://issues.apache.org/jira/browse/DRILL-6350 Project: Apache Drill Issue Type: Improvement Components: Query Planning & Optimization Affects Versions: Future Reporter: Vitalii Diravka Transitive closure is implemented with DRILL-6173 in Drill. But it doesn't work for some casesl, these Calcite issues can resolve them: CALCITE-1048, CALCITE-2274, CALCITE-2275, CALCITE-2241. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] drill issue #1210: DRILL-6270: Add debug startup option flag for drill in em...
Github user agozhiy commented on the issue: https://github.com/apache/drill/pull/1210 @paul-rogers / @arina-ielchiieva , I addressed your comments, could you please review? ---
[GitHub] drill pull request #1210: DRILL-6270: Add debug startup option flag for dril...
Github user agozhiy commented on a diff in the pull request: https://github.com/apache/drill/pull/1210#discussion_r183683315 --- Diff: distribution/src/resources/runbit --- @@ -65,6 +65,47 @@ drill_rotate_log () fi } +args=( $@ ) +RBARGS=() +for (( i=0; i < ${#args[@]}; i++ )); do + case "${args[i]}" in + --debug*) + DEBUG=true + DEBUG_STRING=`expr "${args[i]}" : '.*:\(.*\)'` + ;; + *) RBARGS+=("${args[i]}");; + esac +done + +# Enables remote debug if requested +# Usage: --debug:[parameter1=value,parameter2=value] +# Optional parameters: +# port=[port_number] - debug port number +# suspend=[y/n] - pause until the IDE connects + +if [ $DEBUG ]; then + debug_params=( $(echo $DEBUG_STRING | grep -o '[^,"]*') ) + for param in ${debug_params[@]}; do +case $param in +port*) + DEBUG_PORT=`expr "$param" : '.*=\(.*\)'` + ;; +suspend*) + DEBUG_SUSPEND=`expr "$param" : '.*=\(.*\)'` + ;; +esac + done + + if [ -z $DEBUG_PORT ]; then +DEBUG_PORT=5 + fi + if [ -z $DEBUG_SUSPEND ]; then +DEBUG_SUSPEND='n' + fi + + JAVA_DEBUG="-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,address=$DEBUG_PORT,server=y,suspend=$DEBUG_SUSPEND" +fi --- End diff -- Done. ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183635975 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java --- @@ -0,0 +1,195 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.parquet; + +import com.google.common.base.Functions; +import com.google.common.collect.Maps; +import org.apache.drill.common.Stopwatch; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.ExecutorFragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.ScanBatch; +import org.apache.drill.exec.store.ColumnExplorer; +import org.apache.drill.exec.store.RecordReader; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader; +import org.apache.drill.exec.store.parquet2.DrillParquetReader; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.parquet.column.ColumnDescriptor; +import org.apache.parquet.format.converter.ParquetMetadataConverter; +import org.apache.parquet.hadoop.CodecFactory; +import org.apache.parquet.hadoop.ParquetFileReader; +import org.apache.parquet.hadoop.metadata.ParquetMetadata; +import org.apache.parquet.schema.MessageType; +import org.apache.parquet.schema.Type; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.LinkedHashMap; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.TimeUnit; + +public abstract class AbstractParquetScanBatchCreator { + + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(AbstractParquetScanBatchCreator.class); + + private static final String ENABLE_BYTES_READ_COUNTER = "parquet.benchmark.bytes.read"; + private static final String ENABLE_BYTES_TOTAL_COUNTER = "parquet.benchmark.bytes.total"; + private static final String ENABLE_TIME_READ_COUNTER = "parquet.benchmark.time.read"; + + protected ScanBatch getBatch(ExecutorFragmentContext context, AbstractParquetRowGroupScan rowGroupScan, OperatorContext oContext) throws ExecutionSetupException { +final ColumnExplorer columnExplorer = new ColumnExplorer(context.getOptions(), rowGroupScan.getColumns()); + +if (!columnExplorer.isStarQuery()) { + rowGroupScan = rowGroupScan.copy(columnExplorer.getTableColumns()); + rowGroupScan.setOperatorId(rowGroupScan.getOperatorId()); +} + +boolean useAsyncPageReader = + context.getOptions().getOption(ExecConstants.PARQUET_PAGEREADER_ASYNC).bool_val; + +AbstractDrillFileSystemManager fsManager = getDrillFileSystemCreator(oContext, useAsyncPageReader); + +// keep footers in a map to avoid re-reading them +Map footers = new HashMap<>(); +List readers = new LinkedList<>(); +List> implicitColumns = new ArrayList<>(); +Map mapWithMaxColumns = new LinkedHashMap<>(); +for(RowGroupReadEntry rowGroup : rowGroupScan.getRowGroupReadEntries()) { + /* + Here we could store a map from file names to footers, to prevent re-reading the footer for each row group in a file + TODO - to prevent reading the footer again in the parquet record reader (it is read earlier in the ParquetStorageEngine) + we should add more information to the RowGroupInfo that will be populated upon the first read to + provide the reader with all of th file meta-data it needs + These fields will be added to the constructor below + */ + try { +Stopwatch timer = Stopwatch.createUnstarted(logger.isDebugEnabled()); +DrillFileSystem fs
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183558185 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java --- @@ -0,0 +1,462 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.parquet; + +import com.fasterxml.jackson.annotation.JsonIgnore; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.google.common.base.Preconditions; +import com.google.common.collect.ArrayListMultimap; +import com.google.common.collect.ListMultimap; +import org.apache.drill.common.expression.ErrorCollector; +import org.apache.drill.common.expression.ErrorCollectorImpl; +import org.apache.drill.common.expression.ExpressionStringBuilder; +import org.apache.drill.common.expression.LogicalExpression; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.expression.ValueExpressions; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.compile.sig.ConstantExpressionIdentifier; +import org.apache.drill.exec.expr.ExpressionTreeMaterializer; +import org.apache.drill.exec.expr.fn.FunctionImplementationRegistry; +import org.apache.drill.exec.expr.stat.ParquetFilterPredicate; +import org.apache.drill.exec.ops.UdfUtilities; +import org.apache.drill.exec.physical.EndpointAffinity; +import org.apache.drill.exec.physical.base.AbstractFileGroupScan; +import org.apache.drill.exec.physical.base.GroupScan; +import org.apache.drill.exec.physical.base.ScanStats; +import org.apache.drill.exec.physical.base.ScanStats.GroupScanProperty; +import org.apache.drill.exec.planner.physical.PlannerSettings; +import org.apache.drill.exec.proto.CoordinationProtos; +import org.apache.drill.exec.server.options.OptionManager; +import org.apache.drill.exec.store.ColumnExplorer; +import org.apache.drill.exec.store.dfs.FileSelection; +import org.apache.drill.exec.store.dfs.ReadEntryWithPath; +import org.apache.drill.exec.store.parquet.stat.ColumnStatistics; +import org.apache.drill.exec.store.parquet.stat.ParquetMetaStatCollector; +import org.apache.drill.exec.store.schedule.AffinityCreator; +import org.apache.drill.exec.store.schedule.AssignmentCreator; +import org.apache.drill.exec.store.schedule.EndpointByteMap; +import org.apache.drill.exec.store.schedule.EndpointByteMapImpl; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.function.Function; +import java.util.stream.Collectors; + + +import static org.apache.drill.exec.store.parquet.metadata.MetadataBase.ParquetFileMetadata; +import static org.apache.drill.exec.store.parquet.metadata.MetadataBase.RowGroupMetadata; +import static org.apache.drill.exec.store.parquet.metadata.MetadataBase.ParquetTableMetadataBase; + +public abstract class AbstractParquetGroupScan extends AbstractFileGroupScan { + + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(AbstractParquetGroupScan.class); + + protected List columns; + protected List entries; + protected LogicalExpression filter; + + protected ParquetTableMetadataBase parquetTableMetadata; + protected List rowGroupInfos; + protected ListMultimap mappings; + protected Set fileSet; + + private List endpointAffinities; + private ParquetGroupScanStatistics parquetGroupScanStatistics; + + protected AbstractParquetGroupScan(String userName, List columns, List entries, LogicalExpression filter) { +super(userName); +this.columns = columns; +this.entries = entries; +this.filter = filter; + } + + // immutable copy constructor + protected AbstractParquetGroupScan(Abstr
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183644277 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScanStatistics.java --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.parquet; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.physical.base.GroupScan; +import org.apache.parquet.schema.OriginalType; +import org.apache.parquet.schema.PrimitiveType; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.apache.drill.exec.store.parquet.metadata.MetadataBase.ColumnMetadata; +import static org.apache.drill.exec.store.parquet.metadata.MetadataBase.ParquetTableMetadataBase; +import static org.apache.drill.exec.store.parquet.metadata.Metadata_V3.ColumnTypeMetadata_v3; +import static org.apache.drill.exec.store.parquet.metadata.Metadata_V3.ParquetTableMetadata_v3; + +/** + * Holds common statistics about data in parquet group scan, + * including information about total row count, columns counts, partition columns. + */ +public class ParquetGroupScanStatistics { + + // map from file names to maps of column name to partition value mappings + private Map> partitionValueMap; + // only for partition columns : value is unique for each partition + private Map partitionColTypeMap; + // total number of non-null value for each column in parquet files + private Map columnValueCounts; + // total number of rows (obtained from parquet footer) + private long rowCount; + + + public ParquetGroupScanStatistics(List rowGroupInfos, ParquetTableMetadataBase parquetTableMetadata) { +collect(rowGroupInfos, parquetTableMetadata); + } + + public ParquetGroupScanStatistics(ParquetGroupScanStatistics that) { +this.partitionValueMap = new HashMap<>(that.partitionValueMap); +this.partitionColTypeMap = new HashMap<>(that.partitionColTypeMap); +this.columnValueCounts = new HashMap<>(that.columnValueCounts); +this.rowCount = that.rowCount; + } + + public long getColumnValueCount(SchemaPath column) { +return columnValueCounts.containsKey(column) ? columnValueCounts.get(column) : 0; + } + + public List getPartitionColumns() { +return new ArrayList<>(partitionColTypeMap.keySet()); + } + + public TypeProtos.MajorType getTypeForColumn(SchemaPath schemaPath) { +return partitionColTypeMap.get(schemaPath); + } + + public long getRowCount() { +return rowCount; + } + + public Object getPartitionValue(String path, SchemaPath column) { +return partitionValueMap.get(path).get(column); + } + + public void collect(List rowGroupInfos, ParquetTableMetadataBase parquetTableMetadata) { +resetHolders(); +boolean first = true; +for (RowGroupInfo rowGroup : rowGroupInfos) { + long rowCount = rowGroup.getRowCount(); + for (ColumnMetadata column : rowGroup.getColumns()) { +SchemaPath schemaPath = SchemaPath.getCompoundPath(column.getName()); +Long previousCount = columnValueCounts.get(schemaPath); +if (previousCount != null) { + if (previousCount != GroupScan.NO_COLUMN_STATS) { +if (column.getNulls() != null) { --- End diff -- Combine if statement with above. ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183251213 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/RowGroupInfo.java --- @@ -0,0 +1,95 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to you under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ +package org.apache.drill.exec.store.parquet; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; +import org.apache.drill.exec.store.dfs.ReadEntryFromHDFS; +import org.apache.drill.exec.store.dfs.easy.FileWork; +import org.apache.drill.exec.store.schedule.CompleteWork; +import org.apache.drill.exec.store.schedule.EndpointByteMap; + +import java.util.List; + +import static org.apache.drill.exec.store.parquet.metadata.MetadataBase.ColumnMetadata; + +public class RowGroupInfo extends ReadEntryFromHDFS implements CompleteWork, FileWork { + +private EndpointByteMap byteMap; +private int rowGroupIndex; +private List columns; +private long rowCount; // rowCount = -1 indicates to include all rows. +private long numRecordsToRead; + +@JsonCreator +public RowGroupInfo(@JsonProperty("path") String path, @JsonProperty("start") long start, +@JsonProperty("length") long length, @JsonProperty("rowGroupIndex") int rowGroupIndex, long rowCount) { + super(path, start, length); + this.rowGroupIndex = rowGroupIndex; + this.rowCount = rowCount; + this.numRecordsToRead = rowCount; +} + +public RowGroupReadEntry getRowGroupReadEntry() { + return new RowGroupReadEntry(this.getPath(), this.getStart(), this.getLength(), + this.rowGroupIndex, this.getNumRecordsToRead()); +} + +public int getRowGroupIndex() { + return this.rowGroupIndex; +} + +@Override +public int compareTo(CompleteWork o) { + return Long.compare(getTotalBytes(), o.getTotalBytes()); +} + +@Override +public long getTotalBytes() { + return this.getLength(); +} + +@Override +public EndpointByteMap getByteMap() { + return byteMap; +} + +public long getNumRecordsToRead() { + return numRecordsToRead; +} + +public void setNumRecordsToRead(long numRecords) { + numRecordsToRead = numRecords; +} + +public void setEndpointByteMap(EndpointByteMap byteMap) { + this.byteMap = byteMap; +} + +public long getRowCount() { + return rowCount; +} + +public List getColumns() { + return columns; +} + +public void setColumns(List columns) { + this.columns = columns; +} + + } --- End diff -- new line ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183253923 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetRowGroupScan.java --- @@ -0,0 +1,130 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more --- End diff -- indent ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183566717 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java --- @@ -0,0 +1,462 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.parquet; + +import com.fasterxml.jackson.annotation.JsonIgnore; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.google.common.base.Preconditions; +import com.google.common.collect.ArrayListMultimap; +import com.google.common.collect.ListMultimap; +import org.apache.drill.common.expression.ErrorCollector; +import org.apache.drill.common.expression.ErrorCollectorImpl; +import org.apache.drill.common.expression.ExpressionStringBuilder; +import org.apache.drill.common.expression.LogicalExpression; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.expression.ValueExpressions; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.compile.sig.ConstantExpressionIdentifier; +import org.apache.drill.exec.expr.ExpressionTreeMaterializer; +import org.apache.drill.exec.expr.fn.FunctionImplementationRegistry; +import org.apache.drill.exec.expr.stat.ParquetFilterPredicate; +import org.apache.drill.exec.ops.UdfUtilities; +import org.apache.drill.exec.physical.EndpointAffinity; +import org.apache.drill.exec.physical.base.AbstractFileGroupScan; +import org.apache.drill.exec.physical.base.GroupScan; +import org.apache.drill.exec.physical.base.ScanStats; +import org.apache.drill.exec.physical.base.ScanStats.GroupScanProperty; +import org.apache.drill.exec.planner.physical.PlannerSettings; +import org.apache.drill.exec.proto.CoordinationProtos; +import org.apache.drill.exec.server.options.OptionManager; +import org.apache.drill.exec.store.ColumnExplorer; +import org.apache.drill.exec.store.dfs.FileSelection; +import org.apache.drill.exec.store.dfs.ReadEntryWithPath; +import org.apache.drill.exec.store.parquet.stat.ColumnStatistics; +import org.apache.drill.exec.store.parquet.stat.ParquetMetaStatCollector; +import org.apache.drill.exec.store.schedule.AffinityCreator; +import org.apache.drill.exec.store.schedule.AssignmentCreator; +import org.apache.drill.exec.store.schedule.EndpointByteMap; +import org.apache.drill.exec.store.schedule.EndpointByteMapImpl; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.function.Function; +import java.util.stream.Collectors; + + +import static org.apache.drill.exec.store.parquet.metadata.MetadataBase.ParquetFileMetadata; +import static org.apache.drill.exec.store.parquet.metadata.MetadataBase.RowGroupMetadata; +import static org.apache.drill.exec.store.parquet.metadata.MetadataBase.ParquetTableMetadataBase; + +public abstract class AbstractParquetGroupScan extends AbstractFileGroupScan { + + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(AbstractParquetGroupScan.class); + + protected List columns; + protected List entries; + protected LogicalExpression filter; + + protected ParquetTableMetadataBase parquetTableMetadata; + protected List rowGroupInfos; + protected ListMultimap mappings; + protected Set fileSet; + + private List endpointAffinities; + private ParquetGroupScanStatistics parquetGroupScanStatistics; + + protected AbstractParquetGroupScan(String userName, List columns, List entries, LogicalExpression filter) { +super(userName); +this.columns = columns; +this.entries = entries; +this.filter = filter; + } + + // immutable copy constructor + protected AbstractParquetGroupScan(Abstr
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183551693 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java --- @@ -0,0 +1,462 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.parquet; + +import com.fasterxml.jackson.annotation.JsonIgnore; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.google.common.base.Preconditions; +import com.google.common.collect.ArrayListMultimap; +import com.google.common.collect.ListMultimap; +import org.apache.drill.common.expression.ErrorCollector; +import org.apache.drill.common.expression.ErrorCollectorImpl; +import org.apache.drill.common.expression.ExpressionStringBuilder; +import org.apache.drill.common.expression.LogicalExpression; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.expression.ValueExpressions; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.compile.sig.ConstantExpressionIdentifier; +import org.apache.drill.exec.expr.ExpressionTreeMaterializer; +import org.apache.drill.exec.expr.fn.FunctionImplementationRegistry; +import org.apache.drill.exec.expr.stat.ParquetFilterPredicate; +import org.apache.drill.exec.ops.UdfUtilities; +import org.apache.drill.exec.physical.EndpointAffinity; +import org.apache.drill.exec.physical.base.AbstractFileGroupScan; +import org.apache.drill.exec.physical.base.GroupScan; +import org.apache.drill.exec.physical.base.ScanStats; +import org.apache.drill.exec.physical.base.ScanStats.GroupScanProperty; +import org.apache.drill.exec.planner.physical.PlannerSettings; +import org.apache.drill.exec.proto.CoordinationProtos; +import org.apache.drill.exec.server.options.OptionManager; +import org.apache.drill.exec.store.ColumnExplorer; +import org.apache.drill.exec.store.dfs.FileSelection; +import org.apache.drill.exec.store.dfs.ReadEntryWithPath; +import org.apache.drill.exec.store.parquet.stat.ColumnStatistics; +import org.apache.drill.exec.store.parquet.stat.ParquetMetaStatCollector; +import org.apache.drill.exec.store.schedule.AffinityCreator; +import org.apache.drill.exec.store.schedule.AssignmentCreator; +import org.apache.drill.exec.store.schedule.EndpointByteMap; +import org.apache.drill.exec.store.schedule.EndpointByteMapImpl; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.function.Function; +import java.util.stream.Collectors; + + +import static org.apache.drill.exec.store.parquet.metadata.MetadataBase.ParquetFileMetadata; +import static org.apache.drill.exec.store.parquet.metadata.MetadataBase.RowGroupMetadata; +import static org.apache.drill.exec.store.parquet.metadata.MetadataBase.ParquetTableMetadataBase; + +public abstract class AbstractParquetGroupScan extends AbstractFileGroupScan { + + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(AbstractParquetGroupScan.class); + + protected List columns; + protected List entries; + protected LogicalExpression filter; + + protected ParquetTableMetadataBase parquetTableMetadata; + protected List rowGroupInfos; + protected ListMultimap mappings; + protected Set fileSet; + + private List endpointAffinities; + private ParquetGroupScanStatistics parquetGroupScanStatistics; + + protected AbstractParquetGroupScan(String userName, List columns, List entries, LogicalExpression filter) { +super(userName); +this.columns = columns; +this.entries = entries; +this.filter = filter; + } + + // immutable copy constructor + protected AbstractParquetGroupScan(Abstr
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183646149 --- Diff: contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveDrillNativeParquetReader.java --- @@ -0,0 +1,247 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to you under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ +package org.apache.drill.exec; + +import org.apache.drill.PlanTestBase; +import org.apache.drill.categories.HiveStorageTest; +import org.apache.drill.categories.SlowTest; +import org.apache.drill.common.exceptions.UserRemoteException; +import org.apache.drill.exec.hive.HiveTestBase; +import org.apache.drill.exec.planner.physical.PlannerSettings; +import org.hamcrest.CoreMatchers; +import org.joda.time.DateTime; +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.Test; +import org.junit.experimental.categories.Category; +import org.junit.rules.ExpectedException; + +import java.math.BigDecimal; +import java.sql.Date; +import java.sql.Timestamp; + +import static org.hamcrest.CoreMatchers.containsString; +import static org.junit.Assert.assertEquals; + +@Category({SlowTest.class, HiveStorageTest.class}) +public class TestHiveDrillNativeParquetReader extends HiveTestBase { + + @BeforeClass + public static void init() { +setSessionOption(ExecConstants.HIVE_OPTIMIZE_SCAN_WITH_NATIVE_READERS, true); +setSessionOption(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY, true); + } + + @AfterClass + public static void cleanup() { + resetSessionOption(ExecConstants.HIVE_OPTIMIZE_SCAN_WITH_NATIVE_READERS); +resetSessionOption(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY); + } + + @Rule + public ExpectedException thrown = ExpectedException.none(); + + @Test + public void testFilterPushDownForManagedTable() throws Exception { +String query = "select * from hive.kv_native where key > 1"; + +int actualRowCount = testSql(query); +assertEquals("Expected and actual row count should match", 2, actualRowCount); + +testPlanMatchingPatterns(query, +new String[]{"HiveDrillNativeParquetScan", "numFiles=1"}, new String[]{}); + } + + @Test + public void testFilterPushDownForExternalTable() throws Exception { +String query = "select * from hive.kv_native_ext where key = 1"; + +int actualRowCount = testSql(query); +assertEquals("Expected and actual row count should match", 1, actualRowCount); + +testPlanMatchingPatterns(query, +new String[]{"HiveDrillNativeParquetScan", "numFiles=1"}, new String[]{}); --- End diff -- I have added method without `excludedPatterns`, when it is not necessary. But it is not merged for now. Is it better to pass null, than to create empty String? ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183250175 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java --- @@ -1,114 +1,223 @@ /* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ +* Licensed to the Apache Software Foundation (ASF) under one or more --- End diff -- indent ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183647695 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java --- @@ -166,25 +171,43 @@ public boolean matches(RelOptRuleCall call) { @Override public void onMatch(RelOptRuleCall call) { try { - final DrillScanRel hiveScanRel = (DrillScanRel) call.rel(0); + final DrillScanRel hiveScanRel = call.rel(0); final HiveScan hiveScan = (HiveScan) hiveScanRel.getGroupScan(); final PlannerSettings settings = PrelUtil.getPlannerSettings(call.getPlanner()); final String partitionColumnLabel = settings.getFsPartitionColumnLabel(); final Table hiveTable = hiveScan.getHiveReadEntry().getTable(); - checkForUnsupportedDataTypes(hiveTable); + final HiveReadEntry hiveReadEntry = hiveScan.getHiveReadEntry(); + + final HiveMetadataProvider hiveMetadataProvider = new HiveMetadataProvider(hiveScan.getUserName(), hiveReadEntry, hiveScan.getStoragePlugin().getHiveConf()); --- End diff -- line break ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183644581 --- Diff: contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveDrillNativeParquetReader.java --- @@ -0,0 +1,247 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more --- End diff -- indent ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183632379 --- Diff: contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java --- @@ -64,16 +68,17 @@ public static synchronized HiveTestDataGenerator getInstance(File baseDir) throw final String dbDir = dbDirFile.getAbsolutePath(); final String whDir = whDirFile.getAbsolutePath(); - instance = new HiveTestDataGenerator(dbDir, whDir); + instance = new HiveTestDataGenerator(dbDir, whDir, dirTestWatcher); instance.generateTestData(); } return instance; } - private HiveTestDataGenerator(final String dbDir, final String whDir) { + private HiveTestDataGenerator(final String dbDir, final String whDir, final BaseDirTestWatcher dirTestWatcher) { this.dbDir = dbDir; this.whDir = whDir; +this.dirTestWatcher = dirTestWatcher; config = Maps.newHashMap(); config.put("hive.metastore.uris", ""); --- End diff -- "hive.metastore.uris" -> ConfVars.METASTOREURIS ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183633623 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java --- @@ -147,10 +147,12 @@ CloseableRecordBatch getReaderBatch(FragmentContext context, EasySubScan scan) t List readers = new LinkedList<>(); List> implicitColumns = Lists.newArrayList(); Map mapWithMaxColumns = Maps.newLinkedHashMap(); +boolean supportsFileImplicitColumns = scan.getSelectionRoot() != null; for(FileWork work : scan.getWorkUnits()){ --- End diff -- `for (` ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183253252 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRowGroupScan.java --- @@ -40,31 +36,26 @@ import com.fasterxml.jackson.annotation.JsonProperty; import com.fasterxml.jackson.annotation.JsonTypeName; import com.google.common.base.Preconditions; -import com.google.common.collect.Iterators; +import org.apache.hadoop.conf.Configuration; // Class containing information for reading a single parquet row group form HDFS --- End diff -- form - > from ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183633688 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java --- @@ -0,0 +1,195 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.parquet; + +import com.google.common.base.Functions; +import com.google.common.collect.Maps; +import org.apache.drill.common.Stopwatch; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.ExecutorFragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.ScanBatch; +import org.apache.drill.exec.store.ColumnExplorer; +import org.apache.drill.exec.store.RecordReader; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader; +import org.apache.drill.exec.store.parquet2.DrillParquetReader; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.parquet.column.ColumnDescriptor; +import org.apache.parquet.format.converter.ParquetMetadataConverter; +import org.apache.parquet.hadoop.CodecFactory; +import org.apache.parquet.hadoop.ParquetFileReader; +import org.apache.parquet.hadoop.metadata.ParquetMetadata; +import org.apache.parquet.schema.MessageType; +import org.apache.parquet.schema.Type; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.LinkedHashMap; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.TimeUnit; + +public abstract class AbstractParquetScanBatchCreator { + + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(AbstractParquetScanBatchCreator.class); + + private static final String ENABLE_BYTES_READ_COUNTER = "parquet.benchmark.bytes.read"; + private static final String ENABLE_BYTES_TOTAL_COUNTER = "parquet.benchmark.bytes.total"; + private static final String ENABLE_TIME_READ_COUNTER = "parquet.benchmark.time.read"; + + protected ScanBatch getBatch(ExecutorFragmentContext context, AbstractParquetRowGroupScan rowGroupScan, OperatorContext oContext) throws ExecutionSetupException { +final ColumnExplorer columnExplorer = new ColumnExplorer(context.getOptions(), rowGroupScan.getColumns()); + +if (!columnExplorer.isStarQuery()) { + rowGroupScan = rowGroupScan.copy(columnExplorer.getTableColumns()); + rowGroupScan.setOperatorId(rowGroupScan.getOperatorId()); +} + +boolean useAsyncPageReader = + context.getOptions().getOption(ExecConstants.PARQUET_PAGEREADER_ASYNC).bool_val; + +AbstractDrillFileSystemManager fsManager = getDrillFileSystemCreator(oContext, useAsyncPageReader); + +// keep footers in a map to avoid re-reading them +Map footers = new HashMap<>(); +List readers = new LinkedList<>(); +List> implicitColumns = new ArrayList<>(); +Map mapWithMaxColumns = new LinkedHashMap<>(); +for(RowGroupReadEntry rowGroup : rowGroupScan.getRowGroupReadEntries()) { --- End diff -- `for (` ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183633517 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/ColumnExplorer.java --- @@ -156,43 +157,74 @@ public static boolean isPartitionColumn(String partitionDesignator, String path) } /** - * Compares selection root and actual file path to determine partition columns values. - * Adds implicit file columns according to columns list. + * Creates map with implicit columns where key is column name, value is columns actual value. + * This map contains partition and implicit file columns (if requested). + * Partition columns names are formed based in partition designator and value index. * - * @return map with columns names as keys and their values + * @param filePath file path, used to populate file implicit columns + * @param partitionValues list of partition values + * @param includeFileImplicitColumns if file implicit columns should be included into the result + * @return implicit columns map */ - public Map populateImplicitColumns(FileWork work, String selectionRoot) { -return populateImplicitColumns(work.getPath(), selectionRoot); - } + public Map populateImplicitColumns(String filePath, + List partitionValues, + boolean includeFileImplicitColumns) { +Map implicitValues = new LinkedHashMap<>(); - /** - * Compares selection root and actual file path to determine partition columns values. - * Adds implicit file columns according to columns list. - * - * @return map with columns names as keys and their values - */ - public Map populateImplicitColumns(String filePath, String selectionRoot) { -Map implicitValues = Maps.newLinkedHashMap(); -if (selectionRoot != null) { - String[] r = Path.getPathWithoutSchemeAndAuthority(new Path(selectionRoot)).toString().split("/"); - Path path = Path.getPathWithoutSchemeAndAuthority(new Path(filePath)); - String[] p = path.toString().split("/"); - if (p.length > r.length) { -String[] q = ArrayUtils.subarray(p, r.length, p.length - 1); -for (int a = 0; a < q.length; a++) { - if (isStarQuery || selectedPartitionColumns.contains(a)) { -implicitValues.put(partitionDesignator + a, q[a]); - } -} +for(int i = 0; i < partitionValues.size(); i++) { --- End diff -- `for (` ---
[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r183251188 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/RowGroupInfo.java --- @@ -0,0 +1,95 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more --- End diff -- indent ---