[GitHub] drill issue #1141: DRILL-6197: Skip duplicate entry for OperatorStats
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/1141 +1. ---
[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1141#discussion_r171767495 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java --- @@ -79,4 +71,21 @@ public void addOperatorStats(OperatorStats stats) { operators.add(stats); } + //DRILL-6197 + public OperatorStats addOrReplaceOperatorStats(OperatorStats stats) { +//Remove existing stat +OperatorStats replacedStat = null; +int index = 0; +for (OperatorStats opStat : operators) { --- End diff -- Everything worked fine. Tried doing a join for TPCH tables - `lineitem` and `orders`, and confirmed no more duplicates for SCREEN, SINGLE_SENDER and HASH_PARTITION_SENDER. For a smaller substitute of `orders` with `supplier` ; confirmed that the BROADCAST_SENDER was also not having duplicates.. ---
[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1141#discussion_r171753536 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java --- @@ -79,4 +71,21 @@ public void addOperatorStats(OperatorStats stats) { operators.add(stats); } + //DRILL-6197 + public OperatorStats addOrReplaceOperatorStats(OperatorStats stats) { +//Remove existing stat +OperatorStats replacedStat = null; +int index = 0; +for (OperatorStats opStat : operators) { --- End diff -- LGTM. Hopefully it did not break existing stuff..so will wait for your confirmation. ---
[GitHub] drill issue #1105: DRILL-6125: Fix possible memory leak when query is cancel...
Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1105 @arina-ielchiieva @vrozov I believe I have a solution. There were several issues with the original code. 1. It made incorrect assumptions about how cache invalidation works with java **synchronized**. 2. It assumed **innerNext** and **close** would be called sequentially. I believe this PR fixes these issues now and I have gone into more detail about the problems below. # 1. Incorrect Cache Invalidation Assumptions The original code was trying to be smart by trying to reduce synchronization overhead on **innerNext**. So the code in **innerNext** did not synchronize before changing the partitioner object since this would be called often. The code in **close()** and ** receivingFragmentFinished()** synchronized before accessing the partitioner with the intention that this would trigger an update of the partitioner variable state across all threads. Unfortunately, this assumption is invalid (see https://stackoverflow.com/questions/22706739/does-synchronized-guarantee-a-thread-will-see-the-latest-value-of-a-non-volatile). Every thread that accesses a variable must synchronize before accessing a variable in order to properly invalidate cached data on a core. For example if **Thread A** modifies **Variable 1** then **Thread B** synchronizes before accessing **Variable 1** then there is no guarantee **Thread B** will see the most updated value for **Variable 1** since it might . ## Solution In summary the right thing to do is the simple thing. Make the methods synchronized. Unfortunately there is no way to outsmart the system and reduce synchronization overhead without causing race conditions. # 2. Concurrent InnerNext and Close Calls The original code did not consider the case that innerNext was in the middle of execution when close was called. It did try to handle the case where **innerNext** could be called after **close** by setting the **ok** variable. But it didn't even do that right because there was no synchronization around the **ok** variable. ## Solution The right thing to do is the simple thing. Make sure the methods are synchronized so close has to wait until innerNext is done before executing. Also when a query is cancelled the executing thread should be interrupted the thread running innerNext incase it is on a blocking call. ---
[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1141#discussion_r171748860 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java --- @@ -79,4 +71,21 @@ public void addOperatorStats(OperatorStats stats) { operators.add(stats); } + //DRILL-6197 + public OperatorStats addOrReplaceOperatorStats(OperatorStats stats) { +//Remove existing stat +OperatorStats replacedStat = null; +int index = 0; +for (OperatorStats opStat : operators) { --- End diff -- I added a new commit, but I haven't tested it for performance. Can you take a look, @amansinha100 ? ---
[GitHub] drill issue #1145: DRILL-6187: Exception in RPC communication between DataCl...
Github user sohami commented on the issue: https://github.com/apache/drill/pull/1145 @vrozov - Please help to review this PR. It address the concurrency issue during authentication of control/data client to server side. Rather than adding the connection into connection holder right after TCP connection is available, the listener for connection success is called after successful authentication (if needed). ---
[GitHub] drill pull request #1145: DRILL-6187: Exception in RPC communication between...
GitHub user sohami opened a pull request: https://github.com/apache/drill/pull/1145 DRILL-6187: Exception in RPC communication between DataClient/Control⦠â¦Client and respective servers when bit-to-bit security is on You can merge this pull request into a Git repository by running: $ git pull https://github.com/sohami/drill DRILL-6187-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1145.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1145 commit 4a7602b428ef4ef9fe358976713a78174bb82f57 Author: Sorabh HamirwasiaDate: 2018-03-01T23:08:10Z DRILL-6187: Exception in RPC communication between DataClient/ControlClient and respective servers when bit-to-bit security is on ---
[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1141#discussion_r171740245 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java --- @@ -79,4 +71,21 @@ public void addOperatorStats(OperatorStats stats) { operators.add(stats); } + //DRILL-6197 + public OperatorStats addOrReplaceOperatorStats(OperatorStats stats) { +//Remove existing stat +OperatorStats replacedStat = null; +int index = 0; +for (OperatorStats opStat : operators) { --- End diff -- I see your point. Also, digging into the code shows I can substitute with a LinkedHashMap, since the list is only referenced here for consumption of its contents: https://github.com/kkhatua/drill/blob/65efe3ea0c5777490488d3d56cbdb0cb011b9f33/exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java#L45 I can't use a Set, because I need the Stats object hashed on the operator ID & Type, and not the rest of the contents. I'll refactor and try to confirm nothing else breaks. ---
[GitHub] drill pull request #1135: DRILL-6040: Added usage for graceful_stop in drill...
Github user priteshm commented on a diff in the pull request: https://github.com/apache/drill/pull/1135#discussion_r171731905 --- Diff: distribution/src/resources/drillbit.sh --- @@ -45,7 +45,7 @@ # configuration file. The option takes precedence over the # DRILL_CONF_DIR environment variable. # -# The command is one of: start|stop|status|restart|run +# The command is one of: start|stop|status|restart|run|graceful_stop --- End diff -- not sure if this is critical, but other options to consider are "finish" or "drain". ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user kr-arjun commented on the issue: https://github.com/apache/drill/pull/1011 @paul-rogers Currently , the Client exception is being output as 'ClientContext.err.println(e.getMessage())' in DrillOnYarn.java. For most of application master launcher failures, only message available is 'Failed to start Drill application master'. Do you think it would benefit troubleshooting Drill on yarn client failures if exception stacktrace can be logged? ---
[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1141#discussion_r171723902 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java --- @@ -79,4 +71,21 @@ public void addOperatorStats(OperatorStats stats) { operators.add(stats); } + //DRILL-6197 + public OperatorStats addOrReplaceOperatorStats(OperatorStats stats) { +//Remove existing stat +OperatorStats replacedStat = null; +int index = 0; +for (OperatorStats opStat : operators) { --- End diff -- Some TPC-DS queries have fairly long list of operators within a fragment and in general it would be preferable to not do this search. Can you point to where this Json serialization happens ? my guess is it just needs to preserve the insertion order. In that case we could use a LinkedHashSet which would provide both the duplicate removal and keep insertion order. ---
[jira] [Created] (DRILL-6203) Repeated Map Vector does not give correct payload bytecount
Padma Penumarthy created DRILL-6203: --- Summary: Repeated Map Vector does not give correct payload bytecount Key: DRILL-6203 URL: https://issues.apache.org/jira/browse/DRILL-6203 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.12.0 Reporter: Padma Penumarthy Assignee: Padma Penumarthy Repeated Map Vector does not give correct payload byte count. It calls abstractMapVector method which gives payload byte count for a given value count for simple map (non repetitive) case. We need to overload this method for repeated map to get the right numbers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] drill issue #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsException ...
Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1144 @parthchandra Please take a look. ---
[GitHub] drill pull request #1096: DRILL-6099 : Push limit past flatten(project) with...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1096#discussion_r171711326 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java --- @@ -55,18 +62,21 @@ public void onMatch(RelOptRuleCall call) { } }; - public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = - new DrillPushLimitToScanRule( - RelOptHelper.some(DrillLimitRel.class, RelOptHelper.some( - DrillProjectRel.class, RelOptHelper.any(DrillScanRel.class))), - "DrillPushLimitToScanRule_LimitOnProject") { + public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new DrillPushLimitToScanRule( + RelOptHelper.some(DrillLimitRel.class, RelOptHelper.any(DrillProjectRel.class)), "DrillPushLimitToScanRule_LimitOnProject") { @Override public boolean matches(RelOptRuleCall call) { DrillLimitRel limitRel = call.rel(0); - DrillScanRel scanRel = call.rel(2); - // For now only applies to Parquet. And pushdown only apply limit but not offset, + DrillProjectRel projectRel = call.rel(1); + // pushdown only apply limit but not offset, // so if getFetch() return null no need to run this rule. - if (scanRel.getGroupScan().supportsLimitPushdown() && (limitRel.getFetch() != null)) { --- End diff -- Ok, yeah in that case we are not generating a redundant limit. ---
[GitHub] drill issue #1096: DRILL-6099 : Push limit past flatten(project) without pus...
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/1096 Updated version lgtm. +1 ---
[GitHub] drill pull request #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsExc...
GitHub user vrozov opened a pull request: https://github.com/apache/drill/pull/1144 DRILL-6202: Deprecate usage of IndexOutOfBoundsException to re-alloc vectors You can merge this pull request into a Git repository by running: $ git pull https://github.com/vrozov/drill DRILL-6202 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1144.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1144 commit 2af94a07340f9f13aa152822c2c8d37568ab44ab Author: Vlad RozovDate: 2018-03-01T17:36:05Z DRILL-6202: Deprecate usage of IndexOutOfBoundsException to re-alloc vectors ---
[GitHub] drill issue #1096: DRILL-6099 : Push limit past flatten(project) without pus...
Github user gparai commented on the issue: https://github.com/apache/drill/pull/1096 @amansinha100 I have addressed your review comments. Please take a look. Thanks! ---
[GitHub] drill pull request #1096: DRILL-6099 : Push limit past flatten(project) with...
Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/1096#discussion_r171708636 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java --- @@ -55,18 +62,21 @@ public void onMatch(RelOptRuleCall call) { } }; - public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = - new DrillPushLimitToScanRule( - RelOptHelper.some(DrillLimitRel.class, RelOptHelper.some( - DrillProjectRel.class, RelOptHelper.any(DrillScanRel.class))), - "DrillPushLimitToScanRule_LimitOnProject") { + public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new DrillPushLimitToScanRule( + RelOptHelper.some(DrillLimitRel.class, RelOptHelper.any(DrillProjectRel.class)), "DrillPushLimitToScanRule_LimitOnProject") { @Override public boolean matches(RelOptRuleCall call) { DrillLimitRel limitRel = call.rel(0); - DrillScanRel scanRel = call.rel(2); - // For now only applies to Parquet. And pushdown only apply limit but not offset, + DrillProjectRel projectRel = call.rel(1); + // pushdown only apply limit but not offset, // so if getFetch() return null no need to run this rule. - if (scanRel.getGroupScan().supportsLimitPushdown() && (limitRel.getFetch() != null)) { --- End diff -- Without a FLATTEN, the LIMIT would be fully pushed past the PROJECT i.e. we would not have a LIMIT on top of the project. ---
[GitHub] drill pull request #1096: DRILL-6099 : Push limit past flatten(project) with...
Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/1096#discussion_r171708439 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java --- @@ -224,4 +226,64 @@ public Void visitInputRef(RexInputRef inputRef) { } } + public static boolean isLimit0(RexNode fetch) { +if (fetch != null && fetch.isA(SqlKind.LITERAL)) { + RexLiteral l = (RexLiteral) fetch; + switch (l.getTypeName()) { +case BIGINT: +case INTEGER: +case DECIMAL: + if (((long) l.getValue2()) == 0) { +return true; + } + } +} +return false; + } + + public static boolean isProjectOutputRowcountUnknown(RelNode project) { +assert project instanceof Project : "Rel is NOT an instance of project!"; +try { + RexVisitor visitor = + new RexVisitorImpl(true) { +public Void visitCall(RexCall call) { + if ("flatten".equals(call.getOperator().getName().toLowerCase())) { +throw new Util.FoundOne(call); /* throw exception to interrupt tree walk (this is similar to + other utility methods in RexUtil.java */ + } + return super.visitCall(call); +} + }; + for (RexNode rex : ((Project) project).getProjects()) { +rex.accept(visitor); + } +} catch (Util.FoundOne e) { + Util.swallow(e, null); + return true; +} +return false; + } + + public static boolean isProjectOutputSchemaUnknown(RelNode project) { --- End diff -- Done ---
[GitHub] drill pull request #1096: DRILL-6099 : Push limit past flatten(project) with...
Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/1096#discussion_r171708410 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java --- @@ -224,4 +226,64 @@ public Void visitInputRef(RexInputRef inputRef) { } } + public static boolean isLimit0(RexNode fetch) { +if (fetch != null && fetch.isA(SqlKind.LITERAL)) { + RexLiteral l = (RexLiteral) fetch; + switch (l.getTypeName()) { +case BIGINT: +case INTEGER: +case DECIMAL: + if (((long) l.getValue2()) == 0) { +return true; + } + } +} +return false; + } + + public static boolean isProjectOutputRowcountUnknown(RelNode project) { --- End diff -- Done ---
[GitHub] drill pull request #1096: DRILL-6099 : Push limit past flatten(project) with...
Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/1096#discussion_r171708384 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java --- @@ -224,4 +226,64 @@ public Void visitInputRef(RexInputRef inputRef) { } } + public static boolean isLimit0(RexNode fetch) { +if (fetch != null && fetch.isA(SqlKind.LITERAL)) { + RexLiteral l = (RexLiteral) fetch; + switch (l.getTypeName()) { +case BIGINT: +case INTEGER: +case DECIMAL: + if (((long) l.getValue2()) == 0) { +return true; + } + } +} +return false; + } + + public static boolean isProjectOutputRowcountUnknown(RelNode project) { +assert project instanceof Project : "Rel is NOT an instance of project!"; +try { + RexVisitor visitor = --- End diff -- Yes, you are correct. If the rewrite does not consider it as embedded within other expressions then it is fine for the utility function to do the same. ---
[GitHub] drill issue #1138: DRILL-4120: Allow implicit columns for Avro storage forma...
Github user vvysotskyi commented on the issue: https://github.com/apache/drill/pull/1138 @paul-rogers, schema is taken from the first file in the `FormatSelection`. Therefore for the case, when we have a table with several files with a different scheme, Drill query will fail. As for the plan-time type information, besides the validation at the stage when a query is converted into rel nodes, field list may be used in project rel nodes instead of the dynamic star for `DynamicDrillTable`. ---
[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1141#discussion_r171648058 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java --- @@ -79,4 +71,21 @@ public void addOperatorStats(OperatorStats stats) { operators.add(stats); } + //DRILL-6197 + public OperatorStats addOrReplaceOperatorStats(OperatorStats stats) { +//Remove existing stat +OperatorStats replacedStat = null; +int index = 0; +for (OperatorStats opStat : operators) { --- End diff -- The choice for using a list for the collection of stats seems to be because it simply gets serialized into a JSON list. . As for the overhead, since each list is specific to a minor fragment (which typically has about 3-8 operators), the overhead of doing a linear search is not significant and is invoked only for specific operators. That is one of the reasons why I didn't modify the original `addOperatorStats()` implementation with that of `addOrReplaceOperatorStats()`. ---
[jira] [Created] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
Vlad Rozov created DRILL-6202: - Summary: Deprecate usage of IndexOutOfBoundsException to re-alloc vectors Key: DRILL-6202 URL: https://issues.apache.org/jira/browse/DRILL-6202 Project: Apache Drill Issue Type: Bug Reporter: Vlad Rozov Assignee: Vlad Rozov As bounds checking may be enabled or disabled, using IndexOutOfBoundsException to resize vectors is unreliable. It works only when bounds checking is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6201) Failed to create input splits: No FileSystem for scheme: maprfs
Willian Mattos Ribeiro created DRILL-6201: - Summary: Failed to create input splits: No FileSystem for scheme: maprfs Key: DRILL-6201 URL: https://issues.apache.org/jira/browse/DRILL-6201 Project: Apache Drill Issue Type: Bug Components: Storage - Hive, Storage - MapRDB Environment: Mapr cluster - CentOS Apache Drill installed in other VM (Isn't a cluster node) Reporter: Willian Mattos Ribeiro 2018-03-01 14:03:28 ERROR HiveMetadataProvider:294 - Failed to create input splits: No FileSystem for scheme: maprfs java.io.IOException: No FileSystem for scheme: maprfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644) ~[hadoop-common-2.7.1.jar:?] at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) ~[hadoop-common-2.7.1.jar:?] at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) ~[hadoop-common-2.7.1.jar:?] at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) ~[hadoop-common-2.7.1.jar:?] at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) ~[hadoop-common-2.7.1.jar:?] at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) ~[hadoop-common-2.7.1.jar:?] at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) ~[hadoop-common-2.7.1.jar:?] at org.apache.drill.exec.store.hive.HiveMetadataProvider$1.run(HiveMetadataProvider.java:269) ~[drill-storage-hive-core-1.12.0.jar:1.12.0] at org.apache.drill.exec.store.hive.HiveMetadataProvider$1.run(HiveMetadataProvider.java:262) ~[drill-storage-hive-core-1.12.0.jar:1.12.0] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.7.0_161] at javax.security.auth.Subject.doAs(Subject.java:421) ~[?:1.7.0_161] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) ~[hadoop-common-2.7.1.jar:?] at org.apache.drill.exec.store.hive.HiveMetadataProvider.splitInputWithUGI(HiveMetadataProvider.java:262) [drill-storage-hive-core-1.12.0.jar:1.12.0] at org.apache.drill.exec.store.hive.HiveMetadataProvider.getPartitionInputSplits(HiveMetadataProvider.java:154) [drill-storage-hive-core-1.12.0.jar:1.12.0] at org.apache.drill.exec.store.hive.HiveMetadataProvider.getInputSplits(HiveMetadataProvider.java:176) [drill-storage-hive-core-1.12.0.jar:1.12.0] at org.apache.drill.exec.store.hive.HiveScan.getInputSplits(HiveScan.java:122) [drill-storage-hive-core-1.12.0.jar:1.12.0] at org.apache.drill.exec.store.hive.HiveScan.getMaxParallelizationWidth(HiveScan.java:171) [drill-storage-hive-core-1.12.0.jar:1.12.0] at org.apache.drill.exec.planner.physical.ScanPrule.onMatch(ScanPrule.java:41) [drill-java-exec-1.12.0.jar:1.12.0] at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) [calcite-core-1.4.0-drill-r23.jar:1.4.0-drill-r23] at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:811) [calcite-core-1.4.0-drill-r23.jar:1.4.0-drill-r23] at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:310) [calcite-core-1.4.0-drill-r23.jar:1.4.0-drill-r23] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:400) [drill-java-exec-1.12.0.jar:1.12.0] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:429) [drill-java-exec-1.12.0.jar:1.12.0] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:169) [drill-java-exec-1.12.0.jar:1.12.0] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:131) [drill-java-exec-1.12.0.jar:1.12.0] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:79) [drill-java-exec-1.12.0.jar:1.12.0] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1017) [drill-java-exec-1.12.0.jar:1.12.0] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:289) [drill-java-exec-1.12.0.jar:1.12.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152) [?:1.7.0_161] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) [?:1.7.0_161] at java.lang.Thread.run(Thread.java:748) [?:1.7.0_161] 2018-03-01 14:03:28 ERROR HiveMetadataProvider:180 - Failed to get InputSplits org.apache.drill.common.exceptions.DrillRuntimeException: Failed to create input splits: No FileSystem for scheme: maprfs at org.apache.drill.exec.store.hive.HiveMetadataProvider.splitInputWithUGI(HiveMetadataProvider.java:295) ~[drill-storage-hive-core-1.12.0.jar:1.12.0] at org.apache.drill.exec.store.hive.HiveMetadataProvider.getPartitionInputSplits(HiveMetadataProvider.java:154) ~[drill-storage-hive-core-1.12.0.jar:1.12.0] at org.apache.drill.exec.store.hive.HiveMetadataProvider.getInputSplits(HiveMetadataProvider.java:176)
[GitHub] drill issue #1138: DRILL-4120: Allow implicit columns for Avro storage forma...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1138 Another thought. The removed code is at plan time. Did the original code have to open each file to retrieve schema? If so, does removing the code remove that load? If so, then this change could be a huge performance improvement if avoids the need to open every file in the Foreman. Then, the the next question is: do we actually do anything with the plan-time type information? Few files have that information. Given that, does the planner actually use the information? Is this something we get for free from Calcite? If we are not using the type information at plan time, then clearly there is no harm in removing the code that retrieves the type information. ---
[GitHub] drill pull request #1142: DRILL-6198: OpenTSDB unit tests fail when Lilith c...
Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1142#discussion_r171620225 --- Diff: contrib/storage-opentsdb/src/test/java/org/apache/drill/store/openTSDB/TestOpenTSDBPlugin.java --- @@ -185,4 +188,26 @@ public void testDescribe() throws Exception { test("describe `warp.speed.test`"); Assert.assertEquals(1, testSql("show tables")); } + + /** + * Checks that port with specified number is free and returns it. + * Otherwise, increases port number and checks until free port is found + * or the number of attempts is reached specified numAttempts + * + * @param portNum initial port number + * @param numAttempts max number of attempts to find port with greater number + * @return free port number + * @throws BindException if free port was not found and all attempts were used. + */ + private static int getFreePortNum(int portNum, int numAttempts) throws IOException { +while (numAttempts > 0) { --- End diff -- 1. Thanks, it looks better with for loop. 2. Added more details to the error message. ---
[GitHub] drill pull request #1142: DRILL-6198: OpenTSDB unit tests fail when Lilith c...
Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1142#discussion_r171617811 --- Diff: contrib/storage-opentsdb/src/test/java/org/apache/drill/store/openTSDB/TestOpenTSDBPlugin.java --- @@ -51,17 +54,17 @@ public class TestOpenTSDBPlugin extends PlanTestBase { - protected static OpenTSDBStoragePlugin storagePlugin; - protected static OpenTSDBStoragePluginConfig storagePluginConfig; + private static int portNum = 10_000; --- End diff -- Thanks, removed. ---
[GitHub] drill pull request #1142: DRILL-6198: OpenTSDB unit tests fail when Lilith c...
Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1142#discussion_r171618633 --- Diff: contrib/storage-opentsdb/src/test/java/org/apache/drill/store/openTSDB/TestOpenTSDBPlugin.java --- @@ -51,17 +54,17 @@ public class TestOpenTSDBPlugin extends PlanTestBase { - protected static OpenTSDBStoragePlugin storagePlugin; - protected static OpenTSDBStoragePluginConfig storagePluginConfig; + private static int portNum = 10_000; @Rule - public WireMockRule wireMockRule = new WireMockRule(1); + public WireMockRule wireMockRule = new WireMockRule(portNum); @BeforeClass public static void setup() throws Exception { +portNum = getFreePortNum(portNum, 1000); --- End diff -- Done. ---
[GitHub] drill pull request #1142: DRILL-6198: OpenTSDB unit tests fail when Lilith c...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1142#discussion_r171607424 --- Diff: contrib/storage-opentsdb/src/test/java/org/apache/drill/store/openTSDB/TestOpenTSDBPlugin.java --- @@ -185,4 +188,26 @@ public void testDescribe() throws Exception { test("describe `warp.speed.test`"); Assert.assertEquals(1, testSql("show tables")); } + + /** + * Checks that port with specified number is free and returns it. + * Otherwise, increases port number and checks until free port is found + * or the number of attempts is reached specified numAttempts + * + * @param portNum initial port number + * @param numAttempts max number of attempts to find port with greater number + * @return free port number + * @throws BindException if free port was not found and all attempts were used. + */ + private static int getFreePortNum(int portNum, int numAttempts) throws IOException { +while (numAttempts > 0) { --- End diff -- 1. Please re-write using for loop. 2. Please add more details to the exception, include initial port number, which ports were occupied. Suggest to check which ports are free etc. ---
[GitHub] drill pull request #1142: DRILL-6198: OpenTSDB unit tests fail when Lilith c...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1142#discussion_r171606555 --- Diff: contrib/storage-opentsdb/src/test/java/org/apache/drill/store/openTSDB/TestOpenTSDBPlugin.java --- @@ -51,17 +54,17 @@ public class TestOpenTSDBPlugin extends PlanTestBase { - protected static OpenTSDBStoragePlugin storagePlugin; - protected static OpenTSDBStoragePluginConfig storagePluginConfig; + private static int portNum = 10_000; --- End diff -- Why do you set value right away? It looks you will always re-write in `setup`. ---
[GitHub] drill pull request #1142: DRILL-6198: OpenTSDB unit tests fail when Lilith c...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1142#discussion_r171606840 --- Diff: contrib/storage-opentsdb/src/test/java/org/apache/drill/store/openTSDB/TestOpenTSDBPlugin.java --- @@ -51,17 +54,17 @@ public class TestOpenTSDBPlugin extends PlanTestBase { - protected static OpenTSDBStoragePlugin storagePlugin; - protected static OpenTSDBStoragePluginConfig storagePluginConfig; + private static int portNum = 10_000; @Rule - public WireMockRule wireMockRule = new WireMockRule(1); + public WireMockRule wireMockRule = new WireMockRule(portNum); @BeforeClass public static void setup() throws Exception { +portNum = getFreePortNum(portNum, 1000); --- End diff -- May be we can decrease number of attempt to 200? ---
[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1141#discussion_r171611927 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java --- @@ -79,4 +71,21 @@ public void addOperatorStats(OperatorStats stats) { operators.add(stats); } + //DRILL-6197 + public OperatorStats addOrReplaceOperatorStats(OperatorStats stats) { +//Remove existing stat +OperatorStats replacedStat = null; +int index = 0; +for (OperatorStats opStat : operators) { --- End diff -- I am worried about the small overheads of doing this linear search adding up for each operator, especially for queries with complex query plans. The stats collection should ideally impose minimal overhead. Does the operator stats have to be a list or can just use a Set ? ---
[GitHub] drill issue #1138: DRILL-4120: Allow implicit columns for Avro storage forma...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1138 As @arina-ielchiieva points out, this change backs out plan-time knowledge of schema. This may not affect run-time accuracy. However, it does mean that queries can be planned, based on not knowing types, that fail at runtime when types are learned. This seems more like a bug that a feature. In general, we should use all information available. It is not helpful to ignore information if doing so results in poorer user experience. ---
[GitHub] drill pull request #1138: DRILL-4120: Allow implicit columns for Avro storag...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1138#discussion_r171606330 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroRecordReader.java --- @@ -154,6 +156,12 @@ public int next() { writer.setValueCount(recordCount); + // adds fields which don't exist in the table but should be present in the schema + if (recordCount > 0) { +JsonReaderUtils.ensureAtLeastOneField(writer, getColumns(), false, --- End diff -- In general, this is a bad idea, though existing code does this. If we find an empty file in one scanner, but a real file in another, we create an unnecessary schema change by making up a column. Jinfeng's changes last year are supposed to handle the "fast none" case of a reader with no rows. There should be no reason to add a dummy column. Old code that adds such a column should be fixed. IMHO, code that does not add dummy columns should not begin to do so. ---
[GitHub] drill pull request #1138: DRILL-4120: Allow implicit columns for Avro storag...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1138#discussion_r171607241 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroRecordReader.java --- @@ -295,7 +301,8 @@ private void processPrimitive(final Object value, final Schema.Type type, final writer.binary(fieldName).writeVarBinary(0, length, buffer); break; case NULL: -// Nothing to do for null type +// The default Drill behaviour is to create int column +writer.integer(fieldName); --- End diff -- This maps a NULL type to integer. Probably OK if we do this consistently. ---
[GitHub] drill pull request #1139: DRILL-6189: Security: passwords logging and file p...
Github user vladimirtkach commented on a diff in the pull request: https://github.com/apache/drill/pull/1139#discussion_r171588799 --- Diff: logical/src/main/java/org/apache/drill/common/config/LogicalPlanPersistence.java --- @@ -52,6 +53,7 @@ public LogicalPlanPersistence(DrillConfig conf, ScanResult scanResult) { mapper.configure(Feature.ALLOW_UNQUOTED_FIELD_NAMES, true); mapper.configure(JsonGenerator.Feature.QUOTE_FIELD_NAMES, true); mapper.configure(Feature.ALLOW_COMMENTS, true); +mapper.setFilterProvider(new SimpleFilterProvider().setFailOnUnknownId(false)); --- End diff -- submitted physical plan directly to node, it was successfully deserialized ---
[GitHub] drill pull request #1143: DRILL-1491: Support for JDK 8
GitHub user vladimirtkach opened a pull request: https://github.com/apache/drill/pull/1143 DRILL-1491: Support for JDK 8 Changed jdk version from 7 to 8 in pom.xml, drill-config.sh and others You can merge this pull request into a Git repository by running: $ git pull https://github.com/vladimirtkach/drill DRILL-1491 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1143.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1143 commit 0aeeacc9e528b6dea80385bbf53e7259a6813b08 Author: Vladimir TkachDate: 2018-02-28T13:32:55Z DRILL-1491: Support for JDK 8 Changed jdk version from 7 to 8 in pom.xml travis and drill-config.sh ---
[GitHub] drill issue #1139: DRILL-6189: Security: passwords logging and file permisio...
Github user vladimirtkach commented on the issue: https://github.com/apache/drill/pull/1139 @arina-ielchiieva made changes, please take a look ---
[GitHub] drill pull request #1139: DRILL-6189: Security: passwords logging and file p...
Github user vladimirtkach commented on a diff in the pull request: https://github.com/apache/drill/pull/1139#discussion_r171579096 --- Diff: contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStorageConfig.java --- @@ -17,13 +17,15 @@ */ package org.apache.drill.exec.store.jdbc; +import com.fasterxml.jackson.annotation.JsonFilter; import org.apache.drill.common.logical.StoragePluginConfig; import com.fasterxml.jackson.annotation.JsonCreator; import com.fasterxml.jackson.annotation.JsonProperty; import com.fasterxml.jackson.annotation.JsonTypeName; @JsonTypeName(JdbcStorageConfig.NAME) +@JsonFilter("passwordFilter") --- End diff -- To apply filter: 1) Mark the entity with you want to filter out fields from. 2) Create filter provider and register property filter with reference to your entity 3) When creating ObjectWriter pass your filter provider ---
[jira] [Created] (DRILL-6200) ERROR Quering hive through HiverServer2 via JDBC!
Hannibal07 created DRILL-6200: - Summary: ERROR Quering hive through HiverServer2 via JDBC! Key: DRILL-6200 URL: https://issues.apache.org/jira/browse/DRILL-6200 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Affects Versions: 1.12.0 Reporter: Hannibal07 ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: select * from dim_parameter [30034]Query execution error. Details:[ DATA_READ ERROR: The JDBC storage plugin failed while trying setup the SQL query. sql SELECT * FROM.dw.dim_parameter plugin hive Fragment 0:0 [Error Id: e522f220-b857-4273-af0a-2a2d05d992f2 on 172.28.32.7:31010] (org.apache.hive.service.cli.HiveSQLException) Error while compiling statement: FAILED: ParseException line 2:4 cannot recognize input near '.' 'dw' '.' in join source org.apache.hive.jdbc.Utils.verifySuccess():267 org.apache.hive.jdbc.Utils.verifySuccessWithInfo():253 org.apache.hive.jdbc.HiveStatement.runAsyncOnServer():309 org.apache.hive.jdbc.HiveStatement.execute():250 org.apache.hive.jdbc.HiveStatement.executeQuery():434 org.apache.commons.dbcp.DelegatingStatement.executeQuery():208 org.apache.commons.dbcp.DelegatingStatement.executeQuery():208 org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup():177 org.apache.drill.exec.p... em System.Data.Odbc.OdbcConnection.HandleError(OdbcHandle hrHandle, RetCode retcode) em System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, String method, Boolean needReader, Object[] methodArguments, SQL_API odbcApiMethod) em System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, String method, Boolean needReader) em System.Data.Odbc.OdbcCommand.ExecuteReader(CommandBehavior behavior) em DrillExplorer.DROdbcProvider.GetStatmentColumns(String in_query) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSS] 1.13.0 release
Good show. Updated list: DRILL-6185: Error is displaying while accessing query profiles via the Web-UI -- Ready to commit DRILL-6174: Parquet pushdown planning improvements -- Ready to commit DRILL-6191: Need more information on TCP flags -- Ready to commit DRILL-6190: Packets can be bigger than strictly legal -- Ready to commit DRILL-6188: Fix C++ client build on Centos 7 and OS X -- Ready to commit DRILL-1491: Support for JDK 8 --* In progress.* DRILL-1170: YARN support for Drill -- Needs Committer +1 and Travis fix. DRILL-6027: Implement spill to disk for the Hash Join --- No PR and is a major feature that should be reviewed (properly!). DRILL-6173: Support transitive closure during filter push down and partition pruning. -- No PR and depends on 3 Apache Calcite issues that are open. DRILL-6023: Graceful shutdown improvements -- No PR. Consists of 6 sub JIra's none of which have PRs.
Re: [DISCUSS] 1.13.0 release
DRILL-6190 and DRILL-6191 are ready to merge to master for release. Code review and unit tests all pass. On Wed, Feb 28, 2018 at 11:16 PM, Parth Chandrawrote: > Moved Ted's PR's down in the list. Let's see where we are at the end of the > week. > Arina, Volodymyr, ank ETA on JDK 8 work? It's the gating factor for the > release. > Meanwhile, people, feel free to commit your work as usual. > > Updated list: > > DRILL-6185: Error is displaying while accessing query profiles via the > Web-UI -- Ready to commit > DRILL-6174: Parquet pushdown planning improvements -- Ready to commit > DRILL-6188: Fix C++ client build on Centos 7 and OS X -- Ready to commit > > DRILL-1491: Support for JDK 8 --* In progress.* > > DRILL-6191: Need more information on TCP flags -- *In progress* > > DRILL-6190: Packets can be bigger than strictly legal -- *In progress* > > DRILL-1170: YARN support for Drill -- Needs Committer +1 and Travis fix. > > DRILL-6027: Implement spill to disk for the Hash Join --- No PR and is a > major feature that should be reviewed (properly!). > > DRILL-6173: Support transitive closure during filter push down and > partition pruning. -- No PR and depends on 3 Apache Calcite issues that > are open. > > DRILL-6023: Graceful shutdown improvements -- No PR. Consists of 6 sub > JIra's none of which have PRs. > > On Wed, Feb 28, 2018 at 5:45 PM, Ted Dunning > wrote: > > > 6190 and/or 6191 cause test failures that I have been unable to spend > time > > on yet. I don't think that they are ready to commit. > > > > At least one of these is likely to be something very simple like a test > > that didn't clean up after itself. The other should be as simple, but I > > can't understand it yet. It may be a memory pressure thing rather than a > > real problem with the test. > > > > > > On Wed, Feb 28, 2018 at 3:18 AM, Parth Chandra > wrote: > > > > > OK. So let's try to get as many of the following as we can without > > breaking > > > anything. As far as I can see none of the open items below are show > > > stoppers for a release, but I'm happy to give in to popular demand for > > JDK > > > 8 :). > > > > > > Note that the last three appear to be big ticket items that have no PR > > yet. > > > Usually, it is a mistake to rush these into a release (one advantage of > > > frequent, predictable releases is that they won't have to wait too long > > for > > > the next release). > > > > > > Here's what I'm tracking : > > > > > > DRILL-6185: Error is displaying while accessing query profiles via the > > > Web-UI -- Ready to commit > > > DRILL-6174: Parquet pushdown planning improvements -- Ready to commit > > > DRILL-6191: Need more information on TCP flags -- Ready to commit > > > DRILL-6190: Packets can be bigger than strictly legal -- Ready to > commit > > > > > > DRILL-6188: Fix C++ client build on Centos 7 and OS X -- Needs > > committer > > > +1 > > > > > > DRILL-1491: Support for JDK 8 --* In progress.* > > > > > > DRILL-1170: YARN support for Drill -- Needs Committer +1 and Travis > fix. > > > > > > DRILL-6027: Implement spill to disk for the Hash Join --- No PR and > is > > a > > > major feature that should be reviewed (properly!). > > > > > > DRILL-6173: Support transitive closure during filter push down and > > > partition pruning. -- No PR and depends on 3 Apache Calcite issues > that > > > are open. > > > > > > DRILL-6023: Graceful shutdown improvements -- No PR. Consists of 6 sub > > > JIra's none of which have PRs. > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 28, 2018 at 12:32 AM, Ted Dunning > > > wrote: > > > > > > > I have two very small improvements to PCAP support with DRILL-6190 > and > > > > DRILL-6191 that I would like to get in. > > > > > > > > I think that PCAP-NG support is too far from ready. > > > > > > > > > > > > > > > > On Tue, Feb 27, 2018 at 10:52 AM, Pritesh Maker > > wrote: > > > > > > > > > I see a few more issues that are in review and worth including for > > the > > > > > 1.13 release (maybe give another week to resolve this before the > 1st > > RC > > > > is > > > > > created?) > > > > > > > > > > DRILL-6027 Implement spill to disk for the Hash Join -- Boaz and > Tim > > > > > DRILL-6173 Support transitive closure during filter push down and > > > > > partition pruning - Vitalii > > > > > DRILL-6023 Graceful shutdown improvements -- Jyothsna > > > > > > > > > > There are several other bugs/ improvements that are marked in > > progress > > > - > > > > > https://issues.apache.org/jira/secure/Dashboard.jspa? > > > > selectPageId=12332152 > > > > > - if folks are not working on them, we should remove the fixVersion > > for > > > > > 1.13. > > > > > > > > > > Pritesh > > > > > > > > > > > > > > > -Original Message- > > > > > From: Abhishek Girish > > > > > Sent: February 27, 2018 10:44 AM > > > > > To:
[GitHub] drill pull request #1142: DRILL-6198: OpenTSDB unit tests fail when Lilith c...
GitHub user vvysotskyi opened a pull request: https://github.com/apache/drill/pull/1142 DRILL-6198: OpenTSDB unit tests fail when Lilith client is run Added method which checks that the default port 10_000 is free, otherwise this method increases port number and checks until free port is found. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vvysotskyi/drill DRILL-6198 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1142.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1142 commit 671d3e0d900c19f561cb3d0f744898c0f9bf20e9 Author: Volodymyr VysotskyiDate: 2018-03-01T12:52:28Z DRILL-6198: OpenTSDB unit tests fail when Lilith client is run ---
[GitHub] drill pull request #1138: DRILL-4120: Allow implicit columns for Avro storag...
Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1138#discussion_r171544478 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroFormatTest.java --- @@ -170,25 +169,35 @@ public void testSimplePrimitiveSchema_SelectColumnSubset() throws Exception { @Test public void testSimplePrimitiveSchema_NoColumnsExistInTheSchema() throws Exception { -final String file = generateSimplePrimitiveSchema_NoNullValues().getFileName(); -try { - test("select h_dummy1, e_dummy2 from dfs.`%s`", file); - Assert.fail("Test should fail as h_dummy1 and e_dummy2 does not exist."); -} catch(UserException ue) { - Assert.assertTrue("Test should fail as h_dummy1 and e_dummy2 does not exist.", - ue.getMessage().contains("Column 'h_dummy1' not found in any table")); -} +final String file = generateSimplePrimitiveSchema_NoNullValues(1).getFileName(); +testBuilder() + .sqlQuery("select h_dummy1, e_dummy2 from dfs.`%s`", file) + .unOrdered() + .baselineColumns("h_dummy1", "e_dummy2") + .baselineValues(null, null) + .go(); } @Test public void testSimplePrimitiveSchema_OneExistAndOneDoesNotExistInTheSchema() throws Exception { -final String file = generateSimplePrimitiveSchema_NoNullValues().getFileName(); -try { - test("select h_boolean, e_dummy2 from dfs.`%s`", file); - Assert.fail("Test should fail as e_dummy2 does not exist."); -} catch(UserException ue) { - Assert.assertTrue("Test should fail as e_dummy2 does not exist.", true); -} +final String file = generateSimplePrimitiveSchema_NoNullValues(1).getFileName(); +testBuilder() + .sqlQuery("select h_boolean, e_dummy2 from dfs.`%s`", file) + .unOrdered() + .baselineColumns("h_boolean", "e_dummy2") + .baselineValues(true, null) + .go(); + } + + @Test + public void testImplicitColumnFilename() throws Exception { +final String file = generateSimplePrimitiveSchema_NoNullValues(1).getFileName(); +testBuilder() + .sqlQuery("select filename from dfs.`%s`", file) --- End diff -- Thanks for pointing this, modified existing test to check except the `filename` also `suffix`, `fqn` and `filepath` implicit columns. Added separate test for partition column. ---
[GitHub] drill pull request #1138: DRILL-4120: Allow implicit columns for Avro storag...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1138#discussion_r171517376 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroFormatTest.java --- @@ -170,25 +169,35 @@ public void testSimplePrimitiveSchema_SelectColumnSubset() throws Exception { @Test public void testSimplePrimitiveSchema_NoColumnsExistInTheSchema() throws Exception { -final String file = generateSimplePrimitiveSchema_NoNullValues().getFileName(); -try { - test("select h_dummy1, e_dummy2 from dfs.`%s`", file); - Assert.fail("Test should fail as h_dummy1 and e_dummy2 does not exist."); -} catch(UserException ue) { - Assert.assertTrue("Test should fail as h_dummy1 and e_dummy2 does not exist.", - ue.getMessage().contains("Column 'h_dummy1' not found in any table")); -} +final String file = generateSimplePrimitiveSchema_NoNullValues(1).getFileName(); +testBuilder() + .sqlQuery("select h_dummy1, e_dummy2 from dfs.`%s`", file) + .unOrdered() + .baselineColumns("h_dummy1", "e_dummy2") + .baselineValues(null, null) + .go(); } @Test public void testSimplePrimitiveSchema_OneExistAndOneDoesNotExistInTheSchema() throws Exception { -final String file = generateSimplePrimitiveSchema_NoNullValues().getFileName(); -try { - test("select h_boolean, e_dummy2 from dfs.`%s`", file); - Assert.fail("Test should fail as e_dummy2 does not exist."); -} catch(UserException ue) { - Assert.assertTrue("Test should fail as e_dummy2 does not exist.", true); -} +final String file = generateSimplePrimitiveSchema_NoNullValues(1).getFileName(); +testBuilder() + .sqlQuery("select h_boolean, e_dummy2 from dfs.`%s`", file) + .unOrdered() + .baselineColumns("h_boolean", "e_dummy2") + .baselineValues(true, null) + .go(); + } + + @Test + public void testImplicitColumnFilename() throws Exception { +final String file = generateSimplePrimitiveSchema_NoNullValues(1).getFileName(); +testBuilder() + .sqlQuery("select filename from dfs.`%s`", file) --- End diff -- Please test all implicit columns and at least one partition column. ---
Re: Avro storage format behaviour
As Paul has mentioned in PR [1] when we move to new scan framework it will handle implicit columns for all file readers. I guess till that let's treat avro as other file formats (for example, parquet) so users can benefit from implicit columns for this format as well. [1] https://github.com/apache/drill/pull/1138 On Wed, Feb 28, 2018 at 7:47 PM, Vova Vysotskyiwrote: > Hi all, > > I am working on DRILL-4120: dir0 does not work when the directory structure > contains Avro files. > > In DRILL-3810 was added validation of query using avro schema before start > executing the query. > Therefore with these changes Drill throws an exception when the > query contains non-existent column and table has avro format. > Other storage formats such as json or parquet allow usage of non-existing > fields. > > So here is my question: should we continue to treat avro as a format with > fixed schema, or we should start treating avro as a dynamic format to be > consistent with other storage formats? > > -- > Kind regards, > Volodymyr Vysotskyi >
[GitHub] drill issue #1137: DRILL-6185: Fixed error while displaying system profiles ...
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1137 I meant that we parse text plan, obviously it was generated from the some object. For the future we may consider to create special plan object from initial one which will have structure suitable for Web UI and we won't need to parse plan string... ---
[GitHub] drill pull request #1139: DRILL-6189: Security: passwords logging and file p...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1139#discussion_r171511391 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserServer.java --- @@ -91,6 +91,34 @@ static { userConnectionMap = new ConcurrentHashMap<>(); } + public static String safeLogString(UserToBitHandshake inbound) { +StringBuilder sb = new StringBuilder(); +sb.append("rpc_version: "); +sb.append(inbound.getRpcVersion()); +sb.append("\ncredentials:\n\t"); +sb.append(inbound.getCredentials()); +sb.append("properties:"); +java.util.List props = inbound.getProperties().getPropertiesList(); +for (Property p: props){ + if(!p.getKey().equalsIgnoreCase("password")) { --- End diff -- Please add spaces missing spaces... ---
[GitHub] drill pull request #1139: DRILL-6189: Security: passwords logging and file p...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1139#discussion_r171512422 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java --- @@ -158,7 +162,9 @@ protected void logAndSetTextPlan(final String description, final Prel prel, fina protected void log(final String name, final PhysicalPlan plan, final Logger logger) throws JsonProcessingException { if (logger.isDebugEnabled()) { - String planText = plan.unparse(context.getLpPersistence().getMapper().writer()); + PropertyFilter theFilter = new SimpleBeanPropertyFilter.SerializeExceptFilter(Sets.newHashSet("password")); --- End diff -- Please rename to `filter`. ---
[GitHub] drill pull request #1139: DRILL-6189: Security: passwords logging and file p...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1139#discussion_r171510779 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserServer.java --- @@ -91,6 +91,34 @@ static { userConnectionMap = new ConcurrentHashMap<>(); } + public static String safeLogString(UserToBitHandshake inbound) { --- End diff -- 1. Please remove one space -> `static String`/ 2. Can this method be just private? Not public static? If yes, please move it to the end of the class. 3. Please add javadoc to the method. 4. Please consider method renaming to depict actual work it does. ---
[GitHub] drill pull request #1139: DRILL-6189: Security: passwords logging and file p...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1139#discussion_r171512007 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserServer.java --- @@ -320,7 +348,7 @@ protected void consumeHandshake(ChannelHandlerContext ctx, UserToBitHandshake in @Override public BitToUserHandshake getHandshakeResponse(UserToBitHandshake inbound) throws Exception { -logger.trace("Handling handshake from user to bit. {}", inbound); +logger.trace("Handling handshake from user to bit. {}", safeLogString(inbound)); --- End diff -- Should we add `if (logger.isTraceEnabled()) {`? so `safeLogString` will be called only when we do need it for trace? ---
[GitHub] drill pull request #1139: DRILL-6189: Security: passwords logging and file p...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1139#discussion_r171511274 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserServer.java --- @@ -91,6 +91,34 @@ static { userConnectionMap = new ConcurrentHashMap<>(); } + public static String safeLogString(UserToBitHandshake inbound) { +StringBuilder sb = new StringBuilder(); +sb.append("rpc_version: "); +sb.append(inbound.getRpcVersion()); +sb.append("\ncredentials:\n\t"); +sb.append(inbound.getCredentials()); +sb.append("properties:"); +java.util.List props = inbound.getProperties().getPropertiesList(); --- End diff -- Why do you need full import? ---
[jira] [Created] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
Anton Gozhiy created DRILL-6199: --- Summary: Filter push down doesn't work with more than one nested subqueries Key: DRILL-6199 URL: https://issues.apache.org/jira/browse/DRILL-6199 Project: Apache Drill Issue Type: Bug Affects Versions: 1.13.0 Reporter: Anton Gozhiy Attachments: DRILL_6118_data_source.csv *Data set:* The data is generated used the attached file: *DRILL_6118_data_source.csv* Data gen commands: {code:sql} create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` where columns[0] in (1, 3); create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` where columns[0]=2; create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` where columns[0]>3; {code} *Steps:* # Execute the following query: {code:sql} explain plan for select * from (select * from (select * from dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3 {code} *Expected result:* numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be scanned. *Actual result:* Filter push down doesn't work: numFiles=3, numRowGroups=3, scanning from all files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6198) OpenTSDB unit tests fail when Lilith client is run
Volodymyr Vysotskyi created DRILL-6198: -- Summary: OpenTSDB unit tests fail when Lilith client is run Key: DRILL-6198 URL: https://issues.apache.org/jira/browse/DRILL-6198 Project: Apache Drill Issue Type: Bug Components: Tools, Build Test Reporter: Volodymyr Vysotskyi When OpenTSDB unit tests are running on the same machine where Lilith client is run, unit tests fail with the error: {noformat} testDescribe(org.apache.drill.store.openTSDB.TestOpenTSDBPlugin) Time elapsed: 0.01 sec <<< ERROR! com.github.tomakehurst.wiremock.common.FatalStartupException: java.lang.RuntimeException: java.net.BindException: Address already in use at com.github.tomakehurst.wiremock.WireMockServer.start(WireMockServer.java:145) at com.github.tomakehurst.wiremock.junit.WireMockRule$1.evaluate(WireMockRule.java:68) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) Caused by: java.lang.RuntimeException: java.net.BindException: Address already in use at com.github.tomakehurst.wiremock.jetty9.JettyHttpServer.start(JettyHttpServer.java:132) at com.github.tomakehurst.wiremock.WireMockServer.start(WireMockServer.java:143) at com.github.tomakehurst.wiremock.junit.WireMockRule$1.evaluate(WireMockRule.java:68) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at wiremock.org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:321) at wiremock.org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80) at wiremock.org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236) at wiremock.org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) at wiremock.org.eclipse.jetty.server.Server.doStart(Server.java:366) at wiremock.org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) at com.github.tomakehurst.wiremock.jetty9.JettyHttpServer.start(JettyHttpServer.java:130) at com.github.tomakehurst.wiremock.WireMockServer.start(WireMockServer.java:143) at com.github.tomakehurst.wiremock.junit.WireMockRule$1.evaluate(WireMockRule.java:68) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {noformat} This failure appears because of Lilith uses the same port 1 as the port, specified in {{TestOpenTSDBPlugin.wireMockRule}} and {{bootstrap-storage-plugins.json}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] drill issue #1140: DRILL-6195: Quering Hive non-partitioned transactional ta...
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1140 +1 ---
[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1141#discussion_r171485412 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java --- @@ -31,6 +32,13 @@ public class FragmentStats { // private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(FragmentStats.class); + //Skip operators that already have stats reported by org.apache.drill.exec.physical.impl.BaseRootExec + private static final List operatorStatsInitToSkip = Lists.newArrayList( --- End diff -- The add-on commit refactors by having the BaseRootExec constructor handle the substition without risking going out of sync with other senders extending BaseRootExec. ---