[GitHub] drill issue #1141: DRILL-6197: Skip duplicate entry for OperatorStats

2018-03-01 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/1141
  
+1.  


---


[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats

2018-03-01 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1141#discussion_r171767495
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java ---
@@ -79,4 +71,21 @@ public void addOperatorStats(OperatorStats stats) {
 operators.add(stats);
   }
 
+  //DRILL-6197
+  public OperatorStats addOrReplaceOperatorStats(OperatorStats stats) {
+//Remove existing stat
+OperatorStats replacedStat = null;
+int index = 0;
+for (OperatorStats opStat : operators) {
--- End diff --

Everything worked fine. Tried doing a join for TPCH tables - `lineitem` and 
`orders`, and confirmed no more duplicates for SCREEN, SINGLE_SENDER and 
HASH_PARTITION_SENDER. For a smaller substitute of `orders` with `supplier` ; 
confirmed that the BROADCAST_SENDER was also not having duplicates..


---


[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats

2018-03-01 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1141#discussion_r171753536
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java ---
@@ -79,4 +71,21 @@ public void addOperatorStats(OperatorStats stats) {
 operators.add(stats);
   }
 
+  //DRILL-6197
+  public OperatorStats addOrReplaceOperatorStats(OperatorStats stats) {
+//Remove existing stat
+OperatorStats replacedStat = null;
+int index = 0;
+for (OperatorStats opStat : operators) {
--- End diff --

LGTM.  Hopefully it did not break existing stuff..so will wait for your 
confirmation.   


---


[GitHub] drill issue #1105: DRILL-6125: Fix possible memory leak when query is cancel...

2018-03-01 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1105
  
@arina-ielchiieva @vrozov I believe I have a solution. There were several 
issues with the original code.

1. It made incorrect assumptions about how cache invalidation works with 
java **synchronized**.
2. It assumed **innerNext** and **close** would be called sequentially.

I believe this PR fixes these issues now and I have gone into more detail 
about the problems below.

# 1. Incorrect Cache Invalidation Assumptions

The original code was trying to be smart by trying to reduce 
synchronization overhead on **innerNext**. So the code in **innerNext** did not 
synchronize before changing the partitioner object since this would be called 
often. The code in **close()** and ** receivingFragmentFinished()** 
synchronized before accessing the partitioner with the intention that this 
would trigger an update of the partitioner variable state across all threads. 
Unfortunately, this assumption is invalid (see 
https://stackoverflow.com/questions/22706739/does-synchronized-guarantee-a-thread-will-see-the-latest-value-of-a-non-volatile).
 Every thread that accesses a variable must synchronize before accessing a 
variable in order to properly invalidate cached data on a core. 

For example if **Thread A** modifies **Variable 1** then **Thread B** 
synchronizes before accessing **Variable 1** then there is no guarantee 
**Thread B** will see the most updated value for **Variable 1** since it might .

## Solution

In summary the right thing to do is the simple thing. Make the methods 
synchronized. Unfortunately there is no way to outsmart the system and reduce 
synchronization overhead without causing race conditions.

# 2. Concurrent InnerNext and Close Calls

The original code did not consider the case that innerNext was in the 
middle of execution when close was called. It did try to handle the case where 
**innerNext** could be called after **close** by setting the **ok** variable. 
But it didn't even do that right because there was no synchronization around 
the **ok** variable.

## Solution

The right thing to do is the simple thing. Make sure the methods are 
synchronized so close has to wait until innerNext is done before executing. 
Also when a query is cancelled the executing thread should be interrupted the 
thread running innerNext incase it is on a blocking call.


---


[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats

2018-03-01 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1141#discussion_r171748860
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java ---
@@ -79,4 +71,21 @@ public void addOperatorStats(OperatorStats stats) {
 operators.add(stats);
   }
 
+  //DRILL-6197
+  public OperatorStats addOrReplaceOperatorStats(OperatorStats stats) {
+//Remove existing stat
+OperatorStats replacedStat = null;
+int index = 0;
+for (OperatorStats opStat : operators) {
--- End diff --

I added a new commit, but I haven't tested it for performance. Can you take 
a look, @amansinha100 ?


---


[GitHub] drill issue #1145: DRILL-6187: Exception in RPC communication between DataCl...

2018-03-01 Thread sohami
Github user sohami commented on the issue:

https://github.com/apache/drill/pull/1145
  
@vrozov - Please help to review this PR.
It address the concurrency issue during authentication of control/data 
client to server side. Rather than adding the connection into connection holder 
right after TCP connection is available, the listener for connection success is 
called after successful authentication (if needed).


---


[GitHub] drill pull request #1145: DRILL-6187: Exception in RPC communication between...

2018-03-01 Thread sohami
GitHub user sohami opened a pull request:

https://github.com/apache/drill/pull/1145

DRILL-6187: Exception in RPC communication between DataClient/Control…

…Client and respective servers when bit-to-bit security is on

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sohami/drill DRILL-6187-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1145.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1145


commit 4a7602b428ef4ef9fe358976713a78174bb82f57
Author: Sorabh Hamirwasia 
Date:   2018-03-01T23:08:10Z

DRILL-6187: Exception in RPC communication between DataClient/ControlClient 
and respective servers when bit-to-bit security is on




---


[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats

2018-03-01 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1141#discussion_r171740245
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java ---
@@ -79,4 +71,21 @@ public void addOperatorStats(OperatorStats stats) {
 operators.add(stats);
   }
 
+  //DRILL-6197
+  public OperatorStats addOrReplaceOperatorStats(OperatorStats stats) {
+//Remove existing stat
+OperatorStats replacedStat = null;
+int index = 0;
+for (OperatorStats opStat : operators) {
--- End diff --

I see your point. Also, digging into the code shows I can substitute with a 
LinkedHashMap, since the list is only referenced here for consumption of its 
contents:

https://github.com/kkhatua/drill/blob/65efe3ea0c5777490488d3d56cbdb0cb011b9f33/exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java#L45

I can't use a Set, because I need the Stats object hashed on the operator 
ID & Type, and not the rest of the contents. I'll refactor and try to confirm 
nothing else breaks.


---


[GitHub] drill pull request #1135: DRILL-6040: Added usage for graceful_stop in drill...

2018-03-01 Thread priteshm
Github user priteshm commented on a diff in the pull request:

https://github.com/apache/drill/pull/1135#discussion_r171731905
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -45,7 +45,7 @@
 # configuration file. The option takes precedence over the
 # DRILL_CONF_DIR environment variable.
 #
-# The command is one of: start|stop|status|restart|run
+# The command is one of: start|stop|status|restart|run|graceful_stop
--- End diff --

not sure if this is critical, but other options to consider are "finish" or 
"drain". 


---


[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

2018-03-01 Thread kr-arjun
Github user kr-arjun commented on the issue:

https://github.com/apache/drill/pull/1011
  
@paul-rogers  
Currently , the Client exception is being output as 
'ClientContext.err.println(e.getMessage())' in DrillOnYarn.java. For most of 
application master launcher failures, only message available is 'Failed to 
start Drill application master'. Do you think it would benefit troubleshooting 
Drill on yarn client failures if exception stacktrace can be logged? 



---


[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats

2018-03-01 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1141#discussion_r171723902
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java ---
@@ -79,4 +71,21 @@ public void addOperatorStats(OperatorStats stats) {
 operators.add(stats);
   }
 
+  //DRILL-6197
+  public OperatorStats addOrReplaceOperatorStats(OperatorStats stats) {
+//Remove existing stat
+OperatorStats replacedStat = null;
+int index = 0;
+for (OperatorStats opStat : operators) {
--- End diff --

Some TPC-DS queries have fairly long list of operators within a fragment 
and in general it would be preferable to not do this search.  Can you point to 
where this Json serialization happens ?  my guess is it just needs to preserve 
the insertion order.  In that case we could use a LinkedHashSet which would 
provide both the duplicate removal and keep insertion order. 


---


[jira] [Created] (DRILL-6203) Repeated Map Vector does not give correct payload bytecount

2018-03-01 Thread Padma Penumarthy (JIRA)
Padma Penumarthy created DRILL-6203:
---

 Summary: Repeated Map Vector does not give correct payload 
bytecount
 Key: DRILL-6203
 URL: https://issues.apache.org/jira/browse/DRILL-6203
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.12.0
Reporter: Padma Penumarthy
Assignee: Padma Penumarthy


Repeated Map Vector does not give correct payload byte count. It calls 
abstractMapVector method which gives payload byte count for a given value count 
for simple map (non repetitive) case. We need to overload this method for 
repeated map to get the right numbers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] drill issue #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsException ...

2018-03-01 Thread vrozov
Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1144
  
@parthchandra Please take a look.


---


[GitHub] drill pull request #1096: DRILL-6099 : Push limit past flatten(project) with...

2018-03-01 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1096#discussion_r171711326
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
 ---
@@ -55,18 +62,21 @@ public void onMatch(RelOptRuleCall call) {
 }
   };
 
-  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT =
-  new DrillPushLimitToScanRule(
-  RelOptHelper.some(DrillLimitRel.class, RelOptHelper.some(
-  DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class))),
-  "DrillPushLimitToScanRule_LimitOnProject") {
+  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.any(DrillProjectRel.class)), 
"DrillPushLimitToScanRule_LimitOnProject") {
 @Override
 public boolean matches(RelOptRuleCall call) {
   DrillLimitRel limitRel = call.rel(0);
-  DrillScanRel scanRel = call.rel(2);
-  // For now only applies to Parquet. And pushdown only apply limit 
but not offset,
+  DrillProjectRel projectRel = call.rel(1);
+  // pushdown only apply limit but not offset,
   // so if getFetch() return null no need to run this rule.
-  if (scanRel.getGroupScan().supportsLimitPushdown() && 
(limitRel.getFetch() != null)) {
--- End diff --

Ok, yeah in that case we are not generating a redundant limit.  


---


[GitHub] drill issue #1096: DRILL-6099 : Push limit past flatten(project) without pus...

2018-03-01 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/1096
  
Updated version lgtm.  +1


---


[GitHub] drill pull request #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsExc...

2018-03-01 Thread vrozov
GitHub user vrozov opened a pull request:

https://github.com/apache/drill/pull/1144

DRILL-6202: Deprecate usage of IndexOutOfBoundsException to re-alloc vectors



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vrozov/drill DRILL-6202

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1144.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1144


commit 2af94a07340f9f13aa152822c2c8d37568ab44ab
Author: Vlad Rozov 
Date:   2018-03-01T17:36:05Z

DRILL-6202: Deprecate usage of IndexOutOfBoundsException to re-alloc vectors




---


[GitHub] drill issue #1096: DRILL-6099 : Push limit past flatten(project) without pus...

2018-03-01 Thread gparai
Github user gparai commented on the issue:

https://github.com/apache/drill/pull/1096
  
@amansinha100 I have addressed your review comments. Please take a look. 
Thanks!


---


[GitHub] drill pull request #1096: DRILL-6099 : Push limit past flatten(project) with...

2018-03-01 Thread gparai
Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/1096#discussion_r171708636
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
 ---
@@ -55,18 +62,21 @@ public void onMatch(RelOptRuleCall call) {
 }
   };
 
-  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT =
-  new DrillPushLimitToScanRule(
-  RelOptHelper.some(DrillLimitRel.class, RelOptHelper.some(
-  DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class))),
-  "DrillPushLimitToScanRule_LimitOnProject") {
+  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.any(DrillProjectRel.class)), 
"DrillPushLimitToScanRule_LimitOnProject") {
 @Override
 public boolean matches(RelOptRuleCall call) {
   DrillLimitRel limitRel = call.rel(0);
-  DrillScanRel scanRel = call.rel(2);
-  // For now only applies to Parquet. And pushdown only apply limit 
but not offset,
+  DrillProjectRel projectRel = call.rel(1);
+  // pushdown only apply limit but not offset,
   // so if getFetch() return null no need to run this rule.
-  if (scanRel.getGroupScan().supportsLimitPushdown() && 
(limitRel.getFetch() != null)) {
--- End diff --

Without a FLATTEN, the LIMIT would be fully pushed past the PROJECT i.e. we 
would not have a LIMIT on top of the project.


---


[GitHub] drill pull request #1096: DRILL-6099 : Push limit past flatten(project) with...

2018-03-01 Thread gparai
Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/1096#discussion_r171708439
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
@@ -224,4 +226,64 @@ public Void visitInputRef(RexInputRef inputRef) {
 }
   }
 
+  public static boolean isLimit0(RexNode fetch) {
+if (fetch != null && fetch.isA(SqlKind.LITERAL)) {
+  RexLiteral l = (RexLiteral) fetch;
+  switch (l.getTypeName()) {
+case BIGINT:
+case INTEGER:
+case DECIMAL:
+  if (((long) l.getValue2()) == 0) {
+return true;
+  }
+  }
+}
+return false;
+  }
+
+  public static boolean isProjectOutputRowcountUnknown(RelNode project) {
+assert project instanceof Project : "Rel is NOT an instance of 
project!";
+try {
+  RexVisitor visitor =
+  new RexVisitorImpl(true) {
+public Void visitCall(RexCall call) {
+  if 
("flatten".equals(call.getOperator().getName().toLowerCase())) {
+throw new Util.FoundOne(call); /* throw exception to 
interrupt tree walk (this is similar to
+  other utility methods in 
RexUtil.java */
+  }
+  return super.visitCall(call);
+}
+  };
+  for (RexNode rex : ((Project) project).getProjects()) {
+rex.accept(visitor);
+  }
+} catch (Util.FoundOne e) {
+  Util.swallow(e, null);
+  return true;
+}
+return false;
+  }
+
+  public static boolean isProjectOutputSchemaUnknown(RelNode project) {
--- End diff --

Done


---


[GitHub] drill pull request #1096: DRILL-6099 : Push limit past flatten(project) with...

2018-03-01 Thread gparai
Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/1096#discussion_r171708410
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
@@ -224,4 +226,64 @@ public Void visitInputRef(RexInputRef inputRef) {
 }
   }
 
+  public static boolean isLimit0(RexNode fetch) {
+if (fetch != null && fetch.isA(SqlKind.LITERAL)) {
+  RexLiteral l = (RexLiteral) fetch;
+  switch (l.getTypeName()) {
+case BIGINT:
+case INTEGER:
+case DECIMAL:
+  if (((long) l.getValue2()) == 0) {
+return true;
+  }
+  }
+}
+return false;
+  }
+
+  public static boolean isProjectOutputRowcountUnknown(RelNode project) {
--- End diff --

Done


---


[GitHub] drill pull request #1096: DRILL-6099 : Push limit past flatten(project) with...

2018-03-01 Thread gparai
Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/1096#discussion_r171708384
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
@@ -224,4 +226,64 @@ public Void visitInputRef(RexInputRef inputRef) {
 }
   }
 
+  public static boolean isLimit0(RexNode fetch) {
+if (fetch != null && fetch.isA(SqlKind.LITERAL)) {
+  RexLiteral l = (RexLiteral) fetch;
+  switch (l.getTypeName()) {
+case BIGINT:
+case INTEGER:
+case DECIMAL:
+  if (((long) l.getValue2()) == 0) {
+return true;
+  }
+  }
+}
+return false;
+  }
+
+  public static boolean isProjectOutputRowcountUnknown(RelNode project) {
+assert project instanceof Project : "Rel is NOT an instance of 
project!";
+try {
+  RexVisitor visitor =
--- End diff --

Yes, you are correct. If the rewrite does not consider it as embedded 
within other expressions then it is fine for the utility function to do the 
same.


---


[GitHub] drill issue #1138: DRILL-4120: Allow implicit columns for Avro storage forma...

2018-03-01 Thread vvysotskyi
Github user vvysotskyi commented on the issue:

https://github.com/apache/drill/pull/1138
  
@paul-rogers, schema is taken from the first file in the `FormatSelection`. 
Therefore for the case, when we have a table with several files with a 
different scheme, Drill query will fail.

As for the plan-time type information, besides the validation at the stage 
when a query is converted into rel nodes, field list may be used in project rel 
nodes instead of the dynamic star for `DynamicDrillTable`.


---


[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats

2018-03-01 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1141#discussion_r171648058
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java ---
@@ -79,4 +71,21 @@ public void addOperatorStats(OperatorStats stats) {
 operators.add(stats);
   }
 
+  //DRILL-6197
+  public OperatorStats addOrReplaceOperatorStats(OperatorStats stats) {
+//Remove existing stat
+OperatorStats replacedStat = null;
+int index = 0;
+for (OperatorStats opStat : operators) {
--- End diff --

The choice for using a list for the collection of stats seems to be because 
it simply gets serialized into a JSON list. . As for the overhead, since each 
list is specific to a minor fragment (which typically has about 3-8 operators), 
the overhead of doing a linear search is not significant and is invoked only 
for specific operators. That is one of the reasons why I didn't modify the 
original `addOperatorStats()` implementation with that of 
`addOrReplaceOperatorStats()`.


---


[jira] [Created] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors

2018-03-01 Thread Vlad Rozov (JIRA)
Vlad Rozov created DRILL-6202:
-

 Summary: Deprecate usage of IndexOutOfBoundsException to re-alloc 
vectors
 Key: DRILL-6202
 URL: https://issues.apache.org/jira/browse/DRILL-6202
 Project: Apache Drill
  Issue Type: Bug
Reporter: Vlad Rozov
Assignee: Vlad Rozov


As bounds checking may be enabled or disabled, using IndexOutOfBoundsException 
to resize vectors is unreliable. It works only when bounds checking is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6201) Failed to create input splits: No FileSystem for scheme: maprfs

2018-03-01 Thread Willian Mattos Ribeiro (JIRA)
Willian Mattos Ribeiro created DRILL-6201:
-

 Summary: Failed to create input splits: No FileSystem for scheme: 
maprfs
 Key: DRILL-6201
 URL: https://issues.apache.org/jira/browse/DRILL-6201
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive, Storage - MapRDB
 Environment: Mapr cluster - CentOS

Apache Drill installed in other VM (Isn't a cluster node)
Reporter: Willian Mattos Ribeiro


2018-03-01 14:03:28 ERROR HiveMetadataProvider:294 - Failed to create input 
splits: No FileSystem for scheme: maprfs
java.io.IOException: No FileSystem for scheme: maprfs
 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644) 
~[hadoop-common-2.7.1.jar:?]
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) 
~[hadoop-common-2.7.1.jar:?]
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) 
~[hadoop-common-2.7.1.jar:?]
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) 
~[hadoop-common-2.7.1.jar:?]
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) 
~[hadoop-common-2.7.1.jar:?]
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) 
~[hadoop-common-2.7.1.jar:?]
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) 
~[hadoop-common-2.7.1.jar:?]
 at 
org.apache.drill.exec.store.hive.HiveMetadataProvider$1.run(HiveMetadataProvider.java:269)
 ~[drill-storage-hive-core-1.12.0.jar:1.12.0]
 at 
org.apache.drill.exec.store.hive.HiveMetadataProvider$1.run(HiveMetadataProvider.java:262)
 ~[drill-storage-hive-core-1.12.0.jar:1.12.0]
 at java.security.AccessController.doPrivileged(Native Method) ~[?:1.7.0_161]
 at javax.security.auth.Subject.doAs(Subject.java:421) ~[?:1.7.0_161]
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 ~[hadoop-common-2.7.1.jar:?]
 at 
org.apache.drill.exec.store.hive.HiveMetadataProvider.splitInputWithUGI(HiveMetadataProvider.java:262)
 [drill-storage-hive-core-1.12.0.jar:1.12.0]
 at 
org.apache.drill.exec.store.hive.HiveMetadataProvider.getPartitionInputSplits(HiveMetadataProvider.java:154)
 [drill-storage-hive-core-1.12.0.jar:1.12.0]
 at 
org.apache.drill.exec.store.hive.HiveMetadataProvider.getInputSplits(HiveMetadataProvider.java:176)
 [drill-storage-hive-core-1.12.0.jar:1.12.0]
 at org.apache.drill.exec.store.hive.HiveScan.getInputSplits(HiveScan.java:122) 
[drill-storage-hive-core-1.12.0.jar:1.12.0]
 at 
org.apache.drill.exec.store.hive.HiveScan.getMaxParallelizationWidth(HiveScan.java:171)
 [drill-storage-hive-core-1.12.0.jar:1.12.0]
 at org.apache.drill.exec.planner.physical.ScanPrule.onMatch(ScanPrule.java:41) 
[drill-java-exec-1.12.0.jar:1.12.0]
 at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
 [calcite-core-1.4.0-drill-r23.jar:1.4.0-drill-r23]
 at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:811)
 [calcite-core-1.4.0-drill-r23.jar:1.4.0-drill-r23]
 at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:310) 
[calcite-core-1.4.0-drill-r23.jar:1.4.0-drill-r23]
 at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:400)
 [drill-java-exec-1.12.0.jar:1.12.0]
 at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:429)
 [drill-java-exec-1.12.0.jar:1.12.0]
 at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:169)
 [drill-java-exec-1.12.0.jar:1.12.0]
 at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:131)
 [drill-java-exec-1.12.0.jar:1.12.0]
 at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:79)
 [drill-java-exec-1.12.0.jar:1.12.0]
 at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1017) 
[drill-java-exec-1.12.0.jar:1.12.0]
 at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:289) 
[drill-java-exec-1.12.0.jar:1.12.0]
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152) 
[?:1.7.0_161]
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) 
[?:1.7.0_161]
 at java.lang.Thread.run(Thread.java:748) [?:1.7.0_161]
2018-03-01 14:03:28 ERROR HiveMetadataProvider:180 - Failed to get InputSplits
org.apache.drill.common.exceptions.DrillRuntimeException: Failed to create 
input splits: No FileSystem for scheme: maprfs
 at 
org.apache.drill.exec.store.hive.HiveMetadataProvider.splitInputWithUGI(HiveMetadataProvider.java:295)
 ~[drill-storage-hive-core-1.12.0.jar:1.12.0]
 at 
org.apache.drill.exec.store.hive.HiveMetadataProvider.getPartitionInputSplits(HiveMetadataProvider.java:154)
 ~[drill-storage-hive-core-1.12.0.jar:1.12.0]
 at 
org.apache.drill.exec.store.hive.HiveMetadataProvider.getInputSplits(HiveMetadataProvider.java:176)
 

[GitHub] drill issue #1138: DRILL-4120: Allow implicit columns for Avro storage forma...

2018-03-01 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1138
  
Another thought. The removed code is at plan time. Did the original code 
have to open each file to retrieve schema? If so, does removing the code remove 
that load? If so, then this change could be a huge performance improvement if 
avoids the need to open every file in the Foreman.

Then, the the next question is: do we actually do anything with the 
plan-time type information? Few files have that information. Given that, does 
the planner actually use the information? Is this something we get for free 
from Calcite? If we are not using the type information at plan time, then 
clearly there is no harm in removing the code that retrieves the type 
information.


---


[GitHub] drill pull request #1142: DRILL-6198: OpenTSDB unit tests fail when Lilith c...

2018-03-01 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1142#discussion_r171620225
  
--- Diff: 
contrib/storage-opentsdb/src/test/java/org/apache/drill/store/openTSDB/TestOpenTSDBPlugin.java
 ---
@@ -185,4 +188,26 @@ public void testDescribe() throws Exception {
 test("describe `warp.speed.test`");
 Assert.assertEquals(1, testSql("show tables"));
   }
+
+  /**
+   * Checks that port with specified number is free and returns it.
+   * Otherwise, increases port number and checks until free port is found
+   * or the number of attempts is reached specified numAttempts
+   *
+   * @param portNum initial port number
+   * @param numAttempts max number of attempts to find port with greater 
number
+   * @return free port number
+   * @throws BindException if free port was not found and all attempts 
were used.
+   */
+  private static int getFreePortNum(int portNum, int numAttempts) throws 
IOException {
+while (numAttempts > 0) {
--- End diff --

1. Thanks, it looks better with for loop.
2. Added more details to the error message.


---


[GitHub] drill pull request #1142: DRILL-6198: OpenTSDB unit tests fail when Lilith c...

2018-03-01 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1142#discussion_r171617811
  
--- Diff: 
contrib/storage-opentsdb/src/test/java/org/apache/drill/store/openTSDB/TestOpenTSDBPlugin.java
 ---
@@ -51,17 +54,17 @@
 
 public class TestOpenTSDBPlugin extends PlanTestBase {
 
-  protected static OpenTSDBStoragePlugin storagePlugin;
-  protected static OpenTSDBStoragePluginConfig storagePluginConfig;
+  private static int portNum = 10_000;
--- End diff --

Thanks, removed.


---


[GitHub] drill pull request #1142: DRILL-6198: OpenTSDB unit tests fail when Lilith c...

2018-03-01 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1142#discussion_r171618633
  
--- Diff: 
contrib/storage-opentsdb/src/test/java/org/apache/drill/store/openTSDB/TestOpenTSDBPlugin.java
 ---
@@ -51,17 +54,17 @@
 
 public class TestOpenTSDBPlugin extends PlanTestBase {
 
-  protected static OpenTSDBStoragePlugin storagePlugin;
-  protected static OpenTSDBStoragePluginConfig storagePluginConfig;
+  private static int portNum = 10_000;
 
   @Rule
-  public WireMockRule wireMockRule = new WireMockRule(1);
+  public WireMockRule wireMockRule = new WireMockRule(portNum);
 
   @BeforeClass
   public static void setup() throws Exception {
+portNum = getFreePortNum(portNum, 1000);
--- End diff --

Done.


---


[GitHub] drill pull request #1142: DRILL-6198: OpenTSDB unit tests fail when Lilith c...

2018-03-01 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1142#discussion_r171607424
  
--- Diff: 
contrib/storage-opentsdb/src/test/java/org/apache/drill/store/openTSDB/TestOpenTSDBPlugin.java
 ---
@@ -185,4 +188,26 @@ public void testDescribe() throws Exception {
 test("describe `warp.speed.test`");
 Assert.assertEquals(1, testSql("show tables"));
   }
+
+  /**
+   * Checks that port with specified number is free and returns it.
+   * Otherwise, increases port number and checks until free port is found
+   * or the number of attempts is reached specified numAttempts
+   *
+   * @param portNum initial port number
+   * @param numAttempts max number of attempts to find port with greater 
number
+   * @return free port number
+   * @throws BindException if free port was not found and all attempts 
were used.
+   */
+  private static int getFreePortNum(int portNum, int numAttempts) throws 
IOException {
+while (numAttempts > 0) {
--- End diff --

1. Please re-write using for loop.
2. Please add more details to the exception, include initial port number, 
which ports were occupied. Suggest to check which ports are free etc.


---


[GitHub] drill pull request #1142: DRILL-6198: OpenTSDB unit tests fail when Lilith c...

2018-03-01 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1142#discussion_r171606555
  
--- Diff: 
contrib/storage-opentsdb/src/test/java/org/apache/drill/store/openTSDB/TestOpenTSDBPlugin.java
 ---
@@ -51,17 +54,17 @@
 
 public class TestOpenTSDBPlugin extends PlanTestBase {
 
-  protected static OpenTSDBStoragePlugin storagePlugin;
-  protected static OpenTSDBStoragePluginConfig storagePluginConfig;
+  private static int portNum = 10_000;
--- End diff --

Why do you set value right away? It looks you will always re-write in 
`setup`.


---


[GitHub] drill pull request #1142: DRILL-6198: OpenTSDB unit tests fail when Lilith c...

2018-03-01 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1142#discussion_r171606840
  
--- Diff: 
contrib/storage-opentsdb/src/test/java/org/apache/drill/store/openTSDB/TestOpenTSDBPlugin.java
 ---
@@ -51,17 +54,17 @@
 
 public class TestOpenTSDBPlugin extends PlanTestBase {
 
-  protected static OpenTSDBStoragePlugin storagePlugin;
-  protected static OpenTSDBStoragePluginConfig storagePluginConfig;
+  private static int portNum = 10_000;
 
   @Rule
-  public WireMockRule wireMockRule = new WireMockRule(1);
+  public WireMockRule wireMockRule = new WireMockRule(portNum);
 
   @BeforeClass
   public static void setup() throws Exception {
+portNum = getFreePortNum(portNum, 1000);
--- End diff --

May be we can decrease number of attempt to 200?


---


[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats

2018-03-01 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1141#discussion_r171611927
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java ---
@@ -79,4 +71,21 @@ public void addOperatorStats(OperatorStats stats) {
 operators.add(stats);
   }
 
+  //DRILL-6197
+  public OperatorStats addOrReplaceOperatorStats(OperatorStats stats) {
+//Remove existing stat
+OperatorStats replacedStat = null;
+int index = 0;
+for (OperatorStats opStat : operators) {
--- End diff --

I am worried about the small overheads of doing this linear search adding 
up for each operator, especially for queries with complex query plans.  The 
stats collection should ideally impose minimal overhead.  Does the operator 
stats have to be a list or can just use a Set ? 


---


[GitHub] drill issue #1138: DRILL-4120: Allow implicit columns for Avro storage forma...

2018-03-01 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1138
  
As @arina-ielchiieva points out, this change backs out plan-time knowledge 
of schema. This may not affect run-time accuracy. However, it does mean that 
queries can be planned, based on not knowing types, that fail at runtime when 
types are learned. This seems more like a bug that a feature. In general, we 
should use all information available. It is not helpful to ignore information 
if doing so results in poorer user experience.


---


[GitHub] drill pull request #1138: DRILL-4120: Allow implicit columns for Avro storag...

2018-03-01 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r171606330
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroRecordReader.java
 ---
@@ -154,6 +156,12 @@ public int next() {
 
   writer.setValueCount(recordCount);
 
+  // adds fields which don't exist in the table but should be present 
in the schema
+  if (recordCount > 0) {
+JsonReaderUtils.ensureAtLeastOneField(writer, getColumns(), false,
--- End diff --

In general, this is a bad idea, though existing code does this. If we find 
an empty file in one scanner, but a real file in another, we create an 
unnecessary schema change by making up a column.

Jinfeng's changes last year are supposed to handle the "fast none" case of 
a reader with no rows. There should be no reason to add a dummy column. Old 
code that adds such a column should be fixed. IMHO, code that does not add 
dummy columns should not begin to do so.


---


[GitHub] drill pull request #1138: DRILL-4120: Allow implicit columns for Avro storag...

2018-03-01 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r171607241
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroRecordReader.java
 ---
@@ -295,7 +301,8 @@ private void processPrimitive(final Object value, final 
Schema.Type type, final
 writer.binary(fieldName).writeVarBinary(0, length, buffer);
 break;
   case NULL:
-// Nothing to do for null type
+// The default Drill behaviour is to create int column
+writer.integer(fieldName);
--- End diff --

This maps a NULL type to integer. Probably OK if we do this consistently.


---


[GitHub] drill pull request #1139: DRILL-6189: Security: passwords logging and file p...

2018-03-01 Thread vladimirtkach
Github user vladimirtkach commented on a diff in the pull request:

https://github.com/apache/drill/pull/1139#discussion_r171588799
  
--- Diff: 
logical/src/main/java/org/apache/drill/common/config/LogicalPlanPersistence.java
 ---
@@ -52,6 +53,7 @@ public LogicalPlanPersistence(DrillConfig conf, 
ScanResult scanResult) {
 mapper.configure(Feature.ALLOW_UNQUOTED_FIELD_NAMES, true);
 mapper.configure(JsonGenerator.Feature.QUOTE_FIELD_NAMES, true);
 mapper.configure(Feature.ALLOW_COMMENTS, true);
+mapper.setFilterProvider(new 
SimpleFilterProvider().setFailOnUnknownId(false));
--- End diff --

submitted physical plan directly to node, it was successfully deserialized


---


[GitHub] drill pull request #1143: DRILL-1491: Support for JDK 8

2018-03-01 Thread vladimirtkach
GitHub user vladimirtkach opened a pull request:

https://github.com/apache/drill/pull/1143

DRILL-1491: Support for JDK 8

Changed jdk version from 7 to 8 in pom.xml, drill-config.sh and others

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vladimirtkach/drill DRILL-1491

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1143.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1143


commit 0aeeacc9e528b6dea80385bbf53e7259a6813b08
Author: Vladimir Tkach 
Date:   2018-02-28T13:32:55Z

DRILL-1491: Support for JDK 8

Changed jdk version from 7 to 8 in pom.xml travis and drill-config.sh




---


[GitHub] drill issue #1139: DRILL-6189: Security: passwords logging and file permisio...

2018-03-01 Thread vladimirtkach
Github user vladimirtkach commented on the issue:

https://github.com/apache/drill/pull/1139
  
@arina-ielchiieva made changes, please take a look


---


[GitHub] drill pull request #1139: DRILL-6189: Security: passwords logging and file p...

2018-03-01 Thread vladimirtkach
Github user vladimirtkach commented on a diff in the pull request:

https://github.com/apache/drill/pull/1139#discussion_r171579096
  
--- Diff: 
contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStorageConfig.java
 ---
@@ -17,13 +17,15 @@
  */
 package org.apache.drill.exec.store.jdbc;
 
+import com.fasterxml.jackson.annotation.JsonFilter;
 import org.apache.drill.common.logical.StoragePluginConfig;
 
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
 import com.fasterxml.jackson.annotation.JsonTypeName;
 
 @JsonTypeName(JdbcStorageConfig.NAME)
+@JsonFilter("passwordFilter")
--- End diff --

To apply filter:
1) Mark the entity with you want to filter out fields from.
2) Create filter provider and register property filter with reference to 
your entity
3) When creating ObjectWriter pass your filter provider


---


[jira] [Created] (DRILL-6200) ERROR Quering hive through HiverServer2 via JDBC!

2018-03-01 Thread Hannibal07 (JIRA)
Hannibal07 created DRILL-6200:
-

 Summary: ERROR Quering hive through HiverServer2 via JDBC!
 Key: DRILL-6200
 URL: https://issues.apache.org/jira/browse/DRILL-6200
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Affects Versions: 1.12.0
Reporter: Hannibal07


ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: select * 
from dim_parameter
[30034]Query execution error. Details:[ 
DATA_READ ERROR: The JDBC storage plugin failed while trying setup the SQL 
query.

sql SELECT *
FROM.dw.dim_parameter
plugin hive
Fragment 0:0

[Error Id: e522f220-b857-4273-af0a-2a2d05d992f2 on 172.28.32.7:31010]

(org.apache.hive.service.cli.HiveSQLException) Error while compiling statement: 
FAILED: ParseException line 2:4 cannot recognize input near '.' 'dw' '.' in 
join source
 org.apache.hive.jdbc.Utils.verifySuccess():267
 org.apache.hive.jdbc.Utils.verifySuccessWithInfo():253
 org.apache.hive.jdbc.HiveStatement.runAsyncOnServer():309
 org.apache.hive.jdbc.HiveStatement.execute():250
 org.apache.hive.jdbc.HiveStatement.executeQuery():434
 org.apache.commons.dbcp.DelegatingStatement.executeQuery():208
 org.apache.commons.dbcp.DelegatingStatement.executeQuery():208
 org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup():177
 org.apache.drill.exec.p...
 em System.Data.Odbc.OdbcConnection.HandleError(OdbcHandle hrHandle, RetCode 
retcode)
 em System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, 
String method, Boolean needReader, Object[] methodArguments, SQL_API 
odbcApiMethod)
 em System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, 
String method, Boolean needReader)
 em System.Data.Odbc.OdbcCommand.ExecuteReader(CommandBehavior behavior)
 em DrillExplorer.DROdbcProvider.GetStatmentColumns(String in_query)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] 1.13.0 release

2018-03-01 Thread Parth Chandra
Good show.

Updated list:

DRILL-6185: Error is displaying while accessing query profiles via the
Web-UI  -- Ready to commit
DRILL-6174: Parquet pushdown planning improvements -- Ready to commit
DRILL-6191: Need more information on TCP flags -- Ready to commit
DRILL-6190: Packets can be bigger than strictly legal  -- Ready to commit
DRILL-6188: Fix C++ client build on Centos 7 and OS X  --  Ready to commit

DRILL-1491:  Support for JDK 8 --* In progress.*

DRILL-1170: YARN support for Drill -- Needs Committer +1 and Travis fix.

DRILL-6027: Implement spill to disk for the Hash Join   --- No PR and is a
major feature that should be reviewed (properly!).

DRILL-6173: Support transitive closure during filter push down and
partition pruning.  -- No PR and depends on 3 Apache Calcite issues that
are open.

DRILL-6023: Graceful shutdown improvements -- No PR. Consists of 6 sub
JIra's none of which have PRs.


Re: [DISCUSS] 1.13.0 release

2018-03-01 Thread Ted Dunning
DRILL-6190 and DRILL-6191 are ready to merge to master for release.

Code review and unit tests all pass.



On Wed, Feb 28, 2018 at 11:16 PM, Parth Chandra  wrote:

> Moved Ted's PR's down in the list. Let's see where we are at the end of the
> week.
> Arina, Volodymyr, ank ETA on JDK 8 work? It's the gating factor for the
> release.
> Meanwhile, people, feel free to commit your work as usual.
>
> Updated list:
>
> DRILL-6185: Error is displaying while accessing query profiles via the
> Web-UI  -- Ready to commit
> DRILL-6174: Parquet pushdown planning improvements -- Ready to commit
> DRILL-6188: Fix C++ client build on Centos 7 and OS X  --  Ready to commit
>
> DRILL-1491:  Support for JDK 8 --* In progress.*
>
> DRILL-6191: Need more information on TCP flags -- *In progress*
>
> DRILL-6190: Packets can be bigger than strictly legal  -- *In progress*
>
> DRILL-1170: YARN support for Drill -- Needs Committer +1 and Travis fix.
>
> DRILL-6027: Implement spill to disk for the Hash Join   --- No PR and is a
> major feature that should be reviewed (properly!).
>
> DRILL-6173: Support transitive closure during filter push down and
> partition pruning.  -- No PR and depends on 3 Apache Calcite issues that
> are open.
>
> DRILL-6023: Graceful shutdown improvements -- No PR. Consists of 6 sub
> JIra's none of which have PRs.
>
> On Wed, Feb 28, 2018 at 5:45 PM, Ted Dunning 
> wrote:
>
> > 6190 and/or 6191 cause test failures that I have been unable to spend
> time
> > on yet. I don't think that they are ready to commit.
> >
> > At least one of these is likely to be something very simple like a test
> > that didn't clean up after itself. The other should be as simple, but I
> > can't understand it yet. It may be a memory pressure thing rather than a
> > real problem with the test.
> >
> >
> > On Wed, Feb 28, 2018 at 3:18 AM, Parth Chandra 
> wrote:
> >
> > > OK. So let's try to get as many of the following as we can without
> > breaking
> > > anything. As far as I can see none of the open items below are show
> > > stoppers for a release, but I'm happy to give in to popular demand for
> > JDK
> > > 8 :).
> > >
> > > Note that the last three appear to be big ticket items that have no PR
> > yet.
> > > Usually, it is a mistake to rush these into a release (one advantage of
> > > frequent, predictable releases is that they won't have to wait too long
> > for
> > > the next release).
> > >
> > > Here's what I'm tracking :
> > >
> > > DRILL-6185: Error is displaying while accessing query profiles via the
> > > Web-UI  -- Ready to commit
> > > DRILL-6174: Parquet pushdown planning improvements -- Ready to commit
> > > DRILL-6191: Need more information on TCP flags -- Ready to commit
> > > DRILL-6190: Packets can be bigger than strictly legal  -- Ready to
> commit
> > >
> > > DRILL-6188: Fix C++ client build on Centos 7 and OS X  --  Needs
> > committer
> > > +1
> > >
> > > DRILL-1491:  Support for JDK 8 --* In progress.*
> > >
> > > DRILL-1170: YARN support for Drill -- Needs Committer +1 and Travis
> fix.
> > >
> > > DRILL-6027: Implement spill to disk for the Hash Join   --- No PR and
> is
> > a
> > > major feature that should be reviewed (properly!).
> > >
> > > DRILL-6173: Support transitive closure during filter push down and
> > > partition pruning.  -- No PR and depends on 3 Apache Calcite issues
> that
> > > are open.
> > >
> > > DRILL-6023: Graceful shutdown improvements -- No PR. Consists of 6 sub
> > > JIra's none of which have PRs.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Feb 28, 2018 at 12:32 AM, Ted Dunning 
> > > wrote:
> > >
> > > > I have two very small improvements to PCAP support with DRILL-6190
> and
> > > > DRILL-6191 that I would like to get in.
> > > >
> > > > I think that PCAP-NG support is too far from ready.
> > > >
> > > >
> > > >
> > > > On Tue, Feb 27, 2018 at 10:52 AM, Pritesh Maker 
> > wrote:
> > > >
> > > > > I see a few more issues that are in review and worth including for
> > the
> > > > > 1.13 release (maybe give another week to resolve this before the
> 1st
> > RC
> > > > is
> > > > > created?)
> > > > >
> > > > > DRILL-6027 Implement spill to disk for the Hash Join  -- Boaz and
> Tim
> > > > > DRILL-6173 Support transitive closure during filter push down and
> > > > > partition pruning - Vitalii
> > > > > DRILL-6023 Graceful shutdown improvements -- Jyothsna
> > > > >
> > > > > There are several other bugs/ improvements that are marked in
> > progress
> > > -
> > > > > https://issues.apache.org/jira/secure/Dashboard.jspa?
> > > > selectPageId=12332152
> > > > > - if folks are not working on them, we should remove the fixVersion
> > for
> > > > > 1.13.
> > > > >
> > > > > Pritesh
> > > > >
> > > > >
> > > > > -Original Message-
> > > > > From: Abhishek Girish 
> > > > > Sent: February 27, 2018 10:44 AM
> > > > > To: 

[GitHub] drill pull request #1142: DRILL-6198: OpenTSDB unit tests fail when Lilith c...

2018-03-01 Thread vvysotskyi
GitHub user vvysotskyi opened a pull request:

https://github.com/apache/drill/pull/1142

DRILL-6198: OpenTSDB unit tests fail when Lilith client is run

Added method which checks that the default port 10_000 is free, otherwise 
this method increases port number and checks until free port is found.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vvysotskyi/drill DRILL-6198

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1142.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1142


commit 671d3e0d900c19f561cb3d0f744898c0f9bf20e9
Author: Volodymyr Vysotskyi 
Date:   2018-03-01T12:52:28Z

DRILL-6198: OpenTSDB unit tests fail when Lilith client is run




---


[GitHub] drill pull request #1138: DRILL-4120: Allow implicit columns for Avro storag...

2018-03-01 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r171544478
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroFormatTest.java
 ---
@@ -170,25 +169,35 @@ public void 
testSimplePrimitiveSchema_SelectColumnSubset() throws Exception {
 
   @Test
   public void testSimplePrimitiveSchema_NoColumnsExistInTheSchema() throws 
Exception {
-final String file = 
generateSimplePrimitiveSchema_NoNullValues().getFileName();
-try {
-  test("select h_dummy1, e_dummy2 from dfs.`%s`", file);
-  Assert.fail("Test should fail as h_dummy1 and e_dummy2 does not 
exist.");
-} catch(UserException ue) {
-  Assert.assertTrue("Test should fail as h_dummy1 and e_dummy2 does 
not exist.",
-  ue.getMessage().contains("Column 'h_dummy1' not found in any 
table"));
-}
+final String file = 
generateSimplePrimitiveSchema_NoNullValues(1).getFileName();
+testBuilder()
+  .sqlQuery("select h_dummy1, e_dummy2 from dfs.`%s`", file)
+  .unOrdered()
+  .baselineColumns("h_dummy1", "e_dummy2")
+  .baselineValues(null, null)
+  .go();
   }
 
   @Test
   public void 
testSimplePrimitiveSchema_OneExistAndOneDoesNotExistInTheSchema() throws 
Exception {
-final String file = 
generateSimplePrimitiveSchema_NoNullValues().getFileName();
-try {
-  test("select h_boolean, e_dummy2 from dfs.`%s`", file);
-  Assert.fail("Test should fail as e_dummy2 does not exist.");
-} catch(UserException ue) {
-  Assert.assertTrue("Test should fail as e_dummy2 does not exist.", 
true);
-}
+final String file = 
generateSimplePrimitiveSchema_NoNullValues(1).getFileName();
+testBuilder()
+  .sqlQuery("select h_boolean, e_dummy2 from dfs.`%s`", file)
+  .unOrdered()
+  .baselineColumns("h_boolean", "e_dummy2")
+  .baselineValues(true, null)
+  .go();
+  }
+
+  @Test
+  public void testImplicitColumnFilename() throws Exception {
+final String file = 
generateSimplePrimitiveSchema_NoNullValues(1).getFileName();
+testBuilder()
+  .sqlQuery("select filename from dfs.`%s`", file)
--- End diff --

Thanks for pointing this, modified existing test to check except the 
`filename` also `suffix`, `fqn` and `filepath` implicit columns. Added separate 
test for partition column.


---


[GitHub] drill pull request #1138: DRILL-4120: Allow implicit columns for Avro storag...

2018-03-01 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r171517376
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroFormatTest.java
 ---
@@ -170,25 +169,35 @@ public void 
testSimplePrimitiveSchema_SelectColumnSubset() throws Exception {
 
   @Test
   public void testSimplePrimitiveSchema_NoColumnsExistInTheSchema() throws 
Exception {
-final String file = 
generateSimplePrimitiveSchema_NoNullValues().getFileName();
-try {
-  test("select h_dummy1, e_dummy2 from dfs.`%s`", file);
-  Assert.fail("Test should fail as h_dummy1 and e_dummy2 does not 
exist.");
-} catch(UserException ue) {
-  Assert.assertTrue("Test should fail as h_dummy1 and e_dummy2 does 
not exist.",
-  ue.getMessage().contains("Column 'h_dummy1' not found in any 
table"));
-}
+final String file = 
generateSimplePrimitiveSchema_NoNullValues(1).getFileName();
+testBuilder()
+  .sqlQuery("select h_dummy1, e_dummy2 from dfs.`%s`", file)
+  .unOrdered()
+  .baselineColumns("h_dummy1", "e_dummy2")
+  .baselineValues(null, null)
+  .go();
   }
 
   @Test
   public void 
testSimplePrimitiveSchema_OneExistAndOneDoesNotExistInTheSchema() throws 
Exception {
-final String file = 
generateSimplePrimitiveSchema_NoNullValues().getFileName();
-try {
-  test("select h_boolean, e_dummy2 from dfs.`%s`", file);
-  Assert.fail("Test should fail as e_dummy2 does not exist.");
-} catch(UserException ue) {
-  Assert.assertTrue("Test should fail as e_dummy2 does not exist.", 
true);
-}
+final String file = 
generateSimplePrimitiveSchema_NoNullValues(1).getFileName();
+testBuilder()
+  .sqlQuery("select h_boolean, e_dummy2 from dfs.`%s`", file)
+  .unOrdered()
+  .baselineColumns("h_boolean", "e_dummy2")
+  .baselineValues(true, null)
+  .go();
+  }
+
+  @Test
+  public void testImplicitColumnFilename() throws Exception {
+final String file = 
generateSimplePrimitiveSchema_NoNullValues(1).getFileName();
+testBuilder()
+  .sqlQuery("select filename from dfs.`%s`", file)
--- End diff --

Please test all implicit columns and at least one partition column.


---


Re: Avro storage format behaviour

2018-03-01 Thread Arina Yelchiyeva
As Paul has mentioned in PR [1] when we move to new scan framework it will
handle implicit columns for all file readers.
I guess till that let's treat avro as other file formats (for example,
parquet) so users can benefit from implicit columns for this format as well.

[1] https://github.com/apache/drill/pull/1138

On Wed, Feb 28, 2018 at 7:47 PM, Vova Vysotskyi  wrote:

> Hi all,
>
> I am working on DRILL-4120: dir0 does not work when the directory structure
> contains Avro files.
>
> In DRILL-3810 was added validation of query using avro schema before start
> executing the query.
> Therefore with these changes Drill throws an exception when the
> query contains non-existent column and table has avro format.
> Other storage formats such as json or parquet allow usage of non-existing
> fields.
>
> So here is my question: should we continue to treat avro as a format with
> fixed schema, or we should start treating avro as a dynamic format to be
> consistent with other storage formats?
>
> --
> Kind regards,
> Volodymyr Vysotskyi
>


[GitHub] drill issue #1137: DRILL-6185: Fixed error while displaying system profiles ...

2018-03-01 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1137
  
I meant that we parse text plan, obviously it was generated from the some 
object. For the future we may consider to create special plan object from 
initial one which will have structure suitable for Web UI and we won't need to 
parse plan string...


---


[GitHub] drill pull request #1139: DRILL-6189: Security: passwords logging and file p...

2018-03-01 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1139#discussion_r171511391
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserServer.java ---
@@ -91,6 +91,34 @@
   static {
 userConnectionMap = new ConcurrentHashMap<>();
   }
+  public static  String safeLogString(UserToBitHandshake inbound) {
+StringBuilder sb = new StringBuilder();
+sb.append("rpc_version: ");
+sb.append(inbound.getRpcVersion());
+sb.append("\ncredentials:\n\t");
+sb.append(inbound.getCredentials());
+sb.append("properties:");
+java.util.List props = 
inbound.getProperties().getPropertiesList();
+for (Property p: props){
+  if(!p.getKey().equalsIgnoreCase("password")) {
--- End diff --

Please add spaces missing spaces...


---


[GitHub] drill pull request #1139: DRILL-6189: Security: passwords logging and file p...

2018-03-01 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1139#discussion_r171512422
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
 ---
@@ -158,7 +162,9 @@ protected void logAndSetTextPlan(final String 
description, final Prel prel, fina
 
   protected void log(final String name, final PhysicalPlan plan, final 
Logger logger) throws JsonProcessingException {
 if (logger.isDebugEnabled()) {
-  String planText = 
plan.unparse(context.getLpPersistence().getMapper().writer());
+  PropertyFilter theFilter = new 
SimpleBeanPropertyFilter.SerializeExceptFilter(Sets.newHashSet("password"));
--- End diff --

Please rename to `filter`.


---


[GitHub] drill pull request #1139: DRILL-6189: Security: passwords logging and file p...

2018-03-01 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1139#discussion_r171510779
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserServer.java ---
@@ -91,6 +91,34 @@
   static {
 userConnectionMap = new ConcurrentHashMap<>();
   }
+  public static  String safeLogString(UserToBitHandshake inbound) {
--- End diff --

1. Please remove one space -> `static  String`/
2. Can this method be just private? Not public static? If yes, please move 
it to the end of the class.
3. Please add javadoc to the method.
4. Please consider method renaming to depict actual work it does.


---


[GitHub] drill pull request #1139: DRILL-6189: Security: passwords logging and file p...

2018-03-01 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1139#discussion_r171512007
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserServer.java ---
@@ -320,7 +348,7 @@ protected void consumeHandshake(ChannelHandlerContext 
ctx, UserToBitHandshake in
 
   @Override
   public BitToUserHandshake getHandshakeResponse(UserToBitHandshake 
inbound) throws Exception {
-logger.trace("Handling handshake from user to bit. {}", inbound);
+logger.trace("Handling handshake from user to bit. {}", 
safeLogString(inbound));
--- End diff --

Should we add `if (logger.isTraceEnabled()) {`? so `safeLogString` will be 
called only when we do need it for trace?


---


[GitHub] drill pull request #1139: DRILL-6189: Security: passwords logging and file p...

2018-03-01 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1139#discussion_r171511274
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserServer.java ---
@@ -91,6 +91,34 @@
   static {
 userConnectionMap = new ConcurrentHashMap<>();
   }
+  public static  String safeLogString(UserToBitHandshake inbound) {
+StringBuilder sb = new StringBuilder();
+sb.append("rpc_version: ");
+sb.append(inbound.getRpcVersion());
+sb.append("\ncredentials:\n\t");
+sb.append(inbound.getCredentials());
+sb.append("properties:");
+java.util.List props = 
inbound.getProperties().getPropertiesList();
--- End diff --

Why do you need full import?


---


[jira] [Created] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-01 Thread Anton Gozhiy (JIRA)
Anton Gozhiy created DRILL-6199:
---

 Summary: Filter push down doesn't work with more than one nested 
subqueries
 Key: DRILL-6199
 URL: https://issues.apache.org/jira/browse/DRILL-6199
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.13.0
Reporter: Anton Gozhiy
 Attachments: DRILL_6118_data_source.csv

*Data set:*
The data is generated used the attached file: *DRILL_6118_data_source.csv*
Data gen commands:

{code:sql}
create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] c3, 
columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` where 
columns[0] in (1, 3);
create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] c3, 
columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` where 
columns[0]=2;
create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] c3, 
columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` where 
columns[0]>3;
{code}

*Steps:*
# Execute the following query:
{code:sql}
explain plan for select * from (select * from (select * from 
dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
{code}

*Expected result:*
numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
scanned.

*Actual result:*
Filter push down doesn't work:
numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6198) OpenTSDB unit tests fail when Lilith client is run

2018-03-01 Thread Volodymyr Vysotskyi (JIRA)
Volodymyr Vysotskyi created DRILL-6198:
--

 Summary: OpenTSDB unit tests fail when Lilith client is run
 Key: DRILL-6198
 URL: https://issues.apache.org/jira/browse/DRILL-6198
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build  Test
Reporter: Volodymyr Vysotskyi


When OpenTSDB unit tests are running on the same machine where Lilith client is 
run, unit tests fail with the error:
{noformat}
testDescribe(org.apache.drill.store.openTSDB.TestOpenTSDBPlugin)  Time elapsed: 
0.01 sec  <<< ERROR!
com.github.tomakehurst.wiremock.common.FatalStartupException: 
java.lang.RuntimeException: java.net.BindException: Address already in use
at 
com.github.tomakehurst.wiremock.WireMockServer.start(WireMockServer.java:145)
at 
com.github.tomakehurst.wiremock.junit.WireMockRule$1.evaluate(WireMockRule.java:68)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: java.lang.RuntimeException: java.net.BindException: Address already 
in use
at 
com.github.tomakehurst.wiremock.jetty9.JettyHttpServer.start(JettyHttpServer.java:132)
at 
com.github.tomakehurst.wiremock.WireMockServer.start(WireMockServer.java:143)
at 
com.github.tomakehurst.wiremock.junit.WireMockRule$1.evaluate(WireMockRule.java:68)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at 
wiremock.org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:321)
at 
wiremock.org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
at 
wiremock.org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236)
at 
wiremock.org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at wiremock.org.eclipse.jetty.server.Server.doStart(Server.java:366)
at 
wiremock.org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
com.github.tomakehurst.wiremock.jetty9.JettyHttpServer.start(JettyHttpServer.java:130)
at 
com.github.tomakehurst.wiremock.WireMockServer.start(WireMockServer.java:143)
at 
com.github.tomakehurst.wiremock.junit.WireMockRule$1.evaluate(WireMockRule.java:68)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}
This failure appears because of Lilith uses the same port 1 as the port, 
specified in {{TestOpenTSDBPlugin.wireMockRule}} and 
{{bootstrap-storage-plugins.json}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] drill issue #1140: DRILL-6195: Quering Hive non-partitioned transactional ta...

2018-03-01 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1140
  
+1


---


[GitHub] drill pull request #1141: DRILL-6197: Skip duplicate entry for OperatorStats

2018-03-01 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1141#discussion_r171485412
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java ---
@@ -31,6 +32,13 @@
 public class FragmentStats {
 //  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(FragmentStats.class);
 
+  //Skip operators that already have stats reported by 
org.apache.drill.exec.physical.impl.BaseRootExec
+  private static final List operatorStatsInitToSkip = 
Lists.newArrayList(
--- End diff --

The add-on commit refactors by having the BaseRootExec constructor handle 
the substition without risking going out of sync with other senders extending 
BaseRootExec.


---