[GitHub] drill pull request #1093: DRILL-6093 : Account for simple columns in project...

2018-01-16 Thread gparai
GitHub user gparai opened a pull request:

https://github.com/apache/drill/pull/1093

DRILL-6093 : Account for simple columns in project cpu costing

@amansinha100 can you please review this PR? Thanks!

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gparai/drill DRILL-6093-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1093.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1093


commit 30f1934e6b57aeaf33f72ac701bd431f2c11e403
Author: Gautam Parai 
Date:   2018-01-16T23:16:16Z

DRILL-6093 : Account for simple columns in project cpu costing




---


[GitHub] drill pull request #1057: DRILL-5993 Add Generic Copiers With Append Methods

2018-01-16 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r161950855
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/AbstractSV4Copier.java
 ---
@@ -17,32 +17,30 @@
  */
 package org.apache.drill.exec.physical.impl.svremover;
 
-import javax.inject.Named;
-
 import org.apache.drill.common.types.TypeProtos.MajorType;
 import org.apache.drill.common.types.Types;
 import org.apache.drill.exec.exception.SchemaChangeException;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 import org.apache.drill.exec.vector.AllocationHelper;
+import org.apache.drill.exec.vector.ValueVector;
 
-public abstract class CopierTemplate4 implements Copier{
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(CopierTemplate4.class);
+public abstract class AbstractSV4Copier implements Copier {
--- End diff --

The code for AbstractSV4Copier is nearly identical to the code of 
AbstractSV2Copier; I wonder if we could have combined the two somehow (like a 
super class ??)



---


[GitHub] drill issue #1057: DRILL-5993 Add Generic Copiers With Append Methods

2018-01-16 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1057
  
Addressed review comments and rebased.


---


[GitHub] drill pull request #1057: DRILL-5993 Add Generic Copiers With Append Methods

2018-01-16 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r161913565
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/svremover/GenericCopierTest.java
 ---
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.svremover;
+
+import org.apache.drill.exec.memory.RootAllocator;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.test.rowSet.RowSet;
+import org.apache.drill.test.rowSet.RowSetBuilder;
+
+public class GenericCopierTest extends AbstractGenericCopierTest {
--- End diff --

The generated code uses the copyFromSafe methods in code generated copiers. 
Your copyEntry method which is used by the GenericCopiers also uses the 
copyFromSafe method. So we should inherit all the same supported types. I think 
if all the unit tests, functional tests, and QA tests pass that will be a 
sufficient enough vote of confidence. Ideally we would have unit tests for all 
the operators (not just the copiers) which use the complex data types, but 
there is significant amount of work left to be able to effectively unit tests 
with complex data types and adding all the machinery to do so is out of the 
scope of this PR.


---


[jira] [Created] (DRILL-6093) Unneeded columns in Drill logical project

2018-01-16 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-6093:
-

 Summary: Unneeded columns in Drill logical project
 Key: DRILL-6093
 URL: https://issues.apache.org/jira/browse/DRILL-6093
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.12.0, 1.11.0
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai
 Fix For: 1.12.0


Here is an example query with the corresponding logical plan. The project 
contains unnecessary columns L_ORDERKEY, O_ORDERKEY in the projection even when 
it is not required by subsequent operators e.g. DrillJoinRel.

EXPLAIN PLAN without implementation FOR SELECT L.L_QUANTITY FROM 
cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O WHERE 
cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int);

*+--+--+*

*|* *text* *|* *json* *|*

*+--+--+*

*|* DrillScreenRel

  DrillProjectRel(L_QUANTITY=[$1])

    DrillJoinRel(condition=[=($2, $4)], joinType=[inner])

      DrillProjectRel(L_ORDERKEY=[$0], L_QUANTITY=[$1], $f2=[CAST($0):INTEGER])

        DrillScanRel(table=[[cp, tpch/lineitem.parquet]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=classpath:/tpch/lineitem.parquet]], 
selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`L_ORDERKEY`, `L_QUANTITY`]]])

      DrillProjectRel(O_ORDERKEY=[$0], $f1=[CAST($0):INTEGER])

        DrillScanRel(table=[[cp, tpch/orders.parquet]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=classpath:/tpch/orders.parquet]], 
selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`O_ORDERKEY`]]])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6092) Support latest MapR release in format-maprdb storage plugin

2018-01-16 Thread Chunhui Shi (JIRA)
Chunhui Shi created DRILL-6092:
--

 Summary: Support latest MapR release in format-maprdb storage 
plugin
 Key: DRILL-6092
 URL: https://issues.apache.org/jira/browse/DRILL-6092
 Project: Apache Drill
  Issue Type: Bug
 Environment: Latest MapRDB release is 6.0. Apache Drill still has 5.2 
MapRDB libraries to build together with format-maprdb plugin. We should update 
to latest MapR. Simply bump up version in pom.xml is not working. 

Ideally we should allow users of Apache Drill to decide which version of MapR 
platform to pick, and Drill should work with latest major release (6.0 or 6.x)  
AND last major release, (5.2.1 or 5.2 or 5.x)

The same apply to other storage plugins, we should allow an easy way to 
configure which version of underneath storage to connect when build Drill.
Reporter: Chunhui Shi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...

2018-01-16 Thread ppadma
Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r161895458
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,286 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 0) {
+  matcherFcn = new MatcherZero();
+} else if (patternLength == 1) {
+  matcherFcn = new MatcherOne();
+} else if (patternLength == 2) {
+  matcherFcn = new MatcherTwo();
+} else if (patternLength == 3) {
+  matcherFcn = new MatcherThree();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherZero extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
+private MatcherZero() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
   return 1;
 }
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherOne extends MatcherFcn {
+
+private MatcherOne() {
+  super();
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start;
+  final byte firstPattByte  = patternArray[0];
--- End diff --

can we name it firstPatternByte ?


---


[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...

2018-01-16 Thread ppadma
Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r161899357
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,286 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 0) {
+  matcherFcn = new MatcherZero();
+} else if (patternLength == 1) {
+  matcherFcn = new MatcherOne();
+} else if (patternLength == 2) {
+  matcherFcn = new MatcherTwo();
+} else if (patternLength == 3) {
+  matcherFcn = new MatcherThree();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherZero extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
+private MatcherZero() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
   return 1;
 }
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherOne extends MatcherFcn {
+
+private MatcherOne() {
+  super();
--- End diff --

redundant ?


---


[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...

2018-01-16 Thread ppadma
Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r161906070
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,286 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 0) {
+  matcherFcn = new MatcherZero();
+} else if (patternLength == 1) {
+  matcherFcn = new MatcherOne();
+} else if (patternLength == 2) {
+  matcherFcn = new MatcherTwo();
+} else if (patternLength == 3) {
+  matcherFcn = new MatcherThree();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherZero extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
+private MatcherZero() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
   return 1;
 }
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherOne extends MatcherFcn {
+
+private MatcherOne() {
+  super();
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start;
+  final byte firstPattByte  = patternArray[0];
+
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+byte inputByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != inputByte) {
+  continue;
+}
+return 1;
+  }
+  return 0;
+}
+  }
+
+  /** Handles patterns with length two */
+  private final class MatcherTwo extends MatcherFcn {
+
+private MatcherTwo() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start - 1;
+  final byte firstPattByte  = patternArray[0];
+  final byte secondPattByte = patternArray[1];
+
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+final byte firstInByte = drillBuf.getByte(start + idx);
 
-final int txtLength = end - start;
+if (firstPattByte != firstInByte) {
+  continue;
+} else {
+  final byte secondInByte = drillBuf.getByte(start + idx +1);
--- End diff --

space between + and 1


---


[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...

2018-01-16 Thread ppadma
Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r161895158
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,286 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 0) {
+  matcherFcn = new MatcherZero();
+} else if (patternLength == 1) {
+  matcherFcn = new MatcherOne();
+} else if (patternLength == 2) {
+  matcherFcn = new MatcherTwo();
+} else if (patternLength == 3) {
+  matcherFcn = new MatcherThree();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
--- End diff --

 length zero.


---


[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...

2018-01-16 Thread ppadma
Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r161905865
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,286 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 0) {
+  matcherFcn = new MatcherZero();
+} else if (patternLength == 1) {
+  matcherFcn = new MatcherOne();
+} else if (patternLength == 2) {
+  matcherFcn = new MatcherTwo();
+} else if (patternLength == 3) {
+  matcherFcn = new MatcherThree();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherZero extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
+private MatcherZero() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
   return 1;
 }
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherOne extends MatcherFcn {
+
+private MatcherOne() {
+  super();
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start;
+  final byte firstPattByte  = patternArray[0];
+
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+byte inputByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != inputByte) {
+  continue;
+}
+return 1;
+  }
+  return 0;
+}
+  }
+
+  /** Handles patterns with length two */
+  private final class MatcherTwo extends MatcherFcn {
+
+private MatcherTwo() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start - 1;
+  final byte firstPattByte  = patternArray[0];
+  final byte secondPattByte = patternArray[1];
--- End diff --

can you initialize them in the constructor for matcher function ? that way 
you can initialize once and reuse for each match. 


---


[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...

2018-01-16 Thread ppadma
Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r161899883
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,286 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 0) {
+  matcherFcn = new MatcherZero();
+} else if (patternLength == 1) {
+  matcherFcn = new MatcherOne();
+} else if (patternLength == 2) {
+  matcherFcn = new MatcherTwo();
+} else if (patternLength == 3) {
+  matcherFcn = new MatcherThree();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherZero extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
+private MatcherZero() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
   return 1;
 }
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherOne extends MatcherFcn {
+
+private MatcherOne() {
+  super();
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start;
+  final byte firstPattByte  = patternArray[0];
+
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+byte inputByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != inputByte) {
+  continue;
+}
+return 1;
+  }
+  return 0;
+}
+  }
+
+  /** Handles patterns with length two */
+  private final class MatcherTwo extends MatcherFcn {
+
+private MatcherTwo() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start - 1;
+  final byte firstPattByte  = patternArray[0];
+  final byte secondPattByte = patternArray[1];
+
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+final byte firstInByte = drillBuf.getByte(start + idx);
 
-final int txtLength = end - start;
+if (firstPattByte != firstInByte) {
+  continue;
+} else {
+  final byte secondInByte = drillBuf.getByte(start + idx +1);
 
-// no match if input string length is less than pattern length
-if (txtLength < patternLength) {
+  if (secondInByte == secondPattByte) {
+return 1;
+  }
+}
+  }
   return 0;
 }
+  }
 
+  /** Handles patterns with length three */
+  private final class MatcherThree extends 

[GitHub] drill issue #1059: DRILL-5851: Empty table during a join operation with a no...

2018-01-16 Thread HanumathRao
Github user HanumathRao commented on the issue:

https://github.com/apache/drill/pull/1059
  
@vdiravka  Thank you for the review comments.

I have done the needed changes.


---


Re: Odd Error on Login with 1.13-SNAPSHOT

2018-01-16 Thread John Omernik
Thank you, I will try at some point. I am struggling with another issue not
related to Drill right now. Thanks for doing this! The Drill team is
awesome!

John

On Mon, Jan 15, 2018 at 1:27 PM, Sorabh Hamirwasia 
wrote:

> I have opened a PR for review.
> Would be great if you can pull that in and try out, unfortunately just
> adding the new config will not help 
>
>
> Thanks,
> Sorabh
>
>
> 
> From: Sorabh Hamirwasia
> Sent: Sunday, January 14, 2018 9:36 PM
> To: dev@drill.apache.org
> Subject: Re: Odd Error on Login with 1.13-SNAPSHOT
>
>
> I have opened DRILL-6088
> for this issue and will provide a patch by tonight for review. For now to
> unblock you, please add following http configuration in your
> drill-override.conf.
>
>
> drill.exec.http.auth.mechanisms: ["FORM"],
>
>
> I am assuming other user auth related setting to enable PLAIN
> authentication is in place as per [1] or [2].
>
>
> [1]: https://drill.apache.org/docs/using-libpam4j-as-the-pam-
> authenticator/
>
> [2]: https://drill.apache.org/docs/using-jpam-as-the-pam-authenticator/
>
>
> Thanks,
> Sorabh
>
> 
> From: Vitalii Diravka 
> Sent: Friday, January 12, 2018 2:12 PM
> To: dev@drill.apache.org
> Subject: Re: Odd Error on Login with 1.13-SNAPSHOT
>
> Hi John,
>
> Probably this Jira can help you [1].
> You should verify your drill-override.conf file. Do you want to use any
> security mechanism?
>
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
> apache.org_jira_browse_DRILL-2D5425=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=
> gRpEl0WzXE3EMrwj0KFbZXGXRyadOthF2jlYxvhTlQg=
> eWbxbVf7ysp5Kkl8FpO63Em5a65AnSOP5kYU4KIwQTo=
> WfamtfbUdrG9eS2v6VLz0FI80E-2a_YjfrjkuztQzME=
>
> Kind regards
> Vitalii
>
> On Thu, Jan 11, 2018 at 6:52 PM, John Omernik  wrote:
>
> > I am probably missing something minor here, but I am working with Ted
> > Dunning on some PCAP plugin stuff, so I built his 1.13 SNAPSHOT, and
> when I
> > try to login I see
> >
> > {
> >   "errorMessage" : "No configuration setting found for key
> > 'drill.exec.http.auth'"
> > }
> >
> >
> >
> > I am guessing that something was added that I need to fill out in my
> > config? Is there a JIRA or something that can guide me on this?
> >
> >
> > Thanks
> >
>


[GitHub] drill pull request #1058: DRILL-6002: Avoid memory copy from direct buffer t...

2018-01-16 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1058#discussion_r161842736
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/SpillSet.java
 ---
@@ -104,9 +107,17 @@
  * nodes provide insufficient local disk space)
  */
 
+// The buffer size is calculated as LCM of the Hadoop internal 
checksum buffer (9 * checksum length), where
--- End diff --

@paul-rogers Changed `TRANSFER_SIZE` to 72K. It is calculated as LCM 
between Hadoop internal checksum buffer and MapR FS page size to make writes 
aligned on internal buffer boundaries.


---


[GitHub] drill issue #1091: DRILL-6071: Limit batch size for flatten operator

2018-01-16 Thread ppadma
Github user ppadma commented on the issue:

https://github.com/apache/drill/pull/1091
  
@paul-rogers  Paul, can you please review this PR ? 


---


[GitHub] drill issue #1066: DRILL-3993: Changes to support Calcite 1.15

2018-01-16 Thread chunhui-shi
Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/1066
  
+1. Thank you for addressing the comments.


---


[jira] [Created] (DRILL-6091) Continued Validation Error - Replacing Tables

2018-01-16 Thread Brandi Spinn (JIRA)
Brandi Spinn created DRILL-6091:
---

 Summary: Continued Validation Error - Replacing Tables
 Key: DRILL-6091
 URL: https://issues.apache.org/jira/browse/DRILL-6091
 Project: Apache Drill
  Issue Type: Bug
Reporter: Brandi Spinn


We are currently running a project which is utilizing the Drill push to Tableau 
function to be able to work with our data sets, we are already working with 
Tableau regarding our needs and determined that this is our best course of 
action considering how large data sets are - over 2 million rows per day.

 

At the moment we have several visualizations we have published, but we are 
running in an issue each morning where some of them are not updating according 
to our schedules, and when we review the logs we find some the same type of 
"fatal" errors that do not always allow the visualizations to update. We also 
continually run into "drill server down" errors.

 

I have reached out to our account rep through Tableau for possible guidance, 
however, they have indicated that it is not an issue on the Tableau side - 
which we figured.

 

Below is a sample of the error codes we are seeing, please let me know you are 
able to assist or if you need any additional information. Thank you!

 
|_Auto_AOD_Content:288 - create table dfs.tmp.attributes as select distinct 
channels.marketingname as channelmarketingname, channels.streamingname as 
channelstreamingname, channels.channelguid as channelGuid, channels.channelid 
as channelId, categories.category_name as channelcategory, CASE when 
music.channel_guid is not null then 'Music' else null end as genre_Music, CASE 
when news.channel_guid is not null then 'News' else null end as genre_News, 
CASE when sports.channel_guid is not null then 'Sports' else null end as 
genre_Sports,  CASE when talk.channel_guid is not null then 'Talk' else null 
end as genre_Talk, CASE when howard.channel_guid is not null then 'Howard' else 
null end as genre_Howard, categories.channel_name channelName from 
dfs.root.`/SXM/archive/parsed/Channel-parsed-type2/2017-12-14*` channels join 
dfs.root.`/SXM/archive/parsed/category-parsed-type2/2017-12-14*` categories on 
categories.channel_guid=channels.channelguid left join (select channel_guid 
from dfs.root.`/SXM/archive/parsed/category-parsed-type2/2017-12-14*` where 
supercategory_name = 'Music') music on channels.channelguid = 
music.channel_guid left join (select channel_guid from 
dfs.root.`/SXM/archive/parsed/category-parsed-type2/2017-12-14*` where 
supercategory_name = 'News') news on channels.channelguid = news.channel_guid 
left join (select channel_guid from 
dfs.root.`/SXM/archive/parsed/category-parsed-type2/2017-12-14*` where 
supercategory_name = 'Sports') sports on channels.channelguid = 
sports.channel_guid left join (select channel_guid from 
dfs.root.`/SXM/archive/parsed/category-parsed-type2/2017-12-14*` where 
supercategory_name = 'Talk') talk on channels.channelguid = talk.channel_guid 
left join (select channel_guid from 
dfs.root.`/SXM/archive/parsed/category-parsed-type2/2017-12-14*` where 
supercategory_name = 'Howard') howard on channels.channelguid = 
howard.channel_guid_|
|_Auto_AOD_Content:304 - There was a SQL ERROR on DRILL side_|
|_Auto_AOD_Content:305 - VALIDATION ERROR: A table or view with given name 
[attributes] already exists in schema [dfs.tmp]_|

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2018-01-16 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r161796052
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -1303,6 +1305,8 @@ private void checkGroupAndAggrValues(int 
incomingRowIdx) {
   long memDiff = allocator.getAllocatedMemory() - allocatedBeforeHTput;
   if ( memDiff > 0 ) { logger.warn("Leak: HashTable put() OOM left 
behind {} bytes allocated",memDiff); }
 
+  checkForSpillPossibility(currentPartition);
--- End diff --

These checks were needed to avoid infinite loop when there is not enough 
memory for the spill. 
I moved these checks into `spillIfNeeded()` method, so when called 
`doSpill()`, `forceSpill` in `spillIfNeeded()` is true and check should be done.


---


[jira] [Created] (DRILL-6090) While connecting to drill-bits using JDBC Driver through Zookeeper, a lot of "Curator-Framework-0" threads are created if connection to drill-bit is not successful(no dri

2018-01-16 Thread Milind Takawale (JIRA)
Milind Takawale created DRILL-6090:
--

 Summary: While connecting to drill-bits using JDBC Driver through 
Zookeeper, a lot of "Curator-Framework-0" threads are created if connection to 
drill-bit is not successful(no drill-bits are up/reachable)
 Key: DRILL-6090
 URL: https://issues.apache.org/jira/browse/DRILL-6090
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Affects Versions: 1.12.0
 Environment: Centos 65, Java 7, Drill JDBC 1.12.0
Reporter: Milind Takawale
 Fix For: 1.12.0


I am using Drill JDBC driver 1.12.0 to connect to MapR-DB. I am finding the 
available drill-bits using Zookeepers. When drill-bits are not up or not 
reachable, the connection is failed with exception: "Failure in connecting to 
Drill: oadd.org.apache.drill.exec.rpc.RpcException: Failure setting up ZK for 
client", which is expected, but number of threads created by 
ZKClusterCoordinator just keeps on increasing.

Steps to reproduce the issue
 # Setup a connection with a drill-bit using Apache Drill JDBC driver 1.12.0 
through Zookeeper hosts(port 5181)
 # Now stop the drill-bit services or block the drill-bit IPs using iptable 
rules
 # Truncate catalina logs
 # Try to connect to the drill-bit/hit a code path that requires connection to 
drill-bits.
 # Take thread dump using kill -QUIT 
 # grep -c "Curator-Framework-0" catalina.out

Observe that the curator framework thread just keep on accumulating

RCA:
 # ZKClusterCoordinator creates curator threads in the constructor
 # ZKClusterCoordinator is instantiated by DrillClient.connect
 # DrillClient.connect is called in DrillConnectionImpl constructor

Fix:

Call DrillConnectionImpl .cleanup() from all the catch blocks in the 
DrillConnectionImpl  constructor.

 



 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)