[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1045#discussion_r162250517
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/TestAffinityCalculator.java
 ---
@@ -21,18 +21,14 @@
 
 import org.apache.drill.exec.ExecTest;
 import org.apache.drill.exec.proto.CoordinationProtos;
-import org.apache.drill.exec.store.parquet.ParquetGroupScan;
 import org.apache.hadoop.fs.BlockLocation;
 import org.junit.Test;
 
 import com.google.common.collect.ImmutableRangeMap;
 import com.google.common.collect.Range;
 
 public class TestAffinityCalculator extends ExecTest {
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestAffinityCalculator.class);
-
-  String port = "1234";
-  final String path = "path";
+  private final String port = "1234";
--- End diff --

What happened to the blocks of code that were removed? Not used? Duplicate? 
Or, does the unused code suggest that this test is not actually testing what it 
should?


---


[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1045#discussion_r162244374
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/WriterRecordBatch.java
 ---
@@ -118,7 +118,7 @@ public IterOutcome innerNext() {
 } catch(IOException ex) {
   logger.error("Failure during query", ex);
   kill(false);
-  context.fail(ex);
+  context.getExecutorState().fail(ex);
--- End diff --

At some point, you may want to lead a cleanup of Drill's failure reporting 
design. I took a crack a while back. We have multiple ways of reporting errors:

* Throw a UserException explaining the error
* Throw an unchecked exception and and let the fragment executor guess the 
cause.
* Return STOP
* Tell the fragment executor to fail. (A we also required to return STOP?)
* Return OUT_OF_MEMORY status

The proposal is to replace them all with a single solution: throw a 
UserException. Each operator catches other exceptions and translates them to 
UserException.

Java unwinds the stack just fine; no reason for us to write code to do it 
via STOP.

Then, the Fragment Executor calls close() on all operators. No reason to 
try to do this cleanup on STOP. (Even if we do, the lower-level operators won't 
have seen the STOP.)

Since failures are hard to test, and have cause no end of problems, having 
multiple ways to do the same thing is really not that helpful to Drill users.


---


[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1045#discussion_r162229362
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java
 ---
@@ -288,7 +288,7 @@ public static void populateVector(final ValueVector 
vector, final DrillBuf manag
 }
   }
 
-  public static MajorType getMajorTypeFromHiveTypeInfo(final TypeInfo 
typeInfo, final OptionManager options) {
+  public static MajorType getMajorTypeFromHiveTypeInfo(final TypeInfo 
typeInfo, final OptionSet options) {
--- End diff --

Note that DRILL-6049 removes OptionSet in favor of OptionManager. (Thanks 
to the work you and Jyothsna did, we can now use that class in tests.)


---


[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1045#discussion_r162251030
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/OperatorFixture.java ---
@@ -86,37 +117,35 @@
  * Multiple threads of execution.
  * 
  */
-
 public class OperatorFixture extends BaseFixture implements AutoCloseable {
 
+  public OperatorContext operatorContext(PhysicalOperator config) {
+return new MockOperatorContext(context, allocator(), config);
+  }
+
   /**
* Builds an operator fixture based on a set of config options and 
system/session
* options.
*/
-
-  public static class OperatorFixtureBuilder
+  public static class Builder
--- End diff --

Your PR is accidentally undoing a name change from a prior commit. Please 
check if we have any other collisions in this or related files.


---


[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1045#discussion_r162251160
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/OperatorFixture.java ---
@@ -127,20 +156,26 @@ public OperatorFixture build() {
* uses the same code generation mechanism as the full Drill, but
* provide test-specific versions of various other services.
*/
-
-  public static class TestFragmentContext extends BaseFragmentContext {
-
+  public static class MockFragmentContext extends BaseFragmentContext {
--- End diff --

More collisions? Can't image you made this many changes here...


---


[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1045#discussion_r162246538
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/BaseRawBatchBuffer.java
 ---
@@ -184,7 +184,7 @@ public RawFragmentBatch getNext() throws IOException {
   return null;
 }
 
-if (context.isOverMemoryLimit()) {
+if (context.getAllocator().isOverLimit()) {
--- End diff --

This one is fine. We must expose an allocator even for testing.


---


[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1045#discussion_r162251307
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/OperatorFixture.java ---
@@ -175,19 +210,189 @@ public DrillConfig getConfig() {
 }
 
 @Override
-public DrillbitContext getDrillbitContext() {
-  throw new UnsupportedOperationException("Drillbit context not 
available for operator unit tests");
+public ExecutorService getScanDecodeExecutor() {
+  return null;
+}
+
+@Override
+public ExecutorService getScanExecutor() {
+  return null;
+}
+
+@Override
+public ExecutorService getExecutor() {
+  return null;
+}
+
+@Override
+public ExecutorState getExecutorState() {
+  return executorState;
+}
+
+@Override
+public BufferAllocator getNewChildAllocator(String operatorName, int 
operatorId,
+long initialReservation, 
long maximumReservation) {
+  return allocator.newChildAllocator(
+"op:" + operatorId + ":" + operatorName,
+initialReservation,
+maximumReservation);
 }
 
 @Override
-protected CodeCompiler getCompiler() {
+public ExecProtos.FragmentHandle getHandle() {
+  return ExecProtos.FragmentHandle.newBuilder().build();
+}
+
+@Override
+public BufferAllocator getAllocator() {
+  return allocator;
+}
+
+@Override
+public OperatorContext newOperatorContext(PhysicalOperator popConfig) {
+  return mockOperatorContext;
+}
+
+@Override
+public OperatorContext newOperatorContext(PhysicalOperator popConfig, 
OperatorStats stats) {
+  return mockOperatorContext;
+}
+
+@Override
+public SchemaPlus getFullRootSchema() {
+  return null;
+}
+
+@Override
+public String getQueryUserName() {
+  return null;
+}
+
+@Override
+public String getFragIdString() {
+  return null;
+}
+
+@Override
+public CodeCompiler getCompiler() {
return compiler;
 }
 
 @Override
 protected BufferManager getBufferManager() {
   return bufferManager;
 }
+
+@Override
+public void close() {
+  bufferManager.close();
+}
+
+@Override
+public ContextInformation getContextInformation() {
+  return null;
+}
+
+@Override
+public PartitionExplorer getPartitionExplorer() {
+  return null;
+}
+
+@Override
+public ValueHolder getConstantValueHolder(String value, 
TypeProtos.MinorType type, Function holderInitializer) {
+  return null;
+}
+  }
+
+  public static class MockExecutorFragmentContext extends 
MockFragmentContext implements ExecutorFragmentContext {
+
+public MockExecutorFragmentContext(DrillConfig config, OptionManager 
optionManager, BufferAllocator allocator) {
+  super(config, optionManager, allocator);
+}
+
+@Override
+public PhysicalPlanReader getPlanReader() {
+  throw new UnsupportedOperationException();
+}
+
+@Override
+public ClusterCoordinator getClusterCoordinator() {
+  throw new UnsupportedOperationException();
+}
+
+@Override
+public CoordinationProtos.DrillbitEndpoint getForemanEndpoint() {
+  throw new UnsupportedOperationException();
+}
+
+@Override
+public CoordinationProtos.DrillbitEndpoint getEndpoint() {
+  throw new UnsupportedOperationException();
+}
+
+@Override
+public Collection getBits() {
+  throw new UnsupportedOperationException();
+}
+
+@Override
+public OperatorCreatorRegistry getOperatorCreatorRegistry() {
+  return null;
+}
+
+@Override
+public void setBuffers(IncomingBuffers buffers) {
+
+}
+
+@Override
+public Set> getUserConnections() {
+  return null;
+}
+
+@Override
+public QueryProfileStoreContext getProfileStoreContext() {
+  return null;
+}
+
+@Override
+public void waitForSendComplete() {
+  throw new UnsupportedOperationException();
+}
+
+@Override
+public AccountingDataTunnel 
getDataTunnel(CoordinationProtos.DrillbitEndpoint endpoint) {
--- End diff --

We really don't 

[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1045#discussion_r162246947
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/PlanningBase.java 
---
@@ -84,28 +82,17 @@ protected void testSqlPlan(String sqlCommands) throws 
Exception {
 systemOptions.init();
 @SuppressWarnings("resource")
 final UserSession userSession = 
UserSession.Builder.newBuilder().withOptionManager(systemOptions).build();
-final SessionOptionManager sessionOptions = (SessionOptionManager) 
userSession.getOptions();
+final SessionOptionManager sessionOptions = userSession.getOptions();
 final QueryOptionManager queryOptions = new 
QueryOptionManager(sessionOptions);
 final ExecutionControls executionControls = new 
ExecutionControls(queryOptions, DrillbitEndpoint.getDefaultInstance());
 
-new NonStrictExpectations() {
-  {
-dbContext.getMetrics();
-result = new MetricRegistry();
-dbContext.getAllocator();
-result = allocator;
-dbContext.getConfig();
-result = config;
-dbContext.getOptionManager();
-result = systemOptions;
-dbContext.getStoreProvider();
-result = provider;
-dbContext.getClasspathScan();
-result = scanResult;
-dbContext.getLpPersistence();
-result = logicalPlanPersistence;
-  }
-};
+when(dbContext.getMetrics()).thenReturn(new MetricRegistry());
+when(dbContext.getAllocator()).thenReturn(allocator);
+when(dbContext.getConfig()).thenReturn(config);
+when(dbContext.getOptionManager()).thenReturn(systemOptions);
+when(dbContext.getStoreProvider()).thenReturn(provider);
+when(dbContext.getClasspathScan()).thenReturn(scanResult);
+when(dbContext.getLpPersistence()).thenReturn(logicalPlanPersistence);
--- End diff --

Here, we should have a test time drillbit context implementation that 
extends a common interface -- just as we are now doing for the fragment context.

Why? We want classes to have well-defined interfaces, and want to minimize 
cohesion. (Basic OO design rules.) It should be made very clear when some 
planning test needs additional info. And, if it needs that info, then the set 
of info should be varied to ensure all code paths are tested. That can't be 
done if we use one generic set of global state for all tests.

This gets to philosophy also. Our tests should strive to cover all possible 
states. that Can only be done at the unit level. And, it can only be done if we 
drive the tests with a wide variety of inputs.

I we bolkt things together, then tests, we must run a minimum of tests: 
just enough to show that "things work" for one or two cases. We then rely on 
the users for more complex test cases. That is great -- but tends to upset the 
users...


---


[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1045#discussion_r162251468
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/OperatorFixture.java ---
@@ -351,8 +603,110 @@ public OperatorStats getStats() {
 }
   }
 
-  public OperatorContext operatorContext(PhysicalOperator config) {
-return new TestOperatorContext(context, allocator(), config, stats);
+  public static class MockPhysicalOperator implements PhysicalOperator {
--- End diff --

Seems overkill. Most times, a test that needs an operator will provide one. 
The dummy one that was in this code (or maybe in my latest PR) handles those 
cases where we don't even have an operator (record batch) so any operator 
(definition) will do.


---


[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1045#discussion_r162229669
  
--- Diff: 
contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcBatchCreator.java
 ---
@@ -33,10 +32,10 @@
 public class JdbcBatchCreator implements BatchCreator {
   @Override
   public ScanBatch getBatch(FragmentContext context, JdbcSubScan config,
-  List children) throws ExecutionSetupException {
+List children) throws 
ExecutionSetupException {
 Preconditions.checkArgument(children.isEmpty());
 JdbcStoragePlugin plugin = config.getPlugin();
-RecordReader reader = new JdbcRecordReader(context, 
plugin.getSource(), config.getSql(), plugin.getName());
+RecordReader reader = new JdbcRecordReader(plugin.getSource(), 
config.getSql(), plugin.getName());
--- End diff --

Now I can't find the file either. Consider this issue resolved.


---


[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1045#discussion_r162246792
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/PlanningBase.java 
---
@@ -58,23 +55,24 @@
 import com.google.common.collect.ImmutableList;
 import com.google.common.io.Resources;
 
-public class PlanningBase extends ExecTest{
-  //private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(PlanningBase.class);
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.when;
--- End diff --

Suggestion. Do not depend on mocking in test. Design code so it works well 
in tests without mocks. Resort to mocks only when absolutely necessary. That 
is, we should be trying to do LESS mocking, not MORE.

This way, the same test code can be used in tools, ad-hoc tests and so on 
without the JUnit machinery.

This is a philosophical discussion, and one that should have input from 
more of the team.


---


[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1045#discussion_r162246482
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/BaseRawBatchBuffer.java
 ---
@@ -172,7 +172,7 @@ public RawFragmentBatch getNext() throws IOException {
 } catch (final InterruptedException e) {
 
   // We expect that the interrupt means the fragment is canceled or 
failed, so we should kill this buffer
-  if (!context.shouldContinue()) {
+  if (!context.getExecutorState().shouldContinue()) {
--- End diff --

Let's think about this from a testing perspective. The context context can 
be implemented in a way that allows testing. That is what the use of an 
interface allows. Tests implement the context one way, real code another way, 
but the interface is the same.

Here, we expose the executor state. Is this also defined as an interface?

If not, we are letting implementation leak out, and we won't be able to 
create a test-only version. It was better to leave these methods on the context 
itself so that their implementation is hidden and thus we can create test 
versions.


---


[GitHub] drill issue #1095: DRILL-6085: Fixed spontaneous vm exits on Travis.

2018-01-17 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1095
  
Note the test failure on Travis is the infamous Sort operator test failure, 
which is unrelated to this change:

```
Failed tests: 
  
TestSortImpl.testLargeBatch:498->runJumboBatchTest:471->runLargeSortTest:451 
Value of 1:0 expected:<0> but was:<1>
```


---


[GitHub] drill pull request #1091: DRILL-6071: Limit batch size for flatten operator

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1091#discussion_r162226190
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java
 ---
@@ -380,14 +394,19 @@ private void measureColumn(ValueVector v, String 
prefix) {
 // vectors do consume space, so visit columns recursively.
 
 switch (v.getField().getType().getMinorType()) {
-case MAP:
-  expandMap((AbstractMapVector) v, prefix + v.getField().getName() + 
".");
-  break;
-case LIST:
-  expandList((RepeatedListVector) v, prefix + v.getField().getName() + 
".");
-  break;
-default:
-  v.collectLedgers(ledgers);
+  case MAP:
+expandMap((AbstractMapVector) v, prefix + v.getField().getName() + 
".");
+break;
+  case LIST:
--- End diff --

Here, check mode. If mode is REPEATED, this is a RepeatedListVector and 
should go down one path. Otherwise, it is a ListVector (possible list of 
unions) and should go down another path.


---


[GitHub] drill pull request #1091: DRILL-6071: Limit batch size for flatten operator

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1091#discussion_r162225383
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractBase.java
 ---
@@ -29,9 +28,12 @@
 
   public static long INIT_ALLOCATION = 1_000_000L;
   public static long MAX_ALLOCATION = 10_000_000_000L;
+  // Default output batch size, 512MB
+  public static long OUTPUT_BATCH_SIZE = 512 * 1024 * 1024L;
--- End diff --

Too large. The sort & hash agg operators often receive just 20-40 MB on a 
large cluster. (That is, itself, an issue, but one that has proven very 
difficult to resolve.) So, the output batch size must be no larger than 1/3 
this size (for sort). Probably some team discussion is required to agree on a 
good number, and on the work needed to ensure that sort, hash agg and hash join 
are given sufficient memory for the selected batch size.


---


[GitHub] drill pull request #1091: DRILL-6071: Limit batch size for flatten operator

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1091#discussion_r162225832
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java
 ---
@@ -94,8 +98,54 @@ private void clear() {
 }
   }
 
+  private class FlattenMemoryManager {
+private final int outputRowCount;
+private static final int OFFSET_VECTOR_WIDTH = 4;
+private static final int WORST_CASE_FRAGMENTATION_FACTOR = 2;
+private static final int MAX_NUM_ROWS = 64 * 1024;
+private static final int MIN_NUM_ROWS = 1;
+
+private FlattenMemoryManager(RecordBatch incoming, long 
outputBatchSize, SchemaPath flattenColumn) {
+  // Get sizing information for the batch.
+  RecordBatchSizer sizer = new RecordBatchSizer(incoming);
+
+  final TypedFieldId typedFieldId = 
incoming.getValueVectorId(flattenColumn);
+  final MaterializedField field = 
incoming.getSchema().getColumn(typedFieldId.getFieldIds()[0]);
+
+  // Get column size of flatten column.
+  RecordBatchSizer.ColumnSize columnSize = 
sizer.getColumn(incoming.getValueAccessorById(field.getValueClass(),
+  typedFieldId.getFieldIds()).getValueVector(), field.getName());
+
+  // Average rowWidth of flatten column
+  final int avgRowWidthFlattenColumn = 
RecordBatchSizer.safeDivide(columnSize.dataSize, incoming.getRecordCount());
+
+  // Average rowWidth excluding the flatten column.
+  final int avgRowWidthWithOutFlattenColumn = sizer.netRowWidth() - 
avgRowWidthFlattenColumn;
+
+  // Average rowWidth of single element in the flatten list.
+  // subtract the offset vector size from column data size.
+  final int avgRowWidthSingleFlattenEntry =
+  RecordBatchSizer.safeDivide(columnSize.dataSize - 
(OFFSET_VECTOR_WIDTH * columnSize.valueCount), columnSize.elementCount);
+
+  // Average rowWidth of outgoing batch.
+  final int avgOutgoingRowWidth = avgRowWidthWithOutFlattenColumn + 
avgRowWidthSingleFlattenEntry;
+
+  // Number of rows in outgoing batch
+  outputRowCount = Math.max(MIN_NUM_ROWS, Math.min(MAX_NUM_ROWS,
+  
RecordBatchSizer.safeDivide((outputBatchSize/WORST_CASE_FRAGMENTATION_FACTOR), 
avgOutgoingRowWidth)));
+}
--- End diff --

Would be great to log this info in debug mode to aid in tuning. Doing this 
in the sort proved highly valuable.


---


[GitHub] drill pull request #1091: DRILL-6071: Limit batch size for flatten operator

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1091#discussion_r162225507
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractBase.java
 ---
@@ -29,9 +28,12 @@
 
   public static long INIT_ALLOCATION = 1_000_000L;
   public static long MAX_ALLOCATION = 10_000_000_000L;
+  // Default output batch size, 512MB
+  public static long OUTPUT_BATCH_SIZE = 512 * 1024 * 1024L;
--- End diff --

Also, if the size is a config parameter, shouldn't the default be set in 
the drill-module.conf file, not in a Java constant?


---


[GitHub] drill pull request #1091: DRILL-6071: Limit batch size for flatten operator

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1091#discussion_r162226341
  
--- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
@@ -312,12 +312,13 @@ drill.exec: {
   memory: {
 operator: {
   max: 200,
-  initial: 1000
+  initial: 1000,
+  output_batch_size : 536870912
--- End diff --

See note above.

BTW: If you add a new memory-related param, you can use the methods 
specifically designed for memory. Then you can say things such as 512K or 2MB. 
The sort code does this. See also the HOCON documentation.


---


[GitHub] drill pull request #1091: DRILL-6071: Limit batch size for flatten operator

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1091#discussion_r162225888
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenTemplate.java
 ---
@@ -38,23 +38,20 @@
 public abstract class FlattenTemplate implements Flattener {
   private static final Logger logger = 
LoggerFactory.getLogger(FlattenTemplate.class);
 
-  private static final int OUTPUT_BATCH_SIZE = 4*1024;
-  private static final int OUTPUT_MEMORY_LIMIT = 512 * 1024 * 1024;
+  private static final int OUTPUT_ROW_COUNT = 64*1024;
--- End diff --

`ValueVector.MAX_ROW_COUNT`


---


[GitHub] drill pull request #1091: DRILL-6071: Limit batch size for flatten operator

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1091#discussion_r162225575
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java
 ---
@@ -94,8 +98,54 @@ private void clear() {
 }
   }
 
+  private class FlattenMemoryManager {
+private final int outputRowCount;
+private static final int OFFSET_VECTOR_WIDTH = 4;
+private static final int WORST_CASE_FRAGMENTATION_FACTOR = 2;
+private static final int MAX_NUM_ROWS = 64 * 1024;
--- End diff --

`ValueVector.MAX_ROW_COUNT`


---


[GitHub] drill issue #1090: DRILL-6080: Sort incorrectly limits batch size to 65535 r...

2018-01-17 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1090
  
Pushed a new commit that fixes an indexing error. Also removes the "unsafe" 
methods in favor of the "set" methods in DrillBuf as Vlad requested.

@vrozov, please take another look. 


---


[GitHub] drill issue #1095: DRILL-6085: Fixed spontaneous vm exits on Travis.

2018-01-17 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1095
  
@arina-ielchiieva 


---


[GitHub] drill pull request #1095: DRILL-6085: Fixed spontaneous vm exits on Travis.

2018-01-17 Thread ilooner
GitHub user ilooner opened a pull request:

https://github.com/apache/drill/pull/1095

DRILL-6085: Fixed spontaneous vm exits on Travis.

This error occurs sporadically on Travis.

```
Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.17:test 
(default-test) on project drill-java-exec: Execution 
default-test of goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test 
failed: The forked VM terminated without properly saying goodbye. VM crash or 
System.exit called?
```

I believe the issue is caused by the fact that our travis containers only 
have 4gb of memory and our test vms are given up to 4 gb of direct memory and 
heap memory (for a total of up to 8gb). Because we are promising more memory 
than we have it seems we occasionally hit the limit of our 4gb vm. I have added 
the ```sudo: "required"``` option to travis.yml which increases the amount of 
memory in the container to 7gb.

The vm exits occurred about 30% of the time lately before this change for 
me. With this change I ran 5 consecutive builds without a vm exit.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner/drill DRILL-6085

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1095.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1095


commit 4b2c4f0d1a3175e6a8cffa5be42cdf51632ae1c2
Author: Timothy Farkas 
Date:   2018-01-16T20:11:25Z

DRILL-6085: Fixed spontaneous vm exits on Travis.




---


[GitHub] drill pull request #1057: DRILL-5993 Add Generic Copiers With Append Methods

2018-01-17 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r162196004
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/AbstractCopier.java
 ---
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.svremover;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.vector.AllocationHelper;
+import org.apache.drill.exec.vector.ValueVector;
+
+public abstract class AbstractCopier implements Copier {
+  protected ValueVector[] vvOut;
+  protected VectorContainer outgoing;
+
+  @Override
+  public void setup(RecordBatch incoming, VectorContainer outgoing) throws 
SchemaChangeException {
--- End diff --

It is not used by the AbstractCopier itself but it is used by the 
AbstractSV2Copier and AbstractSV4Copier which override the method.


---


[GitHub] drill pull request #1057: DRILL-5993 Add Generic Copiers With Append Methods

2018-01-17 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r162193983
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/AbstractCopier.java
 ---
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.svremover;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.vector.AllocationHelper;
+import org.apache.drill.exec.vector.ValueVector;
+
+public abstract class AbstractCopier implements Copier {
+  protected ValueVector[] vvOut;
+  protected VectorContainer outgoing;
+
+  @Override
+  public void setup(RecordBatch incoming, VectorContainer outgoing) throws 
SchemaChangeException {
--- End diff --

The "incoming" parameter is not used anywhere ...


---


[GitHub] drill issue #1066: DRILL-3993: Changes to support Calcite 1.15

2018-01-17 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/1066
  
Other than one comment above, rest LGTM.  +1


---


[GitHub] drill issue #1066: DRILL-3993: Changes to support Calcite 1.15

2018-01-17 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/1066
  
@vvysotskyi could you pls separate out the HashAggregate OOM related change 
from this PR and file a separate JIRA for it since it is unrelated to the 
calcite rebase per-se.  Lumping them together causes confusion.   Thanks. 


---


[GitHub] drill pull request #1057: DRILL-5993 Add Generic Copiers With Append Methods

2018-01-17 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r162175059
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/AbstractSV4Copier.java
 ---
@@ -17,32 +17,30 @@
  */
 package org.apache.drill.exec.physical.impl.svremover;
 
-import javax.inject.Named;
-
 import org.apache.drill.common.types.TypeProtos.MajorType;
 import org.apache.drill.common.types.Types;
 import org.apache.drill.exec.exception.SchemaChangeException;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 import org.apache.drill.exec.vector.AllocationHelper;
+import org.apache.drill.exec.vector.ValueVector;
 
-public abstract class CopierTemplate4 implements Copier{
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(CopierTemplate4.class);
+public abstract class AbstractSV4Copier implements Copier {
--- End diff --

Done


---


[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2018-01-17 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r162171143
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/DrillRelBuilder.java 
---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner;
+
+import org.apache.calcite.plan.Context;
+import org.apache.calcite.plan.Contexts;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptSchema;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.RelFactories;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.calcite.util.Util;
+
+public class DrillRelBuilder extends RelBuilder {
+  private final RelFactories.FilterFactory filterFactory;
+
+  protected DrillRelBuilder(Context context, RelOptCluster cluster, 
RelOptSchema relOptSchema) {
+super(context, cluster, relOptSchema);
+this.filterFactory =
+Util.first(context.unwrap(RelFactories.FilterFactory.class),
+RelFactories.DEFAULT_FILTER_FACTORY);
+  }
+
+  /**
+   * Original method {@link RelBuilder#empty} returns empty values rel.
+   * In the order to preserve dara row types, filter with false predicate 
is created.
+   */
+  @Override
+  public RelBuilder empty() {
--- End diff --

Ok, thanks for clarifying.  The schema-on-read does make it difficult to 
avoid overriding the empty() behavior.


---


[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2018-01-17 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r162157833
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/DrillRelBuilder.java 
---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner;
+
+import org.apache.calcite.plan.Context;
+import org.apache.calcite.plan.Contexts;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptSchema;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.RelFactories;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.calcite.util.Util;
+
+public class DrillRelBuilder extends RelBuilder {
+  private final RelFactories.FilterFactory filterFactory;
+
+  protected DrillRelBuilder(Context context, RelOptCluster cluster, 
RelOptSchema relOptSchema) {
+super(context, cluster, relOptSchema);
+this.filterFactory =
+Util.first(context.unwrap(RelFactories.FilterFactory.class),
+RelFactories.DEFAULT_FILTER_FACTORY);
+  }
+
+  /**
+   * Original method {@link RelBuilder#empty} returns empty values rel.
+   * In the order to preserve dara row types, filter with false predicate 
is created.
--- End diff --

Thanks, fixed


---


[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2018-01-17 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r162161040
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/DrillRelBuilder.java 
---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner;
+
+import org.apache.calcite.plan.Context;
+import org.apache.calcite.plan.Contexts;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptSchema;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.RelFactories;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.calcite.util.Util;
+
+public class DrillRelBuilder extends RelBuilder {
+  private final RelFactories.FilterFactory filterFactory;
+
+  protected DrillRelBuilder(Context context, RelOptCluster cluster, 
RelOptSchema relOptSchema) {
+super(context, cluster, relOptSchema);
+this.filterFactory =
+Util.first(context.unwrap(RelFactories.FilterFactory.class),
+RelFactories.DEFAULT_FILTER_FACTORY);
+  }
+
+  /**
+   * Original method {@link RelBuilder#empty} returns empty values rel.
+   * In the order to preserve dara row types, filter with false predicate 
is created.
+   */
+  @Override
+  public RelBuilder empty() {
--- End diff --

In some cases, RowType of the input may not be known at the moment when 
this method is called (some of the columns or all columns were not dynamically 
discovered yet), therefore I had to make changes directly in Calcite to allow 
using of custom RelBuilder and make changes in Drill to pass custom RelBuilder 
into them. For more details please see [1] and [2].

[1] 
https://lists.apache.org/list.html?d...@calcite.apache.org:lte=1y:Make%20RelBuilder.filter%28%29%20configurable
[2] https://issues.apache.org/jira/browse/CALCITE-2043



---


[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2018-01-17 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r157854723
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/DrillRelBuilder.java 
---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner;
+
+import org.apache.calcite.plan.Context;
+import org.apache.calcite.plan.Contexts;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptSchema;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.RelFactories;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.calcite.util.Util;
+
+public class DrillRelBuilder extends RelBuilder {
+  private final RelFactories.FilterFactory filterFactory;
+
+  protected DrillRelBuilder(Context context, RelOptCluster cluster, 
RelOptSchema relOptSchema) {
+super(context, cluster, relOptSchema);
+this.filterFactory =
+Util.first(context.unwrap(RelFactories.FilterFactory.class),
+RelFactories.DEFAULT_FILTER_FACTORY);
+  }
+
+  /**
+   * Original method {@link RelBuilder#empty} returns empty values rel.
+   * In the order to preserve dara row types, filter with false predicate 
is created.
+   */
+  @Override
+  public RelBuilder empty() {
--- End diff --

Is a separate DrillRelBuilder absolutely needed ?  Should the original 
behavior of empty() in Calcite be changed to preserve the RowType of the input 
? The fewer things we extend in Drill the better. 


---


[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2018-01-17 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r157853167
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/DrillRelBuilder.java 
---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner;
+
+import org.apache.calcite.plan.Context;
+import org.apache.calcite.plan.Contexts;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptSchema;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.RelFactories;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.calcite.util.Util;
+
+public class DrillRelBuilder extends RelBuilder {
+  private final RelFactories.FilterFactory filterFactory;
+
+  protected DrillRelBuilder(Context context, RelOptCluster cluster, 
RelOptSchema relOptSchema) {
+super(context, cluster, relOptSchema);
+this.filterFactory =
+Util.first(context.unwrap(RelFactories.FilterFactory.class),
+RelFactories.DEFAULT_FILTER_FACTORY);
+  }
+
+  /**
+   * Original method {@link RelBuilder#empty} returns empty values rel.
+   * In the order to preserve dara row types, filter with false predicate 
is created.
--- End diff --

'data' spelling


---


[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...

2018-01-17 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r162124359
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,286 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 0) {
+  matcherFcn = new MatcherZero();
+} else if (patternLength == 1) {
+  matcherFcn = new MatcherOne();
+} else if (patternLength == 2) {
+  matcherFcn = new MatcherTwo();
+} else if (patternLength == 3) {
+  matcherFcn = new MatcherThree();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherZero extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
+private MatcherZero() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
   return 1;
 }
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherOne extends MatcherFcn {
+
+private MatcherOne() {
+  super();
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start;
+  final byte firstPattByte  = patternArray[0];
+
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+byte inputByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != inputByte) {
+  continue;
+}
+return 1;
+  }
+  return 0;
+}
+  }
+
+  /** Handles patterns with length two */
+  private final class MatcherTwo extends MatcherFcn {
+
+private MatcherTwo() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start - 1;
+  final byte firstPattByte  = patternArray[0];
+  final byte secondPattByte = patternArray[1];
+
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+final byte firstInByte = drillBuf.getByte(start + idx);
 
-final int txtLength = end - start;
+if (firstPattByte != firstInByte) {
+  continue;
+} else {
+  final byte secondInByte = drillBuf.getByte(start + idx +1);
 
-// no match if input string length is less than pattern length
-if (txtLength < patternLength) {
+  if (secondInByte == secondPattByte) {
+return 1;
+  }
+}
+  }
   return 0;
 }
+  }
 
+  /** Handles patterns with length three */
+  private final class MatcherThree extends 

[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...

2018-01-17 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r162120115
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,286 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 0) {
+  matcherFcn = new MatcherZero();
+} else if (patternLength == 1) {
+  matcherFcn = new MatcherOne();
+} else if (patternLength == 2) {
+  matcherFcn = new MatcherTwo();
+} else if (patternLength == 3) {
+  matcherFcn = new MatcherThree();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherZero extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
+private MatcherZero() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
   return 1;
 }
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherOne extends MatcherFcn {
+
+private MatcherOne() {
+  super();
--- End diff --

removed.


---


[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...

2018-01-17 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r162123690
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,286 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 0) {
+  matcherFcn = new MatcherZero();
+} else if (patternLength == 1) {
+  matcherFcn = new MatcherOne();
+} else if (patternLength == 2) {
+  matcherFcn = new MatcherTwo();
+} else if (patternLength == 3) {
+  matcherFcn = new MatcherThree();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherZero extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
+private MatcherZero() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
   return 1;
 }
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherOne extends MatcherFcn {
+
+private MatcherOne() {
+  super();
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start;
+  final byte firstPattByte  = patternArray[0];
+
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+byte inputByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != inputByte) {
+  continue;
+}
+return 1;
+  }
+  return 0;
+}
+  }
+
+  /** Handles patterns with length two */
+  private final class MatcherTwo extends MatcherFcn {
+
+private MatcherTwo() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start - 1;
+  final byte firstPattByte  = patternArray[0];
+  final byte secondPattByte = patternArray[1];
--- End diff --

Good idea; done this for all the matchers.


---


[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...

2018-01-17 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r162119757
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,286 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 0) {
+  matcherFcn = new MatcherZero();
+} else if (patternLength == 1) {
+  matcherFcn = new MatcherOne();
+} else if (patternLength == 2) {
+  matcherFcn = new MatcherTwo();
+} else if (patternLength == 3) {
+  matcherFcn = new MatcherThree();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
--- End diff --

cut-and-paste issue :-)


---


[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...

2018-01-17 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r162121115
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,286 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 0) {
+  matcherFcn = new MatcherZero();
+} else if (patternLength == 1) {
+  matcherFcn = new MatcherOne();
+} else if (patternLength == 2) {
+  matcherFcn = new MatcherTwo();
+} else if (patternLength == 3) {
+  matcherFcn = new MatcherThree();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherZero extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
+private MatcherZero() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
   return 1;
 }
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherOne extends MatcherFcn {
+
+private MatcherOne() {
+  super();
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start;
+  final byte firstPattByte  = patternArray[0];
--- End diff --

renamed all such variables to XXXPatternBytes


---


[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...

2018-01-17 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r162124388
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,286 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 0) {
+  matcherFcn = new MatcherZero();
+} else if (patternLength == 1) {
+  matcherFcn = new MatcherOne();
+} else if (patternLength == 2) {
+  matcherFcn = new MatcherTwo();
+} else if (patternLength == 3) {
+  matcherFcn = new MatcherThree();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherZero extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
+private MatcherZero() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
   return 1;
 }
+  }
+
+  /** Handles patterns with length one */
+  private final class MatcherOne extends MatcherFcn {
+
+private MatcherOne() {
+  super();
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start;
+  final byte firstPattByte  = patternArray[0];
+
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+byte inputByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != inputByte) {
+  continue;
+}
+return 1;
+  }
+  return 0;
+}
+  }
+
+  /** Handles patterns with length two */
+  private final class MatcherTwo extends MatcherFcn {
+
+private MatcherTwo() {
+}
+
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start - 1;
+  final byte firstPattByte  = patternArray[0];
+  final byte secondPattByte = patternArray[1];
+
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+final byte firstInByte = drillBuf.getByte(start + idx);
 
-final int txtLength = end - start;
+if (firstPattByte != firstInByte) {
+  continue;
+} else {
+  final byte secondInByte = drillBuf.getByte(start + idx +1);
 
-// no match if input string length is less than pattern length
-if (txtLength < patternLength) {
+  if (secondInByte == secondPattByte) {
+return 1;
+  }
+}
+  }
   return 0;
 }
+  }
 
+  /** Handles patterns with length three */
+  private final class MatcherThree extends 

[GitHub] drill issue #1093: DRILL-6093 : Account for simple columns in project cpu co...

2018-01-17 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/1093
  
+1 .  It would be good to add a unit test using the example in the JIRA. 


---


[GitHub] drill pull request #1094: DRILL-6090: While connecting to drill-bits using J...

2018-01-17 Thread milindt
GitHub user milindt opened a pull request:

https://github.com/apache/drill/pull/1094

DRILL-6090: While connecting to drill-bits using JDBC Driver through …

…Zookeeper, a lot of "Curator-Framework-0" threads are created if 
connection to drill-bit is not successful(no drill-bits are up/reachable)

I am using Drill JDBC driver 1.12.0 to connect to MapR-DB. I am finding the 
available drill-bits using Zookeepers. When drill-bits are not up or not 
reachable, the connection is failed with exception: "Failure in connecting to 
Drill: oadd.org.apache.drill.exec.rpc.RpcException: Failure setting up ZK for 
client", which is expected, but number of threads created by 
ZKClusterCoordinator just keeps on increasing.

Steps to reproduce the issue

Setup a connection with a drill-bit using Apache Drill JDBC driver 1.12.0 
through Zookeeper hosts(port 5181)
Now stop the drill-bit services or block the drill-bit IPs using iptable 
rules
Truncate catalina logs
Try to connect to the drill-bit/hit a code path that requires connection to 
drill-bits.
Take thread dump using kill -QUIT 
grep -c "Curator-Framework-0" catalina.out
Observe that the curator framework thread just keep on accumulating

RCA:

ZKClusterCoordinator creates curator threads in the constructor
ZKClusterCoordinator is instantiated by DrillClient.connect
DrillClient.connect is called in DrillConnectionImpl constructor

Fix:

Call DrillConnectionImpl .cleanup() from all the catch blocks in the 
DrillConnectionImpl  constructor.

[JIRA link](https://issues.apache.org/jira/browse/DRILL-6090)

If this fix is accepted, will there be a release from this branch(1.12.0)? 
A release with minor version may be.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/milindt/drill 1.12.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1094.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1094






---