[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1045#discussion_r162250517 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/TestAffinityCalculator.java --- @@ -21,18 +21,14 @@ import org.apache.drill.exec.ExecTest; import org.apache.drill.exec.proto.CoordinationProtos; -import org.apache.drill.exec.store.parquet.ParquetGroupScan; import org.apache.hadoop.fs.BlockLocation; import org.junit.Test; import com.google.common.collect.ImmutableRangeMap; import com.google.common.collect.Range; public class TestAffinityCalculator extends ExecTest { - static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestAffinityCalculator.class); - - String port = "1234"; - final String path = "path"; + private final String port = "1234"; --- End diff -- What happened to the blocks of code that were removed? Not used? Duplicate? Or, does the unused code suggest that this test is not actually testing what it should? ---
[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1045#discussion_r162244374 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/WriterRecordBatch.java --- @@ -118,7 +118,7 @@ public IterOutcome innerNext() { } catch(IOException ex) { logger.error("Failure during query", ex); kill(false); - context.fail(ex); + context.getExecutorState().fail(ex); --- End diff -- At some point, you may want to lead a cleanup of Drill's failure reporting design. I took a crack a while back. We have multiple ways of reporting errors: * Throw a UserException explaining the error * Throw an unchecked exception and and let the fragment executor guess the cause. * Return STOP * Tell the fragment executor to fail. (A we also required to return STOP?) * Return OUT_OF_MEMORY status The proposal is to replace them all with a single solution: throw a UserException. Each operator catches other exceptions and translates them to UserException. Java unwinds the stack just fine; no reason for us to write code to do it via STOP. Then, the Fragment Executor calls close() on all operators. No reason to try to do this cleanup on STOP. (Even if we do, the lower-level operators won't have seen the STOP.) Since failures are hard to test, and have cause no end of problems, having multiple ways to do the same thing is really not that helpful to Drill users. ---
[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1045#discussion_r162229362 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java --- @@ -288,7 +288,7 @@ public static void populateVector(final ValueVector vector, final DrillBuf manag } } - public static MajorType getMajorTypeFromHiveTypeInfo(final TypeInfo typeInfo, final OptionManager options) { + public static MajorType getMajorTypeFromHiveTypeInfo(final TypeInfo typeInfo, final OptionSet options) { --- End diff -- Note that DRILL-6049 removes OptionSet in favor of OptionManager. (Thanks to the work you and Jyothsna did, we can now use that class in tests.) ---
[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1045#discussion_r162251030 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/OperatorFixture.java --- @@ -86,37 +117,35 @@ * Multiple threads of execution. * */ - public class OperatorFixture extends BaseFixture implements AutoCloseable { + public OperatorContext operatorContext(PhysicalOperator config) { +return new MockOperatorContext(context, allocator(), config); + } + /** * Builds an operator fixture based on a set of config options and system/session * options. */ - - public static class OperatorFixtureBuilder + public static class Builder --- End diff -- Your PR is accidentally undoing a name change from a prior commit. Please check if we have any other collisions in this or related files. ---
[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1045#discussion_r162251160 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/OperatorFixture.java --- @@ -127,20 +156,26 @@ public OperatorFixture build() { * uses the same code generation mechanism as the full Drill, but * provide test-specific versions of various other services. */ - - public static class TestFragmentContext extends BaseFragmentContext { - + public static class MockFragmentContext extends BaseFragmentContext { --- End diff -- More collisions? Can't image you made this many changes here... ---
[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1045#discussion_r162246538 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/BaseRawBatchBuffer.java --- @@ -184,7 +184,7 @@ public RawFragmentBatch getNext() throws IOException { return null; } -if (context.isOverMemoryLimit()) { +if (context.getAllocator().isOverLimit()) { --- End diff -- This one is fine. We must expose an allocator even for testing. ---
[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1045#discussion_r162251307 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/OperatorFixture.java --- @@ -175,19 +210,189 @@ public DrillConfig getConfig() { } @Override -public DrillbitContext getDrillbitContext() { - throw new UnsupportedOperationException("Drillbit context not available for operator unit tests"); +public ExecutorService getScanDecodeExecutor() { + return null; +} + +@Override +public ExecutorService getScanExecutor() { + return null; +} + +@Override +public ExecutorService getExecutor() { + return null; +} + +@Override +public ExecutorState getExecutorState() { + return executorState; +} + +@Override +public BufferAllocator getNewChildAllocator(String operatorName, int operatorId, +long initialReservation, long maximumReservation) { + return allocator.newChildAllocator( +"op:" + operatorId + ":" + operatorName, +initialReservation, +maximumReservation); } @Override -protected CodeCompiler getCompiler() { +public ExecProtos.FragmentHandle getHandle() { + return ExecProtos.FragmentHandle.newBuilder().build(); +} + +@Override +public BufferAllocator getAllocator() { + return allocator; +} + +@Override +public OperatorContext newOperatorContext(PhysicalOperator popConfig) { + return mockOperatorContext; +} + +@Override +public OperatorContext newOperatorContext(PhysicalOperator popConfig, OperatorStats stats) { + return mockOperatorContext; +} + +@Override +public SchemaPlus getFullRootSchema() { + return null; +} + +@Override +public String getQueryUserName() { + return null; +} + +@Override +public String getFragIdString() { + return null; +} + +@Override +public CodeCompiler getCompiler() { return compiler; } @Override protected BufferManager getBufferManager() { return bufferManager; } + +@Override +public void close() { + bufferManager.close(); +} + +@Override +public ContextInformation getContextInformation() { + return null; +} + +@Override +public PartitionExplorer getPartitionExplorer() { + return null; +} + +@Override +public ValueHolder getConstantValueHolder(String value, TypeProtos.MinorType type, FunctionholderInitializer) { + return null; +} + } + + public static class MockExecutorFragmentContext extends MockFragmentContext implements ExecutorFragmentContext { + +public MockExecutorFragmentContext(DrillConfig config, OptionManager optionManager, BufferAllocator allocator) { + super(config, optionManager, allocator); +} + +@Override +public PhysicalPlanReader getPlanReader() { + throw new UnsupportedOperationException(); +} + +@Override +public ClusterCoordinator getClusterCoordinator() { + throw new UnsupportedOperationException(); +} + +@Override +public CoordinationProtos.DrillbitEndpoint getForemanEndpoint() { + throw new UnsupportedOperationException(); +} + +@Override +public CoordinationProtos.DrillbitEndpoint getEndpoint() { + throw new UnsupportedOperationException(); +} + +@Override +public Collection getBits() { + throw new UnsupportedOperationException(); +} + +@Override +public OperatorCreatorRegistry getOperatorCreatorRegistry() { + return null; +} + +@Override +public void setBuffers(IncomingBuffers buffers) { + +} + +@Override +public Set > getUserConnections() { + return null; +} + +@Override +public QueryProfileStoreContext getProfileStoreContext() { + return null; +} + +@Override +public void waitForSendComplete() { + throw new UnsupportedOperationException(); +} + +@Override +public AccountingDataTunnel getDataTunnel(CoordinationProtos.DrillbitEndpoint endpoint) { --- End diff -- We really don't
[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1045#discussion_r162246947 --- Diff: exec/java-exec/src/test/java/org/apache/drill/PlanningBase.java --- @@ -84,28 +82,17 @@ protected void testSqlPlan(String sqlCommands) throws Exception { systemOptions.init(); @SuppressWarnings("resource") final UserSession userSession = UserSession.Builder.newBuilder().withOptionManager(systemOptions).build(); -final SessionOptionManager sessionOptions = (SessionOptionManager) userSession.getOptions(); +final SessionOptionManager sessionOptions = userSession.getOptions(); final QueryOptionManager queryOptions = new QueryOptionManager(sessionOptions); final ExecutionControls executionControls = new ExecutionControls(queryOptions, DrillbitEndpoint.getDefaultInstance()); -new NonStrictExpectations() { - { -dbContext.getMetrics(); -result = new MetricRegistry(); -dbContext.getAllocator(); -result = allocator; -dbContext.getConfig(); -result = config; -dbContext.getOptionManager(); -result = systemOptions; -dbContext.getStoreProvider(); -result = provider; -dbContext.getClasspathScan(); -result = scanResult; -dbContext.getLpPersistence(); -result = logicalPlanPersistence; - } -}; +when(dbContext.getMetrics()).thenReturn(new MetricRegistry()); +when(dbContext.getAllocator()).thenReturn(allocator); +when(dbContext.getConfig()).thenReturn(config); +when(dbContext.getOptionManager()).thenReturn(systemOptions); +when(dbContext.getStoreProvider()).thenReturn(provider); +when(dbContext.getClasspathScan()).thenReturn(scanResult); +when(dbContext.getLpPersistence()).thenReturn(logicalPlanPersistence); --- End diff -- Here, we should have a test time drillbit context implementation that extends a common interface -- just as we are now doing for the fragment context. Why? We want classes to have well-defined interfaces, and want to minimize cohesion. (Basic OO design rules.) It should be made very clear when some planning test needs additional info. And, if it needs that info, then the set of info should be varied to ensure all code paths are tested. That can't be done if we use one generic set of global state for all tests. This gets to philosophy also. Our tests should strive to cover all possible states. that Can only be done at the unit level. And, it can only be done if we drive the tests with a wide variety of inputs. I we bolkt things together, then tests, we must run a minimum of tests: just enough to show that "things work" for one or two cases. We then rely on the users for more complex test cases. That is great -- but tends to upset the users... ---
[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1045#discussion_r162251468 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/OperatorFixture.java --- @@ -351,8 +603,110 @@ public OperatorStats getStats() { } } - public OperatorContext operatorContext(PhysicalOperator config) { -return new TestOperatorContext(context, allocator(), config, stats); + public static class MockPhysicalOperator implements PhysicalOperator { --- End diff -- Seems overkill. Most times, a test that needs an operator will provide one. The dummy one that was in this code (or maybe in my latest PR) handles those cases where we don't even have an operator (record batch) so any operator (definition) will do. ---
[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1045#discussion_r162229669 --- Diff: contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcBatchCreator.java --- @@ -33,10 +32,10 @@ public class JdbcBatchCreator implements BatchCreator { @Override public ScanBatch getBatch(FragmentContext context, JdbcSubScan config, - List children) throws ExecutionSetupException { +List children) throws ExecutionSetupException { Preconditions.checkArgument(children.isEmpty()); JdbcStoragePlugin plugin = config.getPlugin(); -RecordReader reader = new JdbcRecordReader(context, plugin.getSource(), config.getSql(), plugin.getName()); +RecordReader reader = new JdbcRecordReader(plugin.getSource(), config.getSql(), plugin.getName()); --- End diff -- Now I can't find the file either. Consider this issue resolved. ---
[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1045#discussion_r162246792 --- Diff: exec/java-exec/src/test/java/org/apache/drill/PlanningBase.java --- @@ -58,23 +55,24 @@ import com.google.common.collect.ImmutableList; import com.google.common.io.Resources; -public class PlanningBase extends ExecTest{ - //private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(PlanningBase.class); +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; --- End diff -- Suggestion. Do not depend on mocking in test. Design code so it works well in tests without mocks. Resort to mocks only when absolutely necessary. That is, we should be trying to do LESS mocking, not MORE. This way, the same test code can be used in tools, ad-hoc tests and so on without the JUnit machinery. This is a philosophical discussion, and one that should have input from more of the team. ---
[GitHub] drill pull request #1045: DRILL-5730 Test Mocking Improvements
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1045#discussion_r162246482 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/BaseRawBatchBuffer.java --- @@ -172,7 +172,7 @@ public RawFragmentBatch getNext() throws IOException { } catch (final InterruptedException e) { // We expect that the interrupt means the fragment is canceled or failed, so we should kill this buffer - if (!context.shouldContinue()) { + if (!context.getExecutorState().shouldContinue()) { --- End diff -- Let's think about this from a testing perspective. The context context can be implemented in a way that allows testing. That is what the use of an interface allows. Tests implement the context one way, real code another way, but the interface is the same. Here, we expose the executor state. Is this also defined as an interface? If not, we are letting implementation leak out, and we won't be able to create a test-only version. It was better to leave these methods on the context itself so that their implementation is hidden and thus we can create test versions. ---
[GitHub] drill issue #1095: DRILL-6085: Fixed spontaneous vm exits on Travis.
Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1095 Note the test failure on Travis is the infamous Sort operator test failure, which is unrelated to this change: ``` Failed tests: TestSortImpl.testLargeBatch:498->runJumboBatchTest:471->runLargeSortTest:451 Value of 1:0 expected:<0> but was:<1> ``` ---
[GitHub] drill pull request #1091: DRILL-6071: Limit batch size for flatten operator
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1091#discussion_r162226190 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java --- @@ -380,14 +394,19 @@ private void measureColumn(ValueVector v, String prefix) { // vectors do consume space, so visit columns recursively. switch (v.getField().getType().getMinorType()) { -case MAP: - expandMap((AbstractMapVector) v, prefix + v.getField().getName() + "."); - break; -case LIST: - expandList((RepeatedListVector) v, prefix + v.getField().getName() + "."); - break; -default: - v.collectLedgers(ledgers); + case MAP: +expandMap((AbstractMapVector) v, prefix + v.getField().getName() + "."); +break; + case LIST: --- End diff -- Here, check mode. If mode is REPEATED, this is a RepeatedListVector and should go down one path. Otherwise, it is a ListVector (possible list of unions) and should go down another path. ---
[GitHub] drill pull request #1091: DRILL-6071: Limit batch size for flatten operator
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1091#discussion_r162225383 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractBase.java --- @@ -29,9 +28,12 @@ public static long INIT_ALLOCATION = 1_000_000L; public static long MAX_ALLOCATION = 10_000_000_000L; + // Default output batch size, 512MB + public static long OUTPUT_BATCH_SIZE = 512 * 1024 * 1024L; --- End diff -- Too large. The sort & hash agg operators often receive just 20-40 MB on a large cluster. (That is, itself, an issue, but one that has proven very difficult to resolve.) So, the output batch size must be no larger than 1/3 this size (for sort). Probably some team discussion is required to agree on a good number, and on the work needed to ensure that sort, hash agg and hash join are given sufficient memory for the selected batch size. ---
[GitHub] drill pull request #1091: DRILL-6071: Limit batch size for flatten operator
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1091#discussion_r162225832 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java --- @@ -94,8 +98,54 @@ private void clear() { } } + private class FlattenMemoryManager { +private final int outputRowCount; +private static final int OFFSET_VECTOR_WIDTH = 4; +private static final int WORST_CASE_FRAGMENTATION_FACTOR = 2; +private static final int MAX_NUM_ROWS = 64 * 1024; +private static final int MIN_NUM_ROWS = 1; + +private FlattenMemoryManager(RecordBatch incoming, long outputBatchSize, SchemaPath flattenColumn) { + // Get sizing information for the batch. + RecordBatchSizer sizer = new RecordBatchSizer(incoming); + + final TypedFieldId typedFieldId = incoming.getValueVectorId(flattenColumn); + final MaterializedField field = incoming.getSchema().getColumn(typedFieldId.getFieldIds()[0]); + + // Get column size of flatten column. + RecordBatchSizer.ColumnSize columnSize = sizer.getColumn(incoming.getValueAccessorById(field.getValueClass(), + typedFieldId.getFieldIds()).getValueVector(), field.getName()); + + // Average rowWidth of flatten column + final int avgRowWidthFlattenColumn = RecordBatchSizer.safeDivide(columnSize.dataSize, incoming.getRecordCount()); + + // Average rowWidth excluding the flatten column. + final int avgRowWidthWithOutFlattenColumn = sizer.netRowWidth() - avgRowWidthFlattenColumn; + + // Average rowWidth of single element in the flatten list. + // subtract the offset vector size from column data size. + final int avgRowWidthSingleFlattenEntry = + RecordBatchSizer.safeDivide(columnSize.dataSize - (OFFSET_VECTOR_WIDTH * columnSize.valueCount), columnSize.elementCount); + + // Average rowWidth of outgoing batch. + final int avgOutgoingRowWidth = avgRowWidthWithOutFlattenColumn + avgRowWidthSingleFlattenEntry; + + // Number of rows in outgoing batch + outputRowCount = Math.max(MIN_NUM_ROWS, Math.min(MAX_NUM_ROWS, + RecordBatchSizer.safeDivide((outputBatchSize/WORST_CASE_FRAGMENTATION_FACTOR), avgOutgoingRowWidth))); +} --- End diff -- Would be great to log this info in debug mode to aid in tuning. Doing this in the sort proved highly valuable. ---
[GitHub] drill pull request #1091: DRILL-6071: Limit batch size for flatten operator
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1091#discussion_r162225507 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractBase.java --- @@ -29,9 +28,12 @@ public static long INIT_ALLOCATION = 1_000_000L; public static long MAX_ALLOCATION = 10_000_000_000L; + // Default output batch size, 512MB + public static long OUTPUT_BATCH_SIZE = 512 * 1024 * 1024L; --- End diff -- Also, if the size is a config parameter, shouldn't the default be set in the drill-module.conf file, not in a Java constant? ---
[GitHub] drill pull request #1091: DRILL-6071: Limit batch size for flatten operator
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1091#discussion_r162226341 --- Diff: exec/java-exec/src/main/resources/drill-module.conf --- @@ -312,12 +312,13 @@ drill.exec: { memory: { operator: { max: 200, - initial: 1000 + initial: 1000, + output_batch_size : 536870912 --- End diff -- See note above. BTW: If you add a new memory-related param, you can use the methods specifically designed for memory. Then you can say things such as 512K or 2MB. The sort code does this. See also the HOCON documentation. ---
[GitHub] drill pull request #1091: DRILL-6071: Limit batch size for flatten operator
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1091#discussion_r162225888 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenTemplate.java --- @@ -38,23 +38,20 @@ public abstract class FlattenTemplate implements Flattener { private static final Logger logger = LoggerFactory.getLogger(FlattenTemplate.class); - private static final int OUTPUT_BATCH_SIZE = 4*1024; - private static final int OUTPUT_MEMORY_LIMIT = 512 * 1024 * 1024; + private static final int OUTPUT_ROW_COUNT = 64*1024; --- End diff -- `ValueVector.MAX_ROW_COUNT` ---
[GitHub] drill pull request #1091: DRILL-6071: Limit batch size for flatten operator
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1091#discussion_r162225575 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java --- @@ -94,8 +98,54 @@ private void clear() { } } + private class FlattenMemoryManager { +private final int outputRowCount; +private static final int OFFSET_VECTOR_WIDTH = 4; +private static final int WORST_CASE_FRAGMENTATION_FACTOR = 2; +private static final int MAX_NUM_ROWS = 64 * 1024; --- End diff -- `ValueVector.MAX_ROW_COUNT` ---
[GitHub] drill issue #1090: DRILL-6080: Sort incorrectly limits batch size to 65535 r...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1090 Pushed a new commit that fixes an indexing error. Also removes the "unsafe" methods in favor of the "set" methods in DrillBuf as Vlad requested. @vrozov, please take another look. ---
[GitHub] drill issue #1095: DRILL-6085: Fixed spontaneous vm exits on Travis.
Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1095 @arina-ielchiieva ---
[GitHub] drill pull request #1095: DRILL-6085: Fixed spontaneous vm exits on Travis.
GitHub user ilooner opened a pull request: https://github.com/apache/drill/pull/1095 DRILL-6085: Fixed spontaneous vm exits on Travis. This error occurs sporadically on Travis. ``` Failed to execute goal [32morg.apache.maven.plugins:maven-surefire-plugin:2.17:test[m [1m(default-test)[m on project [36mdrill-java-exec[m: [1;31mExecution default-test of goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test failed: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?[m ``` I believe the issue is caused by the fact that our travis containers only have 4gb of memory and our test vms are given up to 4 gb of direct memory and heap memory (for a total of up to 8gb). Because we are promising more memory than we have it seems we occasionally hit the limit of our 4gb vm. I have added the ```sudo: "required"``` option to travis.yml which increases the amount of memory in the container to 7gb. The vm exits occurred about 30% of the time lately before this change for me. With this change I ran 5 consecutive builds without a vm exit. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilooner/drill DRILL-6085 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1095.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1095 commit 4b2c4f0d1a3175e6a8cffa5be42cdf51632ae1c2 Author: Timothy FarkasDate: 2018-01-16T20:11:25Z DRILL-6085: Fixed spontaneous vm exits on Travis. ---
[GitHub] drill pull request #1057: DRILL-5993 Add Generic Copiers With Append Methods
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r162196004 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/AbstractCopier.java --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.svremover; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.exception.SchemaChangeException; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.VectorContainer; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.vector.AllocationHelper; +import org.apache.drill.exec.vector.ValueVector; + +public abstract class AbstractCopier implements Copier { + protected ValueVector[] vvOut; + protected VectorContainer outgoing; + + @Override + public void setup(RecordBatch incoming, VectorContainer outgoing) throws SchemaChangeException { --- End diff -- It is not used by the AbstractCopier itself but it is used by the AbstractSV2Copier and AbstractSV4Copier which override the method. ---
[GitHub] drill pull request #1057: DRILL-5993 Add Generic Copiers With Append Methods
Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r162193983 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/AbstractCopier.java --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.svremover; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.exception.SchemaChangeException; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.VectorContainer; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.vector.AllocationHelper; +import org.apache.drill.exec.vector.ValueVector; + +public abstract class AbstractCopier implements Copier { + protected ValueVector[] vvOut; + protected VectorContainer outgoing; + + @Override + public void setup(RecordBatch incoming, VectorContainer outgoing) throws SchemaChangeException { --- End diff -- The "incoming" parameter is not used anywhere ... ---
[GitHub] drill issue #1066: DRILL-3993: Changes to support Calcite 1.15
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/1066 Other than one comment above, rest LGTM. +1 ---
[GitHub] drill issue #1066: DRILL-3993: Changes to support Calcite 1.15
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/1066 @vvysotskyi could you pls separate out the HashAggregate OOM related change from this PR and file a separate JIRA for it since it is unrelated to the calcite rebase per-se. Lumping them together causes confusion. Thanks. ---
[GitHub] drill pull request #1057: DRILL-5993 Add Generic Copiers With Append Methods
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r162175059 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/AbstractSV4Copier.java --- @@ -17,32 +17,30 @@ */ package org.apache.drill.exec.physical.impl.svremover; -import javax.inject.Named; - import org.apache.drill.common.types.TypeProtos.MajorType; import org.apache.drill.common.types.Types; import org.apache.drill.exec.exception.SchemaChangeException; -import org.apache.drill.exec.ops.FragmentContext; import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.VectorContainer; import org.apache.drill.exec.record.VectorWrapper; import org.apache.drill.exec.record.selection.SelectionVector4; import org.apache.drill.exec.vector.AllocationHelper; +import org.apache.drill.exec.vector.ValueVector; -public abstract class CopierTemplate4 implements Copier{ - static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(CopierTemplate4.class); +public abstract class AbstractSV4Copier implements Copier { --- End diff -- Done ---
[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1066#discussion_r162171143 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/DrillRelBuilder.java --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner; + +import org.apache.calcite.plan.Context; +import org.apache.calcite.plan.Contexts; +import org.apache.calcite.plan.RelOptCluster; +import org.apache.calcite.plan.RelOptSchema; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.RelFactories; +import org.apache.calcite.tools.RelBuilder; +import org.apache.calcite.tools.RelBuilderFactory; +import org.apache.calcite.util.Util; + +public class DrillRelBuilder extends RelBuilder { + private final RelFactories.FilterFactory filterFactory; + + protected DrillRelBuilder(Context context, RelOptCluster cluster, RelOptSchema relOptSchema) { +super(context, cluster, relOptSchema); +this.filterFactory = +Util.first(context.unwrap(RelFactories.FilterFactory.class), +RelFactories.DEFAULT_FILTER_FACTORY); + } + + /** + * Original method {@link RelBuilder#empty} returns empty values rel. + * In the order to preserve dara row types, filter with false predicate is created. + */ + @Override + public RelBuilder empty() { --- End diff -- Ok, thanks for clarifying. The schema-on-read does make it difficult to avoid overriding the empty() behavior. ---
[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15
Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1066#discussion_r162157833 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/DrillRelBuilder.java --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner; + +import org.apache.calcite.plan.Context; +import org.apache.calcite.plan.Contexts; +import org.apache.calcite.plan.RelOptCluster; +import org.apache.calcite.plan.RelOptSchema; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.RelFactories; +import org.apache.calcite.tools.RelBuilder; +import org.apache.calcite.tools.RelBuilderFactory; +import org.apache.calcite.util.Util; + +public class DrillRelBuilder extends RelBuilder { + private final RelFactories.FilterFactory filterFactory; + + protected DrillRelBuilder(Context context, RelOptCluster cluster, RelOptSchema relOptSchema) { +super(context, cluster, relOptSchema); +this.filterFactory = +Util.first(context.unwrap(RelFactories.FilterFactory.class), +RelFactories.DEFAULT_FILTER_FACTORY); + } + + /** + * Original method {@link RelBuilder#empty} returns empty values rel. + * In the order to preserve dara row types, filter with false predicate is created. --- End diff -- Thanks, fixed ---
[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15
Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/1066#discussion_r162161040 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/DrillRelBuilder.java --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner; + +import org.apache.calcite.plan.Context; +import org.apache.calcite.plan.Contexts; +import org.apache.calcite.plan.RelOptCluster; +import org.apache.calcite.plan.RelOptSchema; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.RelFactories; +import org.apache.calcite.tools.RelBuilder; +import org.apache.calcite.tools.RelBuilderFactory; +import org.apache.calcite.util.Util; + +public class DrillRelBuilder extends RelBuilder { + private final RelFactories.FilterFactory filterFactory; + + protected DrillRelBuilder(Context context, RelOptCluster cluster, RelOptSchema relOptSchema) { +super(context, cluster, relOptSchema); +this.filterFactory = +Util.first(context.unwrap(RelFactories.FilterFactory.class), +RelFactories.DEFAULT_FILTER_FACTORY); + } + + /** + * Original method {@link RelBuilder#empty} returns empty values rel. + * In the order to preserve dara row types, filter with false predicate is created. + */ + @Override + public RelBuilder empty() { --- End diff -- In some cases, RowType of the input may not be known at the moment when this method is called (some of the columns or all columns were not dynamically discovered yet), therefore I had to make changes directly in Calcite to allow using of custom RelBuilder and make changes in Drill to pass custom RelBuilder into them. For more details please see [1] and [2]. [1] https://lists.apache.org/list.html?d...@calcite.apache.org:lte=1y:Make%20RelBuilder.filter%28%29%20configurable [2] https://issues.apache.org/jira/browse/CALCITE-2043 ---
[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1066#discussion_r157854723 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/DrillRelBuilder.java --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner; + +import org.apache.calcite.plan.Context; +import org.apache.calcite.plan.Contexts; +import org.apache.calcite.plan.RelOptCluster; +import org.apache.calcite.plan.RelOptSchema; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.RelFactories; +import org.apache.calcite.tools.RelBuilder; +import org.apache.calcite.tools.RelBuilderFactory; +import org.apache.calcite.util.Util; + +public class DrillRelBuilder extends RelBuilder { + private final RelFactories.FilterFactory filterFactory; + + protected DrillRelBuilder(Context context, RelOptCluster cluster, RelOptSchema relOptSchema) { +super(context, cluster, relOptSchema); +this.filterFactory = +Util.first(context.unwrap(RelFactories.FilterFactory.class), +RelFactories.DEFAULT_FILTER_FACTORY); + } + + /** + * Original method {@link RelBuilder#empty} returns empty values rel. + * In the order to preserve dara row types, filter with false predicate is created. + */ + @Override + public RelBuilder empty() { --- End diff -- Is a separate DrillRelBuilder absolutely needed ? Should the original behavior of empty() in Calcite be changed to preserve the RowType of the input ? The fewer things we extend in Drill the better. ---
[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/1066#discussion_r157853167 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/DrillRelBuilder.java --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner; + +import org.apache.calcite.plan.Context; +import org.apache.calcite.plan.Contexts; +import org.apache.calcite.plan.RelOptCluster; +import org.apache.calcite.plan.RelOptSchema; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.RelFactories; +import org.apache.calcite.tools.RelBuilder; +import org.apache.calcite.tools.RelBuilderFactory; +import org.apache.calcite.util.Util; + +public class DrillRelBuilder extends RelBuilder { + private final RelFactories.FilterFactory filterFactory; + + protected DrillRelBuilder(Context context, RelOptCluster cluster, RelOptSchema relOptSchema) { +super(context, cluster, relOptSchema); +this.filterFactory = +Util.first(context.unwrap(RelFactories.FilterFactory.class), +RelFactories.DEFAULT_FILTER_FACTORY); + } + + /** + * Original method {@link RelBuilder#empty} returns empty values rel. + * In the order to preserve dara row types, filter with false predicate is created. --- End diff -- 'data' spelling ---
[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...
Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1072#discussion_r162124359 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java --- @@ -19,44 +19,286 @@ import io.netty.buffer.DrillBuf; -public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher { +/** SQL Pattern Contains implementation */ +public final class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher { + private final MatcherFcn matcherFcn; public SqlPatternContainsMatcher(String patternString) { super(patternString); + +// Pattern matching is 1) a CPU intensive operation and 2) pattern and input dependent. The conclusion is +// that there is no single implementation that can do it all well. So, we use multiple implementations +// chosen based on the pattern length. +if (patternLength == 0) { + matcherFcn = new MatcherZero(); +} else if (patternLength == 1) { + matcherFcn = new MatcherOne(); +} else if (patternLength == 2) { + matcherFcn = new MatcherTwo(); +} else if (patternLength == 3) { + matcherFcn = new MatcherThree(); +} else if (patternLength < 10) { + matcherFcn = new MatcherN(); +} else { + matcherFcn = new BoyerMooreMatcher(); +} } @Override public int match(int start, int end, DrillBuf drillBuf) { +return matcherFcn.match(start, end, drillBuf); + } + + //-- + // Inner Data Structure + // -- + + /** Abstract matcher class to allow us pick the most efficient implementation */ + private abstract class MatcherFcn { +protected final byte[] patternArray; + +protected MatcherFcn() { + assert patternByteBuffer.hasArray(); + + patternArray = patternByteBuffer.array(); +} + +/** + * @return 1 if the pattern was matched; 0 otherwise + */ +protected abstract int match(int start, int end, DrillBuf drillBuf); + } + + /** Handles patterns with length one */ + private final class MatcherZero extends MatcherFcn { -if (patternLength == 0) { // Everything should match for null pattern string +private MatcherZero() { +} + +/** {@inheritDoc} */ +@Override +protected final int match(int start, int end, DrillBuf drillBuf) { return 1; } + } + + /** Handles patterns with length one */ + private final class MatcherOne extends MatcherFcn { + +private MatcherOne() { + super(); +} + +/** {@inheritDoc} */ +@Override +protected final int match(int start, int end, DrillBuf drillBuf) { + final int lengthToProcess = end - start; + final byte firstPattByte = patternArray[0]; + + // simplePattern string has meta characters i.e % and _ and escape characters removed. + // so, we can just directly compare. + for (int idx = 0; idx < lengthToProcess; idx++) { +byte inputByte = drillBuf.getByte(start + idx); + +if (firstPattByte != inputByte) { + continue; +} +return 1; + } + return 0; +} + } + + /** Handles patterns with length two */ + private final class MatcherTwo extends MatcherFcn { + +private MatcherTwo() { +} + +/** {@inheritDoc} */ +@Override +protected final int match(int start, int end, DrillBuf drillBuf) { + final int lengthToProcess = end - start - 1; + final byte firstPattByte = patternArray[0]; + final byte secondPattByte = patternArray[1]; + + // simplePattern string has meta characters i.e % and _ and escape characters removed. + // so, we can just directly compare. + for (int idx = 0; idx < lengthToProcess; idx++) { +final byte firstInByte = drillBuf.getByte(start + idx); -final int txtLength = end - start; +if (firstPattByte != firstInByte) { + continue; +} else { + final byte secondInByte = drillBuf.getByte(start + idx +1); -// no match if input string length is less than pattern length -if (txtLength < patternLength) { + if (secondInByte == secondPattByte) { +return 1; + } +} + } return 0; } + } + /** Handles patterns with length three */ + private final class MatcherThree extends
[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...
Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1072#discussion_r162120115 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java --- @@ -19,44 +19,286 @@ import io.netty.buffer.DrillBuf; -public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher { +/** SQL Pattern Contains implementation */ +public final class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher { + private final MatcherFcn matcherFcn; public SqlPatternContainsMatcher(String patternString) { super(patternString); + +// Pattern matching is 1) a CPU intensive operation and 2) pattern and input dependent. The conclusion is +// that there is no single implementation that can do it all well. So, we use multiple implementations +// chosen based on the pattern length. +if (patternLength == 0) { + matcherFcn = new MatcherZero(); +} else if (patternLength == 1) { + matcherFcn = new MatcherOne(); +} else if (patternLength == 2) { + matcherFcn = new MatcherTwo(); +} else if (patternLength == 3) { + matcherFcn = new MatcherThree(); +} else if (patternLength < 10) { + matcherFcn = new MatcherN(); +} else { + matcherFcn = new BoyerMooreMatcher(); +} } @Override public int match(int start, int end, DrillBuf drillBuf) { +return matcherFcn.match(start, end, drillBuf); + } + + //-- + // Inner Data Structure + // -- + + /** Abstract matcher class to allow us pick the most efficient implementation */ + private abstract class MatcherFcn { +protected final byte[] patternArray; + +protected MatcherFcn() { + assert patternByteBuffer.hasArray(); + + patternArray = patternByteBuffer.array(); +} + +/** + * @return 1 if the pattern was matched; 0 otherwise + */ +protected abstract int match(int start, int end, DrillBuf drillBuf); + } + + /** Handles patterns with length one */ + private final class MatcherZero extends MatcherFcn { -if (patternLength == 0) { // Everything should match for null pattern string +private MatcherZero() { +} + +/** {@inheritDoc} */ +@Override +protected final int match(int start, int end, DrillBuf drillBuf) { return 1; } + } + + /** Handles patterns with length one */ + private final class MatcherOne extends MatcherFcn { + +private MatcherOne() { + super(); --- End diff -- removed. ---
[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...
Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1072#discussion_r162123690 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java --- @@ -19,44 +19,286 @@ import io.netty.buffer.DrillBuf; -public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher { +/** SQL Pattern Contains implementation */ +public final class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher { + private final MatcherFcn matcherFcn; public SqlPatternContainsMatcher(String patternString) { super(patternString); + +// Pattern matching is 1) a CPU intensive operation and 2) pattern and input dependent. The conclusion is +// that there is no single implementation that can do it all well. So, we use multiple implementations +// chosen based on the pattern length. +if (patternLength == 0) { + matcherFcn = new MatcherZero(); +} else if (patternLength == 1) { + matcherFcn = new MatcherOne(); +} else if (patternLength == 2) { + matcherFcn = new MatcherTwo(); +} else if (patternLength == 3) { + matcherFcn = new MatcherThree(); +} else if (patternLength < 10) { + matcherFcn = new MatcherN(); +} else { + matcherFcn = new BoyerMooreMatcher(); +} } @Override public int match(int start, int end, DrillBuf drillBuf) { +return matcherFcn.match(start, end, drillBuf); + } + + //-- + // Inner Data Structure + // -- + + /** Abstract matcher class to allow us pick the most efficient implementation */ + private abstract class MatcherFcn { +protected final byte[] patternArray; + +protected MatcherFcn() { + assert patternByteBuffer.hasArray(); + + patternArray = patternByteBuffer.array(); +} + +/** + * @return 1 if the pattern was matched; 0 otherwise + */ +protected abstract int match(int start, int end, DrillBuf drillBuf); + } + + /** Handles patterns with length one */ + private final class MatcherZero extends MatcherFcn { -if (patternLength == 0) { // Everything should match for null pattern string +private MatcherZero() { +} + +/** {@inheritDoc} */ +@Override +protected final int match(int start, int end, DrillBuf drillBuf) { return 1; } + } + + /** Handles patterns with length one */ + private final class MatcherOne extends MatcherFcn { + +private MatcherOne() { + super(); +} + +/** {@inheritDoc} */ +@Override +protected final int match(int start, int end, DrillBuf drillBuf) { + final int lengthToProcess = end - start; + final byte firstPattByte = patternArray[0]; + + // simplePattern string has meta characters i.e % and _ and escape characters removed. + // so, we can just directly compare. + for (int idx = 0; idx < lengthToProcess; idx++) { +byte inputByte = drillBuf.getByte(start + idx); + +if (firstPattByte != inputByte) { + continue; +} +return 1; + } + return 0; +} + } + + /** Handles patterns with length two */ + private final class MatcherTwo extends MatcherFcn { + +private MatcherTwo() { +} + +/** {@inheritDoc} */ +@Override +protected final int match(int start, int end, DrillBuf drillBuf) { + final int lengthToProcess = end - start - 1; + final byte firstPattByte = patternArray[0]; + final byte secondPattByte = patternArray[1]; --- End diff -- Good idea; done this for all the matchers. ---
[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...
Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1072#discussion_r162119757 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java --- @@ -19,44 +19,286 @@ import io.netty.buffer.DrillBuf; -public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher { +/** SQL Pattern Contains implementation */ +public final class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher { + private final MatcherFcn matcherFcn; public SqlPatternContainsMatcher(String patternString) { super(patternString); + +// Pattern matching is 1) a CPU intensive operation and 2) pattern and input dependent. The conclusion is +// that there is no single implementation that can do it all well. So, we use multiple implementations +// chosen based on the pattern length. +if (patternLength == 0) { + matcherFcn = new MatcherZero(); +} else if (patternLength == 1) { + matcherFcn = new MatcherOne(); +} else if (patternLength == 2) { + matcherFcn = new MatcherTwo(); +} else if (patternLength == 3) { + matcherFcn = new MatcherThree(); +} else if (patternLength < 10) { + matcherFcn = new MatcherN(); +} else { + matcherFcn = new BoyerMooreMatcher(); +} } @Override public int match(int start, int end, DrillBuf drillBuf) { +return matcherFcn.match(start, end, drillBuf); + } + + //-- + // Inner Data Structure + // -- + + /** Abstract matcher class to allow us pick the most efficient implementation */ + private abstract class MatcherFcn { +protected final byte[] patternArray; + +protected MatcherFcn() { + assert patternByteBuffer.hasArray(); + + patternArray = patternByteBuffer.array(); +} + +/** + * @return 1 if the pattern was matched; 0 otherwise + */ +protected abstract int match(int start, int end, DrillBuf drillBuf); + } + + /** Handles patterns with length one */ --- End diff -- cut-and-paste issue :-) ---
[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...
Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1072#discussion_r162121115 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java --- @@ -19,44 +19,286 @@ import io.netty.buffer.DrillBuf; -public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher { +/** SQL Pattern Contains implementation */ +public final class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher { + private final MatcherFcn matcherFcn; public SqlPatternContainsMatcher(String patternString) { super(patternString); + +// Pattern matching is 1) a CPU intensive operation and 2) pattern and input dependent. The conclusion is +// that there is no single implementation that can do it all well. So, we use multiple implementations +// chosen based on the pattern length. +if (patternLength == 0) { + matcherFcn = new MatcherZero(); +} else if (patternLength == 1) { + matcherFcn = new MatcherOne(); +} else if (patternLength == 2) { + matcherFcn = new MatcherTwo(); +} else if (patternLength == 3) { + matcherFcn = new MatcherThree(); +} else if (patternLength < 10) { + matcherFcn = new MatcherN(); +} else { + matcherFcn = new BoyerMooreMatcher(); +} } @Override public int match(int start, int end, DrillBuf drillBuf) { +return matcherFcn.match(start, end, drillBuf); + } + + //-- + // Inner Data Structure + // -- + + /** Abstract matcher class to allow us pick the most efficient implementation */ + private abstract class MatcherFcn { +protected final byte[] patternArray; + +protected MatcherFcn() { + assert patternByteBuffer.hasArray(); + + patternArray = patternByteBuffer.array(); +} + +/** + * @return 1 if the pattern was matched; 0 otherwise + */ +protected abstract int match(int start, int end, DrillBuf drillBuf); + } + + /** Handles patterns with length one */ + private final class MatcherZero extends MatcherFcn { -if (patternLength == 0) { // Everything should match for null pattern string +private MatcherZero() { +} + +/** {@inheritDoc} */ +@Override +protected final int match(int start, int end, DrillBuf drillBuf) { return 1; } + } + + /** Handles patterns with length one */ + private final class MatcherOne extends MatcherFcn { + +private MatcherOne() { + super(); +} + +/** {@inheritDoc} */ +@Override +protected final int match(int start, int end, DrillBuf drillBuf) { + final int lengthToProcess = end - start; + final byte firstPattByte = patternArray[0]; --- End diff -- renamed all such variables to XXXPatternBytes ---
[GitHub] drill pull request #1072: DRILL-5879: Improved SQL Pattern Contains Performa...
Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1072#discussion_r162124388 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java --- @@ -19,44 +19,286 @@ import io.netty.buffer.DrillBuf; -public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher { +/** SQL Pattern Contains implementation */ +public final class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher { + private final MatcherFcn matcherFcn; public SqlPatternContainsMatcher(String patternString) { super(patternString); + +// Pattern matching is 1) a CPU intensive operation and 2) pattern and input dependent. The conclusion is +// that there is no single implementation that can do it all well. So, we use multiple implementations +// chosen based on the pattern length. +if (patternLength == 0) { + matcherFcn = new MatcherZero(); +} else if (patternLength == 1) { + matcherFcn = new MatcherOne(); +} else if (patternLength == 2) { + matcherFcn = new MatcherTwo(); +} else if (patternLength == 3) { + matcherFcn = new MatcherThree(); +} else if (patternLength < 10) { + matcherFcn = new MatcherN(); +} else { + matcherFcn = new BoyerMooreMatcher(); +} } @Override public int match(int start, int end, DrillBuf drillBuf) { +return matcherFcn.match(start, end, drillBuf); + } + + //-- + // Inner Data Structure + // -- + + /** Abstract matcher class to allow us pick the most efficient implementation */ + private abstract class MatcherFcn { +protected final byte[] patternArray; + +protected MatcherFcn() { + assert patternByteBuffer.hasArray(); + + patternArray = patternByteBuffer.array(); +} + +/** + * @return 1 if the pattern was matched; 0 otherwise + */ +protected abstract int match(int start, int end, DrillBuf drillBuf); + } + + /** Handles patterns with length one */ + private final class MatcherZero extends MatcherFcn { -if (patternLength == 0) { // Everything should match for null pattern string +private MatcherZero() { +} + +/** {@inheritDoc} */ +@Override +protected final int match(int start, int end, DrillBuf drillBuf) { return 1; } + } + + /** Handles patterns with length one */ + private final class MatcherOne extends MatcherFcn { + +private MatcherOne() { + super(); +} + +/** {@inheritDoc} */ +@Override +protected final int match(int start, int end, DrillBuf drillBuf) { + final int lengthToProcess = end - start; + final byte firstPattByte = patternArray[0]; + + // simplePattern string has meta characters i.e % and _ and escape characters removed. + // so, we can just directly compare. + for (int idx = 0; idx < lengthToProcess; idx++) { +byte inputByte = drillBuf.getByte(start + idx); + +if (firstPattByte != inputByte) { + continue; +} +return 1; + } + return 0; +} + } + + /** Handles patterns with length two */ + private final class MatcherTwo extends MatcherFcn { + +private MatcherTwo() { +} + +/** {@inheritDoc} */ +@Override +protected final int match(int start, int end, DrillBuf drillBuf) { + final int lengthToProcess = end - start - 1; + final byte firstPattByte = patternArray[0]; + final byte secondPattByte = patternArray[1]; + + // simplePattern string has meta characters i.e % and _ and escape characters removed. + // so, we can just directly compare. + for (int idx = 0; idx < lengthToProcess; idx++) { +final byte firstInByte = drillBuf.getByte(start + idx); -final int txtLength = end - start; +if (firstPattByte != firstInByte) { + continue; +} else { + final byte secondInByte = drillBuf.getByte(start + idx +1); -// no match if input string length is less than pattern length -if (txtLength < patternLength) { + if (secondInByte == secondPattByte) { +return 1; + } +} + } return 0; } + } + /** Handles patterns with length three */ + private final class MatcherThree extends
[GitHub] drill issue #1093: DRILL-6093 : Account for simple columns in project cpu co...
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/1093 +1 . It would be good to add a unit test using the example in the JIRA. ---
[GitHub] drill pull request #1094: DRILL-6090: While connecting to drill-bits using J...
GitHub user milindt opened a pull request: https://github.com/apache/drill/pull/1094 DRILL-6090: While connecting to drill-bits using JDBC Driver through ⦠â¦Zookeeper, a lot of "Curator-Framework-0" threads are created if connection to drill-bit is not successful(no drill-bits are up/reachable) I am using Drill JDBC driver 1.12.0 to connect to MapR-DB. I am finding the available drill-bits using Zookeepers. When drill-bits are not up or not reachable, the connection is failed with exception: "Failure in connecting to Drill: oadd.org.apache.drill.exec.rpc.RpcException: Failure setting up ZK for client", which is expected, but number of threads created by ZKClusterCoordinator just keeps on increasing. Steps to reproduce the issue Setup a connection with a drill-bit using Apache Drill JDBC driver 1.12.0 through Zookeeper hosts(port 5181) Now stop the drill-bit services or block the drill-bit IPs using iptable rules Truncate catalina logs Try to connect to the drill-bit/hit a code path that requires connection to drill-bits. Take thread dump using kill -QUIT grep -c "Curator-Framework-0" catalina.out Observe that the curator framework thread just keep on accumulating RCA: ZKClusterCoordinator creates curator threads in the constructor ZKClusterCoordinator is instantiated by DrillClient.connect DrillClient.connect is called in DrillConnectionImpl constructor Fix: Call DrillConnectionImpl .cleanup() from all the catch blocks in the DrillConnectionImpl constructor. [JIRA link](https://issues.apache.org/jira/browse/DRILL-6090) If this fix is accepted, will there be a release from this branch(1.12.0)? A release with minor version may be. You can merge this pull request into a Git repository by running: $ git pull https://github.com/milindt/drill 1.12.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1094.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1094 ---