[GitHub] drill pull request #1212: DRILL-6323: Lateral Join - Initial implementation
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1212 ---
[GitHub] drill pull request #1212: DRILL-6323: Lateral Join - Initial implementation
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1212#discussion_r182283067 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/LateralJoinBatch.java --- @@ -0,0 +1,861 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.join; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.exception.OutOfMemoryException; +import org.apache.drill.exec.exception.SchemaChangeException; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.LateralContract; +import org.apache.drill.exec.physical.config.LateralJoinPOP; +import org.apache.drill.exec.record.AbstractBinaryRecordBatch; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.exec.record.JoinBatchMemoryManager; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.RecordBatchSizer; +import org.apache.drill.exec.record.VectorAccessibleUtilities; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.vector.ValueVector; + +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.EMIT; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.NONE; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK_NEW_SCHEMA; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OUT_OF_MEMORY; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.STOP; + +/** + * RecordBatch implementation for the lateral join operator. Currently it's expected LATERAL to co-exists with UNNEST + * operator. Both LATERAL and UNNEST will share a contract with each other defined at {@link LateralContract} + */ +public class LateralJoinBatch extends AbstractBinaryRecordBatch implements LateralContract { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(LateralJoinBatch.class); + + // Input indexes to correctly update the stats + private static final int LEFT_INPUT = 0; + + private static final int RIGHT_INPUT = 1; + + // Maximum number records in the outgoing batch + private int maxOutputRowCount; + + // Schema on the left side + private BatchSchema leftSchema; + + // Schema on the right side + private BatchSchema rightSchema; + + // Index in output batch to populate next row + private int outputIndex; + + // Current index of record in left incoming which is being processed + private int leftJoinIndex = -1; + + // Current index of record in right incoming which is being processed + private int rightJoinIndex = -1; + + // flag to keep track if current left batch needs to be processed in future next call + private boolean processLeftBatchInFuture; + + // Keep track if any matching right record was found for current left index record + private boolean matchedRecordFound; + + private boolean useMemoryManager = true; + + /* + * Public Methods + * / + public LateralJoinBatch(LateralJoinPOP popConfig, FragmentContext context, + RecordBatch left, RecordBatch right) throws OutOfMemoryException { +super(popConfig, context, left, right); +
[GitHub] drill pull request #1212: DRILL-6323: Lateral Join - Initial implementation
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/1212#discussion_r182259926 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/LateralJoinBatch.java --- @@ -0,0 +1,861 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.join; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.exception.OutOfMemoryException; +import org.apache.drill.exec.exception.SchemaChangeException; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.LateralContract; +import org.apache.drill.exec.physical.config.LateralJoinPOP; +import org.apache.drill.exec.record.AbstractBinaryRecordBatch; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.exec.record.JoinBatchMemoryManager; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.RecordBatchSizer; +import org.apache.drill.exec.record.VectorAccessibleUtilities; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.vector.ValueVector; + +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.EMIT; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.NONE; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK_NEW_SCHEMA; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OUT_OF_MEMORY; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.STOP; + +/** + * RecordBatch implementation for the lateral join operator. Currently it's expected LATERAL to co-exists with UNNEST + * operator. Both LATERAL and UNNEST will share a contract with each other defined at {@link LateralContract} + */ +public class LateralJoinBatch extends AbstractBinaryRecordBatch implements LateralContract { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(LateralJoinBatch.class); + + // Input indexes to correctly update the stats + private static final int LEFT_INPUT = 0; + + private static final int RIGHT_INPUT = 1; + + // Maximum number records in the outgoing batch + private int maxOutputRowCount; + + // Schema on the left side + private BatchSchema leftSchema; + + // Schema on the right side + private BatchSchema rightSchema; + + // Index in output batch to populate next row + private int outputIndex; + + // Current index of record in left incoming which is being processed + private int leftJoinIndex = -1; + + // Current index of record in right incoming which is being processed + private int rightJoinIndex = -1; + + // flag to keep track if current left batch needs to be processed in future next call + private boolean processLeftBatchInFuture; + + // Keep track if any matching right record was found for current left index record + private boolean matchedRecordFound; + + private boolean useMemoryManager = true; + + /* + * Public Methods + * / + public LateralJoinBatch(LateralJoinPOP popConfig, FragmentContext context, + RecordBatch left, RecordBatch right) throws OutOfMemoryException { +super(popConfig, context, left, right); +
[GitHub] drill pull request #1212: DRILL-6323: Lateral Join - Initial implementation
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/1212#discussion_r182262104 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java --- @@ -186,6 +186,11 @@ private ExecConstants() { public static final String BIT_ENCRYPTION_SASL_ENABLED = "drill.exec.security.bit.encryption.sasl.enabled"; public static final String BIT_ENCRYPTION_SASL_MAX_WRAPPED_SIZE = "drill.exec.security.bit.encryption.sasl.max_wrapped_size"; + /** --- End diff -- Removed. Thanks for catching this. ---
[GitHub] drill pull request #1212: DRILL-6323: Lateral Join - Initial implementation
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1212#discussion_r182172836 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/LateralJoinBatch.java --- @@ -0,0 +1,861 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.join; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.exception.OutOfMemoryException; +import org.apache.drill.exec.exception.SchemaChangeException; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.LateralContract; +import org.apache.drill.exec.physical.config.LateralJoinPOP; +import org.apache.drill.exec.record.AbstractBinaryRecordBatch; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.exec.record.JoinBatchMemoryManager; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.RecordBatchSizer; +import org.apache.drill.exec.record.VectorAccessibleUtilities; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.vector.ValueVector; + +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.EMIT; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.NONE; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK_NEW_SCHEMA; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OUT_OF_MEMORY; +import static org.apache.drill.exec.record.RecordBatch.IterOutcome.STOP; + +/** + * RecordBatch implementation for the lateral join operator. Currently it's expected LATERAL to co-exists with UNNEST + * operator. Both LATERAL and UNNEST will share a contract with each other defined at {@link LateralContract} + */ +public class LateralJoinBatch extends AbstractBinaryRecordBatch implements LateralContract { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(LateralJoinBatch.class); + + // Input indexes to correctly update the stats + private static final int LEFT_INPUT = 0; + + private static final int RIGHT_INPUT = 1; + + // Maximum number records in the outgoing batch + private int maxOutputRowCount; + + // Schema on the left side + private BatchSchema leftSchema; + + // Schema on the right side + private BatchSchema rightSchema; + + // Index in output batch to populate next row + private int outputIndex; + + // Current index of record in left incoming which is being processed + private int leftJoinIndex = -1; + + // Current index of record in right incoming which is being processed + private int rightJoinIndex = -1; + + // flag to keep track if current left batch needs to be processed in future next call + private boolean processLeftBatchInFuture; + + // Keep track if any matching right record was found for current left index record + private boolean matchedRecordFound; + + private boolean useMemoryManager = true; + + /* + * Public Methods + * / + public LateralJoinBatch(LateralJoinPOP popConfig, FragmentContext context, + RecordBatch left, RecordBatch right) throws OutOfMemoryException { +super(popConfig, context, left, right); +
[GitHub] drill pull request #1212: DRILL-6323: Lateral Join - Initial implementation
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1212#discussion_r182172309 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java --- @@ -186,6 +186,11 @@ private ExecConstants() { public static final String BIT_ENCRYPTION_SASL_ENABLED = "drill.exec.security.bit.encryption.sasl.enabled"; public static final String BIT_ENCRYPTION_SASL_MAX_WRAPPED_SIZE = "drill.exec.security.bit.encryption.sasl.max_wrapped_size"; + /** --- End diff -- Not needed as you don't have code gen ---
[GitHub] drill pull request #1212: DRILL-6323: Lateral Join - Initial implementation
GitHub user sohami opened a pull request: https://github.com/apache/drill/pull/1212 DRILL-6323: Lateral Join - Initial implementation @parthchandra - Please review. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sohami/drill DRILL-6323 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1212.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1212 commit 38b82ce8ebca3b91ba8847229a568daf88f654a3 Author: Sorabh HamirwasiaDate: 2018-03-06T05:09:16Z DRILL-6323: Lateral Join - Initial implementation Refactor Join PopConfigs commit 0931ac974f7be41cd5ca4516524d04c4e5c6245d Author: Sorabh Hamirwasia Date: 2018-02-05T22:46:19Z DRILL-6323: Lateral Join - Initial implementation commit 08b7d38e6ffe4fed492a2ea67c3907f0f514717e Author: Sorabh Hamirwasia Date: 2018-02-20T22:47:48Z DRILL-6323: Lateral Join - Initial implementation Note: Refactor, fix various issues with LATERAL: a)Override prefetch call in BuildSchema phase for LATERAL, b)EMIT handling in buildSchema, c)Issue when in multilevel Lateral case schema change is observed only on right side of UNNEST, d)Handle SelectionVector in incoming batch, e) Handling Schema change, f) Updating joinIndexes correctly when producing multiple output batches for current left inputs. Added tests for a)EMIT handling in buildSchema, b)Multiple UNNEST at same level, c)Multilevel Lateral, d)Multilevel Lateral with Schema change on left/right or both branches, e) Left LATERAL join f)Schema change for UNNEST and Non-UNNEST columns, g)Error outcomes from left, h) Producing multiple output batches for given incoming, i) Consuming multiple incoming into single output batch commit f686877198857a23227031329b2f48c163f85021 Author: Sorabh Hamirwasia Date: 2018-03-09T23:56:22Z DRILL-6323: Lateral Join - Initial implementation Note: Remove codegen and operator template class. Logic to copy data is moved to LateralJoinBatch itself commit cb47407a8f47e7fc10ac04c438f7e903de92a80c Author: Sorabh Hamirwasia Date: 2018-03-13T23:41:03Z DRILL-6323: Lateral Join - Initial implementation Note: Add some debug logs for LateralJoinBatch commit 2ea968e3302ad8bb6d62a8032bd6b6329b900d3a Author: Sorabh Hamirwasia Date: 2018-03-14T23:59:29Z DRILL-6323: Lateral Join - Initial implementation Note: Refactor BatchMemorySize to put outputBatchSize in abstract class. Created a new JoinBatchMemoryManager to be shared across join record batches. Changed merge join to use AbstractBinaryRecordBatch instead of AbstractRecordBatch, and use JoinBatchMemoryManager commit fb521c65f91b69543ae9b5dbb9a727c79f852573 Author: Sorabh Hamirwasia Date: 2018-03-19T19:00:22Z DRILL-6323: Lateral Join - Initial implementation Note: Lateral Join Batch Memory manager support using the record batch sizer ---