[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329585#comment-16329585 ] ASF GitHub Bot commented on DRILL-5993: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r162196004 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/AbstractCopier.java --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.svremover; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.exception.SchemaChangeException; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.VectorContainer; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.vector.AllocationHelper; +import org.apache.drill.exec.vector.ValueVector; + +public abstract class AbstractCopier implements Copier { + protected ValueVector[] vvOut; + protected VectorContainer outgoing; + + @Override + public void setup(RecordBatch incoming, VectorContainer outgoing) throws SchemaChangeException { --- End diff -- It is not used by the AbstractCopier itself but it is used by the AbstractSV2Copier and AbstractSV4Copier which override the method. > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329563#comment-16329563 ] ASF GitHub Bot commented on DRILL-5993: --- Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r162193983 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/AbstractCopier.java --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.svremover; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.exception.SchemaChangeException; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.VectorContainer; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.vector.AllocationHelper; +import org.apache.drill.exec.vector.ValueVector; + +public abstract class AbstractCopier implements Copier { + protected ValueVector[] vvOut; + protected VectorContainer outgoing; + + @Override + public void setup(RecordBatch incoming, VectorContainer outgoing) throws SchemaChangeException { --- End diff -- The "incoming" parameter is not used anywhere ... > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329449#comment-16329449 ] ASF GitHub Bot commented on DRILL-5993: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r162175059 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/AbstractSV4Copier.java --- @@ -17,32 +17,30 @@ */ package org.apache.drill.exec.physical.impl.svremover; -import javax.inject.Named; - import org.apache.drill.common.types.TypeProtos.MajorType; import org.apache.drill.common.types.Types; import org.apache.drill.exec.exception.SchemaChangeException; -import org.apache.drill.exec.ops.FragmentContext; import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.VectorContainer; import org.apache.drill.exec.record.VectorWrapper; import org.apache.drill.exec.record.selection.SelectionVector4; import org.apache.drill.exec.vector.AllocationHelper; +import org.apache.drill.exec.vector.ValueVector; -public abstract class CopierTemplate4 implements Copier{ - static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(CopierTemplate4.class); +public abstract class AbstractSV4Copier implements Copier { --- End diff -- Done > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328243#comment-16328243 ] ASF GitHub Bot commented on DRILL-5993: --- Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r161950855 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/AbstractSV4Copier.java --- @@ -17,32 +17,30 @@ */ package org.apache.drill.exec.physical.impl.svremover; -import javax.inject.Named; - import org.apache.drill.common.types.TypeProtos.MajorType; import org.apache.drill.common.types.Types; import org.apache.drill.exec.exception.SchemaChangeException; -import org.apache.drill.exec.ops.FragmentContext; import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.VectorContainer; import org.apache.drill.exec.record.VectorWrapper; import org.apache.drill.exec.record.selection.SelectionVector4; import org.apache.drill.exec.vector.AllocationHelper; +import org.apache.drill.exec.vector.ValueVector; -public abstract class CopierTemplate4 implements Copier{ - static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(CopierTemplate4.class); +public abstract class AbstractSV4Copier implements Copier { --- End diff -- The code for AbstractSV4Copier is nearly identical to the code of AbstractSV2Copier; I wonder if we could have combined the two somehow (like a super class ??) > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328038#comment-16328038 ] ASF GitHub Bot commented on DRILL-5993: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1057 Addressed review comments and rebased. > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327970#comment-16327970 ] ASF GitHub Bot commented on DRILL-5993: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r161913565 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/svremover/GenericCopierTest.java --- @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.svremover; + +import org.apache.drill.exec.memory.RootAllocator; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.test.rowSet.RowSet; +import org.apache.drill.test.rowSet.RowSetBuilder; + +public class GenericCopierTest extends AbstractGenericCopierTest { --- End diff -- The generated code uses the copyFromSafe methods in code generated copiers. Your copyEntry method which is used by the GenericCopiers also uses the copyFromSafe method. So we should inherit all the same supported types. I think if all the unit tests, functional tests, and QA tests pass that will be a sufficient enough vote of confidence. Ideally we would have unit tests for all the operators (not just the copiers) which use the complex data types, but there is significant amount of work left to be able to effectively unit tests with complex data types and adding all the machinery to do so is out of the scope of this PR. > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16299378#comment-16299378 ] ASF GitHub Bot commented on DRILL-5993: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r158179561 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/svremover/GenericCopierTest.java --- @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.svremover; + +import org.apache.drill.exec.memory.RootAllocator; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.test.rowSet.RowSet; +import org.apache.drill.test.rowSet.RowSetBuilder; + +public class GenericCopierTest extends AbstractGenericCopierTest { --- End diff -- One reason I was a bit hesitant to add this support earlier is that we need this to work for ALL Drill's data types, including the obscure ones. Clearly, this mechanism works for scalars. But, what about: * Arrays of scalars * Maps * Arrays of maps * Unions * Lists of unions * Lists of scalars * Lists of maps * Repeated lists of scalars * Repeated lists of maps * Repeated lists of repeated lists * Repeated lists of unions There are probably a few more that I'm overlooking. Until we test ALL of these, we can't be sure if the new code works as well as the old code. > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16299377#comment-16299377 ] ASF GitHub Bot commented on DRILL-5993: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r158179222 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java --- @@ -353,6 +353,23 @@ public int getRecordCount() { public boolean hasRecordCount() { return recordCount != -1; } + /** + * This works with non-hyper {@link VectorContainer}s which have no selection vectors. + * Appends a row taken from a source {@link VectorContainer} to this {@link VectorContainer}. + * @param srcContainer The {@link VectorContainer} to copy a row from. + * @param srcIndex The index of the row to copy from the source {@link VectorContainer}. + */ + public void appendRow(VectorContainer srcContainer, int srcIndex) { +for (int vectorIndex = 0; vectorIndex < wrappers.size(); vectorIndex++) { + ValueVector destVector = wrappers.get(vectorIndex).getValueVector(); + ValueVector srcVector = srcContainer.wrappers.get(vectorIndex).getValueVector(); + + destVector.copyEntry(recordCount, srcVector, srcIndex); +} + +recordCount++; --- End diff -- I would not advise changing the meaning of -1; that is baked into the code in many places. Instead, before adding the first row, simply set the count to 0 when the target container is created. > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297542#comment-16297542 ] ASF GitHub Bot commented on DRILL-5993: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r157894540 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/Copier.java --- @@ -19,13 +19,15 @@ import org.apache.drill.exec.compile.TemplateClassDefinition; import org.apache.drill.exec.exception.SchemaChangeException; -import org.apache.drill.exec.ops.FragmentContext; import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.VectorContainer; public interface Copier { - public static TemplateClassDefinition TEMPLATE_DEFINITION2 = new TemplateClassDefinition(Copier.class, CopierTemplate2.class); - public static TemplateClassDefinition TEMPLATE_DEFINITION4 = new TemplateClassDefinition(Copier.class, CopierTemplate4.class); + TemplateClassDefinition TEMPLATE_DEFINITION2 = new TemplateClassDefinition(Copier.class, CopierTemplate2.class); + TemplateClassDefinition TEMPLATE_DEFINITION4 = new TemplateClassDefinition(Copier.class, CopierTemplate4.class); --- End diff -- I will create a separate PR to do that since that change is unrelated to this PR > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297541#comment-16297541 ] ASF GitHub Bot commented on DRILL-5993: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r157894423 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/CopierTemplate2.java --- @@ -53,17 +51,32 @@ public int copyRecords(int index, int recordCount) throws SchemaChangeException } } -int outgoingPosition = 0; +return insertRecords(0, index, recordCount); + } + + @Override + public int appendRecord(int index) throws SchemaChangeException { +return appendRecords(index, 1); + } + + @Override + public int appendRecords(int index, int recordCount) throws SchemaChangeException { +return insertRecords(outgoing.getRecordCount(), index, recordCount); + } + + private int insertRecords(int outgoingPosition, int index, int recordCount) throws SchemaChangeException { +final int endIndex = index + recordCount; -for(int svIndex = index; svIndex < index + recordCount; svIndex++, outgoingPosition++){ +for(int svIndex = index; svIndex < endIndex; svIndex++, outgoingPosition++){ doEval(sv2.getIndex(svIndex), outgoingPosition); } + +outgoing.setRecordCount(outgoingPosition); return outgoingPosition; } - public abstract void doSetup(@Named("context") FragmentContext context, - @Named("incoming") RecordBatch incoming, - @Named("outgoing") RecordBatch outgoing) + public abstract void doSetup(@Named("incoming") RecordBatch incoming, + @Named("outgoing") VectorContainer outgoing) --- End diff -- The copiers are only used in the SVRemover and TopN operator. I have replaced the code generated copiers in both now to use the GenericCopiers. > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297299#comment-16297299 ] ASF GitHub Bot commented on DRILL-5993: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r157853697 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/CopierTemplate4.java --- @@ -54,17 +52,33 @@ public int copyRecords(int index, int recordCount) throws SchemaChangeException } } -int outgoingPosition = 0; -for(int svIndex = index; svIndex < index + recordCount; svIndex++, outgoingPosition++){ +return insertRecords(0, index, recordCount); + } + + @Override + public int appendRecord(int index) throws SchemaChangeException { +return appendRecords(index, 1); + } --- End diff -- Updated the code and made an implementation of appendRecord which doesn't use a for loop > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296038#comment-16296038 ] ASF GitHub Bot commented on DRILL-5993: --- Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r157646636 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/CopierTemplate4.java --- @@ -54,17 +52,33 @@ public int copyRecords(int index, int recordCount) throws SchemaChangeException } } -int outgoingPosition = 0; -for(int svIndex = index; svIndex < index + recordCount; svIndex++, outgoingPosition++){ +return insertRecords(0, index, recordCount); + } + + @Override + public int appendRecord(int index) throws SchemaChangeException { +return appendRecords(index, 1); + } --- End diff -- Minor performance comment: If *appendRecord()* is to be used heavily, then the overhead of calling the for() loop (in insertRecords() ) many times for only a size of 1 may be significant. > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296037#comment-16296037 ] ASF GitHub Bot commented on DRILL-5993: --- Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r157646058 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/CopierTemplate2.java --- @@ -53,17 +51,32 @@ public int copyRecords(int index, int recordCount) throws SchemaChangeException } } -int outgoingPosition = 0; +return insertRecords(0, index, recordCount); + } + + @Override + public int appendRecord(int index) throws SchemaChangeException { +return appendRecords(index, 1); + } + + @Override + public int appendRecords(int index, int recordCount) throws SchemaChangeException { +return insertRecords(outgoing.getRecordCount(), index, recordCount); + } + + private int insertRecords(int outgoingPosition, int index, int recordCount) throws SchemaChangeException { +final int endIndex = index + recordCount; -for(int svIndex = index; svIndex < index + recordCount; svIndex++, outgoingPosition++){ +for(int svIndex = index; svIndex < endIndex; svIndex++, outgoingPosition++){ doEval(sv2.getIndex(svIndex), outgoingPosition); } + +outgoing.setRecordCount(outgoingPosition); return outgoingPosition; } - public abstract void doSetup(@Named("context") FragmentContext context, - @Named("incoming") RecordBatch incoming, - @Named("outgoing") RecordBatch outgoing) + public abstract void doSetup(@Named("incoming") RecordBatch incoming, + @Named("outgoing") VectorContainer outgoing) --- End diff -- As the "generated code" methods are replaced with "in line" methods, we should remove these abstract signatures (both *doSetup* and *doEval*) to avoid generating code, and to distinguish these methods. Maybe we should _rename_ them as well to make the distinction clear. Then probably also eliminate the "code generation" code in **getGenerated2Copier()** and **getGenerated4Copier()**. > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296039#comment-16296039 ] ASF GitHub Bot commented on DRILL-5993: --- Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r157616077 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/Copier.java --- @@ -19,13 +19,15 @@ import org.apache.drill.exec.compile.TemplateClassDefinition; import org.apache.drill.exec.exception.SchemaChangeException; -import org.apache.drill.exec.ops.FragmentContext; import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.VectorContainer; public interface Copier { - public static TemplateClassDefinition TEMPLATE_DEFINITION2 = new TemplateClassDefinition(Copier.class, CopierTemplate2.class); - public static TemplateClassDefinition TEMPLATE_DEFINITION4 = new TemplateClassDefinition(Copier.class, CopierTemplate4.class); + TemplateClassDefinition TEMPLATE_DEFINITION2 = new TemplateClassDefinition(Copier.class, CopierTemplate2.class); + TemplateClassDefinition TEMPLATE_DEFINITION4 = new TemplateClassDefinition(Copier.class, CopierTemplate4.class); --- End diff -- Just for cleanliness, should do the same (remove all the public/static/abstract/final ) in **filter/Filterer.java** as well ! > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296036#comment-16296036 ] ASF GitHub Bot commented on DRILL-5993: --- Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r157612792 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java --- @@ -353,6 +353,23 @@ public int getRecordCount() { public boolean hasRecordCount() { return recordCount != -1; } + /** + * This works with non-hyper {@link VectorContainer}s which have no selection vectors. + * Appends a row taken from a source {@link VectorContainer} to this {@link VectorContainer}. + * @param srcContainer The {@link VectorContainer} to copy a row from. + * @param srcIndex The index of the row to copy from the source {@link VectorContainer}. + */ + public void appendRow(VectorContainer srcContainer, int srcIndex) { +for (int vectorIndex = 0; vectorIndex < wrappers.size(); vectorIndex++) { + ValueVector destVector = wrappers.get(vectorIndex).getValueVector(); + ValueVector srcVector = srcContainer.wrappers.get(vectorIndex).getValueVector(); + + destVector.copyEntry(recordCount, srcVector, srcIndex); +} + +recordCount++; --- End diff -- (1) Need to change the initial value of *recordCount* from -1 to 0 ; (2) That "-1" is used to note "uninitialized". So need to add a boolean flag for that purpose. (3) Please change the *appendRow()* signature to return *recordCount*. (E.g., so the caller would know when the target container is full). > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275197#comment-16275197 ] ASF GitHub Bot commented on DRILL-5993: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1057 @Ben-Zvi @paul-rogers As per our discussion I've added Paul's generic copiers and have provided a GenericCopier that works for RecordBatches that do not have selection vectors. I've enhanced the copier interface to include append methods. > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271921#comment-16271921 ] ASF GitHub Bot commented on DRILL-5993: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1057 Please let me know if I should make any more changes. > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271919#comment-16271919 ] ASF GitHub Bot commented on DRILL-5993: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1057 @paul-rogers since HashJoin does not need to support Selection Vectors maybe we can postpone adding the corresponding appendRow methods if and when they are needed. I suspect by the time anyone needs those methods we will have already migrated over to the new batch framework. > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271912#comment-16271912 ] ASF GitHub Bot commented on DRILL-5993: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r153958856 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/record/TestVectorContainer.java --- @@ -124,4 +132,52 @@ public void testContainerMerge() { leftIndirect.clear(); right.clear(); } + + @Test + public void testAppendRow() + { +MaterializedField colA = MaterializedField.create("colA", Types.required(TypeProtos.MinorType.INT)); +MaterializedField colB = MaterializedField.create("colB", Types.required(TypeProtos.MinorType.INT)); --- End diff -- Added more interesting types. Currently RowSet classes don't support the Map data type. Paul asked me to look into adding support for this a while ago for DRILL-5870 . I'll update the test framework to support that in the next PR. > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271670#comment-16271670 ] ASF GitHub Bot commented on DRILL-5993: --- Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r153934729 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java --- @@ -353,6 +353,23 @@ public int getRecordCount() { public boolean hasRecordCount() { return recordCount != -1; } + /** + * This works with non-hyper {@link VectorContainer}s which have no selection vectors. + * Appends a row taken from a source {@link VectorContainer} to this {@link VectorContainer}. + * @param srcContainer The {@link VectorContainer} to copy a row from. + * @param srcIndex The index of the row to copy from the source {@link VectorContainer}. + */ + public void appendRow(VectorContainer srcContainer, int srcIndex) { +for (int vectorIndex = 0; vectorIndex < wrappers.size(); vectorIndex++) { + ValueVector destVector = wrappers.get(vectorIndex).getValueVector(); + ValueVector srcVector = srcContainer.wrappers.get(vectorIndex).getValueVector(); + + destVector.copyEntry(recordCount, srcVector, srcIndex); +} + +recordCount++; --- End diff -- The immediate need for appendRow() is to distribute rows from a single incoming batch into multiple other batches (for the Hash Join internal partitioning), based on the hash value of the key columns at each row. This would not work well with the second suggestion (vectorizing - column 1, column 2, etc.) > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271671#comment-16271671 ] ASF GitHub Bot commented on DRILL-5993: --- Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r153935071 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/record/TestVectorContainer.java --- @@ -124,4 +132,52 @@ public void testContainerMerge() { leftIndirect.clear(); right.clear(); } + + @Test + public void testAppendRow() + { +MaterializedField colA = MaterializedField.create("colA", Types.required(TypeProtos.MinorType.INT)); +MaterializedField colB = MaterializedField.create("colB", Types.required(TypeProtos.MinorType.INT)); --- End diff -- Maybe add some "interesting" datatypes ? Testing integers only may miss some issue. > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271600#comment-16271600 ] ASF GitHub Bot commented on DRILL-5993: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1057#discussion_r153926390 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java --- @@ -353,6 +353,23 @@ public int getRecordCount() { public boolean hasRecordCount() { return recordCount != -1; } + /** + * This works with non-hyper {@link VectorContainer}s which have no selection vectors. + * Appends a row taken from a source {@link VectorContainer} to this {@link VectorContainer}. + * @param srcContainer The {@link VectorContainer} to copy a row from. + * @param srcIndex The index of the row to copy from the source {@link VectorContainer}. + */ + public void appendRow(VectorContainer srcContainer, int srcIndex) { +for (int vectorIndex = 0; vectorIndex < wrappers.size(); vectorIndex++) { + ValueVector destVector = wrappers.get(vectorIndex).getValueVector(); + ValueVector srcVector = srcContainer.wrappers.get(vectorIndex).getValueVector(); + + destVector.copyEntry(recordCount, srcVector, srcIndex); +} + +recordCount++; --- End diff -- This is OK for a row-by-row copy. But, you'll get better performance if you optimize for the entire batch. Because you have no SV4, the source and dest batches are the same so the vectors can be preloaded into an array of vectors to avoid the vector wrapper lookup per column. Plus, if the code is written per batch, you can go a step further: vectorize the operation. Copy all values for column 1, then all for column 2, and so on. (In this case, you only get each vector once, so sticking with the wrappers is fine.) By vectorizing, you may get the vectorized cache-locality benefit that Drill promises from its operations. Worth a try to see if you get any speed-up. > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271587#comment-16271587 ] ASF GitHub Bot commented on DRILL-5993: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1057 To answer the two questions: 1. The copier is used in multiple locations, some of which include selection vectors. Sort uses a copier to merge rows coming from multiple sorted batches. The SVR compresses out SVs. A filter will produce an SV2 which the SVR removes. An in-memory sort produces an SV4. But, because of the ways plans are generated, the hash join will never see a batch with an SV. (An SVR will be inserted, if needed, to remove the SV.) 2. We never write a batch using an SV. The SV is always a source indirection. Because we do indirection on the source side (and vectors are append only), there can be no SV on the destination side. Note also that the {{VectorContainer}} class, despite it's API, knows nothing about SVs. The SV is tacked on separately by the {{RecordBatch}}. (This is a less-than-ideal design, but it is how things work at present.) FWIW, the test-oriented {{RowSet}} abstractions came about as wrappers around both the {{VectorContainer}} and SV to provide a unified view. Because of how we do SVs, you'll need three copy methods: one for no SV, one for an SV2 and another for an SV4. In the fullness of time, the new "column reader" and "column writer" abstractions will hide all this stuff, but it will take time before those tools come online. > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271512#comment-16271512 ] ASF GitHub Bot commented on DRILL-5993: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1057 @Ben-Zvi @paul-rogers > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
[ https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271511#comment-16271511 ] ASF GitHub Bot commented on DRILL-5993: --- GitHub user ilooner opened a pull request: https://github.com/apache/drill/pull/1057 DRILL-5993 Append Row Method For VectorContainer (WIP) ## Motivation HashJoin requires a method that can take a row from a VectorContainer and append it to a destination VectorContainer. This is a WIP and this PR is mainly opened to improve my understanding. ## Implementation This is an initial implementation that works with simple VectorContainers that are not hyper batches and do not have selection vectors. It is also assumed that the user called **SchemaUtil.coerceContainer** on the source VectorContainer before using the newly added **appendRow** method. ## Questions - Do we have to worry about selection vectors in the source container? - Do we have to think about support hyper batches in the destination container? You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilooner/drill DRILL-5993 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1057.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1057 commit ee43d6808562a1ff60c17fa7622b8358b63c7276 Author: Timothy FarkasDate: 2017-11-29T20:38:41Z - Initial Implementation of append row for a vector container > Allow Copier to Copy a Record and Append to the End of an Outgoing Batch > > > Key: DRILL-5993 > URL: https://issues.apache.org/jira/browse/DRILL-5993 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Currently the copier can only copy record from an incoming batch to the > beginning of an outgoing batch. We need to be able to copy a record and > append it to the end of the outgoing batch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)