[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/837 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...
Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/837#discussion_r122287615 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java --- @@ -162,20 +162,22 @@ private boolean majorTypeEqual(MajorType t1, MajorType t2) { * Merge two schema to produce a new, merged schema. The caller is responsible * for ensuring that column names are unique. The order of the fields in the * new schema is the same as that of this schema, with the other schema's fields - * appended in the order defined in the other schema. The resulting selection - * vector mode is the same as this schema. (That is, this schema is assumed to - * be the main part of the batch, possibly with a selection vector, with the - * other schema representing additional, new columns.) + * appended in the order defined in the other schema. + * + * Merging data with selection vectors is unlikely to be useful, or work well. --- End diff -- Can you please leave a comment about why this is unlikely to be useful, or work well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...
Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/837#discussion_r122287096 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/record/TestVectorContainer.java --- @@ -110,13 +110,16 @@ public void testContainerMerge() { RowSet mergedRs = left.merge(right); comparison.verifyAndClear(mergedRs); -// Add a selection vector. Ensure the SV appears in the merged -// result. Test as a row set since container's don't actually -// carry the selection vector. +// Add a selection vector. Merging is forbidden. --- End diff -- Maybe this can be changed to "//Merging data with a selection vector is forbidden". As is the comment implies that we are adding a selection vector. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...
Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/837#discussion_r120198724 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java --- @@ -157,4 +158,26 @@ private boolean majorTypeEqual(MajorType t1, MajorType t2) { return true; } + /** + * Merge two schema to produce a new, merged schema. The caller is responsible + * for ensuring that column names are unique. The order of the fields in the + * new schema is the same as that of this schema, with the other schema's fields + * appended in the order defined in the other schema. The resulting selection + * vector mode is the same as this schema. (That is, this schema is assumed to + * be the main part of the batch, possibly with a selection vector, with the + * other schema representing additional, new columns.) + * @param otherSchema the schema to merge with this one + * @return the new, merged, schema + */ + + public BatchSchema merge(BatchSchema otherSchema) { +if (otherSchema.selectionVectorMode != SelectionVectorMode.NONE && +selectionVectorMode != otherSchema.selectionVectorMode) { + throw new IllegalArgumentException("Left schema must carry the selection vector mode"); +} +List mergedFields = new ArrayList<>(); --- End diff -- List mergedFields = new ArrayList(this.fields.size() + otherSchema.fields.size()) would avoid having to potentially grow the ArrayList twice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...
Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/837#discussion_r118797793 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java --- @@ -157,4 +158,26 @@ private boolean majorTypeEqual(MajorType t1, MajorType t2) { return true; } + /** + * Merge two schema to produce a new, merged schema. The caller is responsible + * for ensuring that column names are unique. The order of the fields in the + * new schema is the same as that of this schema, with the other schema's fields + * appended in the order defined in the other schema. The resulting selection + * vector mode is the same as this schema. (That is, this schema is assumed to + * be the main part of the batch, possibly with a selection vector, with the + * other schema representing additional, new columns.) + * @param otherSchema the schema to merge with this one + * @return the new, merged, schema + */ + + public BatchSchema merge(BatchSchema otherSchema) { +if (otherSchema.selectionVectorMode != SelectionVectorMode.NONE && +selectionVectorMode != otherSchema.selectionVectorMode) { + throw new IllegalArgumentException("Left schema must carry the selection vector mode"); --- End diff -- "Left schema must carry the same selection vector mode" + "as the right schema"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...
GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/837 DRILL-5514: Enhance VectorContainer to merge two row sets Adds ability to merge two schemas and to merge two vector containers, in each case producing a new, merged result. See DRILL-5514 for details. Also provides a handy constructor to create a vector container given a pre-defined schema. You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-5514 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/837.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #837 commit 5b2ceccd7d002b56b93abbff769bfb96b9ff0ff6 Author: Paul RogersDate: 2017-05-15T22:59:35Z DRILL-5514: Enhance VectorContainer to merge two row sets Adds ability to merge two schemas and to merge two vector containers, in each case producing a new, merged result. See DRILL-5514 for details. Also provides a handy constructor to create a vector container given a pre-defined schema. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---