[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...

2017-06-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/837


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...

2017-06-15 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/837#discussion_r122287615
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java ---
@@ -162,20 +162,22 @@ private boolean majorTypeEqual(MajorType t1, 
MajorType t2) {
* Merge two schema to produce a new, merged schema. The caller is 
responsible
* for ensuring that column names are unique. The order of the fields in 
the
* new schema is the same as that of this schema, with the other 
schema's fields
-   * appended in the order defined in the other schema. The resulting 
selection
-   * vector mode is the same as this schema. (That is, this schema is 
assumed to
-   * be the main part of the batch, possibly with a selection vector, with 
the
-   * other schema representing additional, new columns.)
+   * appended in the order defined in the other schema.
+   * 
+   * Merging data with selection vectors is unlikely to be useful, or work 
well.
--- End diff --

Can you please leave a comment about why this is unlikely to be useful, or 
work well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...

2017-06-15 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/837#discussion_r122287096
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/record/TestVectorContainer.java
 ---
@@ -110,13 +110,16 @@ public void testContainerMerge() {
 RowSet mergedRs = left.merge(right);
 comparison.verifyAndClear(mergedRs);
 
-// Add a selection vector. Ensure the SV appears in the merged
-// result. Test as a row set since container's don't actually
-// carry the selection vector.
+// Add a selection vector. Merging is forbidden.
--- End diff --

Maybe this can be changed to "//Merging data with a selection vector is 
forbidden". As is the comment implies that we are adding a selection vector.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...

2017-06-05 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/837#discussion_r120198724
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java ---
@@ -157,4 +158,26 @@ private boolean majorTypeEqual(MajorType t1, MajorType 
t2) {
 return true;
   }
 
+  /**
+   * Merge two schema to produce a new, merged schema. The caller is 
responsible
+   * for ensuring that column names are unique. The order of the fields in 
the
+   * new schema is the same as that of this schema, with the other 
schema's fields
+   * appended in the order defined in the other schema. The resulting 
selection
+   * vector mode is the same as this schema. (That is, this schema is 
assumed to
+   * be the main part of the batch, possibly with a selection vector, with 
the
+   * other schema representing additional, new columns.)
+   * @param otherSchema the schema to merge with this one
+   * @return the new, merged, schema
+   */
+
+  public BatchSchema merge(BatchSchema otherSchema) {
+if (otherSchema.selectionVectorMode != SelectionVectorMode.NONE &&
+selectionVectorMode != otherSchema.selectionVectorMode) {
+  throw new IllegalArgumentException("Left schema must carry the 
selection vector mode");
+}
+List mergedFields = new ArrayList<>();
--- End diff --

List mergedFields = new ArrayList(this.fields.size() +  
otherSchema.fields.size()) would avoid having to potentially grow the ArrayList 
twice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...

2017-06-05 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/837#discussion_r118797793
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java ---
@@ -157,4 +158,26 @@ private boolean majorTypeEqual(MajorType t1, MajorType 
t2) {
 return true;
   }
 
+  /**
+   * Merge two schema to produce a new, merged schema. The caller is 
responsible
+   * for ensuring that column names are unique. The order of the fields in 
the
+   * new schema is the same as that of this schema, with the other 
schema's fields
+   * appended in the order defined in the other schema. The resulting 
selection
+   * vector mode is the same as this schema. (That is, this schema is 
assumed to
+   * be the main part of the batch, possibly with a selection vector, with 
the
+   * other schema representing additional, new columns.)
+   * @param otherSchema the schema to merge with this one
+   * @return the new, merged, schema
+   */
+
+  public BatchSchema merge(BatchSchema otherSchema) {
+if (otherSchema.selectionVectorMode != SelectionVectorMode.NONE &&
+selectionVectorMode != otherSchema.selectionVectorMode) {
+  throw new IllegalArgumentException("Left schema must carry the 
selection vector mode");
--- End diff --

"Left schema must carry the same selection vector mode"  + "as the right 
schema"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...

2017-05-15 Thread paul-rogers
GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/837

DRILL-5514: Enhance VectorContainer to merge two row sets

Adds ability to merge two schemas and to merge two vector containers,
in each case producing a new, merged result. See DRILL-5514 for details.

Also provides a handy constructor to create a vector container given a
pre-defined schema.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-rogers/drill DRILL-5514

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/837.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #837


commit 5b2ceccd7d002b56b93abbff769bfb96b9ff0ff6
Author: Paul Rogers 
Date:   2017-05-15T22:59:35Z

DRILL-5514: Enhance VectorContainer to merge two row sets

Adds ability to merge two schemas and to merge two vector containers,
in each case producing a new, merged result. See DRILL-5514 for details.

Also provides a handy constructor to create a vector container given a
pre-defined schema.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---