[GitHub] [drill] paul-rogers commented on a change in pull request #2018: DRILL-7633: Fixes for union and repeated list accessors

2020-03-24 Thread GitBox
paul-rogers commented on a change in pull request #2018: DRILL-7633: Fixes for 
union and repeated list accessors
URL: https://github.com/apache/drill/pull/2018#discussion_r396947139
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/VariantColumnMetadata.java
 ##
 @@ -96,11 +134,13 @@ public StructureType structureType() {
   public boolean isVariant() { return true; }
 
   @Override
-  public boolean isArray() { return type() == MinorType.LIST; }
+  public boolean isArray() {
+return super.isArray() || type() == MinorType.LIST;
+  }
 
   @Override
   public ColumnMetadata cloneEmpty() {
-return new VariantColumnMetadata(name, type, variantSchema.cloneEmpty());
+return new VariantColumnMetadata(name, type, mode, new VariantSchema());
 
 Review comment:
   @arina-ielchiieva, thanks for the reminder; I did miss your response.
   
   I think I see where we had a misunderstanding. I thought the `typeString()` 
code already worked because I saw this in `AbstractColumnMetadata`:
   
   ```
 public String typeString() {
   return majorType().toString();
 }
   ```
   
`PrimitiveColumnMetadata` uses a different set of type names.
   
   It seems this area has gotten a bit muddled: the  `AbstractColumnMetadata` 
version seems to never be called, except for `VariantColumnMetadata`. (This 
confused the heck out of me on an earlier attempt to clean up this area, but I 
didn't clue into the problem then.)
   
   So, now that I understand what you are asking for, I did this:
   
   * Removed the `AbstractColumnMetadata` version.
   * Added a  `VariantColumnMetadata` that produces either `UNION` or 
`ARRAY`.
   
   Until I see a good reason otherwise, I think we should treat `UNION` (what I 
call a "variant") as an opaque type. I guess I don't see the contents of a 
variant as something we want to specify, either in the metastore or in a 
provided schema. For example, can the metastore compute an NDV across an `INT`, 
`VARCHAR` and `MAP`? Probably not. Nor do the min/max values or other stats 
make sense for a variant. As a result, a variant is an opaque type for the 
metastore.
   
   Similarly, in a provided schema, I can't convince myself that the user wants 
not only to say, "the type of this field can vary", but also that "the type can 
be an `INT`, `DOUBLE` or `VARCHAR`, but not a `BIGINT`.) Just does not seem 
super-useful.
   
   An additional concern, which I think I mentioned somewhere else, is that as 
soon as we start serializing complex types, the SQL-like text format becomes 
far too cumbersome. We'd be much better of with a JSON format that can capture 
the complex tree structure. Compare:
   
   ```
   (A UNION)>)
   ```
   With something like:
   ```
   { schema: [
 {name: "a",
 type: {
name: "UNION",
nullable: true,
subtypes: [
 {name: INT, nullable: false},
 {name: MAP, nullable: false},
 members: [ ...
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #2018: DRILL-7633: Fixes for union and repeated list accessors

2020-03-12 Thread GitBox
paul-rogers commented on a change in pull request #2018: DRILL-7633: Fixes for 
union and repeated list accessors
URL: https://github.com/apache/drill/pull/2018#discussion_r391431754
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/VariantColumnMetadata.java
 ##
 @@ -96,11 +134,13 @@ public StructureType structureType() {
   public boolean isVariant() { return true; }
 
   @Override
-  public boolean isArray() { return type() == MinorType.LIST; }
+  public boolean isArray() {
+return super.isArray() || type() == MinorType.LIST;
+  }
 
   @Override
   public ColumnMetadata cloneEmpty() {
-return new VariantColumnMetadata(name, type, variantSchema.cloneEmpty());
+return new VariantColumnMetadata(name, type, mode, new VariantSchema());
 
 Review comment:
   Nicely done! We discussed the problem with `UNION` a while back and realized 
it was a mess. Now I gotta deal with it...
   
   The `typeString()` version is used, I presume, in a provided schema. Correct?
   
   If so, then then the base class implementation is fine, the output will be 
just `UNION`. If we think of Drill's `UNION` as like a Visual Basic or 
PostgreSQL variant (or a Python variable), then the `UNION` type is sufficient: 
a `UNION` can hold anything; no need to restrict what it might or might not 
hold.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services