Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17638 )

Change subject: WiP: IMPALA-9495: Support struct in select list for ORC tables
......................................................................


Patch Set 8:

(67 comments)

Thanks for all the valuable feedback for everyone! The latest patch set is 
supposed to address all the comments so far. Let me know if I missed anything!

http://gerrit.cloudera.org:8080/#/c/17638/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17638/4//COMMIT_MSG@44
PS4, Line 44:
> nit. "a row in a row batch"
Done


http://gerrit.cloudera.org:8080/#/c/17638/4//COMMIT_MSG@58
PS4, Line 58:
> Since this patch lays the foundation work for STRUCT support (even though t
Currently the list of use cases this patch opens isn't that long. You can 
provide structs in the select list of a select statement and an inline view 
created by a WITH() clause can contain structs (however this second use case 
wasn't an intention of this patch, worked almost out of the box, and has some 
issues https://issues.apache.org/jira/browse/IMPALA-10838).


http://gerrit.cloudera.org:8080/#/c/17638/4//COMMIT_MSG@64
PS4, Line 64: the newly introduced TupleDescriptor using 'itemTupleId'.
> nit. May add items into the test section.
I only find useful to mention testing in the commit message if there was 
anything worth to mention apart from the tests written in the patch. Here 
nothing extra was made apart from the tests I added.


http://gerrit.cloudera.org:8080/#/c/17638/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17638/7//COMMIT_MSG@10
PS7, Line 10: JSON
> Can you add a separate section for "returning structs to cliens"? I think t
Done


http://gerrit.cloudera.org:8080/#/c/17638/7//COMMIT_MSG@14
PS7, Line 14: '{"int_struct_member":12,"string_struct_member":"string value"}'
> Can you add an example that has a nested struct?
Done


http://gerrit.cloudera.org:8080/#/c/17638/7//COMMIT_MSG@20
PS7, Line 20: Note, the conversion from struct to JSON happens on the server 
side
> nit: wrap at 72
Done


http://gerrit.cloudera.org:8080/#/c/17638/7//COMMIT_MSG@34
PS7, Line 34: And running the following query:
> I would move this earlier, as this seems to be the most critical decision.
Done


http://gerrit.cloudera.org:8080/#/c/17638/7//COMMIT_MSG@35
PS7, Line 35: SELECT id, s FROM tbl;
            :
> Do structs affect the order of slots? Generally we order the slots by size
Yes, structs affect the order of the slots in the row batch. The position of a 
struct in the row batch depends on the total size of that particular struct 
(the sum of the size of the struct fields.) The struct fields are consecutive 
to each other in the row batch and their order inside the struct depends on 
their size as it would work in a regular case for primitives.

I'll add a note about this in the commit msg.


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exec/orc-column-readers.h@626
PS4, Line 626:
> nit. May change to "is to be materialized".
see below.


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exec/orc-column-readers.h@627
PS4, Line 627: urrent_write_id_field_in
> nit. Maybe renamed as is_to_be_materialized.
Actually, there is already a variable for this purpose: 'materialize_tuple_' 
It's always false for structs, so I'll re-use this for this purpose and drop 
'is_given_in_select_list'.


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exec/orc-column-readers.cc@493
PS4, Line 493: return
> nit. I am a little confusing by the return here. Does it mean that we only
'is_given_in_select_list' is to indicate whether the struct was in the select 
list or not. It is possible that this StructReader is for a struct that is not 
directly given in the select list but used in the query to reference its 
children. E.g. in this one:

select tbl.id, inner_arr.ITEM
from functional_orc_def.complextypestbl tbl,
    tbl.nested_struct.c.d as outer_arr, outer_arr.ITEM as inner_arr;

So the difference here is that if the struct column was given in the select 
list it's not enough jut to delegate the reading to its children as the struct 
itself could be null as well. Hence, I call the BatchedReader::ReadValueBatch() 
instead that covers the null values as well and set the null indicator bits 
accordingly.


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exprs/scalar-expr-evaluator.h
File be/src/exprs/scalar-expr-evaluator.h:

http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exprs/scalar-expr-evaluator.h@262
PS4, Line 262: non-struc
> nit. Maybe rename as childEvaluators_?
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exprs/scalar-expr-evaluator.cc
File be/src/exprs/scalar-expr-evaluator.cc:

http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exprs/scalar-expr-evaluator.cc@96
PS4, Line 96: hildren(), state, p
> nit. Could we use children_ directly here?
Good point. Done


http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/exprs/scalar-expr-evaluator.cc
File be/src/exprs/scalar-expr-evaluator.cc:

http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/exprs/scalar-expr-evaluator.cc@369
PS7, Line 369:     }
             :     case TYPE_STRUCT: {
             :       StructVal v = expr.GetS
> optimization: this can potentially lead to several copies and heap calls
Done


http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/exprs/scalar-expr.h@361
PS7, Line 361:   /// ScalarExpr doesn't need a FunctionContext
> I am unsure about thread safety -AFAIK we have a single expr tree in a frag
I needed to get the evaluator from the expression in 
SlotRef::GetStructValInterpreted() but I figured out that this is in fact not 
needed. So I dropped this member.


http://gerrit.cloudera.org:8080/#/c/17638/3/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17638/3/be/src/runtime/buffered-tuple-stream.cc@85
PS3, Line 85: {
> Nit: Maybe it would be nicer to use
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/descriptors.h@492
PS4, Line 492:
> nit. May draw a diagram to help illustrate the data structure.
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/descriptors.cc@158
PS4, Line 158: inline bool SlotDescriptor::IsChildOfSt
> nit. inline?
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/descriptors.cc@342
PS4, Line 342:
> nit. then
Done


http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/runtime/raw-value.h
File be/src/runtime/raw-value.h:

http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/runtime/raw-value.h@129
PS7, Line 129:   /// Wrapper function for Write() to handle struct slots and 
its children. Additionally,
             :   /// gathers the string slots of the slot tree into 
'string_values'.
             :   template <bool COLLECT_STRING_VALS>
             :   static void Write(const void* value, Tuple* tuple,
             :       const SlotDescriptor* slot_desc, MemPool* pool,
             :     std::vector<StringValue*>* string_values);
> The two comments seem a bit redundant.
oh, the above comment is a leftover I forgot to drop.


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/raw-value.cc
File be/src/runtime/raw-value.cc:

http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/raw-value.cc@223
PS4, Line 223:   DCHECK(value != nullptr && tuple != nullptr && slot_desc != 
nullptr &&
> nit. I wonder if we need to handle value being null here. See line of code
Actually, value can't be null here as it is checked already on the callsite: 
tuple.cc:235 before calling RawValue::Write().
On L258 it is a different case as it checks the children of the struct, so it's 
possible we hit a null child there.


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/raw-value.cc@246
PS4, Line 246: children_tuple
> nit. May renamed as children_tuple_desc.
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/raw-value.cc@260
PS4, Line 260: } else {
             :       WritePrimitive<COLLECT_STRING_VALS>(src_child, tuple, 
child_slot, pool,
             :           string_values);
             :     }
             :   }
> nit this block of code duplicates the block at 227 - 230.
Good observation. I moved these to a separate function.


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/tuple.h
File be/src/runtime/tuple.h:

http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/tuple.h@253
PS4, Line 253:   /// 'slot_desc' describes a struct slot in this tuple. Sets 
'slot_desc' to null in this
> nit. 'slot_desc' describes a struct slot in this tuple.
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/tuple.cc
File be/src/runtime/tuple.cc:

http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/tuple.cc@237
PS4, Line 237:       RawValue::Write<COLLECT_STRING_VALS>(src, this, slot_desc, 
pool, &string_values);
> Might want to push 'COLLECT_STRING_VALS' into RawValue::Write() to prevent
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/tuple.cc@246
PS4, Line 246:  if (slot_desc->type().IsStructType()) {
             :         SetStructToNull(slot_desc);
> nit. May distinguish the two cases and call SetNull() or SetStructToNull().
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/tuple.cc@252
PS4, Line 252:
> nit. May typed as "const SlotDescriptor* const" so that we know slot_desc w
I haven't seen this format being used in Impala, but Done


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/types.h
File be/src/runtime/types.h:

http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/types.h@236
PS4, Line 236:  static inline int GetDecimalByteSize(int precision) {
             :     DCHECK_GT(precision, 0);
             :     if (precision <= MAX_DECIMAL4_PRECISION) return 4;
             :     if (precision <= MAX_DECIMAL8_PRECISION) return 8;
             :     return 16;
             :   }
> nit. May not be necessary to relocate the method.
I think this is git interpreting something a bit weird. This function used to 
be right after the parameterless GetSlotSize() and that is where it is now.


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/types.h@262
PS4, Line 262: /// Helper function for GetSlotSize() so that struct size could 
be calculated
             :   /// recursively.
             :   static inline int GetSlotSize(const ColumnType& col_type) {
             :     switch (col_type.type) {
             :       case TYPE_STRUCT: {
             :         int struct_size = 0;
             :         for (ColumnType child_type : col_type.children) {
             :           struct_size += GetSlotSize(child_type);
             :         }
             :         return struct_size;
             :       }
             :       case TYPE_STRING:
             :       case TYPE_VARCHAR:
             :         return 12;
             :       case TYPE_CHAR:
             :       case TYPE_FIXED_UDA_INTERMEDIATE:
             :         return col_type.len;
             :       case TYPE_ARRAY:
             :       case TYPE_MAP:
             :         return 12;
             :       default:
             :         return GetByteSize(col_type);
             :     }
             :   }
> nit. It may not be necessary to relocate this method. Please refer to GetDe
This actually isn't a relocation of the existing function, but introducing a 
new version of the function that receives a param. The original, parameterless 
version remains where it had been.


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/types.h@293
PS4, Line 293: GetByteSize(child_type);
> nit. I wonder if this call can become expensive with a deeply nested struct
I think the expected size of a struct is never going to be bigger than an 
average number of the columns in a table. This might include some nested 
structs, however, that means more function calls here. Even though I think the 
impact is negligible, I prefer to be on the same side, sot as you suggest, I'll 
keep track of the size, once calculated in member variable.

Update: I implemented a member to store the size once it's calculated. Then I 
ran the test_struct_in_select_list test suite to see how many time this member 
is used and apparently never. So this means that the size of the struct is only 
fetched once for query so there won't be any perf impact if we leave this as it 
is.


http://gerrit.cloudera.org:8080/#/c/17638/3/be/src/runtime/types.h
File be/src/runtime/types.h:

http://gerrit.cloudera.org:8080/#/c/17638/3/be/src/runtime/types.h@264
PS3, Line 264:   static inline int GetSlotSize(const ColumnType& col_type)
> Nit: this could be static.
Done


http://gerrit.cloudera.org:8080/#/c/17638/3/be/src/runtime/types.h@288
PS3, Line 288:   static inline int GetByteSize(const ColumnType& col_type) {
> Nit: this could be static.
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/service/query-result-set.cc
File be/src/service/query-result-set.cc:

http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/service/query-result-set.cc@406
PS4, Line 406: ThriftTColumn col_output;
> This line probably should be moved to the block for STRUCT. In the ELSE blo
Done


http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/udf/udf.h
File be/src/udf/udf.h:

http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/udf/udf.h@740
PS7, Line 740:
> I have some dilemmas about how to design StructVal.
thanks for the explantion! I moved StructVal to udf-internal


http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/udf/udf.h@745
PS7, Line 745:
> Can you add more info? What are the children exactly? Other AnyVals, or mem
Done


http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/udf/udf.h@770
PS7, Line 770:
> StringVal uses the alloc methods of FunctionContext - is there a reason for
Changed the code to use FunctionContext for memory reservations. I dropped this 
destructor as a result.


http://gerrit.cloudera.org:8080/#/c/17638/3/be/src/udf/udf.h
File be/src/udf/udf.h:

http://gerrit.cloudera.org:8080/#/c/17638/3/be/src/udf/udf.h@747
PS3, Line 747:
> Would using std::vector not be compatible? Or too much overhead?
The reason I didn't use std::vector (even though it would have been way more 
convenient) is that in udf.h we have to be careful what we include. This is the 
interface that the users see when they develop UDFs and afaik we can't make 
them pull in more and more libraries. Note, that currently there is almost no 
imports in this file, string is the only one included from std library.


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1019
PS4, Line 1019:
> nit. in lower case.
Done


http://gerrit.cloudera.org:8080/#/c/17638/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

http://gerrit.cloudera.org:8080/#/c/17638/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@248
PS3, Line 248:     // For scalar types and structs the materialized path is the 
same as path_
> Shouldn't we include struct types in the comment?
Done


http://gerrit.cloudera.org:8080/#/c/17638/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/17638/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@164
PS3, Line 164:
> Typo: expects
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@164
PS4, Line 164:
> nit. expects
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@167
PS4, Line 167:
> nit. Maybe renamed to expandSlotRefForStruct().
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@180
PS4, Line 180:
> nit. May renamed to checkForUnsupportedFieldsForStruct().
Done


http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@156
PS7, Line 156: rootTable instanceof FeFsTable
> Can you move this to a precondition? We shouldn't see structs if it is not
I can't as doing that would result in this query hitting that DCHECK:
select v.ts from
  (select tiny_struct as ts from
      functional_orc_def.complextypes_structs) v;

I didn't make an analysis why, though.


http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@182
PS7, Line 182: StructType
> typo
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java:

http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@217
PS4, Line 217: setParentSlot
> nit. Maybe renamed to setParentSlotDesc().
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@223
PS4, Line 223: getParentSlot
> nit. Maybe renamed to getParentSlotDesc().
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@239
PS4, Line 239:   if (parentStructSlot_ != null) {
             :       toStrHelper.add("parentSlot", parentStructSlot_.getId());
             :     }
> Is it possible to get more info about children?
This debug string returned from this method will contain info about the 
children slots as well. See L228-230.


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@304
PS4, Line 304: Pair<Integer, Integer>
> nit. May explain the semantics of the pair in comment.
The commen in L299-302 already describes the return value. Do you think it 
should be extended to give more details?


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@341
PS4, Line 341: the null indicator will be on the top level tuple
> Will this be a problem when some field in a struct is null?
With this implementation it's feasible to have any children of the struct or 
the struct itself (in any depth in case of nested structs) to be NULL. This 
comment only says that keeping track of these slots being NULL is on the top 
level tuple and not on the "struct children tuple". The reason for this is that 
when NULL-ness is checked on many occasions the top level tuple is used and I 
wanted to keep it that way. And anyway this approach could save us some bites 
as there is no extra null indicators needed on struct level.


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@428
PS4, Line 428: adds
> nit. adds
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@430
PS4, Line 430: addToAvgSerializedSi
> nit. May be renamed as addToAvgSerializedSize().
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@448
PS4, Line 448: struct's children.
> My impression is that if struct is not nullable, some of its children can s
A struct slot desc is nullable so as its children.


http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java:

http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@430
PS7, Line 430: addToAvgSerializedSi
> Can you add some comment about structs? My guess is COMPUTE STATS ignores t
Done


http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@448
PS7, Line 448: children.
> If a struct could be non-nullable we would still need to recursively go thr
Did a quick research and apparently only Kudu slots can be nullable but for 
Kudu we don't support complex types in this use case. Another occassion where a 
slotDesc can be null is a count(star) optimization but it doesn't apply here.

Adding a precondition for structs not being nullable.


http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/catalog/StructType.java
File fe/src/main/java/org/apache/impala/catalog/StructType.java:

http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/catalog/StructType.java@99
PS7, Line 99:  of a struc
> Can you add a comment about the layout, e.g. "structs are flattened by the
Done


http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@a2293
PS7, Line 2293:
              :
              :
              :
> Is this actually supported? My understanding is that we support returning s
It seems that checking for collection types in the select list comes before 
checking if the underlying file format supports storing complex types. Note, 
that according to the comment the purpose was to test "CTAS into Kudu table 
with unsupported types" and here this is what happens.


http://gerrit.cloudera.org:8080/#/c/17638/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/17638/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@418
PS3, Line 418: QL ana
> Is it still true or is it a scalar OR a struct?
indeed, this works now for structs as well. Done


http://gerrit.cloudera.org:8080/#/c/17638/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@978
PS3, Line 978:
> Nit: are supported
Done


http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@573
PS7, Line 573:     addTestTable("create table d.t6 (c 
map<int,struct<f:int,key:int,value:int>>) " +
             :         "stored as orc");
> Can you add a similar test table with nested structs and check a few SlotRe
Done


http://gerrit.cloudera.org:8080/#/c/17638/7/testdata/ComplexTypesTbl/structs.orc
File testdata/ComplexTypesTbl/structs.orc:

http://gerrit.cloudera.org:8080/#/c/17638/7/testdata/ComplexTypesTbl/structs.orc@1
PS7, Line 1: ORC
> Do we need to have these data files, can't we create them with Hive? I thin
By creating these test with Hive what do you mean exactly. Should I explicitly 
create the table and add rows right before the struct tests or is there a 
common way you refer to?


http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test:

http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test@56
PS4, Line 56: ---- RESULTS
> would be nice to add more cases.
Done.
Note 3) is not supported, I added a test to check the error msg.

About selecting the member of the struct: As I wrote in another comment it is 
something that has been supported for a while and independent from this patch.


http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test
File 
testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test:

http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test@4
PS4, Line 4: select id, tiny_struct from 
functional_orc_def.complextypes_structs;
> nit. Is it possible to say this? If so, I'd like to suggest that we add a c
Using the member of structs is already supported and independent of this patch. 
It's allowed to use a struct member as it was a regular column. E.g.:
select tiny_struct.b, small_struct.i from complextypes_structs where alltypes.i 
> 10;

As this functionality already existed I didn;t find it useful to add to this 
tests.


http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test@17
PS4, Line 17:  order by id;
> nit. Can we add one with group by on tiny_struct? Is it supported?
Grouping by complex columns is not supported.


http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test@153
PS4, Line 153:
> nit. Should remove this? I did not see order by clause in the query.
Somehow I forgot to add the order by clause to the query. Added it now.


http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test@263
PS4, Line 263:
> nit. Can we add other join flavors such as equi join or left join?
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test@266
PS4, Line 266:
             : 2,'{"e":null,"f":null}'
> nit. Maybe add a comment on the join columns selected from the three tables
This way of joining is a trick to get the content of complex types, including 
arrays, where you join the complex type on the row. Basically, you get an 
unnest() with this (for arrays). This test serves rather as a regression test 
that this functionality still works. The columns you see in the results are the 
fields of the struct.



--
To view, visit http://gerrit.cloudera.org:8080/17638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a
Gerrit-Change-Number: 17638
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Wed, 18 Aug 2021 12:28:12 +0000
Gerrit-HasComments: Yes

Reply via email to