Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17638 )
Change subject: WiP: IMPALA-9495: Support struct in select list for ORC tables ...................................................................... Patch Set 8: (67 comments) Thanks for all the valuable feedback for everyone! The latest patch set is supposed to address all the comments so far. Let me know if I missed anything! http://gerrit.cloudera.org:8080/#/c/17638/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17638/4//COMMIT_MSG@44 PS4, Line 44: > nit. "a row in a row batch" Done http://gerrit.cloudera.org:8080/#/c/17638/4//COMMIT_MSG@58 PS4, Line 58: > Since this patch lays the foundation work for STRUCT support (even though t Currently the list of use cases this patch opens isn't that long. You can provide structs in the select list of a select statement and an inline view created by a WITH() clause can contain structs (however this second use case wasn't an intention of this patch, worked almost out of the box, and has some issues https://issues.apache.org/jira/browse/IMPALA-10838). http://gerrit.cloudera.org:8080/#/c/17638/4//COMMIT_MSG@64 PS4, Line 64: the newly introduced TupleDescriptor using 'itemTupleId'. > nit. May add items into the test section. I only find useful to mention testing in the commit message if there was anything worth to mention apart from the tests written in the patch. Here nothing extra was made apart from the tests I added. http://gerrit.cloudera.org:8080/#/c/17638/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17638/7//COMMIT_MSG@10 PS7, Line 10: JSON > Can you add a separate section for "returning structs to cliens"? I think t Done http://gerrit.cloudera.org:8080/#/c/17638/7//COMMIT_MSG@14 PS7, Line 14: '{"int_struct_member":12,"string_struct_member":"string value"}' > Can you add an example that has a nested struct? Done http://gerrit.cloudera.org:8080/#/c/17638/7//COMMIT_MSG@20 PS7, Line 20: Note, the conversion from struct to JSON happens on the server side > nit: wrap at 72 Done http://gerrit.cloudera.org:8080/#/c/17638/7//COMMIT_MSG@34 PS7, Line 34: And running the following query: > I would move this earlier, as this seems to be the most critical decision. Done http://gerrit.cloudera.org:8080/#/c/17638/7//COMMIT_MSG@35 PS7, Line 35: SELECT id, s FROM tbl; : > Do structs affect the order of slots? Generally we order the slots by size Yes, structs affect the order of the slots in the row batch. The position of a struct in the row batch depends on the total size of that particular struct (the sum of the size of the struct fields.) The struct fields are consecutive to each other in the row batch and their order inside the struct depends on their size as it would work in a regular case for primitives. I'll add a note about this in the commit msg. http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exec/orc-column-readers.h File be/src/exec/orc-column-readers.h: http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exec/orc-column-readers.h@626 PS4, Line 626: > nit. May change to "is to be materialized". see below. http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exec/orc-column-readers.h@627 PS4, Line 627: urrent_write_id_field_in > nit. Maybe renamed as is_to_be_materialized. Actually, there is already a variable for this purpose: 'materialize_tuple_' It's always false for structs, so I'll re-use this for this purpose and drop 'is_given_in_select_list'. http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exec/orc-column-readers.cc@493 PS4, Line 493: return > nit. I am a little confusing by the return here. Does it mean that we only 'is_given_in_select_list' is to indicate whether the struct was in the select list or not. It is possible that this StructReader is for a struct that is not directly given in the select list but used in the query to reference its children. E.g. in this one: select tbl.id, inner_arr.ITEM from functional_orc_def.complextypestbl tbl, tbl.nested_struct.c.d as outer_arr, outer_arr.ITEM as inner_arr; So the difference here is that if the struct column was given in the select list it's not enough jut to delegate the reading to its children as the struct itself could be null as well. Hence, I call the BatchedReader::ReadValueBatch() instead that covers the null values as well and set the null indicator bits accordingly. http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exprs/scalar-expr-evaluator.h File be/src/exprs/scalar-expr-evaluator.h: http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exprs/scalar-expr-evaluator.h@262 PS4, Line 262: non-struc > nit. Maybe rename as childEvaluators_? Done http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exprs/scalar-expr-evaluator.cc File be/src/exprs/scalar-expr-evaluator.cc: http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exprs/scalar-expr-evaluator.cc@96 PS4, Line 96: hildren(), state, p > nit. Could we use children_ directly here? Good point. Done http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/exprs/scalar-expr-evaluator.cc File be/src/exprs/scalar-expr-evaluator.cc: http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/exprs/scalar-expr-evaluator.cc@369 PS7, Line 369: } : case TYPE_STRUCT: { : StructVal v = expr.GetS > optimization: this can potentially lead to several copies and heap calls Done http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/exprs/scalar-expr.h File be/src/exprs/scalar-expr.h: http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/exprs/scalar-expr.h@361 PS7, Line 361: /// ScalarExpr doesn't need a FunctionContext > I am unsure about thread safety -AFAIK we have a single expr tree in a frag I needed to get the evaluator from the expression in SlotRef::GetStructValInterpreted() but I figured out that this is in fact not needed. So I dropped this member. http://gerrit.cloudera.org:8080/#/c/17638/3/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/17638/3/be/src/runtime/buffered-tuple-stream.cc@85 PS3, Line 85: { > Nit: Maybe it would be nicer to use Done http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/descriptors.h@492 PS4, Line 492: > nit. May draw a diagram to help illustrate the data structure. Done http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/descriptors.cc@158 PS4, Line 158: inline bool SlotDescriptor::IsChildOfSt > nit. inline? Done http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/descriptors.cc@342 PS4, Line 342: > nit. then Done http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/runtime/raw-value.h File be/src/runtime/raw-value.h: http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/runtime/raw-value.h@129 PS7, Line 129: /// Wrapper function for Write() to handle struct slots and its children. Additionally, : /// gathers the string slots of the slot tree into 'string_values'. : template <bool COLLECT_STRING_VALS> : static void Write(const void* value, Tuple* tuple, : const SlotDescriptor* slot_desc, MemPool* pool, : std::vector<StringValue*>* string_values); > The two comments seem a bit redundant. oh, the above comment is a leftover I forgot to drop. http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/raw-value.cc File be/src/runtime/raw-value.cc: http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/raw-value.cc@223 PS4, Line 223: DCHECK(value != nullptr && tuple != nullptr && slot_desc != nullptr && > nit. I wonder if we need to handle value being null here. See line of code Actually, value can't be null here as it is checked already on the callsite: tuple.cc:235 before calling RawValue::Write(). On L258 it is a different case as it checks the children of the struct, so it's possible we hit a null child there. http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/raw-value.cc@246 PS4, Line 246: children_tuple > nit. May renamed as children_tuple_desc. Done http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/raw-value.cc@260 PS4, Line 260: } else { : WritePrimitive<COLLECT_STRING_VALS>(src_child, tuple, child_slot, pool, : string_values); : } : } > nit this block of code duplicates the block at 227 - 230. Good observation. I moved these to a separate function. http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/tuple.h File be/src/runtime/tuple.h: http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/tuple.h@253 PS4, Line 253: /// 'slot_desc' describes a struct slot in this tuple. Sets 'slot_desc' to null in this > nit. 'slot_desc' describes a struct slot in this tuple. Done http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/tuple.cc File be/src/runtime/tuple.cc: http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/tuple.cc@237 PS4, Line 237: RawValue::Write<COLLECT_STRING_VALS>(src, this, slot_desc, pool, &string_values); > Might want to push 'COLLECT_STRING_VALS' into RawValue::Write() to prevent Done http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/tuple.cc@246 PS4, Line 246: if (slot_desc->type().IsStructType()) { : SetStructToNull(slot_desc); > nit. May distinguish the two cases and call SetNull() or SetStructToNull(). Done http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/tuple.cc@252 PS4, Line 252: > nit. May typed as "const SlotDescriptor* const" so that we know slot_desc w I haven't seen this format being used in Impala, but Done http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/types.h File be/src/runtime/types.h: http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/types.h@236 PS4, Line 236: static inline int GetDecimalByteSize(int precision) { : DCHECK_GT(precision, 0); : if (precision <= MAX_DECIMAL4_PRECISION) return 4; : if (precision <= MAX_DECIMAL8_PRECISION) return 8; : return 16; : } > nit. May not be necessary to relocate the method. I think this is git interpreting something a bit weird. This function used to be right after the parameterless GetSlotSize() and that is where it is now. http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/types.h@262 PS4, Line 262: /// Helper function for GetSlotSize() so that struct size could be calculated : /// recursively. : static inline int GetSlotSize(const ColumnType& col_type) { : switch (col_type.type) { : case TYPE_STRUCT: { : int struct_size = 0; : for (ColumnType child_type : col_type.children) { : struct_size += GetSlotSize(child_type); : } : return struct_size; : } : case TYPE_STRING: : case TYPE_VARCHAR: : return 12; : case TYPE_CHAR: : case TYPE_FIXED_UDA_INTERMEDIATE: : return col_type.len; : case TYPE_ARRAY: : case TYPE_MAP: : return 12; : default: : return GetByteSize(col_type); : } : } > nit. It may not be necessary to relocate this method. Please refer to GetDe This actually isn't a relocation of the existing function, but introducing a new version of the function that receives a param. The original, parameterless version remains where it had been. http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/types.h@293 PS4, Line 293: GetByteSize(child_type); > nit. I wonder if this call can become expensive with a deeply nested struct I think the expected size of a struct is never going to be bigger than an average number of the columns in a table. This might include some nested structs, however, that means more function calls here. Even though I think the impact is negligible, I prefer to be on the same side, sot as you suggest, I'll keep track of the size, once calculated in member variable. Update: I implemented a member to store the size once it's calculated. Then I ran the test_struct_in_select_list test suite to see how many time this member is used and apparently never. So this means that the size of the struct is only fetched once for query so there won't be any perf impact if we leave this as it is. http://gerrit.cloudera.org:8080/#/c/17638/3/be/src/runtime/types.h File be/src/runtime/types.h: http://gerrit.cloudera.org:8080/#/c/17638/3/be/src/runtime/types.h@264 PS3, Line 264: static inline int GetSlotSize(const ColumnType& col_type) > Nit: this could be static. Done http://gerrit.cloudera.org:8080/#/c/17638/3/be/src/runtime/types.h@288 PS3, Line 288: static inline int GetByteSize(const ColumnType& col_type) { > Nit: this could be static. Done http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/service/query-result-set.cc File be/src/service/query-result-set.cc: http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/service/query-result-set.cc@406 PS4, Line 406: ThriftTColumn col_output; > This line probably should be moved to the block for STRUCT. In the ELSE blo Done http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/udf/udf.h File be/src/udf/udf.h: http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/udf/udf.h@740 PS7, Line 740: > I have some dilemmas about how to design StructVal. thanks for the explantion! I moved StructVal to udf-internal http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/udf/udf.h@745 PS7, Line 745: > Can you add more info? What are the children exactly? Other AnyVals, or mem Done http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/udf/udf.h@770 PS7, Line 770: > StringVal uses the alloc methods of FunctionContext - is there a reason for Changed the code to use FunctionContext for memory reservations. I dropped this destructor as a result. http://gerrit.cloudera.org:8080/#/c/17638/3/be/src/udf/udf.h File be/src/udf/udf.h: http://gerrit.cloudera.org:8080/#/c/17638/3/be/src/udf/udf.h@747 PS3, Line 747: > Would using std::vector not be compatible? Or too much overhead? The reason I didn't use std::vector (even though it would have been way more convenient) is that in udf.h we have to be careful what we include. This is the interface that the users see when they develop UDFs and afaik we can't make them pull in more and more libraries. Note, that currently there is almost no imports in this file, string is the only one included from std library. http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1019 PS4, Line 1019: > nit. in lower case. Done http://gerrit.cloudera.org:8080/#/c/17638/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java: http://gerrit.cloudera.org:8080/#/c/17638/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@248 PS3, Line 248: // For scalar types and structs the materialized path is the same as path_ > Shouldn't we include struct types in the comment? Done http://gerrit.cloudera.org:8080/#/c/17638/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/17638/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@164 PS3, Line 164: > Typo: expects Done http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@164 PS4, Line 164: > nit. expects Done http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@167 PS4, Line 167: > nit. Maybe renamed to expandSlotRefForStruct(). Done http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@180 PS4, Line 180: > nit. May renamed to checkForUnsupportedFieldsForStruct(). Done http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@156 PS7, Line 156: rootTable instanceof FeFsTable > Can you move this to a precondition? We shouldn't see structs if it is not I can't as doing that would result in this query hitting that DCHECK: select v.ts from (select tiny_struct as ts from functional_orc_def.complextypes_structs) v; I didn't make an analysis why, though. http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@182 PS7, Line 182: StructType > typo Done http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java: http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@217 PS4, Line 217: setParentSlot > nit. Maybe renamed to setParentSlotDesc(). Done http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@223 PS4, Line 223: getParentSlot > nit. Maybe renamed to getParentSlotDesc(). Done http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@239 PS4, Line 239: if (parentStructSlot_ != null) { : toStrHelper.add("parentSlot", parentStructSlot_.getId()); : } > Is it possible to get more info about children? This debug string returned from this method will contain info about the children slots as well. See L228-230. http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@304 PS4, Line 304: Pair<Integer, Integer> > nit. May explain the semantics of the pair in comment. The commen in L299-302 already describes the return value. Do you think it should be extended to give more details? http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@341 PS4, Line 341: the null indicator will be on the top level tuple > Will this be a problem when some field in a struct is null? With this implementation it's feasible to have any children of the struct or the struct itself (in any depth in case of nested structs) to be NULL. This comment only says that keeping track of these slots being NULL is on the top level tuple and not on the "struct children tuple". The reason for this is that when NULL-ness is checked on many occasions the top level tuple is used and I wanted to keep it that way. And anyway this approach could save us some bites as there is no extra null indicators needed on struct level. http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@428 PS4, Line 428: adds > nit. adds Done http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@430 PS4, Line 430: addToAvgSerializedSi > nit. May be renamed as addToAvgSerializedSize(). Done http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@448 PS4, Line 448: struct's children. > My impression is that if struct is not nullable, some of its children can s A struct slot desc is nullable so as its children. http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java: http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@430 PS7, Line 430: addToAvgSerializedSi > Can you add some comment about structs? My guess is COMPUTE STATS ignores t Done http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@448 PS7, Line 448: children. > If a struct could be non-nullable we would still need to recursively go thr Did a quick research and apparently only Kudu slots can be nullable but for Kudu we don't support complex types in this use case. Another occassion where a slotDesc can be null is a count(star) optimization but it doesn't apply here. Adding a precondition for structs not being nullable. http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/catalog/StructType.java File fe/src/main/java/org/apache/impala/catalog/StructType.java: http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/main/java/org/apache/impala/catalog/StructType.java@99 PS7, Line 99: of a struc > Can you add a comment about the layout, e.g. "structs are flattened by the Done http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@a2293 PS7, Line 2293: : : : > Is this actually supported? My understanding is that we support returning s It seems that checking for collection types in the select list comes before checking if the underlying file format supports storing complex types. Note, that according to the comment the purpose was to test "CTAS into Kudu table with unsupported types" and here this is what happens. http://gerrit.cloudera.org:8080/#/c/17638/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/17638/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@418 PS3, Line 418: QL ana > Is it still true or is it a scalar OR a struct? indeed, this works now for structs as well. Done http://gerrit.cloudera.org:8080/#/c/17638/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@978 PS3, Line 978: > Nit: are supported Done http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/17638/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@573 PS7, Line 573: addTestTable("create table d.t6 (c map<int,struct<f:int,key:int,value:int>>) " + : "stored as orc"); > Can you add a similar test table with nested structs and check a few SlotRe Done http://gerrit.cloudera.org:8080/#/c/17638/7/testdata/ComplexTypesTbl/structs.orc File testdata/ComplexTypesTbl/structs.orc: http://gerrit.cloudera.org:8080/#/c/17638/7/testdata/ComplexTypesTbl/structs.orc@1 PS7, Line 1: ORC > Do we need to have these data files, can't we create them with Hive? I thin By creating these test with Hive what do you mean exactly. Should I explicitly create the table and add rows right before the struct tests or is there a common way you refer to? http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test File testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test: http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test@56 PS4, Line 56: ---- RESULTS > would be nice to add more cases. Done. Note 3) is not supported, I added a test to check the error msg. About selecting the member of the struct: As I wrote in another comment it is something that has been supported for a while and independent from this patch. http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test File testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test: http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test@4 PS4, Line 4: select id, tiny_struct from functional_orc_def.complextypes_structs; > nit. Is it possible to say this? If so, I'd like to suggest that we add a c Using the member of structs is already supported and independent of this patch. It's allowed to use a struct member as it was a regular column. E.g.: select tiny_struct.b, small_struct.i from complextypes_structs where alltypes.i > 10; As this functionality already existed I didn;t find it useful to add to this tests. http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test@17 PS4, Line 17: order by id; > nit. Can we add one with group by on tiny_struct? Is it supported? Grouping by complex columns is not supported. http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test@153 PS4, Line 153: > nit. Should remove this? I did not see order by clause in the query. Somehow I forgot to add the order by clause to the query. Added it now. http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test@263 PS4, Line 263: > nit. Can we add other join flavors such as equi join or left join? Done http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test@266 PS4, Line 266: : 2,'{"e":null,"f":null}' > nit. Maybe add a comment on the join columns selected from the three tables This way of joining is a trick to get the content of complex types, including arrays, where you join the complex type on the row. Basically, you get an unnest() with this (for arrays). This test serves rather as a regression test that this functionality still works. The columns you see in the results are the fields of the struct. -- To view, visit http://gerrit.cloudera.org:8080/17638 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a Gerrit-Change-Number: 17638 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Wed, 18 Aug 2021 12:28:12 +0000 Gerrit-HasComments: Yes
