Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17638 )
Change subject: WiP: IMPALA-9495: Support struct in select list for ORC tables ...................................................................... Patch Set 9: (13 comments) Thanks a lot for the rework. http://gerrit.cloudera.org:8080/#/c/17638/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17638/4//COMMIT_MSG@58 PS4, Line 58: > Currently the list of use cases this patch opens isn't that long. You can p Okay. I think it is okay to restrict the description to whatever is available even with limitations. Maybe we can extend the first sentence of the commit message "This patch implements the functionality to allow structs in the select list of an inline view, top most block, subquery, ...". http://gerrit.cloudera.org:8080/#/c/17638/4//COMMIT_MSG@64 PS4, Line 64: the newly introduced TupleDescriptor using 'itemTupleId'. > I only find useful to mention testing in the commit message if there was an Maybe a summary of the new tests added is sufficient. http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exec/orc-column-readers.cc@493 PS4, Line 493: return > 'is_given_in_select_list' is to indicate whether the struct was in the sele Okay. Good explanation. In this case, maybe we should rename 'is_given_in_select_list_' to 'read_whole_struct_' to separate the need of the query from the functionality of the struct reader. It is feasible in the future you need to read the whole struct for non-select-list cases. http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/exprs/scalar-expr.h File be/src/exprs/scalar-expr.h: http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/exprs/scalar-expr.h@361 PS7, Line 361: /// ScalarExpr doesn't need a FunctionContext > I am unsure about thread safety -AFAIK we have a single expr tree in a frag Good point. http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/raw-value.cc File be/src/runtime/raw-value.cc: http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/raw-value.cc@223 PS4, Line 223: DCHECK(value != nullptr && tuple != nullptr && slot_desc != nullptr && > Actually, value can't be null here as it is checked already on the callsite I think it is always better to check inside the function than check in the call sites which can be many. http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/types.h File be/src/runtime/types.h: http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/types.h@293 PS4, Line 293: GetByteSize(child_type); > I think the expected size of a struct is never going to be bigger than an a Done. http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/udf/udf.h File be/src/udf/udf.h: http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/udf/udf.h@740 PS7, Line 740: > I have some dilemmas about how to design StructVal. I wonder if we could potentially follow the same model for other data types: an internal class like StringValue in string-value.h and StringVal in udf.h. http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java: http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@239 PS4, Line 239: if (parentStructSlot_ != null) { : toStrHelper.add("parentSlot", parentStructSlot_.getId()); : } > This debug string returned from this method will contain info about the chi Done http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@304 PS4, Line 304: Pair<Integer, Integer> > The commen in L299-302 already describes the return value. Do you think it Okay. Thanks for pointing it out. Maybe make that sentence more clear: "Returns the nullIndicatorByte and nullIndicator bit in a Pair<>". http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@341 PS4, Line 341: the null indicator will be on the top level tuple > With this implementation it's feasible to have any children of the struct o Thanks for the explanation. I think we can say something like the following in the comment. "The total number of bytes for nullable scalar or nested struct fields will be computed for the struct at the top level (i.e., parentStructSlot_ == null).". http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@448 PS4, Line 448: struct's children. > My impression is that if struct is not nullable, some of its children can s Done http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test File testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test: http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test@4 PS4, Line 4: select id, tiny_struct from functional_orc_def.complextypes_structs; > Using the member of structs is already supported and independent of this pa Done http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test@266 PS4, Line 266: : 2,'{"e":null,"f":null}' > This way of joining is a trick to get the content of complex types, includi Done -- To view, visit http://gerrit.cloudera.org:8080/17638 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a Gerrit-Change-Number: 17638 Gerrit-PatchSet: 9 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Wed, 18 Aug 2021 14:47:48 +0000 Gerrit-HasComments: Yes
