Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17638 )

Change subject: WiP: IMPALA-9495: Support struct in select list for ORC tables
......................................................................


Patch Set 9:

(13 comments)

Thanks a lot for the rework.

http://gerrit.cloudera.org:8080/#/c/17638/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17638/4//COMMIT_MSG@58
PS4, Line 58:
> Currently the list of use cases this patch opens isn't that long. You can p
Okay. I think it is okay to restrict the description to whatever is available 
even with limitations.

Maybe we can extend the first sentence of the commit message
"This patch implements the functionality to allow structs in the select
list of an inline view, top most block, subquery, ...".


http://gerrit.cloudera.org:8080/#/c/17638/4//COMMIT_MSG@64
PS4, Line 64: the newly introduced TupleDescriptor using 'itemTupleId'.
> I only find useful to mention testing in the commit message if there was an
Maybe a summary of the new tests added is sufficient.


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/exec/orc-column-readers.cc@493
PS4, Line 493: return
> 'is_given_in_select_list' is to indicate whether the struct was in the sele
Okay. Good explanation.

In this case, maybe we should rename 'is_given_in_select_list_' to 
'read_whole_struct_' to separate the need of the query from the functionality 
of the struct reader. It is feasible in the future you need to read the whole 
struct for non-select-list cases.


http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/exprs/scalar-expr.h@361
PS7, Line 361:   /// ScalarExpr doesn't need a FunctionContext
> I am unsure about thread safety -AFAIK we have a single expr tree in a frag
Good point.


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/raw-value.cc
File be/src/runtime/raw-value.cc:

http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/raw-value.cc@223
PS4, Line 223:   DCHECK(value != nullptr && tuple != nullptr && slot_desc != 
nullptr &&
> Actually, value can't be null here as it is checked already on the callsite
I think it is always better to check inside the function than check in the call 
sites which can be many.


http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/types.h
File be/src/runtime/types.h:

http://gerrit.cloudera.org:8080/#/c/17638/4/be/src/runtime/types.h@293
PS4, Line 293: GetByteSize(child_type);
> I think the expected size of a struct is never going to be bigger than an a
Done.


http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/udf/udf.h
File be/src/udf/udf.h:

http://gerrit.cloudera.org:8080/#/c/17638/7/be/src/udf/udf.h@740
PS7, Line 740:
> I have some dilemmas about how to design StructVal.
I wonder if we could potentially follow the same model for other data types:  
an internal class like StringValue in string-value.h and StringVal in udf.h.


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java:

http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@239
PS4, Line 239:   if (parentStructSlot_ != null) {
             :       toStrHelper.add("parentSlot", parentStructSlot_.getId());
             :     }
> This debug string returned from this method will contain info about the chi
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@304
PS4, Line 304: Pair<Integer, Integer>
> The commen in L299-302 already describes the return value. Do you think it
Okay. Thanks for pointing it out. Maybe make that sentence more clear: "Returns 
the nullIndicatorByte and nullIndicator bit in a Pair<>".


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@341
PS4, Line 341: the null indicator will be on the top level tuple
> With this implementation it's feasible to have any children of the struct o
Thanks for the explanation. I think we can say something like the following in 
the comment.

"The total number of bytes for nullable scalar or nested struct fields will be 
computed for the struct at the top level (i.e., parentStructSlot_ == null).".


http://gerrit.cloudera.org:8080/#/c/17638/4/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@448
PS4, Line 448: struct's children.
> My impression is that if struct is not nullable, some of its children can s
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test
File 
testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test:

http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test@4
PS4, Line 4: select id, tiny_struct from 
functional_orc_def.complextypes_structs;
> Using the member of structs is already supported and independent of this pa
Done


http://gerrit.cloudera.org:8080/#/c/17638/4/testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test@266
PS4, Line 266:
             : 2,'{"e":null,"f":null}'
> This way of joining is a trick to get the content of complex types, includi
Done



--
To view, visit http://gerrit.cloudera.org:8080/17638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a
Gerrit-Change-Number: 17638
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Wed, 18 Aug 2021 14:47:48 +0000
Gerrit-HasComments: Yes

Reply via email to