Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17847 )
Change subject: IMPALA-10838: Error when struct returned from WITH() ...................................................................... Patch Set 9: (20 comments) http://gerrit.cloudera.org:8080/#/c/17847/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17847/9//COMMIT_MSG@7 PS9, Line 7: IMPALA-10838: Error when struct returned from WITH() I wouldn't say I understand everything that happens in ExprSubstitutionMap but left some comments. Nice work to fill this feature gap, anyway! http://gerrit.cloudera.org:8080/#/c/17847/9//COMMIT_MSG@29 PS9, Line 29: in expression substitutions (in : org.apache.impala.analysis.Expr.substituteImpl), if there is no : substitution mapping for a field of a struct but there is a mapping : for an enclosing struct, find the mapping for that enclosing struct : and from its subexpressions use the one corresponding to the : original expression for substitution If I'm not mistaken with this approach every time you unsuccessfully try to substitute a field of a struct you then find the enclosing field and then try the substitution from that angle. Wouldn't it be more performant to populate the substitute map in a way to hold the fields of a struct from the beginning? http://gerrit.cloudera.org:8080/#/c/17847/9//COMMIT_MSG@44 PS9, Line 44: Tests Would it make sense to somehow assert on the 'tuple-size' in the scan node? (in a planner test maybe?) This way we could see if querying a struct member actually reads the whole struct or not. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/Expr.java@1148 PS9, Line 1148: SlotRef overrides this method : // so we need no check here. This part in my opinion is only relevant when the reader knew that here was an 'if' that was removed. Otherwise this sentence just brings confusion. In case ResetAnalysisState() is overridden in SlotRef we should comment its behaviour here because in case it changes later on we won't know that this comment has to be rewritten as well. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/Expr.java@1154 PS9, Line 1154: protected final void callSubstituteImplOnChildren(ExprSubstitutionMap smap, This doesn't add much just extracts the for loop into a different function. I don't feel that this brings more readability and seems an overkill at first sight. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java File fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java: http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@106 PS9, Line 106: SlotRef enclosingStruct = findEnclosingStruct(structField, remainingPath); If you read just this 4 lines of code you would have no idea what 'remainingPath' is when you perform a findEnclosingStruct() function. This hurts readability in my opinion. I think the mai reason of this is that findEnclosingStruct() is responsible not just for finding something but for returning something that is not really relevant taking into account the function name. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@106 PS9, Line 106: structField Do we know that 'structField' is in fact a field of a struct? Or do we perform all these substitution magic every time if there is a slot that is not found in the substitution map? http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@120 PS9, Line 120: // if (getDesc().getParent().getParentSlotDesc() == null) return null; This stinks for me. In case structField is indeed a field of a struct then it's parent should be the itemTupleDesc inside a struct slot descriptor. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@124 PS9, Line 124: final Expr e = getLhs().get(i); This 'e' is expected to be a struct, right? I'd do that check here and do a continue of it isn't. And below you can make a removeStructPathPrefixFromStructField() or something similar. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@130 PS9, Line 130: // TODO: Is it always true? In general, could you please put some effort to eliminate the new TODOs you introduced? Or are those questions towards the reviewer? http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@141 PS9, Line 141: checkConditionsAndRemovePrefix Is there a way to be more specific with the name. "checkConditions" could mean anything. Additionally, from the name it's not obvious what prefix you remove from where. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@148 PS9, Line 148: if (refPath == null) return null; if 'expr'/'ref' is analyzed is there a case when it doesn't have a resolvedPath? http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@153 PS9, Line 153: SlotRef root could you emphasise more that 'root' is a struct? http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@154 PS9, Line 154: root Shouldn't you check if 'root' has already been analysed before calling getResolvedPath()? Or is this something we can assume from the above code? http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/Path.java File fe/src/main/java/org/apache/impala/analysis/Path.java: http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/Path.java@469 PS9, Line 469: path : * after removing the prefix The return type is a list of strings so I guess this function returns a 'raw path'. Could you emphasise this face in the comment? http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java: http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@182 PS9, Line 182: public void getEnclosingSlotAndTupleDescs(List<SlotDescriptor> slotDescs, Is this just a helper function used within SlotDescriptor? I haven't found any other usages. Could it be private then? http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@285 PS9, Line 285: public boolean hasDesc() { nit: should fit into a single line http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@409 PS9, Line 409: // TODO: Is it possible that we don't YET have itemTupleDesc? Could you dig deeper a bit to eliminate this TODO? http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@410 PS9, Line 410: // Preconditions.checkState(itemTupleDesc != null); nit: commented code http://gerrit.cloudera.org:8080/#/c/17847/9/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test File testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test: http://gerrit.cloudera.org:8080/#/c/17847/9/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test@260 PS9, Line 260: ==== I wonder if providing an inline view in a different way would also trigger this code. E.g.: SELECT id, struct_name.field_name FROM (SELECT id, struct_name FROM tbl) v WHERE struct_name.other_field_name; -- To view, visit http://gerrit.cloudera.org:8080/17847 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57 Gerrit-Change-Number: 17847 Gerrit-PatchSet: 9 Gerrit-Owner: Daniel Becker <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Wed, 19 Jan 2022 10:19:32 +0000 Gerrit-HasComments: Yes
