Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 )
Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name ...................................................................... Patch Set 4: (6 comments) Hi Zoltan, pretty nifty change! I think we should define input__file__name as a keyword, currently if I create a table with a column input__file__name the insert succeeds, then the 'select *' leaks the filename. In addition, this column will not be considered during estimations, so the row size estimations will be off. Maybe in a future Jira we could track the file length statistics as well. http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@28 PS4, Line 28: Special care is needed for virtual columns when column masing/row nit: typo 'masking' http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@29 PS4, Line 29: fitering is applicable on them. They are added as "hidden" select nit: typo 'filtering' http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java: http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@164 PS4, Line 164: // Virtual columns are hidden in the masking view, which means they don't : // participate in star expansion. : // E.g. during masking the following query is rewritten (where vc is a virtual col): : // SELECT vc, * FROM t; ===> : // SELECT vc, * FROM (SELECT MASK(vc) as vc, c1, c2, ... FROM t) v; : // In which case the '*' in the outer "SELECT vc, *" shouldn't contain 'v.vc' : // because in that case it would be doubled: : // SELECT vc, vc, c1, c2, ... FROM (...); : // Hence virtual columns are hidden select list items. They are also hidden : // when they are not masked, but other columns are. nit: most of this could be part of the VirtualColumn class comment. http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java File fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java: http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@24 PS4, Line 24: nit: Could we add some notes here, things I think would be useful: - what is a virtual column - singleton, it does not contain any table specific values - added to every Table object - skipped from * expansion, masking http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@35 PS4, Line 35: null Maybe we could use NONE here? Similar to SlotDescriptor. I am a bit afraid of a possible NPE. http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java: http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@188 PS4, Line 188: nit: empty new line -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 24 May 2022 12:28:06 +0000 Gerrit-HasComments: Yes
