Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
......................................................................


Patch Set 4:

(6 comments)

Hi Zoltan, pretty nifty change!

I think we should define input__file__name as a keyword, currently if I create 
a table with a column input__file__name the insert succeeds, then the 'select 
*' leaks the filename.

In addition, this column will not be considered during estimations, so the row 
size estimations will be off. Maybe in a future Jira we could track the file 
length statistics as well.

http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@28
PS4, Line 28: Special care is needed for virtual columns when column masing/row
nit: typo 'masking'


http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@29
PS4, Line 29: fitering is applicable on them. They are added as "hidden" select
nit: typo 'filtering'


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java:

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@164
PS4, Line 164:       // Virtual columns are hidden in the masking view, which 
means they don't
             :       // participate in star expansion.
             :       // E.g. during masking the following query is rewritten 
(where vc is a virtual col):
             :       // SELECT vc, * FROM t; ===>
             :       //     SELECT vc, * FROM (SELECT MASK(vc) as vc, c1, c2, 
... FROM t) v;
             :       // In which case the '*' in the outer "SELECT vc, *" 
shouldn't contain 'v.vc'
             :       // because in that case it would be doubled:
             :       // SELECT vc, vc, c1, c2, ... FROM (...);
             :       // Hence virtual columns are hidden select list items. 
They are also hidden
             :       // when they are not masked, but other columns are.
nit: most of this could be part of the VirtualColumn class comment.


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
File fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java:

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@24
PS4, Line 24:
nit: Could we add some notes here, things I think would be useful:
 - what is a virtual column
 - singleton, it does not contain any table specific values
 - added to every Table object
 - skipped from * expansion, masking


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@35
PS4, Line 35: null
Maybe we could use NONE here? Similar to SlotDescriptor. I am a bit afraid of a 
possible NPE.


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java:

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@188
PS4, Line 188:
nit: empty new line



--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 24 May 2022 12:28:06 +0000
Gerrit-HasComments: Yes

Reply via email to