Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18531 )

Change subject: IMPALA-10453: Support file pruning via runtime filters on 
Iceberg
......................................................................


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/18531/7/be/src/exec/file-metadata-utils.h
File be/src/exec/file-metadata-utils.h:

http://gerrit.cloudera.org:8080/#/c/18531/7/be/src/exec/file-metadata-utils.h@48
PS7, Line 48: &
AFAIK we usually take output (non-const) parameters by pointer and only use 
references for const parameters.


http://gerrit.cloudera.org:8080/#/c/18531/7/be/src/exec/file-metadata-utils.h@66
PS7, Line 66: &
See L48.


http://gerrit.cloudera.org:8080/#/c/18531/7/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/18531/7/be/src/exec/hdfs-scan-node-base.cc@724
PS7, Line 724: (int64_t)
Do we need this cast here? ScanRangeMetadata::partition_id is already an 
int64_t: 
https://github.com/apache/impala/blob/1d16367afb1e520bae5992c1e6c92b4fb41d2655/be/src/exec/hdfs-scan-node-base.h#L108


http://gerrit.cloudera.org:8080/#/c/18531/7/be/src/exec/hdfs-scan-node-base.cc@957
PS7, Line 957:   if (template_tuple == nullptr) return true;
I'm a bit confused, is it possible that template_tuple is NULL? Can we get 
there if it is not an Iceberg table?
If it can't be NULL, a DCHECK should be enough.


http://gerrit.cloudera.org:8080/#/c/18531/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18531/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1014
PS7, Line 1014: skip dictionary
Do you think we are ok with this behaviour or do we want to enable dict 
filtering for these cases in the future? If so, you could add a TODO.


http://gerrit.cloudera.org:8080/#/c/18531/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1055
PS7, Line 1055:   private boolean IsMemberOfStructInSelectList(SlotDescriptor 
slotDesc) {
As far as I understand, this function behaves very differently for scalars and 
structs: for structs it checks whether they have a parent and if so, recurses 
with the parent struct; for structs it checks whether they are in the select 
list.

I think it is confusing that this is done in the same function. It would be 
easier to understand if a separate function was called on the struct parent, 
checking whether it is in the select list. You could do it recursively or use 
SlotDescriptor::getEnclosingStructSlotDescs() to get all enclosing structs.


http://gerrit.cloudera.org:8080/#/c/18531/7/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18531/7/tests/query_test/test_iceberg.py@85
PS7, Line 85:
Nit: query_b/query_c *are* supposed to finish...



--
To view, visit http://gerrit.cloudera.org:8080/18531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7762e1238bdf236b85d2728881a402a2bb41f36a
Gerrit-Change-Number: 18531
Gerrit-PatchSet: 7
Gerrit-Owner: Tamas Mate <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 22 Jul 2022 14:35:42 +0000
Gerrit-HasComments: Yes

Reply via email to