Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/18531 )
Change subject: IMPALA-10453: Support file pruning via runtime filters on Iceberg ...................................................................... Patch Set 7: (7 comments) http://gerrit.cloudera.org:8080/#/c/18531/7/be/src/exec/file-metadata-utils.h File be/src/exec/file-metadata-utils.h: http://gerrit.cloudera.org:8080/#/c/18531/7/be/src/exec/file-metadata-utils.h@48 PS7, Line 48: & AFAIK we usually take output (non-const) parameters by pointer and only use references for const parameters. http://gerrit.cloudera.org:8080/#/c/18531/7/be/src/exec/file-metadata-utils.h@66 PS7, Line 66: & See L48. http://gerrit.cloudera.org:8080/#/c/18531/7/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/18531/7/be/src/exec/hdfs-scan-node-base.cc@724 PS7, Line 724: (int64_t) Do we need this cast here? ScanRangeMetadata::partition_id is already an int64_t: https://github.com/apache/impala/blob/1d16367afb1e520bae5992c1e6c92b4fb41d2655/be/src/exec/hdfs-scan-node-base.h#L108 http://gerrit.cloudera.org:8080/#/c/18531/7/be/src/exec/hdfs-scan-node-base.cc@957 PS7, Line 957: if (template_tuple == nullptr) return true; I'm a bit confused, is it possible that template_tuple is NULL? Can we get there if it is not an Iceberg table? If it can't be NULL, a DCHECK should be enough. http://gerrit.cloudera.org:8080/#/c/18531/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/18531/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1014 PS7, Line 1014: skip dictionary Do you think we are ok with this behaviour or do we want to enable dict filtering for these cases in the future? If so, you could add a TODO. http://gerrit.cloudera.org:8080/#/c/18531/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1055 PS7, Line 1055: private boolean IsMemberOfStructInSelectList(SlotDescriptor slotDesc) { As far as I understand, this function behaves very differently for scalars and structs: for structs it checks whether they have a parent and if so, recurses with the parent struct; for structs it checks whether they are in the select list. I think it is confusing that this is done in the same function. It would be easier to understand if a separate function was called on the struct parent, checking whether it is in the select list. You could do it recursively or use SlotDescriptor::getEnclosingStructSlotDescs() to get all enclosing structs. http://gerrit.cloudera.org:8080/#/c/18531/7/tests/query_test/test_iceberg.py File tests/query_test/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/18531/7/tests/query_test/test_iceberg.py@85 PS7, Line 85: Nit: query_b/query_c *are* supposed to finish... -- To view, visit http://gerrit.cloudera.org:8080/18531 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7762e1238bdf236b85d2728881a402a2bb41f36a Gerrit-Change-Number: 18531 Gerrit-PatchSet: 7 Gerrit-Owner: Tamas Mate <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 22 Jul 2022 14:35:42 +0000 Gerrit-HasComments: Yes
