Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20515 )
Change subject: IMPALA-12243: Add support for DROP PARTITION for Iceberg tables ...................................................................... Patch Set 6: (23 comments) http://gerrit.cloudera.org:8080/#/c/20515/6/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/20515/6/common/thrift/CatalogObjects.thrift@629 PS6, Line 629: struct TIcebergDropPartitionSummary { nit: I feel that naming this 'Summary' is a bit misleading. My impression for such a name would be that it contains e.g. some stats and result codes after executing the command. I think instead of 'Summary' we could call this 'Request'. http://gerrit.cloudera.org:8080/#/c/20515/6/common/thrift/CatalogObjects.thrift@631 PS6, Line 631: 2: optional bool is_truncate Could you please add a comment about what 'is_truncate' means in the context of a drop partition. http://gerrit.cloudera.org:8080/#/c/20515/6/common/thrift/CatalogObjects.thrift@632 PS6, Line 632: 3: required i64 num_partitions nit: a short comment here too would be useful. http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/AlterTableDropPartitionStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableDropPartitionStmt.java: http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/AlterTableDropPartitionStmt.java@103 PS6, Line 103: analyzeIceberg(); nit: this could fit into the line above http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java File fe/src/main/java/org/apache/impala/analysis/PartitionSet.java: http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@68 PS6, Line 68: private long numberOfDroppedIcebergPartitions_; PartitionSet is a general purpose class that could be used in various scenarios. I feel that the name is too generic for this class 'numberOfDroppedIcebergPartitions_' in a sense that it assumes that this PartitionSet class is used for dropping partitions. 'numberOfIcebergPartitions()' maybe? http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@72 PS6, Line 72: private boolean isIcebergTruncate_ = false; Similarly as above, this assumes that this PartitionSet is used for some drop or truncate scenario, however, PartitionSpec is much more general purpose. http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@144 PS6, Line 144: public void analyzeIceberg(Analyzer analyzer) throws AnalysisException { I know it was me who gave a comment to move all this logic into PartitionSet, but now that I see the code I have feelings against it. Initially I figured that there is some overlap between the original analyse() of PartitionSet and the AlterTableDropPartitionStmt.analyzeIceberg() but in fact we ended up completely separate fields and separate logic between the traditional and the Iceberg cases. Sorry, for the inconvenience but I feel now that this code doesn't really fir here naturally, and might have been better in the original place. http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@147 PS6, Line 147: IcebergPartitionExpressionRewriter When looking at this code, for me it is pretty hard to decide the purpose of IcebergPartitionExpressionRewriter and IcebergPartitionPredicateConverter. From the callsite you can only guess that these receive some expression tree and does some 'rewrite' and 'convert' on them but the naming doesn't really help to understand the code. http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@152 PS6, Line 152: icebergExpressions nit: 'icebergPartitionExprs' ? http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@160 PS6, Line 160: catch (ImpalaException e) { nit: this should go into one line above http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@167 PS6, Line 167: icebergExpressions, null)) { Shouldn't we assert that the expressions in 'icebergExpressions' are in fact referring to partition columns in Iceberg? Or would this be guaranteed when checking the residual()? If the latter, could you add a comment about this somewhere around the residual call? http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@169 PS6, Line 169: HashMap<String, Long> icebergPartitionSummary = new HashMap<>(); You seem to only use the keys of this mapping. Could this be a HashSet<String> instead? http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@174 PS6, Line 174: if (fileScanTask.deletes().isEmpty()) { For me it seems to add extra complexity to make all these separations of ContentFile types so that they can fit into GroupedContentFiles. You only use size() and getAllContentFiles() anyway so wouldn't it be possible to collect the ContentFiles in a single set instead? http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@197 PS6, Line 197: private class IcebergPartitionExpressionRewriter { Could you move this into a separate top level class into a separate file? Might come handy later on for other features. http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@221 PS6, Line 221: rewriteReferences I'm not sure about the name, or actually with the 'References' part. I don;t see what it refers to. Would rewrite() do here? http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@222 PS6, Line 222: if (expr instanceof BinaryPredicate) { Seeing all these instanceof checks made me wondering if you have considered polimorphism instead in Expr and its derived classes to do this rewrite action. Or would that be too Iceberg specific logic to add into Expr? http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@264 PS6, Line 264: TIcebergPartitionTransformType transformType = partitionExpr.getTransform() nit: this should be below the literal check http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@312 PS6, Line 312: analyzer_.addWarning(String.format( I don't get why we write a warning kind of unconditionally when we have a year transform. Is this the case when we receive a "years from epoch year" input from the user? Do we want to support this? I'd expect users to provide the actual year when using this year transform in DELETE PARTITION. http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@318 PS6, Line 318: if (!IcebergUtil.isDateTimeTransformType(transformType)) return; Shouldn't this be the very first check in this function to exit immediately if the transformType is not date time related? Additionally at the callsite I'd only call this function if the transformType implies we should. And this check inside the function could be a precondition check instead. http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@325 PS6, Line 325: catch (ImpalaRuntimeException ignore){} Could you help me understand why this exception is ignored? What cases do we ignore here? http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@345 PS6, Line 345: org.apache.iceberg.PartitionSpec spec = ((IcebergTable) table_).getIcebergApiTable() You query this partition spec many times. Could this be a member of the class instead? And if this line wasn't needed then this function would be reduced to one line so could rather write that one line on the callsite instead of making a function call. http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@240 PS6, Line 240: List<String> paths = params.iceberg_drop_partition_summary.paths; This could be moved into the else branch http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@245 PS6, Line 245: for (String path : paths) { Instead of for loop "deleteFiles.addAll(param..paths)"? -- To view, visit http://gerrit.cloudera.org:8080/20515 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a768ba2966f570454687e02e4e6d67df46741f9 Gerrit-Change-Number: 20515 Gerrit-PatchSet: 6 Gerrit-Owner: Peter Rozsa <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 08 Nov 2023 15:03:14 +0000 Gerrit-HasComments: Yes
