Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15985 )
Change subject: IMPALA-9778: Refactor partition modifications in DDL/DMLs ...................................................................... Patch Set 11: (17 comments) Thanks for the feedbacks! http://gerrit.cloudera.org:8080/#/c/15985/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15985/10//COMMIT_MSG@13 PS10, Line 13: - HdfsPartition can be shared across table versions. In full catalog > Are there specific bugs that are fixed by this patch? Or it this a class of I can't find any bug JIRAs about this. But I think there are potential bugs. Currently we modify partitions in the following pattern: 1) modify partition in place, e.g. updating locaion, file format, stats etc. 2) apply ALTER operation in HMS 3) always mark the partition "dirty" 4) reload "dirty" partition from HMS and reload their file metadata If step 2) and 4) fails (e.g. when HMS is temporarily out of service due to network issue), the partition meta will be inconsistent with HMS. Since we modify partition in place, we have no way to rollback. http://gerrit.cloudera.org:8080/#/c/15985/9/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/15985/9/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@86 PS9, Line 86: public class HdfsPartition implements FeFsPartition, PrunablePartition { > is it possible to annotate tihs as @Immutable? I think that'll trigger erro No, it's a pity that this class only has shallow immutability. All its fields are final. But some fields are still referencing mutable objects like HdfsTable, LiteralExpr, InFlightEvents etc. We need more refactoring to make these immutable. However, things like InFlightEvents are only used in the catalogd so they can be mutable. I think we just need to make sure all the values sent to coordinators won't change. http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@519 PS10, Line 519: public final ImmutableList<String> sdBucketCols; > Wonder if we should change these to ImmutableList/ImmutableMap so that it's Done http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@564 PS10, Line 564: sdBucketCols = other.sdBucketCols; > Comment that we can safely reuse these because they're immutable. Maybe not Done http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@589 PS10, Line 589: private final long numRows_; > Maybe move this static value to a different block, e.g. above 585. It's a b Done http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@627 PS10, Line 627: Populated : // when the partition is being built in Builder#setPartitionStatsBytes(). > I think this needs to get updated - we should not be mutating this with set Nice catch! Done. http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@967 PS10, Line 967: public HdfsPartition build() { > Is it valid to continue using the Builder after build() is called? Yes, the build() method is idempotent if there are no more modifications after the first call. I use it in toHmsPartition() to build a temp instance for calling HdfsPartition#toHmsPartition(). TBO, this is not ideal... But there are some logics (like toHmsPartition) that need to be implemented in both HdfsPartition and HdfsPartition.Builder. We need more refactoring to merge these methods... http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@444 PS10, Line 444: > nit: we usually leave blank lines in-between multi-line methods. Done http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@447 PS10, Line 447: */ > Might help to briefly comment these methods, e.g. this one should document Done http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@455 PS10, Line 455: */ > Probably worth adding a brief comment, e.g Done http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@803 PS10, Line 803: */ > Maybe update to createPartitionBuilder? Done http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@897 PS10, Line 897: // Store partitions with null partition values separately > nit: add a blank line between methods. Done http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@945 PS10, Line 945: * HdfsPartition that was dropped or null if the partition does not exist. > We should document this in the comments for these variables. Done for dirtyPartitions_. nullPartitionIds_ and partitionValuesMap_ have such comments. http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1428 PS10, Line 1428: * table partitions. > Can you document inprogressPartBuilders? Done http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2062 PS10, Line 2062: > Maybe we can remove this reference, since we are always doing this now. Removed the whole comment block since we can't updating the existing partition in place now. http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/catalog/Table.java@802 PS10, Line 802: /** > These methods deserve a short comment - how they fit into the execution of Done http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/15985/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@849 PS10, Line 849: modification > typo: modification Done -- To view, visit http://gerrit.cloudera.org:8080/15985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib52e5810d01d5e0c910daacb9c98977426d3914c Gerrit-Change-Number: 15985 Gerrit-PatchSet: 11 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Anurag Mantripragada <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Comment-Date: Mon, 08 Jun 2020 08:23:32 +0000 Gerrit-HasComments: Yes
