Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21869 )
Change subject: IMPALA-11265: Part2: Store Iceberg file descriptors in encoded format ...................................................................... Patch Set 1: (7 comments) Nice findings and improvement! http://gerrit.cloudera.org:8080/#/c/21869/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21869/1//COMMIT_MSG@9 PS1, Line 9: The file descriptors in HdfsPartition are cached as byte arrays to keep : the memory footprint low. To make it easier to understand, you could elaborate that putting a byte array to a ByteBuffer, then putting the ByteBuffer to an FbFileDesc/FbFileMetadata, then putting them into an HdfsPartition.FileDescriptor object all increases the size of the object. http://gerrit.cloudera.org:8080/#/c/21869/1//COMMIT_MSG@16 PS1, Line 16: addiitonal typo http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java: http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@240 PS1, Line 240: .Builder(iceTable, selectedContentFiles).build() IcebergContentFileStore could just have a constructor that takes these two arguments. http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java: http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@72 PS1, Line 72: private static final Function<EncodedFileDescriptor, FileDescriptor> FROM_BYTES = : new Function<EncodedFileDescriptor, FileDescriptor>() { Is there any advantage using a Function object over defining a method? I see it can be used in Lists.transform(), but could you use a lambda like "l -> FromBytes(l)" or a method reference like "IcebergContentFileStore::FromBytes" (or "FromBytes" might work as well)? This way we could get rid of the boilerplate code required for the Function object. http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@101 PS1, Line 101: assert Probably a Precondition check would be better here. http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@402 PS1, Line 402: return fileDesc.cloneWithFileMetadata( : IcebergUtil.createIcebergMetadata(icebergTable_, contentFile)); Would it make sense to return EncodedFileDescriptor here and also switch to EncodedFileDescriptor in the private add*() methods? We would save creating an unnecessary FileDescriptor object and invoking TO_BYTES.apply() for each file descriptor. http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java: http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java@109 PS1, Line 109: new IcebergContentFileStore.Builder().build(); : } catch (IOException e) { : throw new ImpalaRuntimeException(e.getMes If we switched to a default constructor then we wouldn't need exception handling here. -- To view, visit http://gerrit.cloudera.org:8080/21869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d7794df999bdaf118158eace26cea610f911c0a Gerrit-Change-Number: 21869 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 01 Oct 2024 17:06:11 +0000 Gerrit-HasComments: Yes
