Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21869 )

Change subject: IMPALA-11265: Part2: Store Iceberg file descriptors in encoded 
format
......................................................................


Patch Set 1:

(7 comments)

Nice findings and improvement!

http://gerrit.cloudera.org:8080/#/c/21869/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21869/1//COMMIT_MSG@9
PS1, Line 9: The file descriptors in HdfsPartition are cached as byte arrays to 
keep
           : the memory footprint low.
To make it easier to understand, you could elaborate that putting a byte array 
to a ByteBuffer, then putting the ByteBuffer to an FbFileDesc/FbFileMetadata, 
then putting them into an HdfsPartition.FileDescriptor object all increases the 
size of the object.


http://gerrit.cloudera.org:8080/#/c/21869/1//COMMIT_MSG@16
PS1, Line 16: addiitonal
typo


http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java:

http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@240
PS1, Line 240: .Builder(iceTable, selectedContentFiles).build()
IcebergContentFileStore could just have a constructor that takes these two 
arguments.


http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java:

http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@72
PS1, Line 72:   private static final Function<EncodedFileDescriptor, 
FileDescriptor> FROM_BYTES =
            :       new Function<EncodedFileDescriptor, FileDescriptor>() {
Is there any advantage using a Function object over defining a method?
I see it can be used in Lists.transform(), but could you use a lambda like "l 
-> FromBytes(l)" or a method reference like 
"IcebergContentFileStore::FromBytes" (or "FromBytes" might work as well)? This 
way we could get rid of the boilerplate code required for the Function object.


http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@101
PS1, Line 101: assert
Probably a Precondition check would be better here.


http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@402
PS1, Line 402:       return fileDesc.cloneWithFileMetadata(
             :           IcebergUtil.createIcebergMetadata(icebergTable_, 
contentFile));
Would it make sense to return EncodedFileDescriptor here and also switch to 
EncodedFileDescriptor in the private add*() methods? We would save creating an 
unnecessary FileDescriptor object and invoking TO_BYTES.apply() for each file 
descriptor.


http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java:

http://gerrit.cloudera.org:8080/#/c/21869/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java@109
PS1, Line 109: new IcebergContentFileStore.Builder().build();
             :     } catch (IOException e) {
             :       throw new ImpalaRuntimeException(e.getMes
If we switched to a default constructor then we wouldn't need exception 
handling here.



--
To view, visit http://gerrit.cloudera.org:8080/21869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d7794df999bdaf118158eace26cea610f911c0a
Gerrit-Change-Number: 21869
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 01 Oct 2024 17:06:11 +0000
Gerrit-HasComments: Yes

Reply via email to