Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/22456 )
Change subject: IMPALA-13739: Part2: Introduce IcebergFileDescriptor ...................................................................... Patch Set 4: (11 comments) http://gerrit.cloudera.org:8080/#/c/22456/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22456/3//COMMIT_MSG@8 PS3, Line 8: > I think it would be good to shortly describe 'IcebergFileDescriptor' here, Done http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@710 PS3, Line 710: Hdfs > In the review of Part1 there was a comment (https://gerrit.cloudera.org/#/c That was a different story in Part1: FileDescriptor had parts that were relevant to Iceberg tables too, so that was another reason we couldn't call it HdfsFileDescriptor. With this change, technically we could call FileDescriptor HdfsFileDescriptor because the Iceberg part is removed now. The reason I didn't do that is because then IcebergFileDescriptor would derive from HdfsFileDescriptor that doesn't seem a good alignment between classes. The most optimal design would be to have an abstract FileDescriptor class that has 2 implementations IcebergFileDescriptor and HdfsFileDescriptor. However, currently the HdfsFileDescriptor class would be an empty one, so I didn't see the point implementing it. Getting back to the names of these functions. I figured that calling FeIcebergTable.getFileDescriptor() in this form is misleading because callers would expect this to return IcebergFileDescriptors, however it doesn't fo that and return the parent FileDescriptor, that is in fact a FileDescriptor for the HDFS world. I wanted to avoid this comfusion by explicitly say that this is not returning Iceberg FDs and instead return the one for HDFS tables (without the Iceberg part.) Hope this makes sense. http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/catalog/FileDescriptor.java File fe/src/main/java/org/apache/impala/catalog/FileDescriptor.java: http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/catalog/FileDescriptor.java@56 PS3, Line 56: private f > We already have a getFbFileDescriptor() method which could be used by subcl Done http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java: http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@59 PS3, Line 59: private s > This could remain private. Done http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@297 PS3, Line 297: hdfs > Maybe 'rawFileDesc' to avoid reference to HDFS? This is what it is, it's an Hdfs FileDescriptor that we get from the 'hdfsFileDescMap'. http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/catalog/IcebergFileDescriptor.java File fe/src/main/java/org/apache/impala/catalog/IcebergFileDescriptor.java: http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/catalog/IcebergFileDescriptor.java@32 PS3, Line 32: > Now in this new class it is always Iceberg metadata, isn't it? Done http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/catalog/IcebergFileDescriptor.java@33 PS3, Line 33: FbFileMetadata > Couldn't we use FbIcebergMetadata directly? Yes, we could. I planned to do that in a separate change to keep this on only for extracting stuff from FileDescriptor into IcebergFileDescriptor without changing the Fb representation. http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/catalog/IcebergFileDescriptor.java@83 PS3, Line 83: if (!(obj instanceof IcebergFileDescriptor)) return false; > If 'obj' is NULL, the "instanceof" operator will also return false. Done http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/catalog/IcebergFileDescriptor.java@90 PS3, Line 90: && > As far as I understand we can rely on these arrays being the same objects f Well, this is existing code, just moved from one place to another. Additionally, similar functions for FileDescriptor (the HDFS part) are also called the same http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@27 PS3, Line 27: > Unused import. Done http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/22456/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@742 PS3, Line 742: hdfs > 'rawFileDesc'? see my other comment somewhere else. Additionally, we eventually might want to get to the point of making FileDrecriptor abstract and have IcebergFileDescriptor introduced now and HdfsFileDescriptor introduced later on (if we see the point of doing so). For me 'raw' wouldn't make more clarity here about what this FileDescriptor is. -- To view, visit http://gerrit.cloudera.org:8080/22456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id08baf4db32197bab74178777c4ba3fec98c2451 Gerrit-Change-Number: 22456 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Comment-Date: Tue, 11 Feb 2025 09:14:08 +0000 Gerrit-HasComments: Yes