Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/23469 )
Change subject: IMPALA-14092: Support querying of paimon data table via JNI ...................................................................... Patch Set 3: (45 comments) Did a first round, mostly caught style issues. I will need to do a few more iterations to understand the details better. It would be nice if it could be split into multiple CRs. http://gerrit.cloudera.org:8080/#/c/23469/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23469/3//COMMIT_MSG@45 PS3, Line 45: againt nit: against http://gerrit.cloudera.org:8080/#/c/23469/3//COMMIT_MSG@62 PS3, Line 62: - Create table tests in functional_schema_template.sql : - Add TestPaimonScannerWithLimit in test_scanners.py : - Add unit test for AnalyzeDDLTest.java. : - Add unit test for AnalyzerTest.java. > This patch is quite large. Is it possible to split this patch into two part +1. Predicate pushdown could also be moved to a separate CR. http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.h File be/src/exec/paimon/paimon-jni-row-reader.h: http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.h@41 PS3, Line 41: PaimonJniRowReader This shares a lot of code with IcebergRowReader. Would it be possible to move the common code to some JniRowReader utility class? http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.h@56 PS3, Line 56: inline nit: no need for the inline keyword http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.cc File be/src/exec/paimon/paimon-jni-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.cc@41 PS3, Line 41: metadata_scanner PaimonJniScanner is not only for metadata, right? http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.cc@513 PS3, Line 513: IMPALA-12853 This is about Iceberg metadata tables http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.h File be/src/exec/paimon/paimon-jni-scan-node.h: http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.h@54 PS3, Line 54: the nit: -the http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc File be/src/exec/paimon/paimon-jni-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@55 PS3, Line 55: /// java PaimonJniScanner. nit: fits earlier line http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@59 PS3, Line 59: nit: whitespace http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@61 PS3, Line 61: nit: too much indent http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@85 PS3, Line 85: nit: line continuation needs +4 spaces http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@94 PS3, Line 94: bool* eos) { nit: fits line before http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@142 PS3, Line 142: nit: redundant space http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@147 PS3, Line 147: nit: unnecessary indentation http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h File be/src/exec/paimon/paimon-jni-scanner.h: http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@34 PS3, Line 34: const TupleDescriptor* tuple_desc); nit: fits line before http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@51 PS3, Line 51: field position. nit: fits line before http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@84 PS3, Line 84: inline nit: keyword 'inline' is redundant http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@107 PS3, Line 107: nit: extra space http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@140 PS3, Line 140: string Why don't we need the 'std::' prefix here? Seems like we somehow include a use/using statement. http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc File be/src/exec/paimon/paimon-jni-scanner.cc: http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@101 PS3, Line 101: paimon_jni_scanner_scan_table_); nit: fits line before http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@147 PS3, Line 147: nit: need +4 indent http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@165 PS3, Line 165: nit: need +4 indent http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@168 PS3, Line 168: nit: need +4 indent http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@183 PS3, Line 183: nit: need +4 indent http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@186 PS3, Line 186: nit: need +4 indent http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@203 PS3, Line 203: nit: need +4 indent http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@205 PS3, Line 205: nit: need +4 indent http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@235 PS3, Line 235: nit: needs +4 indent http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@284 PS3, Line 284: Metadata table We might scan a regular table http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/runtime/descriptors.cc@377 PS3, Line 377: ] Do we need the ']'? http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/Table.java@672 PS3, Line 672: i nit: above we use 'i' because of Iceberg http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java: http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@141 PS3, Line 141: msTbl.getSd().getInputFormat() nit: fits line before http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@459 PS3, Line 459: IcebergStructField TStructField? http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonColumn.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonColumn.java: http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonColumn.java@24 PS3, Line 24: /** nit: add empty line after the imports http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonHiveTypeUtils.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonHiveTypeUtils.java: http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonHiveTypeUtils.java@196 PS3, Line 196: if (!(type instanceof StructTypeInfo)) { : if (type instanceof MapTypeInfo) { : MapTypeInfo mapTypeInfo = (MapTypeInfo)type; : return DataTypes.MAP(visit(mapTypeInfo.getMapKeyTypeInfo(), visitor), : visit(mapTypeInfo.getMapValueTypeInfo(), visitor)); : } else if (type instanceof ListTypeInfo) { : ListTypeInfo listTypeInfo = (ListTypeInfo)type; : return DataTypes.ARRAY( : visit(listTypeInfo.getListElementTypeInfo(), visitor)); : } else { : return visitor.atomic(type); : } : } else { : StructTypeInfo structTypeInfo = (StructTypeInfo)type; : ArrayList<String> fieldNames = structTypeInfo.getAllStructFieldNames(); : ArrayList<TypeInfo> typeInfos = structTypeInfo : .getAllStructFieldTypeInfos(); : RowType.Builder builder = RowType.builder(); : : for(int i = 0; i < fieldNames.size(); ++i) { : builder.field((String)fieldNames.get(i), : visit((TypeInfo)typeInfos.get(i), visitor)); : } : : return builder.build(); : } The structure of this if statement is a bit weird. Why don't we have something like: if (type instanceof StructTypeInfo) { } else if (type instanceof MapTypeInfo) { } else if (type instanceof ListTypeInfo) { } else { } http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java: http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@31 PS3, Line 31: import org.apache.impala.catalog.IcebergColumn; Unused import http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java: http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@873 PS3, Line 873: nit: too much indent http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@874 PS3, Line 874: nit: too much indent http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@910 PS3, Line 910: nit: too much indent http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java File fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java: http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@151 PS3, Line 151: nit: too much indent http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@152 PS3, Line 152: nit: need +4 indent http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@199 PS3, Line 199: nit: extra indent and missing empty line before this http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@225 PS3, Line 225: FileSystemUtil.listFiles This can be time consuming if we do it for each file. Can we add a timer for this method? Did you try running queries against Paimon tables with >10,000 files? http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@234 PS3, Line 234: 50010 Can we get it from configuration? http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@254 PS3, Line 254: nit: too much indent -- To view, visit http://gerrit.cloudera.org:8080/23469 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0ecab952622e99109927056461d3d5d12f4b295f Gerrit-Change-Number: 23469 Gerrit-PatchSet: 3 Gerrit-Owner: ji chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 09 Oct 2025 13:11:30 +0000 Gerrit-HasComments: Yes
