ji chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/23469 )
Change subject: IMPALA-14092: Support querying of paimon data table via JNI ...................................................................... Patch Set 3: (38 comments) http://gerrit.cloudera.org:8080/#/c/23469/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23469/3//COMMIT_MSG@45 PS3, Line 45: againt > nit: against Done http://gerrit.cloudera.org:8080/#/c/23469/3//COMMIT_MSG@62 PS3, Line 62: - Create table tests in functional_schema_template.sql : - Add TestPaimonScannerWithLimit in test_scanners.py : - Add unit test for AnalyzeDDLTest.java. : - Add unit test for AnalyzerTest.java. > This patch is quite large. Is it possible to split this patch into two part Thanks, will split the patch into 3 parts. 1. Prohibit all unsupported DML/DDL operations againt paimon table. 2. The querying of paimon data table via JNI without predicate push down. 3. predicate push down support. PS: the current scanner design took inspirations from iceberg metadata scanner. the performance of current scanner is not good for data table, it is a bit slower than jdbc scanner according to tpcds test , I planed to submitted a optimized version later to fix the performance problem. concerning the memory management aspect, the jni scanner will still convert on-heap paimon internal rows into row-batches row by row, in a iterative way. after the row object is converted to tuple in row batches, the java object can be garbage collected. http://gerrit.cloudera.org:8080/#/c/23469/3//COMMIT_MSG@62 PS3, Line 62: - Create table tests in functional_schema_template.sql : - Add TestPaimonScannerWithLimit in test_scanners.py : - Add unit test for AnalyzeDDLTest.java. : - Add unit test for AnalyzerTest.java. > +1. Predicate pushdown could also be moved to a separate CR. ok http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.cc File be/src/exec/paimon/paimon-jni-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.cc@41 PS3, Line 41: metadata_scanner > PaimonJniScanner is not only for metadata, right? Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.cc@513 PS3, Line 513: IMPALA-12853 > This is about Iceberg metadata tables will fix http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.h File be/src/exec/paimon/paimon-jni-scan-node.h: http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.h@54 PS3, Line 54: the > nit: -the Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc File be/src/exec/paimon/paimon-jni-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@55 PS3, Line 55: /// java PaimonJniScanner. > nit: fits earlier line Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@59 PS3, Line 59: > nit: whitespace Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@61 PS3, Line 61: > nit: too much indent Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@85 PS3, Line 85: > nit: line continuation needs +4 spaces Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@94 PS3, Line 94: bool* eos) { > nit: fits line before Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@142 PS3, Line 142: > nit: redundant space Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@147 PS3, Line 147: > nit: unnecessary indentation Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h File be/src/exec/paimon/paimon-jni-scanner.h: http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@34 PS3, Line 34: const TupleDescriptor* tuple_desc); > nit: fits line before Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@51 PS3, Line 51: field position. > nit: fits line before Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@84 PS3, Line 84: inline > nit: keyword 'inline' is redundant Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@107 PS3, Line 107: > nit: extra space Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc File be/src/exec/paimon/paimon-jni-scanner.cc: http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@101 PS3, Line 101: paimon_jni_scanner_scan_table_); > nit: fits line before Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@147 PS3, Line 147: > nit: need +4 indent Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@165 PS3, Line 165: > nit: need +4 indent Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@183 PS3, Line 183: > nit: need +4 indent Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@186 PS3, Line 186: > nit: need +4 indent Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@203 PS3, Line 203: > nit: need +4 indent Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@205 PS3, Line 205: > nit: need +4 indent Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@235 PS3, Line 235: > nit: needs +4 indent Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@284 PS3, Line 284: Metadata table > We might scan a regular table Done http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/runtime/descriptors.cc@377 PS3, Line 377: ] > Do we need the ']'? Done http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java: http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@141 PS3, Line 141: msTbl.getSd().getInputFormat() > nit: fits line before Done http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@459 PS3, Line 459: IcebergStructField > TStructField? Done http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonColumn.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonColumn.java: http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonColumn.java@24 PS3, Line 24: /** > nit: add empty line after the imports Done http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonHiveTypeUtils.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonHiveTypeUtils.java: http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonHiveTypeUtils.java@196 PS3, Line 196: if (!(type instanceof StructTypeInfo)) { : if (type instanceof MapTypeInfo) { : MapTypeInfo mapTypeInfo = (MapTypeInfo)type; : return DataTypes.MAP(visit(mapTypeInfo.getMapKeyTypeInfo(), visitor), : visit(mapTypeInfo.getMapValueTypeInfo(), visitor)); : } else if (type instanceof ListTypeInfo) { : ListTypeInfo listTypeInfo = (ListTypeInfo)type; : return DataTypes.ARRAY( : visit(listTypeInfo.getListElementTypeInfo(), visitor)); : } else { : return visitor.atomic(type); : } : } else { : StructTypeInfo structTypeInfo = (StructTypeInfo)type; : ArrayList<String> fieldNames = structTypeInfo.getAllStructFieldNames(); : ArrayList<TypeInfo> typeInfos = structTypeInfo : .getAllStructFieldTypeInfos(); : RowType.Builder builder = RowType.builder(); : : for(int i = 0; i < fieldNames.size(); ++i) { : builder.field((String)fieldNames.get(i), : visit((TypeInfo)typeInfos.get(i), visitor)); : } : : return builder.build(); : } > The structure of this if statement is a bit weird. Why don't we have someth Done http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java: http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@31 PS3, Line 31: import org.apache.impala.catalog.IcebergColumn; > Unused import Done http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java File fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java: http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@151 PS3, Line 151: > nit: too much indent Done http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@152 PS3, Line 152: > nit: need +4 indent Done http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@199 PS3, Line 199: > nit: extra indent and missing empty line before this Done http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@225 PS3, Line 225: FileSystemUtil.listFiles > This can be time consuming if we do it for each file. Can we add a timer fo will fix. http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@234 PS3, Line 234: 50010 > Can we get it from configuration? Done http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@254 PS3, Line 254: > nit: too much indent Done -- To view, visit http://gerrit.cloudera.org:8080/23469 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0ecab952622e99109927056461d3d5d12f4b295f Gerrit-Change-Number: 23469 Gerrit-PatchSet: 3 Gerrit-Owner: ji chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: ji chen <[email protected]> Gerrit-Comment-Date: Fri, 10 Oct 2025 02:58:24 +0000 Gerrit-HasComments: Yes
