ji chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23469 )

Change subject: IMPALA-14092: Support querying of paimon data table via JNI
......................................................................


Patch Set 3:

(38 comments)

http://gerrit.cloudera.org:8080/#/c/23469/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23469/3//COMMIT_MSG@45
PS3, Line 45: againt
> nit: against
Done


http://gerrit.cloudera.org:8080/#/c/23469/3//COMMIT_MSG@62
PS3, Line 62:     - Create table tests in functional_schema_template.sql
            :     - Add TestPaimonScannerWithLimit in test_scanners.py
            :     - Add unit test for AnalyzeDDLTest.java.
            :     - Add unit test for AnalyzerTest.java.
> This patch is quite large. Is it possible to split this patch into two part
Thanks, will split the patch into 3 parts.
1. Prohibit all unsupported DML/DDL operations againt paimon table.
2. The querying of paimon data table via JNI without predicate push down.
3. predicate push down support.
PS: the current scanner design took inspirations from iceberg metadata scanner. 
the performance of current scanner is not  good for data table, it is a bit 
slower than jdbc scanner according to tpcds test ,  I planed to submitted a 
optimized version later to fix the performance problem. concerning the memory 
management aspect, the jni scanner will still convert on-heap paimon internal 
rows into row-batches row by row, in a iterative way. after the row object is 
converted to tuple in row batches, the java object can be garbage collected.


http://gerrit.cloudera.org:8080/#/c/23469/3//COMMIT_MSG@62
PS3, Line 62:     - Create table tests in functional_schema_template.sql
            :     - Add TestPaimonScannerWithLimit in test_scanners.py
            :     - Add unit test for AnalyzeDDLTest.java.
            :     - Add unit test for AnalyzerTest.java.
> +1. Predicate pushdown could also be moved to a separate CR.
ok


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.cc
File be/src/exec/paimon/paimon-jni-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.cc@41
PS3, Line 41: metadata_scanner
> PaimonJniScanner is not only for metadata, right?
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.cc@513
PS3, Line 513: IMPALA-12853
> This is about Iceberg metadata tables
will fix


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.h
File be/src/exec/paimon/paimon-jni-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.h@54
PS3, Line 54: the
> nit: -the
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc
File be/src/exec/paimon/paimon-jni-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@55
PS3, Line 55:   /// java PaimonJniScanner.
> nit: fits earlier line
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@59
PS3, Line 59:
> nit: whitespace
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@61
PS3, Line 61:
> nit: too much indent
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@85
PS3, Line 85:
> nit: line continuation needs +4 spaces
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@94
PS3, Line 94: bool* eos) {
> nit: fits line before
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@142
PS3, Line 142:
> nit: redundant space
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@147
PS3, Line 147:
> nit: unnecessary indentation
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h
File be/src/exec/paimon/paimon-jni-scanner.h:

http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@34
PS3, Line 34: const TupleDescriptor* tuple_desc);
> nit: fits line before
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@51
PS3, Line 51: field position.
> nit: fits line before
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@84
PS3, Line 84: inline
> nit: keyword 'inline' is redundant
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@107
PS3, Line 107:
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc
File be/src/exec/paimon/paimon-jni-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@101
PS3, Line 101: paimon_jni_scanner_scan_table_);
> nit: fits line before
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@147
PS3, Line 147:
> nit: need +4 indent
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@165
PS3, Line 165:
> nit: need +4 indent
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@183
PS3, Line 183:
> nit: need +4 indent
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@186
PS3, Line 186:
> nit: need +4 indent
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@203
PS3, Line 203:
> nit: need +4 indent
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@205
PS3, Line 205:
> nit: need +4 indent
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@235
PS3, Line 235:
> nit: needs +4 indent
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@284
PS3, Line 284: Metadata table
> We might scan a regular table
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/runtime/descriptors.cc@377
PS3, Line 377: ]
> Do we need the ']'?
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java:

http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@141
PS3, Line 141: msTbl.getSd().getInputFormat()
> nit: fits line before
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@459
PS3, Line 459: IcebergStructField
> TStructField?
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonColumn.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonColumn.java:

http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonColumn.java@24
PS3, Line 24: /**
> nit: add empty line after the imports
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonHiveTypeUtils.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonHiveTypeUtils.java:

http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonHiveTypeUtils.java@196
PS3, Line 196:       if (!(type instanceof StructTypeInfo)) {
             :         if (type instanceof MapTypeInfo) {
             :           MapTypeInfo mapTypeInfo = (MapTypeInfo)type;
             :           return 
DataTypes.MAP(visit(mapTypeInfo.getMapKeyTypeInfo(), visitor),
             :               visit(mapTypeInfo.getMapValueTypeInfo(), visitor));
             :         } else if (type instanceof ListTypeInfo) {
             :           ListTypeInfo listTypeInfo = (ListTypeInfo)type;
             :           return DataTypes.ARRAY(
             :               visit(listTypeInfo.getListElementTypeInfo(), 
visitor));
             :         } else {
             :           return visitor.atomic(type);
             :         }
             :       } else {
             :         StructTypeInfo structTypeInfo = (StructTypeInfo)type;
             :         ArrayList<String> fieldNames = 
structTypeInfo.getAllStructFieldNames();
             :         ArrayList<TypeInfo> typeInfos = structTypeInfo
             :             .getAllStructFieldTypeInfos();
             :         RowType.Builder builder = RowType.builder();
             :
             :         for(int i = 0; i < fieldNames.size(); ++i) {
             :           builder.field((String)fieldNames.get(i),
             :               visit((TypeInfo)typeInfos.get(i), visitor));
             :         }
             :
             :         return builder.build();
             :       }
> The structure of this if statement is a bit weird. Why don't we have someth
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java:

http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@31
PS3, Line 31: import org.apache.impala.catalog.IcebergColumn;
> Unused import
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java
File fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java:

http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@151
PS3, Line 151:
> nit: too much indent
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@152
PS3, Line 152:
> nit: need +4 indent
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@199
PS3, Line 199:
> nit: extra indent and missing empty line before this
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@225
PS3, Line 225: FileSystemUtil.listFiles
> This can be time consuming if we do it for each file. Can we add a timer fo
will fix.


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@234
PS3, Line 234: 50010
> Can we get it from configuration?
Done


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@254
PS3, Line 254:
> nit: too much indent
Done



--
To view, visit http://gerrit.cloudera.org:8080/23469
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0ecab952622e99109927056461d3d5d12f4b295f
Gerrit-Change-Number: 23469
Gerrit-PatchSet: 3
Gerrit-Owner: ji chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: ji chen <[email protected]>
Gerrit-Comment-Date: Fri, 10 Oct 2025 02:58:24 +0000
Gerrit-HasComments: Yes

Reply via email to