Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23469 )

Change subject: IMPALA-14092: Support querying of paimon data table via JNI
......................................................................


Patch Set 3:

(45 comments)

Did a first round, mostly caught style issues. I will need to do a few more 
iterations to understand the details better. It would be nice if it could be 
split into multiple CRs.

http://gerrit.cloudera.org:8080/#/c/23469/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23469/3//COMMIT_MSG@45
PS3, Line 45: againt
nit: against


http://gerrit.cloudera.org:8080/#/c/23469/3//COMMIT_MSG@62
PS3, Line 62:     - Create table tests in functional_schema_template.sql
            :     - Add TestPaimonScannerWithLimit in test_scanners.py
            :     - Add unit test for AnalyzeDDLTest.java.
            :     - Add unit test for AnalyzerTest.java.
> This patch is quite large. Is it possible to split this patch into two part
+1. Predicate pushdown could also be moved to a separate CR.


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.h
File be/src/exec/paimon/paimon-jni-row-reader.h:

http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.h@41
PS3, Line 41: PaimonJniRowReader
This shares a lot of code with IcebergRowReader. Would it be possible to move 
the common code to some JniRowReader utility class?


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.h@56
PS3, Line 56: inline
nit: no need for the inline keyword


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.cc
File be/src/exec/paimon/paimon-jni-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.cc@41
PS3, Line 41: metadata_scanner
PaimonJniScanner is not only for metadata, right?


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-row-reader.cc@513
PS3, Line 513: IMPALA-12853
This is about Iceberg metadata tables


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.h
File be/src/exec/paimon/paimon-jni-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.h@54
PS3, Line 54: the
nit: -the


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc
File be/src/exec/paimon/paimon-jni-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@55
PS3, Line 55:   /// java PaimonJniScanner.
nit: fits earlier line


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@59
PS3, Line 59:
nit: whitespace


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@61
PS3, Line 61:
nit: too much indent


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@85
PS3, Line 85:
nit: line continuation needs +4 spaces


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@94
PS3, Line 94: bool* eos) {
nit: fits line before


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@142
PS3, Line 142:
nit: redundant space


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scan-node.cc@147
PS3, Line 147:
nit: unnecessary indentation


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h
File be/src/exec/paimon/paimon-jni-scanner.h:

http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@34
PS3, Line 34: const TupleDescriptor* tuple_desc);
nit: fits line before


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@51
PS3, Line 51: field position.
nit: fits line before


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@84
PS3, Line 84: inline
nit: keyword 'inline' is redundant


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@107
PS3, Line 107:
nit: extra space


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.h@140
PS3, Line 140: string
Why don't we need the 'std::' prefix here? Seems like we somehow include a 
use/using statement.


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc
File be/src/exec/paimon/paimon-jni-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@101
PS3, Line 101: paimon_jni_scanner_scan_table_);
nit: fits line before


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@147
PS3, Line 147:
nit: need +4 indent


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@165
PS3, Line 165:
nit: need +4 indent


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@168
PS3, Line 168:
nit: need +4 indent


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@183
PS3, Line 183:
nit: need +4 indent


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@186
PS3, Line 186:
nit: need +4 indent


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@203
PS3, Line 203:
nit: need +4 indent


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@205
PS3, Line 205:
nit: need +4 indent


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@235
PS3, Line 235:
nit: needs +4 indent


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/exec/paimon/paimon-jni-scanner.cc@284
PS3, Line 284: Metadata table
We might scan a regular table


http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

http://gerrit.cloudera.org:8080/#/c/23469/3/be/src/runtime/descriptors.cc@377
PS3, Line 377: ]
Do we need the ']'?


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/Table.java@672
PS3, Line 672: i
nit: above we use 'i' because of Iceberg


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java:

http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@141
PS3, Line 141: msTbl.getSd().getInputFormat()
nit: fits line before


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@459
PS3, Line 459: IcebergStructField
TStructField?


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonColumn.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonColumn.java:

http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonColumn.java@24
PS3, Line 24: /**
nit: add empty line after the imports


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonHiveTypeUtils.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonHiveTypeUtils.java:

http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonHiveTypeUtils.java@196
PS3, Line 196:       if (!(type instanceof StructTypeInfo)) {
             :         if (type instanceof MapTypeInfo) {
             :           MapTypeInfo mapTypeInfo = (MapTypeInfo)type;
             :           return 
DataTypes.MAP(visit(mapTypeInfo.getMapKeyTypeInfo(), visitor),
             :               visit(mapTypeInfo.getMapValueTypeInfo(), visitor));
             :         } else if (type instanceof ListTypeInfo) {
             :           ListTypeInfo listTypeInfo = (ListTypeInfo)type;
             :           return DataTypes.ARRAY(
             :               visit(listTypeInfo.getListElementTypeInfo(), 
visitor));
             :         } else {
             :           return visitor.atomic(type);
             :         }
             :       } else {
             :         StructTypeInfo structTypeInfo = (StructTypeInfo)type;
             :         ArrayList<String> fieldNames = 
structTypeInfo.getAllStructFieldNames();
             :         ArrayList<TypeInfo> typeInfos = structTypeInfo
             :             .getAllStructFieldTypeInfos();
             :         RowType.Builder builder = RowType.builder();
             :
             :         for(int i = 0; i < fieldNames.size(); ++i) {
             :           builder.field((String)fieldNames.get(i),
             :               visit((TypeInfo)typeInfos.get(i), visitor));
             :         }
             :
             :         return builder.build();
             :       }
The structure of this if statement is a bit weird. Why don't we have something 
like:

 if (type instanceof StructTypeInfo) {

 } else if (type instanceof MapTypeInfo) {

 } else if (type instanceof ListTypeInfo) {

 } else {

 }


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java:

http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@31
PS3, Line 31: import org.apache.impala.catalog.IcebergColumn;
Unused import


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java:

http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@873
PS3, Line 873:
nit: too much indent


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@874
PS3, Line 874:
nit: too much indent


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@910
PS3, Line 910:
nit: too much indent


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java
File fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java:

http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@151
PS3, Line 151:  
nit: too much indent


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@152
PS3, Line 152:
nit: need +4 indent


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@199
PS3, Line 199:
nit: extra indent and missing empty line before this


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@225
PS3, Line 225: FileSystemUtil.listFiles
This can be time consuming if we do it for each file. Can we add a timer for 
this method?

Did you try running queries against Paimon tables with >10,000 files?


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@234
PS3, Line 234: 50010
Can we get it from configuration?


http://gerrit.cloudera.org:8080/#/c/23469/3/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@254
PS3, Line 254:
nit: too much indent



--
To view, visit http://gerrit.cloudera.org:8080/23469
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0ecab952622e99109927056461d3d5d12f4b295f
Gerrit-Change-Number: 23469
Gerrit-PatchSet: 3
Gerrit-Owner: ji chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 09 Oct 2025 13:11:30 +0000
Gerrit-HasComments: Yes

Reply via email to