[Impala-ASF-CR] IMPALA-14092 Part2: Support querying of paimon data table via JNI

Riza Suminto (Code Review) Mon, 01 Dec 2025 14:50:47 -0800

Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23613 )


Change subject: IMPALA-14092 Part2: Support querying of paimon data table via 
JNI
......................................................................


Patch Set 11:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/23613/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23613/11//COMMIT_MSG@23
PS11, Line 23: .
nit: missing space after.


http://gerrit.cloudera.org:8080/#/c/23613/11//COMMIT_MSG@35
PS11, Line 35: And PaimonJniScanner will pass the arrow offheap
             :   record batch memory pointer to the BE backend.
Can you elaborate a bit more about the lifetime of this arrow recordbatch?
When/who will release the memory?

I see it only capped by num rows per batch (1024). So in practice, the arrow 
recordbatch can be very big (say, reading hundreds of varlen columns)?


http://gerrit.cloudera.org:8080/#/c/23613/11/be/src/exec/paimon/paimon-jni-scan-node.h
File be/src/exec/paimon/paimon-jni-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/23613/11/be/src/exec/paimon/paimon-jni-scan-node.h@51
PS11, Line 51: /// 2. Backend:  Creates an PaimonJniScanner object on the Java 
heap.
Will there be 1 PaimonJniScanner per scan fragment instance?


http://gerrit.cloudera.org:8080/#/c/23613/11/common/thrift/Types.thrift
File common/thrift/Types.thrift:

http://gerrit.cloudera.org:8080/#/c/23613/11/common/thrift/Types.thrift@80
PS11, Line 80: Iceberg&Paimon
nit: Iceberg and Paimon


http://gerrit.cloudera.org:8080/#/c/23613/11/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java
File fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java:

http://gerrit.cloudera.org:8080/#/c/23613/11/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@242
PS11, Line 242:   @Override
              :   public void computeNodeResourceProfile(TQueryOptions 
queryOptions) {
              :     // current arrow batch size is 1024, so estimated bytes
              :     // is calcualted as 1024 * average row size
              :     long memSize = 1024L * (long)getAvgRowSize();
              :     nodeResourceProfile_ = new ResourceProfileBuilder()
              :         .setMemEstimateBytes(4096 + memSize).build();
              :   }
Can you explain how this is calculated? Is this follow some existing 
implementation?

Is there any minimum memory required to start PaimonScanNode during Open phase?
Is it possible to have avg row size = 0? If not, please add Preconditions.

Impala has query option BATCH_SIZE that defaults to 1024. Should this follow 
the BATCH_SIZE option instead?


http://gerrit.cloudera.org:8080/#/c/23613/11/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java
File fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java:

http://gerrit.cloudera.org:8080/#/c/23613/11/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java@99
PS11, Line 99: 1024 * 1024, 4 * 1024 * 1024 * 1024L
Create a constant for this values and put comment what the constant is about.
Constants from org.apache.impala.common.ByteUnits can help expressing this.


http://gerrit.cloudera.org:8080/#/c/23613/11/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java@114
PS11, Line 114:
              :   public long GetNextBatch(long [] address) {
A method docs/comment is helpful here, because this is the main part of reading 
from Paimon.



--
To view, visit http://gerrit.cloudera.org:8080/23613
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384
Gerrit-Change-Number: 23613
Gerrit-PatchSet: 11
Gerrit-Owner: ji chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: ji chen <[email protected]>
Gerrit-Comment-Date: Mon, 01 Dec 2025 22:49:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-14092 Part2: Support querying of paimon data table via JNI

Reply via email to