ji chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23613 )

Change subject: IMPALA-14092 Part2: Support querying of paimon data table via 
JNI
......................................................................


Patch Set 12:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/23613/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23613/11//COMMIT_MSG@23
PS11, Line 23: .
> nit: missing space after.
Done


http://gerrit.cloudera.org:8080/#/c/23613/11//COMMIT_MSG@35
PS11, Line 35: And PaimonJniScanner will pass the arrow offheap
             :   record batch memory pointer to the BE backend.
> Can you elaborate a bit more about the lifetime of this arrow recordbatch?
Done??the batch size can be adjusted, will implmenent in the next revision.


http://gerrit.cloudera.org:8080/#/c/23613/11/be/src/exec/paimon/paimon-jni-scan-node.h
File be/src/exec/paimon/paimon-jni-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/23613/11/be/src/exec/paimon/paimon-jni-scan-node.h@51
PS11, Line 51: /// 2. Backend:  Creates an PaimonJniScanner object on the Java 
heap.
> Will there be 1 PaimonJniScanner per scan fragment instance?
yes, 1 PaimonJniScanner per scan fragment instance?


http://gerrit.cloudera.org:8080/#/c/23613/11/common/thrift/Types.thrift
File common/thrift/Types.thrift:

http://gerrit.cloudera.org:8080/#/c/23613/11/common/thrift/Types.thrift@80
PS11, Line 80: Iceberg and Pa
> nit: Iceberg and Paimon
Done


http://gerrit.cloudera.org:8080/#/c/23613/11/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java
File fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java:

http://gerrit.cloudera.org:8080/#/c/23613/11/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@242
PS11, Line 242:     numNodes_ = Math.max(totalNodes, 1);
              :     numInstances_ = Math.max(totalInstances, 1);
              :   }
              :
              :   @Override
              :   public void computeNodeResourceProfile(TQueryOptions 
queryOptions) {
              :     // current batch size is from query options, so estimated 
bytes
              :
> Can you explain how this is calculated? Is this follow some existing implem
?. current calculation is similiar with memoryEstimateForFetchingColumns in 
HbaseScanNode, will sum up the bytes consumed for each used column to get the 
average row size , since batch size is 1024, so need to multiply by 1024. there 
are no concret foluma for arrow batch, so initially use this formula.
2. the avgRowsize_ is calculated by function estimateAvgRowSize, the min value 
is PAIMON_ROW_AVG_SIZE_OVERHEAD, so it is always positive value.
3. sure , will implement this in the next revision.


http://gerrit.cloudera.org:8080/#/c/23613/11/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java
File fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java:

http://gerrit.cloudera.org:8080/#/c/23613/11/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java@99
PS11, Line 99: lits_.add(SerializationUtils.deseria
> Create a constant for this values and put comment what the constant is abou
will introduce param for upper limit of allocator. 1st argument will be removed 
in the next revision.


http://gerrit.cloudera.org:8080/#/c/23613/11/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java@114
PS11, Line 114:     // get mem limit
              :     allocator_mem_limit_ = paimonJniScanParam
> A method docs/comment is helpful here, because this is the main part of rea
Done
Below is a sample for memory limit checking:
[localhost:21050] default> select * from functional_parquet.paimon_partitioned;
Query: select * from functional_parquet.paimon_partitioned
Query submitted at: 2025-12-02 23:41:30 (Coordinator: http://lisa:25000)
Query state can be monitored at: 
http://lisa:25000/query_plan?query_id=97435735f16f20d4:383aa2a600000000
2025-12-02 23:41:30 [Exception]  ERROR: Query 97435735f16f20d4:383aa2a600000000 
failed:
Memory limit exceeded



--
To view, visit http://gerrit.cloudera.org:8080/23613
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384
Gerrit-Change-Number: 23613
Gerrit-PatchSet: 12
Gerrit-Owner: ji chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: ji chen <[email protected]>
Gerrit-Comment-Date: Tue, 02 Dec 2025 15:47:40 +0000
Gerrit-HasComments: Yes

Reply via email to