Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/23613 )
Change subject: IMPALA-14092 Part2: Support querying of paimon data table via JNI ...................................................................... Patch Set 11: (7 comments) http://gerrit.cloudera.org:8080/#/c/23613/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23613/11//COMMIT_MSG@23 PS11, Line 23: . nit: missing space after. http://gerrit.cloudera.org:8080/#/c/23613/11//COMMIT_MSG@35 PS11, Line 35: And PaimonJniScanner will pass the arrow offheap : record batch memory pointer to the BE backend. Can you elaborate a bit more about the lifetime of this arrow recordbatch? When/who will release the memory? I see it only capped by num rows per batch (1024). So in practice, the arrow recordbatch can be very big (say, reading hundreds of varlen columns)? http://gerrit.cloudera.org:8080/#/c/23613/11/be/src/exec/paimon/paimon-jni-scan-node.h File be/src/exec/paimon/paimon-jni-scan-node.h: http://gerrit.cloudera.org:8080/#/c/23613/11/be/src/exec/paimon/paimon-jni-scan-node.h@51 PS11, Line 51: /// 2. Backend: Creates an PaimonJniScanner object on the Java heap. Will there be 1 PaimonJniScanner per scan fragment instance? http://gerrit.cloudera.org:8080/#/c/23613/11/common/thrift/Types.thrift File common/thrift/Types.thrift: http://gerrit.cloudera.org:8080/#/c/23613/11/common/thrift/Types.thrift@80 PS11, Line 80: Iceberg&Paimon nit: Iceberg and Paimon http://gerrit.cloudera.org:8080/#/c/23613/11/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java File fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java: http://gerrit.cloudera.org:8080/#/c/23613/11/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@242 PS11, Line 242: @Override : public void computeNodeResourceProfile(TQueryOptions queryOptions) { : // current arrow batch size is 1024, so estimated bytes : // is calcualted as 1024 * average row size : long memSize = 1024L * (long)getAvgRowSize(); : nodeResourceProfile_ = new ResourceProfileBuilder() : .setMemEstimateBytes(4096 + memSize).build(); : } Can you explain how this is calculated? Is this follow some existing implementation? Is there any minimum memory required to start PaimonScanNode during Open phase? Is it possible to have avg row size = 0? If not, please add Preconditions. Impala has query option BATCH_SIZE that defaults to 1024. Should this follow the BATCH_SIZE option instead? http://gerrit.cloudera.org:8080/#/c/23613/11/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java File fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java: http://gerrit.cloudera.org:8080/#/c/23613/11/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java@99 PS11, Line 99: 1024 * 1024, 4 * 1024 * 1024 * 1024L Create a constant for this values and put comment what the constant is about. Constants from org.apache.impala.common.ByteUnits can help expressing this. http://gerrit.cloudera.org:8080/#/c/23613/11/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java@114 PS11, Line 114: : public long GetNextBatch(long [] address) { A method docs/comment is helpful here, because this is the main part of reading from Paimon. -- To view, visit http://gerrit.cloudera.org:8080/23613 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384 Gerrit-Change-Number: 23613 Gerrit-PatchSet: 11 Gerrit-Owner: ji chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: ji chen <[email protected]> Gerrit-Comment-Date: Mon, 01 Dec 2025 22:49:55 +0000 Gerrit-HasComments: Yes
