[Impala-ASF-CR] IMPALA-14092 Part2: Support querying of paimon data table via JNI

ji chen (Code Review) Fri, 07 Nov 2025 05:03:51 -0800

ji chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23613 )


Change subject: IMPALA-14092 Part2: Support querying of paimon data table via 
JNI
......................................................................


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/23613/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23613/5//COMMIT_MSG@32
PS5, Line 32:   To minimize the overhead, we refashioned the implementation,
            :   the PaimonJniScanner will convert the paimon row batches to
            :   arrow recordbatch, which stores data in offheap region of
            :   impala JVM. And PaimonJniScanner will pass the arrow offheap
            :   record batch memory pointer to the BE backend.
            :   BE PaimonJniScanNode will directly read data from JVM offheap
            :   region, and convert the arrow record batch to impala row batch.
> This might be the first inclusion of Apache Arrow library into Apache Impal
It is a very good question, currently, allocation of offheap memory is not from 
impala memory pool,the memory allocated will goes to Untracked memory, To track 
the offheap memory usage of paimon scanner, I will use memtracker to track the 
memory usage of each arrow batch.


http://gerrit.cloudera.org:8080/#/c/23613/5/fe/src/main/java/org/apache/impala/catalog/Column.java
File fe/src/main/java/org/apache/impala/catalog/Column.java:

http://gerrit.cloudera.org:8080/#/c/23613/5/fe/src/main/java/org/apache/impala/catalog/Column.java@172
PS5, Line 172: IcebergStructField
> Would it be better to have dedicated PaimonStructField for this?
currently, need to reuse the field_id in IcebergStructField, to change this, it 
need to change fe/be code and even thrift file, will need to touch too much 
file, that is the reason I  decide to have dedicated PaimonStructField.


http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/datasets/functional/functional_schema_template.sql@4845
PS5, Line 4845: paimon_primitive_alltypes
> The paimon datasets is growing with this patch.
Done


http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/workloads/functional-query/queries/QueryTest/paimon-query.test
File testdata/workloads/functional-query/queries/QueryTest/paimon-query.test:

http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/workloads/functional-query/queries/QueryTest/paimon-query.test@23
PS5, Line 23: ts_value
> Will this timestamp affected by Daylight Saving Time? If yes, please skip i
thanks, will double check


http://gerrit.cloudera.org:8080/#/c/23613/5/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/23613/5/tests/query_test/test_scanners.py@2040
PS5, Line 2040: # Use a small batch size so changing the limit affects the 
timing of cancellation
> Move this to L2026 where you set batch_size=100.
Done



--
To view, visit http://gerrit.cloudera.org:8080/23613
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384
Gerrit-Change-Number: 23613
Gerrit-PatchSet: 5
Gerrit-Owner: ji chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: ji chen <[email protected]>
Gerrit-Comment-Date: Fri, 07 Nov 2025 13:01:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-14092 Part2: Support querying of paimon data table via JNI

Reply via email to