ji chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/23613 )
Change subject: IMPALA-14092 Part2: Support querying of paimon data table via JNI ...................................................................... Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/23613/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23613/5//COMMIT_MSG@32 PS5, Line 32: To minimize the overhead, we refashioned the implementation, : the PaimonJniScanner will convert the paimon row batches to : arrow recordbatch, which stores data in offheap region of : impala JVM. And PaimonJniScanner will pass the arrow offheap : record batch memory pointer to the BE backend. : BE PaimonJniScanNode will directly read data from JVM offheap : region, and convert the arrow record batch to impala row batch. > This might be the first inclusion of Apache Arrow library into Apache Impal It is a very good question, currently, allocation of offheap memory is not from impala memory pool,the memory allocated will goes to Untracked memory, To track the offheap memory usage of paimon scanner, I will use memtracker to track the memory usage of each arrow batch. http://gerrit.cloudera.org:8080/#/c/23613/5/fe/src/main/java/org/apache/impala/catalog/Column.java File fe/src/main/java/org/apache/impala/catalog/Column.java: http://gerrit.cloudera.org:8080/#/c/23613/5/fe/src/main/java/org/apache/impala/catalog/Column.java@172 PS5, Line 172: IcebergStructField > Would it be better to have dedicated PaimonStructField for this? currently, need to reuse the field_id in IcebergStructField, to change this, it need to change fe/be code and even thrift file, will need to touch too much file, that is the reason I decide to have dedicated PaimonStructField. http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/datasets/functional/functional_schema_template.sql@4845 PS5, Line 4845: paimon_primitive_alltypes > The paimon datasets is growing with this patch. Done http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/workloads/functional-query/queries/QueryTest/paimon-query.test File testdata/workloads/functional-query/queries/QueryTest/paimon-query.test: http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/workloads/functional-query/queries/QueryTest/paimon-query.test@23 PS5, Line 23: ts_value > Will this timestamp affected by Daylight Saving Time? If yes, please skip i thanks, will double check http://gerrit.cloudera.org:8080/#/c/23613/5/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/23613/5/tests/query_test/test_scanners.py@2040 PS5, Line 2040: # Use a small batch size so changing the limit affects the timing of cancellation > Move this to L2026 where you set batch_size=100. Done -- To view, visit http://gerrit.cloudera.org:8080/23613 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384 Gerrit-Change-Number: 23613 Gerrit-PatchSet: 5 Gerrit-Owner: ji chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: ji chen <[email protected]> Gerrit-Comment-Date: Fri, 07 Nov 2025 13:01:31 +0000 Gerrit-HasComments: Yes
