Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/23613 )
Change subject: IMPALA-14092 Part2: Support querying of paimon data table via JNI ...................................................................... Patch Set 5: (5 comments) Need to do more passes, but here is my initial review. http://gerrit.cloudera.org:8080/#/c/23613/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23613/5//COMMIT_MSG@32 PS5, Line 32: To minimize the overhead, we refashioned the implementation, : the PaimonJniScanner will convert the paimon row batches to : arrow recordbatch, which stores data in offheap region of : impala JVM. And PaimonJniScanner will pass the arrow offheap : record batch memory pointer to the BE backend. : BE PaimonJniScanNode will directly read data from JVM offheap : region, and convert the arrow record batch to impala row batch. This might be the first inclusion of Apache Arrow library into Apache Impala codebase. Can you explain how the resource/memory tracking is done here? http://gerrit.cloudera.org:8080/#/c/23613/5/fe/src/main/java/org/apache/impala/catalog/Column.java File fe/src/main/java/org/apache/impala/catalog/Column.java: http://gerrit.cloudera.org:8080/#/c/23613/5/fe/src/main/java/org/apache/impala/catalog/Column.java@172 PS5, Line 172: IcebergStructField Would it be better to have dedicated PaimonStructField for this? http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/datasets/functional/functional_schema_template.sql@4845 PS5, Line 4845: paimon_primitive_alltypes The paimon datasets is growing with this patch. Can you please add README.md under ./testdata/data/paimon_test/ describing each tables and their schema? http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/workloads/functional-query/queries/QueryTest/paimon-query.test File testdata/workloads/functional-query/queries/QueryTest/paimon-query.test: http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/workloads/functional-query/queries/QueryTest/paimon-query.test@23 PS5, Line 23: ts_value Will this timestamp affected by Daylight Saving Time? If yes, please skip it so test won't fail when DST begins/ends. http://gerrit.cloudera.org:8080/#/c/23613/5/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/23613/5/tests/query_test/test_scanners.py@2040 PS5, Line 2040: # Use a small batch size so changing the limit affects the timing of cancellation Move this to L2026 where you set batch_size=100. -- To view, visit http://gerrit.cloudera.org:8080/23613 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384 Gerrit-Change-Number: 23613 Gerrit-PatchSet: 5 Gerrit-Owner: ji chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Wed, 05 Nov 2025 17:35:05 +0000 Gerrit-HasComments: Yes
