Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23613 )

Change subject: IMPALA-14092 Part2: Support querying of paimon data table via 
JNI
......................................................................


Patch Set 5:

(5 comments)

Need to do more passes, but here is my initial review.

http://gerrit.cloudera.org:8080/#/c/23613/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23613/5//COMMIT_MSG@32
PS5, Line 32:   To minimize the overhead, we refashioned the implementation,
            :   the PaimonJniScanner will convert the paimon row batches to
            :   arrow recordbatch, which stores data in offheap region of
            :   impala JVM. And PaimonJniScanner will pass the arrow offheap
            :   record batch memory pointer to the BE backend.
            :   BE PaimonJniScanNode will directly read data from JVM offheap
            :   region, and convert the arrow record batch to impala row batch.
This might be the first inclusion of Apache Arrow library into Apache Impala 
codebase.

Can you explain how the resource/memory tracking is done here?


http://gerrit.cloudera.org:8080/#/c/23613/5/fe/src/main/java/org/apache/impala/catalog/Column.java
File fe/src/main/java/org/apache/impala/catalog/Column.java:

http://gerrit.cloudera.org:8080/#/c/23613/5/fe/src/main/java/org/apache/impala/catalog/Column.java@172
PS5, Line 172: IcebergStructField
Would it be better to have dedicated PaimonStructField for this?


http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/datasets/functional/functional_schema_template.sql@4845
PS5, Line 4845: paimon_primitive_alltypes
The paimon datasets is growing with this patch.
Can you please add README.md under ./testdata/data/paimon_test/ describing each 
tables and their schema?


http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/workloads/functional-query/queries/QueryTest/paimon-query.test
File testdata/workloads/functional-query/queries/QueryTest/paimon-query.test:

http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/workloads/functional-query/queries/QueryTest/paimon-query.test@23
PS5, Line 23: ts_value
Will this timestamp affected by Daylight Saving Time? If yes, please skip it so 
test won't fail when DST begins/ends.


http://gerrit.cloudera.org:8080/#/c/23613/5/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/23613/5/tests/query_test/test_scanners.py@2040
PS5, Line 2040: # Use a small batch size so changing the limit affects the 
timing of cancellation
Move this to L2026 where you set batch_size=100.



--
To view, visit http://gerrit.cloudera.org:8080/23613
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384
Gerrit-Change-Number: 23613
Gerrit-PatchSet: 5
Gerrit-Owner: ji chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Wed, 05 Nov 2025 17:35:05 +0000
Gerrit-HasComments: Yes

Reply via email to