[Impala-ASF-CR] IMPALA-14472: Add create/read support for ARRAY column of Kudu

Csaba Ringhofer (Code Review) Fri, 17 Oct 2025 21:55:40 -0700

Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23493 )


Change subject: IMPALA-14472: Add create/read support for ARRAY column of Kudu
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/23493/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23493/4//COMMIT_MSG@12
PS4, Line 12:
> As for docs, there isn't anything published yet, except for the documentati
Thanks for the info!

I think that some of this should be added in comments / commit message, e.g.:
"Unlike rows the elements of arrays are stored in a different format than 
Impala. Instead of per-row bit flag for NULL info, values and NULL bits are 
stored in separate arrays."


http://gerrit.cloudera.org:8080/#/c/23493/4/be/src/exec/kudu/kudu-scanner.cc
File be/src/exec/kudu/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/23493/4/be/src/exec/kudu/kudu-scanner.cc@469
PS4, Line 469:     result.ptr = 
item_tuple_mem_pool->Allocate(total_tuple_byte_size);
> Sorry, I don't have an answer for this question.
I got answer for this by Alexei in the commit message.

Currently we double copy arrays (we copy them here, and then line 551 deep 
copies the surviving tuples again), but this may be necessary in some cases due 
to the different layout.


http://gerrit.cloudera.org:8080/#/c/23493/4/be/src/exec/kudu/kudu-scanner.cc@537
PS4, Line 537: tuple_data_pool
This memory gets deepcopied again on line 551 for surviving rows, so is not 
necessary to attach these buffers to the row batch. Another solution would be 
to modify deepcopy to ignore arrays, but this would be still attaching array 
memory even for those rows that are dropped by predicates.

My suggestion about lazy materialization line 543 would also help with this.


http://gerrit.cloudera.org:8080/#/c/23493/4/be/src/exec/kudu/kudu-util-ir.cc
File be/src/exec/kudu/kudu-util-ir.cc:

http://gerrit.cloudera.org:8080/#/c/23493/4/be/src/exec/kudu/kudu-util-ir.cc@165
PS4, Line 165:   // KuduScanner::PAD_UNIXTIME_MICROS_TO_16_BYTES does not apply 
to array elements.
> As of now, data layout within a array cell isn't affected by any of the exi
Thanks for the info!

I think that while this is not ideal, we can live with it, and it is not clear 
to me how much we would win with this - it saves work on Impala side, but adds 
work on Kudu side. If Impala can drop most rows based on top level predicates 
then actually this conversion on Kudu side may make things slower, as it has to 
be done for each row.

I think that a Kudu ticket would be useful, even it won't be implemented in the 
end.


http://gerrit.cloudera.org:8080/#/c/23493/4/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/23493/4/tests/query_test/test_kudu.py@1868
PS4, Line 1868: functional_kudu
> test_unsupported_types is a negative tests, so this DDL will not succeed.
ack


http://gerrit.cloudera.org:8080/#/c/23493/4/tests/query_test/test_kudu.py@1966
PS4, Line 1966:     expected_tuples = list(zip(
              :         range(len(sorted_tuples)),
              :         *[EXPECTED_COLUMNS[item_type] for item_type in types]
              :     ))
> Since VARCHAR(1) testcase use UTF-8 chars, it is only possible to validate
ack - is there a ticket for this test issue?



--
To view, visit http://gerrit.cloudera.org:8080/23493
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9282aac821bd30668189f84b2ed8fff7047e7310
Gerrit-Change-Number: 23493
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Xuebin Su <[email protected]>
Gerrit-Comment-Date: Thu, 09 Oct 2025 13:15:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-14472: Add create/read support for ARRAY column of Kudu

Reply via email to