Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23493 )

Change subject: IMPALA-14472: Add create/read support for ARRAY column of Kudu
......................................................................


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/23493/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23493/10//COMMIT_MSG@13
PS10, Line 13: Unlike rows, the elements of a Kudu array are stored in a 
different
             : format than Impala. Instead of per-row bit flag for NULL info, 
values
             : and NULL bits are stored in separate arrays.
Just to make sure we are on the same page here: the cell validity/nullability 
bits for the row in Kudu are stored in per-row bitset, same as for any 
non-array (scalar) cell.  But the validity of array's elements within a 
particular array cell in a row isn't a part of row validity bitset: yes, that's 
indeed so.

Just for my edification: does Impala store validity bitset of individual 
elements of an array cell along with per-row cell nullability info?  If so, how 
cumbersome is to access the validity bit of a particular cell in a row if array 
cells have arbitrary number of elements?


http://gerrit.cloudera.org:8080/#/c/23493/10/be/src/exec/kudu/kudu-scanner.cc
File be/src/exec/kudu/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/23493/10/be/src/exec/kudu/kudu-scanner.cc@485
PS10, Line 485: auto kudu_elem_byte_size
nit: could this be 'const'?


http://gerrit.cloudera.org:8080/#/c/23493/10/be/src/exec/kudu/kudu-scanner.cc@488
PS10, Line 488: slot->children_tuple_descriptor()->byte_size())
Is there any types that are the same size between Impala and Kudu, i.e. 
kudu_elem_byte_size == slot->children_tuple_descriptor()->byte_size()?  If yes, 
does it make sense to avoid per-element data copying like it's done here below, 
and just do memcpy() of the whole array buffer, given that the layout of 
elements seems to be the same?  After copying over the whole buffer in one 
batch, all what's left is to call SetNull() for elements which are invalid/null.

What do you think?


http://gerrit.cloudera.org:8080/#/c/23493/10/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/23493/10/tests/query_test/test_kudu.py@2013
PS10, Line 2013: UNNEST(array_{0})
Maybe this isn't exactly in the context of this changelist, but I'm curious: 
does Impala support unnesting a set of array columns, producing multiple 
columns as a result?

select * from unnest(ARRAY[1,2], ARRAY['foo','bar','baz']) as x(a,b) ?

 a |  b
---+-----
 1 | foo
 2 | bar
   | baz

If yes, consider adding a scenario involving data from Kudu tables with array 
columns.



--
To view, visit http://gerrit.cloudera.org:8080/23493
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9282aac821bd30668189f84b2ed8fff7047e7310
Gerrit-Change-Number: 23493
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Xuebin Su <[email protected]>
Gerrit-Comment-Date: Tue, 21 Oct 2025 04:17:36 +0000
Gerrit-HasComments: Yes

Reply via email to