Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/23493 )
Change subject: IMPALA-14472: Add create/read support for ARRAY column of Kudu ...................................................................... Patch Set 10: (4 comments) http://gerrit.cloudera.org:8080/#/c/23493/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23493/10//COMMIT_MSG@13 PS10, Line 13: Unlike rows, the elements of a Kudu array are stored in a different : format than Impala. Instead of per-row bit flag for NULL info, values : and NULL bits are stored in separate arrays. Just to make sure we are on the same page here: the cell validity/nullability bits for the row in Kudu are stored in per-row bitset, same as for any non-array (scalar) cell. But the validity of array's elements within a particular array cell in a row isn't a part of row validity bitset: yes, that's indeed so. Just for my edification: does Impala store validity bitset of individual elements of an array cell along with per-row cell nullability info? If so, how cumbersome is to access the validity bit of a particular cell in a row if array cells have arbitrary number of elements? http://gerrit.cloudera.org:8080/#/c/23493/10/be/src/exec/kudu/kudu-scanner.cc File be/src/exec/kudu/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/23493/10/be/src/exec/kudu/kudu-scanner.cc@485 PS10, Line 485: auto kudu_elem_byte_size nit: could this be 'const'? http://gerrit.cloudera.org:8080/#/c/23493/10/be/src/exec/kudu/kudu-scanner.cc@488 PS10, Line 488: slot->children_tuple_descriptor()->byte_size()) Is there any types that are the same size between Impala and Kudu, i.e. kudu_elem_byte_size == slot->children_tuple_descriptor()->byte_size()? If yes, does it make sense to avoid per-element data copying like it's done here below, and just do memcpy() of the whole array buffer, given that the layout of elements seems to be the same? After copying over the whole buffer in one batch, all what's left is to call SetNull() for elements which are invalid/null. What do you think? http://gerrit.cloudera.org:8080/#/c/23493/10/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: http://gerrit.cloudera.org:8080/#/c/23493/10/tests/query_test/test_kudu.py@2013 PS10, Line 2013: UNNEST(array_{0}) Maybe this isn't exactly in the context of this changelist, but I'm curious: does Impala support unnesting a set of array columns, producing multiple columns as a result? select * from unnest(ARRAY[1,2], ARRAY['foo','bar','baz']) as x(a,b) ? a | b ---+----- 1 | foo 2 | bar | baz If yes, consider adding a scenario involving data from Kudu tables with array columns. -- To view, visit http://gerrit.cloudera.org:8080/23493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9282aac821bd30668189f84b2ed8fff7047e7310 Gerrit-Change-Number: 23493 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Xuebin Su <[email protected]> Gerrit-Comment-Date: Tue, 21 Oct 2025 04:17:36 +0000 Gerrit-HasComments: Yes
