Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/23493 )
Change subject: IMPALA-14472: Add create/read support for ARRAY column of Kudu ...................................................................... Patch Set 4: (6 comments) http://gerrit.cloudera.org:8080/#/c/23493/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23493/4//COMMIT_MSG@12 PS4, Line 12: > As for docs, there isn't anything published yet, except for the documentati Thanks for the info! I think that some of this should be added in comments / commit message, e.g.: "Unlike rows the elements of arrays are stored in a different format than Impala. Instead of per-row bit flag for NULL info, values and NULL bits are stored in separate arrays." http://gerrit.cloudera.org:8080/#/c/23493/4/be/src/exec/kudu/kudu-scanner.cc File be/src/exec/kudu/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/23493/4/be/src/exec/kudu/kudu-scanner.cc@469 PS4, Line 469: result.ptr = item_tuple_mem_pool->Allocate(total_tuple_byte_size); > Sorry, I don't have an answer for this question. I got answer for this by Alexei in the commit message. Currently we double copy arrays (we copy them here, and then line 551 deep copies the surviving tuples again), but this may be necessary in some cases due to the different layout. http://gerrit.cloudera.org:8080/#/c/23493/4/be/src/exec/kudu/kudu-scanner.cc@537 PS4, Line 537: tuple_data_pool This memory gets deepcopied again on line 551 for surviving rows, so is not necessary to attach these buffers to the row batch. Another solution would be to modify deepcopy to ignore arrays, but this would be still attaching array memory even for those rows that are dropped by predicates. My suggestion about lazy materialization line 543 would also help with this. http://gerrit.cloudera.org:8080/#/c/23493/4/be/src/exec/kudu/kudu-util-ir.cc File be/src/exec/kudu/kudu-util-ir.cc: http://gerrit.cloudera.org:8080/#/c/23493/4/be/src/exec/kudu/kudu-util-ir.cc@165 PS4, Line 165: // KuduScanner::PAD_UNIXTIME_MICROS_TO_16_BYTES does not apply to array elements. > As of now, data layout within a array cell isn't affected by any of the exi Thanks for the info! I think that while this is not ideal, we can live with it, and it is not clear to me how much we would win with this - it saves work on Impala side, but adds work on Kudu side. If Impala can drop most rows based on top level predicates then actually this conversion on Kudu side may make things slower, as it has to be done for each row. I think that a Kudu ticket would be useful, even it won't be implemented in the end. http://gerrit.cloudera.org:8080/#/c/23493/4/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: http://gerrit.cloudera.org:8080/#/c/23493/4/tests/query_test/test_kudu.py@1868 PS4, Line 1868: functional_kudu > test_unsupported_types is a negative tests, so this DDL will not succeed. ack http://gerrit.cloudera.org:8080/#/c/23493/4/tests/query_test/test_kudu.py@1966 PS4, Line 1966: expected_tuples = list(zip( : range(len(sorted_tuples)), : *[EXPECTED_COLUMNS[item_type] for item_type in types] : )) > Since VARCHAR(1) testcase use UTF-8 chars, it is only possible to validate ack - is there a ticket for this test issue? -- To view, visit http://gerrit.cloudera.org:8080/23493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9282aac821bd30668189f84b2ed8fff7047e7310 Gerrit-Change-Number: 23493 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Xuebin Su <[email protected]> Gerrit-Comment-Date: Thu, 09 Oct 2025 13:15:08 +0000 Gerrit-HasComments: Yes
