Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22868 )

Change subject: WIP KUDU-1261 writing/reading of array data blocks
......................................................................


Patch Set 15:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/22868/15/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

http://gerrit.cloudera.org:8080/#/c/22868/15/src/kudu/cfile/cfile_writer.cc@497
PS15, Line 497:         RETURN_NOT_OK(view.Init());
> What happens if `RETURN_NOT_OK` is triggered, masking stays enabled?
What code path of full-block-masking do you think is interspersed with 
RETURN_NOT_OK?  Could you be more specific?


http://gerrit.cloudera.org:8080/#/c/22868/15/src/kudu/cfile/cfile_writer.cc@514
PS15, Line 514:         // Mask the 'block is full' while writing a single 
array.
              :         data_block_->SetBlockFullMasked(true);
> IIUC very large arrays can cause blocks to exceed size limits dramatically.

Right: going over the configured size limit happens even now, without arrays.  
With arrays the overshoot might be quite high even with default settings for 
the CFile block size, correct.

> How are you planning to solve this?

The only action item on this that I have now is to do look-ahead and avoid 
writing next array element into the block if it's about to overshoot the 
configured size limit.  However, that's not an ultimate solution, because a 
single array might be very long (up to 64K elements due to the limitations of 
the array datablock metadata) and the elements might be strings/binaries.  
Otherwise, I'm not planning to do anything else to address it in this 
iteration, except for adding configurable limits on the size of variable-length 
array elements (i.e. BINARY/STRING) and also a configurable limit on the number 
of elements in an array cell.

BTW, what particular issue do you see with going over the configured block size 
limit?  Basically, do you see some particular problems (maybe a critical ones) 
that arise from the fact that the configured block size might be overshoot, 
say, 2x times?



--
To view, visit http://gerrit.cloudera.org:8080/22868
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5825c939cb40d350a23a78609115f26e62cca270
Gerrit-Change-Number: 22868
Gerrit-PatchSet: 15
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Xuebin Su <[email protected]>
Gerrit-Comment-Date: Tue, 23 Sep 2025 20:49:11 +0000
Gerrit-HasComments: Yes

Reply via email to