Hello Marton Greber, Xuebin Su, Kudu Jenkins, Abhishek Chennaka,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/22868

to look at the new patch set (#19).

Change subject: KUDU-1261 writing/reading of array data blocks
......................................................................

KUDU-1261 writing/reading of array data blocks

The CFile reader and writer are updated to support reading and writing
nullable array data blocks as per the documented specification [1].
The new functionality is covered by newly added tests in cfile-test.cc.
Follow-up patches (such as [2]) contain end-to-end tests using Kudu
mini-cluster and Kudu C++ client.

With this patch, writing/reading of array data works end-to-end
of all the existing scalar types except for INT128/DECIMAL128
(the latter isn't supported by the serdes layer, but everything
else can support 128-bit integers).

Below is the list of caveats (TODOs) known at the time of writing:
  * the DICTIONARY encoder for string/binaries needs updating to
    allow for masking 'block is full' bit; as of this changelist
    storing strings/binaries works with PLAIN and PREFIX_ENCODING,
    but has edge cases where it's not working in DICTIONARY encoding
  * memory management isn't yet optimal when reading array cells
    in CFileIterator
  * maximum number of elements in an array isn't yet configurable
  * the configured maximum number of elements in an array isn't yet
    enforced
  * for future proofing (e.g., thinking of switching to Apache Arrow
    format instead of Flatbuffers) it's necessary to add meta-info,
    so the sender and the recipient can tell what format to use for
    for exchanging the data of NESTED data types
  * support for array data blocks needs to be explicitly enabled
    by setting --cfile_support_arrays=true

[1] https://gerrit.cloudera.org/#/c/22058
[2] https://gerrit.cloudera.org/#/c/23220

Change-Id: I5825c939cb40d350a23a78609115f26e62cca270
---
M src/kudu/cfile/binary_dict_block.cc
M src/kudu/cfile/binary_dict_block.h
M src/kudu/cfile/binary_plain_block.cc
M src/kudu/cfile/binary_plain_block.h
M src/kudu/cfile/binary_prefix_block.cc
M src/kudu/cfile/binary_prefix_block.h
M src/kudu/cfile/block_encodings.h
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/cfile/plain_bitmap_block.h
M src/kudu/cfile/plain_block.h
M src/kudu/cfile/rle_block.h
M src/kudu/common/columnblock-test-util.h
M src/kudu/common/columnblock.h
M src/kudu/common/rowblock.h
M src/kudu/common/rowblock_memory.h
M src/kudu/tablet/multi_column_writer.cc
23 files changed, 1,511 insertions(+), 204 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/22868/19
--
To view, visit http://gerrit.cloudera.org:8080/22868
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5825c939cb40d350a23a78609115f26e62cca270
Gerrit-Change-Number: 22868
Gerrit-PatchSet: 19
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Xuebin Su <[email protected]>

Reply via email to