Alexey Serbin has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/23056


Change subject: WIP [thirdparty] introduce FlatBuffers
......................................................................

WIP [thirdparty] introduce FlatBuffers

WIP:
  * collect initial feedback
  * how hard is it to bring in the Apache Arrow IPC format right
    now just for serializing array data type cells?
  * remove the generated file and auto-generate it using flatc
    during compile time
  * should move the test under src/kudu/benchmarks instead?

This changelist adds flatbuffers-25.2.10 into the Kudu's 3rd-party.
I'm planning to use FlatBuffers for encoding/decoding of array data
cells in the RowOperationsPB.indirect_data field in follow-up patches
in the scope array data type project.  In the future, we can use it
to encode arbitrary nested types along with their schema definition
as an extra piece of metadata for each serialized cell, but switching
to the Apache Arrow IPC format might be the best option in the long run.

Using FlatBuffers for encoding/decoding of cells of the nested types
looks like a good option because of multi-language support [1], good
performance [2], ability to re-use the memory of the buffer without
reallocation and copying, small footprint, and no extra dependencies.
And it's also inter-operable between versions and cross-platform [3].

Switching to the Apache Arrow IPC format for nested data type cells
is natural next step when switching to columnar on-the-wire format
for data exchanged between Kudu client and server sides for write
operations (Kudu has columnar on-the-wire format only for scanned data
with COLUMNAR_LAYOUT_FEATURE supported both by server and client sides).
At the time of writing, I decided to limit the set of options for
serialization/de-serialization of array data cells by ProtoBuf and
FlatBuffers (evaluation of Cap'n Proto [4] was still pending).  To
choose between them, I implemented a benchmark to assess the performance
of each in a particular use case.  The results of the benchmark show
that FlatBuffers is about 4.5x times faster than ProtoBuf
if looking at user CPU time:

Flatbuffers serialize  : ElemNum= 1024 Iterations=  500000
  real 2.854s   user 1.108s     sys 1.761s
Protobuf    serialize  : ElemNum= 1024 Iterations=  500000
  real 7.101s   user 5.124s     sys 2.029s

Flatbuffers deserialize: ElemNum= 1024 Iterations=  500000
  real 2.707s   user 0.979s     sys 1.710s
Protobuf    deserialize: ElemNum= 1024 Iterations=  500000
  real 6.241s   user 4.284s     sys 1.887s

[1] https://flatbuffers.dev/support/
[2] https://flatbuffers.dev/benchmarks
[3] https://flatbuffers.dev/white_paper/
[4] https://capnproto.org/

Change-Id: I89c697b8d80cbbd2af4233d16806a230cedaa81a
---
M CMakeLists.txt
A cmake_modules/FindFlatbuffers.cmake
M src/kudu/util/CMakeLists.txt
A src/kudu/util/serdes/arrays.fbs
A src/kudu/util/serdes/arrays.proto
A src/kudu/util/serdes/arrays_generated.h
A src/kudu/util/serdes/serdes-test.cc
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
11 files changed, 1,627 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/23056/1
--
To view, visit http://gerrit.cloudera.org:8080/23056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I89c697b8d80cbbd2af4233d16806a230cedaa81a
Gerrit-Change-Number: 23056
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <[email protected]>

Reply via email to