Hello Kudu Jenkins, Abhishek Chennaka,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/23056
to look at the new patch set (#2).
Change subject: WIP [thirdparty] introduce Flatbuffers
......................................................................
WIP [thirdparty] introduce Flatbuffers
WIP:
* collect initial feedback
* how hard is it to bring in the Apache Arrow IPC format right
now just for serializing array data type cells?
This changelist adds flatbuffers-25.2.10 into the Kudu's 3rd-party.
I'm planning to use Flatbuffers for serializing and de-serializing
of array cells' data in the RowOperationsPB.indirect_data field in
follow-up patches. In the future, we can use it for serdes-ing of
arbitrary nested types, but switching to the Arrow IPC format and
importing the corresponding code from the Arrow project seems to be
the best option in the long run.
Using Flatbuffers for serdes-ing nested type cells' data looks like
a good option because of multi-language support [1], performance [2],
ability to re-use the buffer memory without reallocation and copying,
no temporary serdes objects, and a small run-time footprint. It's also
inter-operable between versions and platforms [3], and licensed under
Apache 2.0 license [4].
Using the Arrow IPC format for serdes-ing data of nested type cells' is
a natural next step after switching to a columnar on-the-wire format for
data exchanged between Kudu clients and servers for write operations.
At the time of writing, Kudu has columnar on-the-wire format only for
scanned data when COLUMNAR_LAYOUT_FEATURE supported both by server and
client sides. After quick research, the choice was between Protobuf,
Flatbuffers, and Cap'n Proto [5]. I found few reports of Cap'n Proto
serdes performance being very close to Flatbuffer's, and I didn't need
Cap'n Proto's RPC and other very cool features, so the choice eventually
became simple: Flatbuffers vs Protobuf. To choose between them,
I implemented a small benchmark to assess the performance of each
in a serdes use case for particular schema (arrays.fbs, arrays.proto).
The results of the benchmark show that Flatbuffers' serdes is about
7x-8x times faster than Protobuf if looking at user CPU times,
and that's with buffer verifier enabled:
RELEASE build, Ubuntu 24.04, x86_64, GCC/G++ 13
Flatbuffers serialize : ElemNum= 1024 Iterations= 100000
real 0.164s user 0.101s sys 0.053s
Protobuf serialize : ElemNum= 1024 Iterations= 100000
real 0.883s user 0.785s sys 0.059s
Flatbuffers deserialize: ElemNum= 1024 Iterations= 100000
real 0.136s user 0.092s sys 0.056s
Protobuf deserialize: ElemNum= 1024 Iterations= 100000
real 0.825s user 0.707s sys 0.059s
Flatbuffers serialize : ElemNum= 1024 Iterations= 500000
real 0.798s user 0.544s sys 0.246s
Protobuf serialize : ElemNum= 1024 Iterations= 500000
real 4.437s user 4.190s sys 0.272s
Flatbuffers deserialize: ElemNum= 1024 Iterations= 500000
real 0.675s user 0.469s sys 0.260s
Protobuf deserialize: ElemNum= 1024 Iterations= 500000
real 4.119s user 3.827s sys 0.262s
[1] https://flatbuffers.dev/support/
[2] https://flatbuffers.dev/benchmarks
[3] https://flatbuffers.dev/white_paper/
[4] https://github.com/google/flatbuffers/blob/master/LICENSE
[5] https://capnproto.org/
Change-Id: I89c697b8d80cbbd2af4233d16806a230cedaa81a
---
M CMakeLists.txt
A cmake_modules/FindFlatbuffers.cmake
M src/kudu/benchmarks/CMakeLists.txt
A src/kudu/benchmarks/serdes/arrays.fbs
A src/kudu/benchmarks/serdes/arrays.proto
A src/kudu/benchmarks/serdes/serdes-test.cc
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
10 files changed, 638 insertions(+), 1 deletion(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/23056/2
--
To view, visit http://gerrit.cloudera.org:8080/23056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89c697b8d80cbbd2af4233d16806a230cedaa81a
Gerrit-Change-Number: 23056
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)