Hello Ashwani Raina, Kudu Jenkins, Abhishek Chennaka,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/23056
to look at the new patch set (#8).
Change subject: KUDU-1261 introduce Flatbuffers into thirdparty
......................................................................
KUDU-1261 introduce Flatbuffers into thirdparty
This changelist adds flatbuffers-25.2.10 into the Kudu's 3rd-party.
I'm planning to use Flatbuffers for serializing and de-serializing
of array cells' data in the RowOperationsPB.indirect_data field in
follow-up patches. In the future, we can use it for serdes-ing of
arbitrary nested types, but switching to the Arrow IPC format and
importing the corresponding code from the Arrow project seems to be
the best option in the long run. Using the Arrow IPC format for
serdes-ing data of nested type cells' looks like a natural next step
once switching to a columnar on-the-wire format for data exchanged
between Kudu clients and servers for write operations. At the time of
writing, Kudu has columnar on-the-wire format only for scanned data
when COLUMNAR_LAYOUT_FEATURE supported both by server and client sides.
Using Flatbuffers for serdes-ing nested type cells' data looks like
a good option because of multi-language support [1], performance [2],
ability to re-use the buffer memory without reallocation and copying,
no temporary serdes objects, and a small run-time footprint. It's also
inter-operable between versions and platforms [3], and licensed under
Apache 2.0 license [4].
After quick research, the choice was between Protobuf,
Flatbuffers, and Cap'n Proto [5]. I found few reports of Cap'n Proto
serdes performance being very close to Flatbuffer's, and I didn't need
Cap'n Proto's RPC and other very cool features, so the choice eventually
became simple: Flatbuffers vs Protobuf. To choose between them,
I implemented a small benchmark to assess the performance of each
in a serdes use case for particular schema (arrays.fbs, arrays.proto).
The results of the benchmark show that Flatbuffers' serdes is about
7x-8x times faster than Protobuf if looking at user CPU times,
and that's with buffer contents verification enabled:
RELEASE build, Ubuntu 24.04, x86_64, GCC/G++ 13
Flatbuffers serialize : ElemNum= 1024 Iterations= 100000
real 0.164s user 0.101s sys 0.053s
Protobuf serialize : ElemNum= 1024 Iterations= 100000
real 0.883s user 0.785s sys 0.059s
Flatbuffers deserialize: ElemNum= 1024 Iterations= 100000
real 0.136s user 0.092s sys 0.056s
Protobuf deserialize: ElemNum= 1024 Iterations= 100000
real 0.825s user 0.707s sys 0.059s
Flatbuffers serialize : ElemNum= 1024 Iterations= 500000
real 0.798s user 0.544s sys 0.246s
Protobuf serialize : ElemNum= 1024 Iterations= 500000
real 4.437s user 4.190s sys 0.272s
Flatbuffers deserialize: ElemNum= 1024 Iterations= 500000
real 0.675s user 0.469s sys 0.260s
Protobuf deserialize: ElemNum= 1024 Iterations= 500000
real 4.119s user 3.827s sys 0.262s
[1] https://flatbuffers.dev/support/
[2] https://flatbuffers.dev/benchmarks
[3] https://flatbuffers.dev/white_paper/
[4] https://github.com/google/flatbuffers/blob/master/LICENSE
[5] https://capnproto.org/
Change-Id: I89c697b8d80cbbd2af4233d16806a230cedaa81a
---
M CMakeLists.txt
A cmake_modules/FindFlatbuffers.cmake
M src/kudu/benchmarks/CMakeLists.txt
A src/kudu/benchmarks/serdes/arrays.fbs
A src/kudu/benchmarks/serdes/arrays.proto
A src/kudu/benchmarks/serdes/serdes-test.cc
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
A thirdparty/patches/flatbuffers-length-to-size-uint8-ptr.patch
M thirdparty/vars.sh
11 files changed, 1,039 insertions(+), 1 deletion(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/23056/8
--
To view, visit http://gerrit.cloudera.org:8080/23056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89c697b8d80cbbd2af4233d16806a230cedaa81a
Gerrit-Change-Number: 23056
Gerrit-PatchSet: 8
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)