Hello Ashwani Raina, Kudu Jenkins, Abhishek Chennaka,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/23056

to look at the new patch set (#10).

Change subject: KUDU-1261 introduce Flatbuffers into thirdparty
......................................................................

KUDU-1261 introduce Flatbuffers into thirdparty

This changelist adds flatbuffers-25.2.10 into the Kudu's 3rd-party.
I'm planning to use Flatbuffers for serializing and de-serializing
of array cells' data in the RowOperationsPB.indirect_data field in
follow-up patches.  In the future, we can use it for serdes-ing of
arbitrary nested types, but switching to the Arrow IPC format and
importing the corresponding code from the Arrow project seems to be
the best option in the long run.  Using the Arrow IPC format for
serdes-ing data of nested type cells' looks like a natural next step
once switching to a columnar on-the-wire format for data exchanged
between Kudu clients and servers for write operations.  At the time of
writing, Kudu has columnar on-the-wire format only for scanned data
when COLUMNAR_LAYOUT_FEATURE supported both by server and client sides.

Using Flatbuffers for serdes-ing nested type cells' data looks like
a good option because of multi-language support [1], performance [2],
ability to re-use the buffer memory without reallocation and copying,
no temporary serdes objects, and a small run-time footprint.  It's also
inter-operable between versions and platforms [3], and licensed under
Apache 2.0 license [4].

After quick research, the choice was between Protobuf,
Flatbuffers, and Cap'n Proto [5].  I found few reports of Cap'n Proto
serdes performance being very close to Flatbuffer's, and I didn't need
Cap'n Proto's RPC and other very cool features, so the choice eventually
became simple: Flatbuffers vs Protobuf.  To choose between them,
I implemented a small benchmark to assess the performance of each
in a serdes use case for particular schema (arrays.fbs, arrays.proto).
The results of the benchmark show that Flatbuffers' serdes is about
7x-8x times faster than Protobuf if looking at user CPU times,
and that's with buffer contents verification enabled:

  RELEASE build, Ubuntu 24.04, x86_64, GCC/G++ 13

    Flatbuffers serialize  : ElemNum= 1024 Iterations=  100000
      real 0.164s      user 0.101s     sys 0.053s
    Protobuf    serialize  : ElemNum= 1024 Iterations=  100000
      real 0.883s      user 0.785s     sys 0.059s

    Flatbuffers deserialize: ElemNum= 1024 Iterations=  100000
      real 0.136s      user 0.092s     sys 0.056s
    Protobuf    deserialize: ElemNum= 1024 Iterations=  100000
      real 0.825s      user 0.707s     sys 0.059s

    Flatbuffers serialize  : ElemNum= 1024 Iterations=  500000
      real 0.798s      user 0.544s     sys 0.246s
    Protobuf    serialize  : ElemNum= 1024 Iterations=  500000
      real 4.437s      user 4.190s     sys 0.272s

    Flatbuffers deserialize: ElemNum= 1024 Iterations=  500000
      real 0.675s      user 0.469s     sys 0.260s
    Protobuf    deserialize: ElemNum= 1024 Iterations=  500000
      real 4.119s      user 3.827s     sys 0.262s

[1] https://flatbuffers.dev/support/
[2] https://flatbuffers.dev/benchmarks
[3] https://flatbuffers.dev/white_paper/
[4] https://github.com/google/flatbuffers/blob/master/LICENSE
[5] https://capnproto.org/

Change-Id: I89c697b8d80cbbd2af4233d16806a230cedaa81a
---
M CMakeLists.txt
A cmake_modules/FindFlatbuffers.cmake
M java/kudu-proto/build.gradle
M src/kudu/benchmarks/CMakeLists.txt
A src/kudu/benchmarks/serdes/arrays.fbs
A src/kudu/benchmarks/serdes/arrays.proto
A src/kudu/benchmarks/serdes/serdes-test.cc
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
A thirdparty/patches/flatbuffers-length-to-size-uint8-ptr.patch
M thirdparty/vars.sh
12 files changed, 1,072 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/23056/10
--
To view, visit http://gerrit.cloudera.org:8080/23056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89c697b8d80cbbd2af4233d16806a230cedaa81a
Gerrit-Change-Number: 23056
Gerrit-PatchSet: 10
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)

Reply via email to