[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

2024-03-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21075 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)
..

KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

The first try is: http://gerrit.cloudera.org:8080/18569
The second try mainly fix the linkage error by using snappy
instead of lz4 when link rocksdb based on the first try.
The lz4 build and link issues of itself can be fixed in
next patches.

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long bootstrap times.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main difference with
LogBlockContainerNativeMeta is that LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than in a
native file. The main implementation of interfaces from
the base class including:
a. Create a container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as ".", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open a container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   loading containers during the bootstrap phase.
c. Destroy a container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "".
d. Load a container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [, ), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it uses RocksDB
  to store LBM metadata. The new block manager is enabled by setting
  --block_manager=logr.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it shows that
the time spent to re-open tablet server's metadata when 99.99% of all
the records removed reduced about 9.5 times when using
LogBlockContainerRdbMeta instead of LogBlockContainerNativeMeta.

Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688
Reviewed-on: http://gerrit.cloudera.org:8080/21075
Tested-by: Yingchun Lai 
Reviewed-by: Alexey Serbin 
---
M cmake_modules/FindSnappy.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
33 files changed, 1,819 insertions(+), 203 deletions(-)

Approvals:
  Yingchun Lai: Verified
  Alexey Serbin: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/21075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: 

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

2024-03-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21075 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688
Gerrit-Change-Number: 21075
Gerrit-PatchSet: 6
Gerrit-Owner: Yingchun Lai 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai 
Gerrit-Comment-Date: Wed, 27 Mar 2024 02:08:21 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

2024-03-20 Thread Yingchun Lai (Code Review)
Yingchun Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21075 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21075/6/thirdparty/build-definitions.sh
File thirdparty/build-definitions.sh:

http://gerrit.cloudera.org:8080/#/c/21075/6/thirdparty/build-definitions.sh@1213
PS6, Line 1213: DWITH_LZ4=OFF
> Does this mean we don't need LZ4 support for our usage pattern of RocksDB,
It's just some issue to build rocksdb to pickup LZ4.
And it seems the issue just occured on the 1.9.4 version of LZ4 (release on Aug 
16, 2022), when I try to build the lastest version from github.com/lz4/lz4, 
there is no problem, we can add LZ4 back when its next version released.
(I reproduced and fixed these issues on my M1 MacBookPro, didn't reproduce that 
on my CentOS docker environment.



--
To view, visit http://gerrit.cloudera.org:8080/21075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688
Gerrit-Change-Number: 21075
Gerrit-PatchSet: 6
Gerrit-Owner: Yingchun Lai 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai 
Gerrit-Comment-Date: Wed, 20 Mar 2024 15:53:44 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

2024-03-19 Thread Yingchun Lai (Code Review)
Yingchun Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21075 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)
..


Patch Set 6:

> Patch Set 6:
>
> Just a clarifying question, so the only thing changed compared to the 1st 
> revision of this patch is that lz4 is switched out for snappy? -> only 
> changes in thirdparty/build-definitions.sh ?
>
> Thanks for debugging the issue!

You can review the changes by https://gerrit.cloudera.org/c/21075/1..6, the 
changed files are:
Commit message
cmake_modules/FindSnappy.cmake
src/kudu/fs/CMakeLists.txt
src/kudu/server/CMakeLists.txt
src/kudu/util/CMakeLists.txt
thirdparty/build-definitions.sh


--
To view, visit http://gerrit.cloudera.org:8080/21075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688
Gerrit-Change-Number: 21075
Gerrit-PatchSet: 6
Gerrit-Owner: Yingchun Lai 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai 
Gerrit-Comment-Date: Wed, 20 Mar 2024 02:50:00 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

2024-03-19 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21075 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21075/6/thirdparty/build-definitions.sh
File thirdparty/build-definitions.sh:

http://gerrit.cloudera.org:8080/#/c/21075/6/thirdparty/build-definitions.sh@1213
PS6, Line 1213: DWITH_LZ4=OFF
Does this mean we don't need LZ4 support for our usage pattern of RocksDB, or 
it's just some issue to build rocksdb to pickup and use both LZ4 and snappy 
libraries from Kudu's 3rd party?



--
To view, visit http://gerrit.cloudera.org:8080/21075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688
Gerrit-Change-Number: 21075
Gerrit-PatchSet: 6
Gerrit-Owner: Yingchun Lai 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai 
Gerrit-Comment-Date: Wed, 20 Mar 2024 01:28:05 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

2024-03-19 Thread Marton Greber (Code Review)
Marton Greber has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21075 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)
..


Patch Set 6:

Just a clarifying question, so the only thing changed compared to the 1st 
revision of this patch is that lz4 is switched out for snappy? -> only changes 
in thirdparty/build-definitions.sh ?

Thanks for debugging the issue!


--
To view, visit http://gerrit.cloudera.org:8080/21075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688
Gerrit-Change-Number: 21075
Gerrit-PatchSet: 6
Gerrit-Owner: Yingchun Lai 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai 
Gerrit-Comment-Date: Tue, 19 Mar 2024 13:06:30 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

2024-03-15 Thread Yingchun Lai (Code Review)
Yingchun Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21075 )

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)
..


Patch Set 6: Verified+1

The failed test is not related: TxnOpDispatcherITest.TxnWriteWhileReplicaDeleted


--
To view, visit http://gerrit.cloudera.org:8080/21075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688
Gerrit-Change-Number: 21075
Gerrit-PatchSet: 6
Gerrit-Owner: Yingchun Lai 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai 
Gerrit-Comment-Date: Fri, 15 Mar 2024 09:31:26 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

2024-03-15 Thread Yingchun Lai (Code Review)
Yingchun Lai has removed a vote on this change.

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/21075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688
Gerrit-Change-Number: 21075
Gerrit-PatchSet: 6
Gerrit-Owner: Yingchun Lai 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai 


[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

2024-03-15 Thread Yingchun Lai (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21075

to look at the new patch set (#6).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)
..

KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

The first try is: http://gerrit.cloudera.org:8080/18569
The second try mainly fix the linkage error by using snappy
instead of lz4 when link rocksdb based on the first try.
The lz4 build and link issues of itself can be fixed in
next patches.

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long bootstrap times.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main difference with
LogBlockContainerNativeMeta is that LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than in a
native file. The main implementation of interfaces from
the base class including:
a. Create a container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as ".", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open a container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   loading containers during the bootstrap phase.
c. Destroy a container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "".
d. Load a container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [, ), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it uses RocksDB
  to store LBM metadata. The new block manager is enabled by setting
  --block_manager=logr.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it shows that
the time spent to re-open tablet server's metadata when 99.99% of all
the records removed reduced about 9.5 times when using
LogBlockContainerRdbMeta instead of LogBlockContainerNativeMeta.

Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688
---
M cmake_modules/FindSnappy.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
33 files changed, 1,819 insertions(+), 203 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/21075/6
--
To view, visit http://gerrit.cloudera.org:8080/21075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

2024-03-11 Thread Yingchun Lai (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21075

to look at the new patch set (#5).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)
..

KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

The first try is: http://gerrit.cloudera.org:8080/18569
The second try mainly fix the linkage error by using snappy
instead of lz4 when link rocksdb based on the first try.
The lz4 build and link issues of itself can be fixed in
next patches.

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long bootstrap times.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main difference with
LogBlockContainerNativeMeta is that LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than in a
native file. The main implementation of interfaces from
the base class including:
a. Create a container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as ".", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open a container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   loading containers during the bootstrap phase.
c. Destroy a container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "".
d. Load a container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [, ), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it uses RocksDB
  to store LBM metadata. The new block manager is enabled by setting
  --block_manager=logr.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it shows that
the time spent to re-open tablet server's metadata when 99.99% of all
the records removed reduced about 9.5 times when using
LogBlockContainerRdbMeta instead of LogBlockContainerNativeMeta.

Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688
---
M cmake_modules/FindSnappy.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
33 files changed, 1,819 insertions(+), 203 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/21075/5
--
To view, visit http://gerrit.cloudera.org:8080/21075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

2024-03-11 Thread Yingchun Lai (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21075

to look at the new patch set (#4).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)
..

KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

The first try is: http://gerrit.cloudera.org:8080/18569
The second try mainly fix the linkage error by using snappy
instead of lz4 when link rocksdb based on the first try.
The lz4 build and link issues of itself can be fixed in
next patches.

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long bootstrap times.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main difference with
LogBlockContainerNativeMeta is that LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than in a
native file. The main implementation of interfaces from
the base class including:
a. Create a container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as ".", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open a container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   loading containers during the bootstrap phase.
c. Destroy a container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "".
d. Load a container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [, ), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it uses RocksDB
  to store LBM metadata. The new block manager is enabled by setting
  --block_manager=logr.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it shows that
the time spent to re-open tablet server's metadata when 99.99% of all
the records removed reduced about 9.5 times when using
LogBlockContainerRdbMeta instead of LogBlockContainerNativeMeta.

Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688
---
M cmake_modules/FindSnappy.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
33 files changed, 1,819 insertions(+), 203 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/21075/4
--
To view, visit http://gerrit.cloudera.org:8080/21075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

2024-03-10 Thread Yingchun Lai (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21075

to look at the new patch set (#3).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)
..

KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

The first try is: http://gerrit.cloudera.org:8080/18569
The second try mainly fix the linkage error by using snappy
instead of lz4 when link rocksdb based on the first try.
The lz4 build and link issues of itself can be fixed in
next patches.

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long bootstrap times.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main difference with
LogBlockContainerNativeMeta is that LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than in a
native file. The main implementation of interfaces from
the base class including:
a. Create a container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as ".", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open a container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   loading containers during the bootstrap phase.
c. Destroy a container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "".
d. Load a container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [, ), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it uses RocksDB
  to store LBM metadata. The new block manager is enabled by setting
  --block_manager=logr.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it shows that
the time spent to re-open tablet server's metadata when 99.99% of all
the records removed reduced about 9.5 times when using
LogBlockContainerRdbMeta instead of LogBlockContainerNativeMeta.

Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688
---
M cmake_modules/FindSnappy.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
33 files changed, 1,819 insertions(+), 203 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/21075/3
--
To view, visit http://gerrit.cloudera.org:8080/21075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688

[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

2024-03-10 Thread Yingchun Lai (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21075

to look at the new patch set (#2).

Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)
..

KUDU-3371 [fs] Use RocksDB to store LBM metadata (2nd try)

The first try is: http://gerrit.cloudera.org:8080/18569
The second try mainly fix the linkage error by using snappy
instead of lz4 when link rocksdb based on the first try.
The lz4 build and link issues of itself can be fixed in
next patches.

Since the LogBlockContainerNativeMeta stores block records
sequentially in the metadata file, the live blocks maybe
in a very low ratio, so it may cause serious disk space
amplification and long bootstrap times.

This patch introduces a new class LogBlockContainerRdbMeta
which uses RocksDB to store LBM metadata, a new item will
be Put() into RocksDB when a new block is created in LBM,
and the item will be Delete() from RocksDB when the block
is removed from LBM. Data in RocksDB can be maintained by
RocksDB itself, i.e. deleted items will be GCed so it's not
needed to rewrite the metadata as how we do in
LogBlockContainerNativeMeta.

The implementation also reuses most logic of the base class
LogBlockContainer, the main difference with
LogBlockContainerNativeMeta is that LogBlockContainerRdbMeta
stores block records metadata in RocksDB rather than in a
native file. The main implementation of interfaces from
the base class including:
a. Create a container
   Data file is created similar to LogBlockContainerNativeMeta,
   but the metadata part is stored in RocksDB with keys
   constructed as ".", and values are
   the same to the records stored in metadata file of
   LogBlockContainerNativeMeta.
b. Open a container
   Similar to LogBlockContainerNativeMeta, and it's not needed
   to check the metadata part, because it has been checked when
   loading containers during the bootstrap phase.
c. Destroy a container
   If the container is dead (full and no live blocks), remove
   the data file, and clean up metadata part, by deleting all
   the keys prefixed by "".
d. Load a container (by ProcessRecords())
   Iterate the RocksDB in the key range
   [, ), because dead blocks
   have been deleted directly, thus only live block records
   will be populated, we can use them as LogBlockContainerNativeMeta.
e. Create blocks in a container
   Put() serialized BlockRecordPB records into RocksDB, keys
   are constructed the same to the above.
f. Remove blocks from a container
   Construct the keys same to the above, Delete() them from RocksDB
   in batch.

This patch contains the following changes:
- Adds a new block manager type named 'logr', it uses RocksDB
  to store LBM metadata. The new block manager is enabled by setting
  --block_manager=logr.
- Related tests add new parameterized value to test the case
  of "--block_manager=logr".

It's optional to use RocksDB, we can use the former LBM as
before, we will introduce more tools to convert data between
the two implementations in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it shows that
the time spent to re-open tablet server's metadata when 99.99% of all
the records removed reduced about 9.5 times when using
LogBlockContainerRdbMeta instead of LogBlockContainerNativeMeta.

Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688
---
M cmake_modules/FindSnappy.cmake
M src/kudu/benchmarks/CMakeLists.txt
M src/kudu/client/CMakeLists.txt
M src/kudu/consensus/CMakeLists.txt
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_manager.h
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/fs_report.cc
M src/kudu/fs/fs_report.h
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test-util.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/server/CMakeLists.txt
M src/kudu/tablet/compaction-test.cc
M src/kudu/tools/CMakeLists.txt
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/CMakeLists.txt
M thirdparty/build-definitions.sh
33 files changed, 1,818 insertions(+), 203 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/21075/2
--
To view, visit http://gerrit.cloudera.org:8080/21075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I23b7d2a16802af01a382a1d74cd9869baf364688
Gerrit-Change-Number: