[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-06-05 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has submitted this change and it was merged.

Change subject: KUDU-1952 Remove round-robin for block placement
..


KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it ("Power of Two Random Choices"). This avoids pigeonholing new tablets
to disks with relatively few tablets, while still trending towards
filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate the effects of single-disk failure. Given this, and given the
tradeoff between I/O and disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

The implementation of block placement is compared against a python
implementation found here:
https://gist.github.com/andrwng/7c24e8e26aec68c50741f92eb6f2e48d

Sweeping over a few parameters shows nearly identical stddev values of
the distribution of tablets across directories between implementations.

https://github.com/andrwng/kudu/blob/po2c/docs/images/10_20_3_10k_cpp.png
https://github.com/andrwng/kudu/blob/po2c/docs/images/10_20_3_10k_py.png

https://github.com/andrwng/kudu/blob/po2c/docs/images/30_10_5_5k_cpp.png
https://github.com/andrwng/kudu/blob/po2c/docs/images/30_10_5_5k_py.png

https://github.com/andrwng/kudu/blob/po2c/docs/images/30_200_5_5k_cpp.png
https://github.com/andrwng/kudu/blob/po2c/docs/images/30_200_5_5k_py.png

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Reviewed-on: http://gerrit.cloudera.org:8080/6636
Reviewed-by: David Ribeiro Alves 
Tested-by: Kudu Jenkins
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
36 files changed, 1,050 insertions(+), 199 deletions(-)

Approvals:
  David Ribeiro Alves: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 48
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 

[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-06-05 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 47: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 47
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-06-05 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 47:

(3 comments)

@dralves: this patch doesn't have that sort of test. Will add in a follow-up

http://gerrit.cloudera.org:8080/#/c/6636/40/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

PS40, Line 239: bool GetDataDirGroupPB(const std::string& tablet_id, 
DataDirGroupPB* pb) const;
> why not return Status::NotFound()?
In a couple cases, this is used to check whether a tablet is registered (in 
which case there's no need to create a new group), so it's not necessarily an 
error for false to be returned.

That said, I removed the PREDICT_ from the return.


http://gerrit.cloudera.org:8080/#/c/6636/46/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

PS46, Line 388: 1.5.0
> you say 1.4.0 somewhere else
Done. Changed the other to 1.5.0


http://gerrit.cloudera.org:8080/#/c/6636/46/src/kudu/util/pb_util.proto
File src/kudu/util/pb_util.proto:

PS46, Line 45: }
> nit: what changed here?
Untracked this file, seems like git was not adding a proper newline at the end 
of the file.


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 47
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-06-05 Thread Andrew Wong (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#47).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it ("Power of Two Random Choices"). This avoids pigeonholing new tablets
to disks with relatively few tablets, while still trending towards
filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate the effects of single-disk failure. Given this, and given the
tradeoff between I/O and disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

The implementation of block placement is compared against a python
implementation found here:
https://gist.github.com/andrwng/7c24e8e26aec68c50741f92eb6f2e48d

Sweeping over a few parameters shows nearly identical stddev values of
the distribution of tablets across directories between implementations.

https://github.com/andrwng/kudu/blob/po2c/docs/images/10_20_3_10k_cpp.png
https://github.com/andrwng/kudu/blob/po2c/docs/images/10_20_3_10k_py.png

https://github.com/andrwng/kudu/blob/po2c/docs/images/30_10_5_5k_cpp.png
https://github.com/andrwng/kudu/blob/po2c/docs/images/30_10_5_5k_py.png

https://github.com/andrwng/kudu/blob/po2c/docs/images/30_200_5_5k_cpp.png
https://github.com/andrwng/kudu/blob/po2c/docs/images/30_200_5_5k_py.png

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
36 files changed, 1,050 insertions(+), 199 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/47
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 47
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 

[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-06-05 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 46:

last q: do you have a test that makes sure a ts boots from an old superblock 
(with no dd groups) and starts using it from then on? (please point me to it if 
I missed it, I just reviewed changes in this last pass).
I'm ok with adding that test post merging this patch.

-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 46
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-06-05 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 40:

(3 comments)

looks, just nits.

http://gerrit.cloudera.org:8080/#/c/6636/40/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

PS40, Line 239: bool GetDataDirGroupPB(const std::string& tablet_id, 
DataDirGroupPB* pb) const;
why not return Status::NotFound()?


http://gerrit.cloudera.org:8080/#/c/6636/46/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

PS46, Line 388: oup c
you say 1.4.0 somewhere else


http://gerrit.cloudera.org:8080/#/c/6636/46/src/kudu/util/pb_util.proto
File src/kudu/util/pb_util.proto:

PS46, Line 45: }
nit: what changed here?


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 40
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-06-02 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 45: Code-Review+2 Verified+1

Known flake, let's override Jenkins.

-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 45
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-06-02 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 45:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6636/44/src/kudu/tablet/tablet_metadata.h
File src/kudu/tablet/tablet_metadata.h:

PS44, Line 181:   // 'last_logged_opid' should be set to the last opid in the 
log, if any is known.
  :   // If 'last_logged
> this isn't accurate anymore, is it?
Done


http://gerrit.cloudera.org:8080/#/c/6636/44/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

Line 247: 
CHECK_OK(fs_manager_->dd_manager()->CreateDataDirGroup(tablet_id_));
> should we be able to CHECK_OK this?
Right DeleteTabletData should ensure this in terms of whether the tablet 
already exists. Could also fail if there's a disk error or if all dirs are 
full. This will probably change with disk failure-handling support.


Line 265:   CHECK(fs_manager_->dd_manager()->GetDataDirGroupPB(tablet_id_,
> same, we expect this to always succeed so could CHECK right?
Same Done


http://gerrit.cloudera.org:8080/#/c/6636/44/src/kudu/tserver/ts_tablet_manager.h
File src/kudu/tserver/ts_tablet_manager.h:

PS44, Line 186:  tablet::TabletDataState 
delete_type,
  :  const 
boost::optional& last_logged_opid);
  :  
> not true anymore?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 45
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-06-02 Thread Andrew Wong (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#45).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it ("Power of Two Random Choices"). This avoids pigeonholing new tablets
to disks with relatively few tablets, while still trending towards
filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate the effects of single-disk failure. Given this, and given the
tradeoff between I/O and disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

The implementation of block placement is compared against a python
implementation found here:
https://gist.github.com/andrwng/7c24e8e26aec68c50741f92eb6f2e48d

Sweeping over a few parameters shows nearly identical stddev values of
the distribution of tablets across directories between implementations.

https://github.com/andrwng/kudu/blob/po2c/docs/images/10_20_3_10k_cpp.png
https://github.com/andrwng/kudu/blob/po2c/docs/images/10_20_3_10k_py.png

https://github.com/andrwng/kudu/blob/po2c/docs/images/30_10_5_5k_cpp.png
https://github.com/andrwng/kudu/blob/po2c/docs/images/30_10_5_5k_py.png

https://github.com/andrwng/kudu/blob/po2c/docs/images/30_200_5_5k_cpp.png
https://github.com/andrwng/kudu/blob/po2c/docs/images/30_200_5_5k_py.png

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/util/pb_util.proto
37 files changed, 1,051 insertions(+), 200 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/45
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 45
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 

[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-06-01 Thread Andrew Wong (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#44).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it ("Power of Two Random Choices"). This avoids pigeonholing new tablets
to disks with relatively few tablets, while still trending towards
filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate the effects of single-disk failure. Given this, and given the
tradeoff between I/O and disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

The implementation of block placement is compared against a python
implementation found here:
https://gist.github.com/andrwng/7c24e8e26aec68c50741f92eb6f2e48d

Sweeping over a few parameters shows nearly identical stddev values of
the distribution of tablets across directories between implementations.

https://github.com/andrwng/kudu/blob/po2c/docs/images/10_20_3_10k_cpp.png
https://github.com/andrwng/kudu/blob/po2c/docs/images/10_20_3_10k_py.png

https://github.com/andrwng/kudu/blob/po2c/docs/images/30_10_5_5k_cpp.png
https://github.com/andrwng/kudu/blob/po2c/docs/images/30_10_5_5k_py.png

https://github.com/andrwng/kudu/blob/po2c/docs/images/30_200_5_5k_cpp.png
https://github.com/andrwng/kudu/blob/po2c/docs/images/30_200_5_5k_py.png

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/pb_util.proto
39 files changed, 1,057 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/44
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 44
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 

[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-06-01 Thread Andrew Wong (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#43).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it ("Power of Two Random Choices"). This avoids pigeonholing new tablets
to disks with relatively few tablets, while still trending towards
filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate the effects of single-disk failure. Given this, and given the
tradeoff between I/O and disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

The implementation of block placement is compared against a python
implementation found here:
https://gist.github.com/andrwng/7c24e8e26aec68c50741f92eb6f2e48d

Sweeping over a few parameters shows nearly identical stddev values of
the distribution of tablets across directories between implementations.

https://github.com/andrwng/kudu/blob/po2c/docs/images/10_20_3_10k_cpp.png
https://github.com/andrwng/kudu/blob/po2c/docs/images/10_20_3_10k_py.png

https://github.com/andrwng/kudu/blob/po2c/docs/images/30_10_5_5k_cpp.png
https://github.com/andrwng/kudu/blob/po2c/docs/images/30_10_5_5k_py.png

https://github.com/andrwng/kudu/blob/po2c/docs/images/30_200_5_5k_cpp.png
https://github.com/andrwng/kudu/blob/po2c/docs/images/30_200_5_5k_py.png

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/pb_util.proto
39 files changed, 1,057 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/43
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 43
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 

[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-30 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 41: Code-Review-1

Adding results to verify implementation of po2c.

-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 41
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-25 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 40:

build failure is from a TSAN flake

-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 40
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-25 Thread Andrew Wong (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#40).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate the effects of single-disk failure. Given this, and given the
tradeoff between I/O and disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/pb_util.proto
39 files changed, 1,035 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/40
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 40
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-24 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 35:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

PS35, Line 137: Status
> Templating and the closure made this a bit tricky. For now I'm leaving it b
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 35
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-24 Thread Andrew Wong (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#38).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate the effects of single-disk failure. Given this, and given the
tradeoff between I/O and disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/pb_util.proto
39 files changed, 1,034 insertions(+), 200 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/38
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 38
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-24 Thread Andrew Wong (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#37).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate the effects of single-disk failure. Given this, and given the
tradeoff between I/O and disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/pb_util.proto
39 files changed, 1,035 insertions(+), 200 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/37
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 37
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-24 Thread Andrew Wong (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#36).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate the effects of single-disk failure. Given this, and given the
tradeoff between I/O and disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/pb_util.proto
39 files changed, 1,040 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/36
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 36
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 35:

(28 comments)

http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/cfile/bloomfile-test-base.h
File src/kudu/cfile/bloomfile-test-base.h:

PS35, Line 74: fs::CreateBlockOptions()
nit: would just '{}' be enough?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

PS35, Line 218: CreateBlockOptions()
nit here and below: would '{}' fit in here?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/block_manager-stress-test.cc
File src/kudu/fs/block_manager-stress-test.cc:

PS35, Line 260: this->
is 'this' necessary here?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

PS35, Line 137: Status
nit: consider adding 'static' specifier -- as I see this method does not depend 
on the state of the object.


PS35, Line 155: _count
nit: why not to pass 'num_files' itself as an argument?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs-test.cc
File src/kudu/fs/data_dirs-test.cc:

Line 17: 
nit: please include at least  and  since they are used in the 
code.


Line 18: #include "kudu/fs/fs.pb.h"
nit: please include  and 


PS35, Line 55: CHECK_OK
nit: why not just ASSERT_OK() ?


PS35, Line 56: CHECK_OK
ditto


PS35, Line 69: string test_tablet_name_;
nit: I don't see it's changing during the tests, so consider adding 'const'.


PS35, Line 70: CreateBlockOptions test_block_opts_
const?


PS35, Line 72: DataDirGroupPB pb_;
nit: is it really needed as a member?  Would local variable be enough where 
needed?


PS35, Line 80: ASSERT_TRUE(s.IsNotFound());
Here and everywhere in the tests consider adding s.ToString() in case if it 
fails, that helps a lot other people during troubleshooting if this ever fails:

ASSERT_TRUE(s.IsNotFound()) << s.ToString();


PS35, Line 96: for (int i = 0; i < pb.uuids().size(); i++) {
Would it make sense to compare pb.uuids.size() and pb_.uuids.size() prior to 
comparing elements in this for() loop?


PS35, Line 102: dd
Does it make sense to check for invariants on 'dd' after this call?  What are 
the expected side-effects after successful call of GetNextDataDir() for 'dd'?


PS35, Line 120:   DataDir* dd;
  :   ASSERT_OK(dd_manager_->GetNextDataDir(test_block_opts_, ));
  :   dd_manager_->DeleteDataDirGroup(test_tablet_name_);
  :   Status s = dd_manager_->GetNextDataDir(test_block_opts_, );
Is there anything specific for 'dd' in the course of there calls?  If yes, 
consider adding corresponding assertions.


PS35, Line 129: FLAGS_fs_data_dirs_full_disk_cache_seconds = 0;
If changing this flag on-the-fly, does it make sense to add the 'runtime' tag 
for the flag?


PS35, Line 235: FindOrDie
nit: since this is a test, it's possible to use something like

... x =  FindOrNull(...);
ASSERT_NE(nullptr, x);


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

PS35, Line 449: FLAGS_fs_target_data_dirs_per_tablet
A paranoid nit: what if FLAGS_fs_target_data_dirs_per_tablet is set to 2^31 ?

Consider either adding a validator for the flag or using unsigned int for 
group_target_size or enforcing consistency by some other means.


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

PS35, Line 87: uuid_indices
nit: consider adding 'const' specified for this method.


PS35, Line 225: void
nit: would the caller ever be interested to know if any group were effectively 
deleted or not?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/diskrowset.cc
File src/kudu/tablet/diskrowset.cc:

PS35, Line 93: string tablet_id
const string& tablet_id ?


Line 112:   string tablet_id = rowset_metadata_->tablet_metadata()->tablet_id();
ditto


PS35, Line 127: string tablet_id
ditto


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/multi_column_writer.cc
File src/kudu/tablet/multi_column_writer.cc:

Line 52:   CreateBlockOptions block_opts({ tablet_id_ });
nit: const?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/multi_column_writer.h
File src/kudu/tablet/multi_column_writer.h:

PS35, Line 20: #include 
nit: move this after the std headers.


PS35, Line 90: std::string tablet_id_;
nit: const?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

PS35, Line 629: GetDataDirGroupPBForTablet
Is it worth to check for the return value of GetDataDirGroupPBForTablet()?


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 35
Gerrit-Project: kudu
Gerrit-Branch: master

[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 35:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

PS35, Line 249: nuber
nit: number


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs-test.cc
File src/kudu/fs/data_dirs-test.cc:

PS35, Line 161: // Looping this 1000 times with an assertion of 2.0 resulted in 
a couple failures.
  :   // Asserting 3.0 should give a safe non-flaky buffer.
  :   ASSERT_LE(stddev, 3.0);
> think we're setting ourselves up for flakyness here, unfortunately it's har
hm, yea, if the distribution were normal, then a z-score of 3 would be only 
99.7%. But I don't think it produces a normal distribution (not sure what kind 
of distribution it produces).

That said, I'm OK if we've looped it thousands of times and found this works.


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

PS35, Line 435: exists
here and below, I sort of feel like "exists" is not the right word, but rather 
"registered" or somesuch?


PS35, Line 478:   // Lock shared to not block other reads to 
group_by_tablet_map_.
this is kinda redundant with the code


PS35, Line 504: RETURN_NOT_OK(candidate->RefreshIsFull(
  : DataDir::RefreshMode::EXPIRED_ONLY));
nit: no need to wrap


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 35
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-23 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 35:

k, think I'm done for this round

-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 35
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-23 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 35:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

PS35, Line 477: Status DataDirManager::GetNextDataDir(const CreateBlockOptions& 
opts, DataDir** dir) {
  :   // Lock shared to not block other reads to 
group_by_tablet_map_.
  :   shared_lock lock(dir_group_lock_.get_lock());
  :   const vector* group_uuid_indices;
  :   vector all_uuid_indices;
  :   if (PREDICT_FALSE(opts.tablet_id.empty())) {
  : // This should only be reached by some tests; in cases 
where there is no
  : // natural tablet_id, select a data dir randomly.
  : DCHECK(IsGTest());
  : AppendKeysFromMap(data_dir_by_uuid_idx_, _uuid_indices);
  : group_uuid_indices = _uuid_indices;
  :   } else {
  : // Select the data dir group for the tablet.
  : DataDirGroup* group = FindOrNull(group_by_tablet_map_, 
opts.tablet_id);
  : if (group == nullptr) {
  :   return Status::NotFound("DataDirGroup not found for 
tablet", opts.tablet_id);
  : }
  : group_uuid_indices = >uuid_indices();
  :   }
  :   vector random_indices(group_uuid_indices->size());
  :   iota(random_indices.begin(), random_indices.end(), 0);
  :   shuffle(random_indices.begin(), random_indices.end(), 
default_random_engine(rng_.Next()));
  : 
  :   // Randomly select a member of the group that is not full.
  :   for (int i : random_indices) {
  : uint16_t uuid_idx = (*group_uuid_indices)[i];
  : DataDir* candidate = FindOrDie(data_dir_by_uuid_idx_, 
uuid_idx);
  : RETURN_NOT_OK(candidate->RefreshIsFull(
  : DataDir::RefreshMode::EXPIRED_ONLY));
  : if (!candidate->is_full()) {
  :   *dir = candidate;
  :   return Status::OK();
  : }
  :   }
  :   return Status::IOError(
  :   "All data directories are full. Please free some disk 
space or "
  :   "consider changing the fs_data_dirs_reserved_bytes 
configuration "
  :   "parameter", "", ENOSPC);
  : }
I guess I'm missing context, but what is this for?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/fs.proto
File src/kudu/fs/fs.proto:

PS35, Line 114: are
nit: I know that whether data is plural or singular a hot topic among grammar 
nerds, but IIRC we mostly use the singular


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

PS35, Line 104: the
remove


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/metadata.proto
File src/kudu/tablet/metadata.proto:

PS35, Line 127: are
:)


PS35, Line 129: from a previous version of Kudu
mention a specific version


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 35
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-23 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 35:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

PS35, Line 485: DCHECK(IsGTest());
hum, can you at least change this to a CHECK? if we're missing coverage 
somewhere a release build could go through this path


PS35, Line 541: Status DataDirManager::GetDirForGroupUnlocked(const 
vector& group_indices,
  :   
boost::optional* new_uuid_idx) {
  :   DCHECK(dir_group_lock_.is_locked());
  :   if (group_indices.size() == data_dirs_.size()) {
  : return Status::OK();
  :   }
  : 
  :   // Determine all potential candidates to be added to the 
group.
  :   vector candidate_indices;
  :   unordered_set data_dir_set;
  :   data_dir_set.insert(group_indices.begin(), 
group_indices.end());
  :   for (auto& e : data_dir_by_uuid_idx_) {
  : // If the directory is already in the group or it's full, 
ignore it.
  : 
RETURN_NOT_OK(e.second->RefreshIsFull(DataDir::RefreshMode::ALWAYS));
  : if (ContainsKey(data_dir_set, e.first) || 
e.second->is_full()) {
  :   continue;
  : }
  : candidate_indices.push_back(e.first);
  :   }
  : 
  :   // Select two randomly, compare their load, and select the 
one with less load.
  :   shuffle(candidate_indices.begin(), candidate_indices.end(), 
default_random_engine(rng_.Next()));
  :   if (candidate_indices.empty()) {
  : *new_uuid_idx = boost::none;
  :   } else if (candidate_indices.size() == 1 ||
  :  FindOrDie(tablets_by_uuid_idx_map_, 
candidate_indices[0]).size() <
  :  FindOrDie(tablets_by_uuid_idx_map_, 
candidate_indices[1]).size()) {
  : *new_uuid_idx = candidate_indices[0];
  :   } else {
  : *new_uuid_idx = candidate_indices[1];
  :   }
  :   return Status::OK();
  : }
wouldn't it be simpler and faster to get all the data dirs in one go?
something like:
go through the data dirs, not including the ones that are full to get dir_vec


while (collected_dirs < needed_dirs) {
  shuffle vec
  pick 1 amont the 2 first
  erase chosen element
}


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

PS35, Line 59: reperesented
typo


PS35, Line 65: uuid_indices
nit: I don't think you need to refer to uuids all over this class, likely 
dd_index is enough


PS35, Line 69: uuid_idx_by_uuid
..except here of course where you use "dd_uuid_to_idx_map


PS35, Line 69: std::unordered_map
nit: typedef?


PS35, Line 221: CreateDataDirGroup
see my comments on the test about these method's names


PS35, Line 254: Selects a directory from the available directories that aren't 
in the
  :   // directory group. Selection is based on "The Power of Two 
Choices in
  :   // Randomized Load Balancing", selecting two directories 
randomly and
  :   // choosing the one with less load, quantified as the number 
of unique
  :   // tablets in the directory.
want to mention briefly what are the average case properties that you expect?


PS35, Line 291: TabletsByUuidIndexMap
oh you do have a typedef. move it/fwd declare somewhere you can use it also for 
DataDirGroup


PS35, Line 301: do not block each other, while threads
  :   // attempting to write, e.g. to create a new tablet, thereby 
creating a new
  :   // data directory group, block all threads.
this is hard to parse


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 35
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-23 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 35:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs-test.cc
File src/kudu/fs/data_dirs-test.cc:

PS35, Line 78: non_existent_opts
nit" non_existent_tablet_opts


PS35, Line 107: CreateDataDirGroup
maybe this should be names CreateDataDirGroupForTablet?


PS35, Line 109: dd_manager_->DeleteDataDirGroup(test_tablet_name_);
this doesn't return Status?


PS35, Line 110: LoadDataDirGroupFromPB
maybe this should be named LoadDataDirGroupForTabletFromPB?


PS35, Line 112: create a duplicate tablet.
this is weird. you mean that it won't crate a duplicate data dir group for the 
tablet, right?


PS35, Line 115: Tried to load DataDirGroup but tablet already exists
similarly here you mean: "Tried to load DataDirGroup for tablet 
 but it already exists" right?


PS35, Line 125: DataDirGroup not found for tablet
similar comment as in the test above


PS35, Line 140: FLAGS_fs_target_data_dirs_per_tablet = 3;
  :   const double kNumTablets = 20;
may make this relative then? something like
FLAGS_fs_target_data_dirs_per_tablet = kNumDirs / 3;
const double kNumTablets = kNumDirs * 2;


PS35, Line 143: 20
refer to the constant name here and at the end of the sentence


PS35, Line 158: static_cast(
dont think you need this static cast since the numerator is already a double


PS35, Line 159: Standard deviation:
add more info


PS35, Line 161: // Looping this 1000 times with an assertion of 2.0 resulted in 
a couple failures.
  :   // Asserting 3.0 should give a safe non-flaky buffer.
  :   ASSERT_LE(stddev, 3.0);
think we're setting ourselves up for flakyness here, unfortunately it's hard to 
come up with a number that is full proof unless you want to somehow run this 
with a huge number of data dirs same comment for the test below. 

I think these could be made into bechmarks or somesuch instead and you could 
try with different counts and report the numbers on the commit message that 
should be proof enough that it worked without having to make assertions that 
are hard to make specially with such a low data dir count.

If you decide to go the huge number of data dirs route instead I'd like to see 
somewhat tight assertions based on the theory behind the "power of two choices"


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 35
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-23 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 35:

(12 comments)

posting my previous comments. looking through the rest now

http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

PS35, Line 186: of 5000.
nit: "after 5000 runs"


PS35, Line 205: disks
data directories?


PS35, Line 226: void 
BlockManagerTest::RunBlockDistributionTest(const 
vector& paths) {
could you merge this method with the method above? a lot of the code seems 
pretty common.
What if it had a signature like:
void BlockManagerTest::DoRunBlockDistributionTest(const vector& paths, 
vector per_path_write_count, vector* num_files_per_path)


PS35, Line 227: const char* kTestData = "test data";
pull this somewhere common?


PS35, Line 228: files_in_each_path
these files are "block_containers" right?


PS35, Line 230: for (int d: { 1, 5 }) {
why not 3 in this case?


PS35, Line 237: ScopedWritableBlockCloser closer;
why not use this in the fbm too?


PS35, Line 344: ASSERT_EQ(paths.size() * 7, sum);
nit: this is not quite the same assertion as before. any special reason to 
change it?


PS35, Line 447:   // Store the DataDirGroupPB for tests that reopen the block 
manager.
  :   CHECK(this->bm_->dd_manager()->GetDataDirGroupPBForTablet(
  :   this->test_tablet_name_, >test_group_pb_));
this seems sketchy. are you reusing state across tests? what if I only run the 
other tests?


PS35, Line 678: true
nit: add /* bool_arg_name */


PS35, Line 692: true,
same


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

PS35, Line 170: next 
remove


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 35
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 35:

(11 comments)

only about 50% through, but figured I'd post a couple comments I have so far

http://gerrit.cloudera.org:8080/#/c/6636/17//COMMIT_MSG
Commit Message:

PS17, Line 27:  When loading
 : tablet data from a previous version of Kudu, the tablet's 
metadata
 : will not have a DataDirGroup. One will be generated containing 
all
 : data directories, as the tablet's data may already be spread 
across
 : any number of disks.
> Won't the superblock be rewritten with the old version of Kudu when new blo
yea, I think you're right here. One thing to watch out for in other cases is 
that, if you deserialize a PB which has some fields that are unknown, and then 
serialize it back out, it actually preserves the fields (in proto2). This 
changed with protobuf 3, but they're thinking of adding back that behavior 
again in 3.4.

In the case of TabletMetadata, though, it seems like we deserialize into our 
own structures, and then create a new PB on flush, rather than actually holding 
onto the original PB. So, in a downgrade-and-flush scenario, we will drop those 
fields as you described, and should be fine.


http://gerrit.cloudera.org:8080/#/c/6636/35//COMMIT_MSG
Commit Message:

PS35, Line 35: failure disk-failure
nit: typo


http://gerrit.cloudera.org:8080/#/c/6636/30/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

PS30, Line 192: bl
'num_blocks_per_dir'?


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

PS17, Line 53: 
> At least while the server is alive and the disk is still there, we do need 
hrm, but even then, don't we need the "placeholder" in the array such that the 
later indexes don't get "shifted forward"?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/tablet_bootstrap-test.cc
File src/kudu/tablet/tablet_bootstrap-test.cc:

PS35, Line 207:   // Bootstrapping the tablet while it is copying will attempt 
to create a data
  :   // dir group for the tablet when one already exists.
  :   ASSERT_TRUE(s.IsAlreadyPresent()) << "Expected already 
present: " << s.ToString();
Not sure about this change.

The test is supposed to be simulating the case where a server crashes mid-copy, 
and then restarts. Maybe this isn't realistic anymore (because we would 
roll-forward a COPYING into DELETED at actual startup)? But I dont think 
AlreadyPresent makes much sense to propagate up here.


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

PS35, Line 86: RETURN_NOT_OK
can you PREPEND here to give better context? same below


PS35, Line 199:   // Keep a copy of the old data dir group in case of flush 
failure.
  :   DataDirGroupPB pb;
  :   bool old_group_exists = 
fs_manager_->dd_manager()->GetDataDirGroupPBForTablet(tablet_id_, );
  : 
  :   // Remove the tablet's data dir group metadata tracked by the 
DataDirManager.
  :   fs_manager_->dd_manager()->DeleteDataDirGroup(tablet_id_);
  :   if (delete_type == TABLET_DATA_COPYING) {
  : // If the tablet deletion is a part of a tablet copy, a 
data dir group must
  : // created so the blocks can be placed for the new tablet.
  : 
RETURN_NOT_OK(fs_manager_->dd_manager()->CreateDataDirGroup(tablet_id_));
  :   }
  : 
  :   auto revert_group_cleanup = MakeScopedCleanup([&]() {
  : fs_manager_->dd_manager()->DeleteDataDirGroup(tablet_id_);
  : if (old_group_exists) {
  :   
fs_manager_->dd_manager()->LoadDataDirGroupFromPB(tablet_id_, pb);
  : }
  :   });
Instead of doing this here, would it work to instead add a new method like 
'ReallocateDataDirGroup' which is called separately by TabletCopy right before 
it starts writing out new blocks? It smells a little funny to me to piggy-back 
this on the deletion instead of on the setting-up of the new tablet metadata.

In other words, when we call TabletCopyClient::Finish(), we're already using 
ReplaceSuperBlock at that point which deletes the data dir group and loads the 
new one from the superblock. So, if the new super block has been allocated a 
new group by TabletCopyClient::Start (eg near where it does clear_rowsets(), 
clear_orphaned_blocks(), etc) then this would be handled somewhat naturally.

That said, if you considered that and went this way, happy to discuss


Line 388:   RETURN_NOT_OK(fs_manager_->dd_manager()->LoadDataDirGroupFromPB(
PREPEND


PS35, Line 393:   // data data 
typo


PS35, Line 398: true)
we try to avoid boolean parameters to 

[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-17 Thread Andrew Wong (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#35).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate the effects of single-disk failure. Given this, and given the
tradeoff between I/O and failure disk-failure tolerance, the default
behavior will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
38 files changed, 1,022 insertions(+), 192 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/35
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 35
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-17 Thread Andrew Wong (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#34).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
38 files changed, 1,023 insertions(+), 192 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/34
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 34
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-17 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 33: Code-Review-1

Getting to the bottom of all the test failures.

-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 33
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-17 Thread Andrew Wong (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#33).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
38 files changed, 1,018 insertions(+), 192 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/33
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 33
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-17 Thread Andrew Wong (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#32).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
38 files changed, 1,002 insertions(+), 190 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/32
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 32
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-17 Thread Andrew Wong (Code Review)
Hello Adar Dembo,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#31).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
38 files changed, 1,002 insertions(+), 190 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/31
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 31
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-16 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 30: Code-Review-1

As per our discussion, I'll update the usages of UUID indices to instead be 
UUIDs.

-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 30
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-16 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 30:

I'm looking

-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 30
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-15 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 30: Code-Review+1 Verified+1

Test failure is a known flake.

Todd/David, I think this is good to go as-is (and fixes KUDU-1952). Could one 
of you take another look?

-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 30
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-15 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 29:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6636/24/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

PS24, Line 412: other
> nit: in other places in this file this kind of statement is written as 'if 
Done


PS24, Line 529:   data_dir_set.insert(group_indices.begin(), 
group_indices.end());
  :   for (auto& e : data_dir_by_uuid_idx_) {
  :
> nit: since it's C++11, consider
Neat! Done


PS24, Line 532: URN_
> nit: auto& ?
Done


http://gerrit.cloudera.org:8080/#/c/6636/24/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

PS24, Line 20: #include 
> nit: per code style this should be placed after std and before kudu headers
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 29
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-15 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#29).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
38 files changed, 991 insertions(+), 190 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/29
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 29
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-15 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 26:

(4 comments)

Just skimmed through.  I need to get better understanding of the background to 
get more thoughtful feedback -- I hope I'll do one more pass this week.

http://gerrit.cloudera.org:8080/#/c/6636/24/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

PS24, Line 412: other
nit: in other places in this file this kind of statement is written as 'if 
(other != nullptr)'.  Consider unifying this among those places.


PS24, Line 529:   for (uint16_t uuid : group_indices) {
  : data_dir_set.insert(uuid);
  :   }
nit: since it's C++11, consider

data_dir_set.insert(group_indices.begin(), group_indices.end());


PS24, Line 532: auto
nit: auto& ?


http://gerrit.cloudera.org:8080/#/c/6636/24/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

PS24, Line 20: #include 
nit: per code style this should be placed after std and before kudu headers; 
like it is in data_dirs.cc

https://google.github.io/styleguide/cppguide.html#Names_and_Order_of_Includes


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 26
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-15 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#28).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
38 files changed, 992 insertions(+), 190 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/28
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 28
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-15 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#27).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
38 files changed, 987 insertions(+), 190 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/27
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 27
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-12 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#26).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
38 files changed, 986 insertions(+), 190 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/26
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 26
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-12 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#25).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
38 files changed, 986 insertions(+), 190 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/25
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 25
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-11 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#24).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
38 files changed, 986 insertions(+), 190 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/24
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 24
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-11 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 23:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6636/23/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

Line 477:   // Use a different RNG than that used to create groups to avoid 
races.
This comment is no longer true.


Line 544:   // Use a separte RNG from that used by directory selection to avoid 
races.
No longer true?


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 23
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-11 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6636/18/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

Line 540: }
> The output of std::default_random_engine() would be constant; it's the inpu
As we discussed, give ThreadSafeRandom a shot. It should let you share the same 
RNG.


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 21
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-11 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#23).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
38 files changed, 978 insertions(+), 179 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/23
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 23
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-11 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#22).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
38 files changed, 979 insertions(+), 179 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/22
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 22
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-11 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 21:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/6636/18/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

Line 134: if (basename == kInstanceMetadataFileName) {
> warning: parameter 'dirname' is unused [misc-unused-parameters]
Done


Line 144:   // hierarchy, ignoring '.', '..', and file 
'kInstanceMetadataFileName'.
> Nit: got some extra whitespace here.
Done


Line 325:   // Verify the results. Each path has dot, dotdot, instance file.
> Should ASSERT_OK() here.
Done


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs-test.cc
File src/kudu/fs/data_dirs-test.cc:

Line 96:   for (int i = 0; i < pb.uuid_indices().size(); i++) {
> Not done?
Rrg good catch, done.


http://gerrit.cloudera.org:8080/#/c/6636/18/src/kudu/fs/data_dirs-test.cc
File src/kudu/fs/data_dirs-test.cc:

PS18, Line 47: ),
> Use env_
Done


Line 72:   DataDirGroupPB pb_;
> This is a little unusual; use the Kudu Random from util/random.
More than unusual, unneeded. These tests don't require any RNG of their own, 
just that of the DataDirManager.


http://gerrit.cloudera.org:8080/#/c/6636/18/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

Line 21: #include 
> Nit: this belongs with the gflags/glog includes since boost is part of the 
Ah I see, it's a dependency. Done.


Line 540: }
> Let's make this a member of the DataDirManager and initialize it just once,
The output of std::default_random_engine() would be constant; it's the input 
r.Next() that's changing to instantiate the engine, providing different rng 
across runs.

I went with your first suggestion.

Actually that resulted in a race since GetNextDataDir only read-locks, and 
since the shuffle needs a new seed every run to select different directories 
every time, it modifies the RNG state.


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 21
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-11 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#21).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
38 files changed, 978 insertions(+), 178 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/21
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 21
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-11 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#20).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
37 files changed, 972 insertions(+), 176 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/20
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 20
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-11 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#19).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
37 files changed, 972 insertions(+), 176 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/19
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 19
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-10 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 18:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6636/18/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

Line 144:   
Nit: got some extra whitespace here.


Line 325: CountFiles(path, _in_path);
Should ASSERT_OK() here.


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs-test.cc
File src/kudu/fs/data_dirs-test.cc:

Line 96:   }
> Done
Not done?


http://gerrit.cloudera.org:8080/#/c/6636/18/src/kudu/fs/data_dirs-test.cc
File src/kudu/fs/data_dirs-test.cc:

PS18, Line 47: Env::Default()
Use env_


Line 72:   testing::internal::Random r_;
This is a little unusual; use the Kudu Random from util/random.


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

Line 460: // natural tablet_id, select a data dir randomly.
> Done
Looks like you didn't add the DCHECK() though.


http://gerrit.cloudera.org:8080/#/c/6636/18/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

Line 21: #include 
Nit: this belongs with the gflags/glog includes since boost is part of the 
"project", not in the "system".


Line 540:   Random r(GetRandomSeed32());
Let's make this a member of the DataDirManager and initialize it just once, 
when constructing the DataDirManager.

Actually, let's make the output of std::default_random_engine() a member, and 
seed it with GetRandomSeed32().


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

Line 51: 
> No, forward declaration usually doesn't work for STL containers since I'm n
Right, I forgot about that. Sorry.

Could you put the DataDirGroup into a namespace called 'internal' to make it 
clear that it's not for use outside of data_dirs.{h,cc}?


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 18
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-10 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 18:

(33 comments)

Failure from LINT and a memory leak in ASAN data_dirs-test dist-test (looking 
into it).

http://gerrit.cloudera.org:8080/#/c/6636/17//COMMIT_MSG
Commit Message:

PS17, Line 9: mitigate the
: single-disk failure
> nit: we aren't mitigating the failure, but rather mitigating the effects of
Done


PS17, Line 27:  When loading
 : tablet data from a previous version of Kudu, the tablet's 
superblock
 : will not have a DataDirGroup. One will be generated containing 
all
 : data directories, as the tablet's data may already be spread 
across
 : any number of disks.
> what happens if we add a table on the new version, then downgrade, run for 
Won't the superblock be rewritten with the old version of Kudu when new blocks 
are written? I thought the superblock was always updated as new things get 
flushed.
If that's the case, shouldn't the metadata be replaced with a group-less 
version, and upon upgrading, we should be create a group with all dirs.


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

Line 133:   Status CountFilesCb(int* num_files, Env::FileType type,
> would using Env::Walk() make this easier?
Done


Line 428:   size_t size1 = 5;
> Before asserting on the status' string, assert on its type (i.e. ASSERT_TRU
Done


http://gerrit.cloudera.org:8080/#/c/6636/18/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

PS18, Line 154: t CountFiles_OLD(const string& path) {
  : vector child_paths;
  : Status s = env_->GetChildren(path, _paths);
  : if (!s.ok()) {
  :   return 0;
  : }
  : int count = 0;
  : for (const string& child_path : child_paths) {
  :   if (child_path == "." || child_path == ".." || child_path 
== kInstanceMetadataFileName) {
  : continue;
  :   }
  :   string full_child_path = JoinPathSegments(path, 
child_path);
  :   bool is_dir;
  :   s = env_->IsDirectory(full_child_path, _dir);
  :   if (is_dir) {
  : // Count the files in the child directories.
  : count += CountFiles_OLD(full_child_path);
  :   } else {
  : // Increment if the child is a file.
  : count++;
  :   }
  : }
  : return count;
  :   }
Removed.


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs-test.cc
File src/kudu/fs/data_dirs-test.cc:

PS17, Line 56:   }
 : 
 :  pr
> You can omit this since it doesn't do anything beyond what the base class d
Done


PS17, Line 79:   Status s = dd_manager_->GetNextDataDir(non_existent_opts, 
nullptr);
 :   ASSERT_STR_CONTAIN
> Would it be possible to instantiate a DataDirManager and forgo the block ma
The main reason I added a blockmanager since setting up the env seemed to be 
nicely handled.


Line 96:   }
> Test the Status programmatically before testing its string representation.
Done


PS17, Line 115: TEST_F(DataDirGroupTest, TestDeleteDataDirGroup) {
  :   ASSERT_OK(dd_manager_->CreateDataDirGroup(test_
> Why is it necessary to set these? Won't GetNextDataDir() return Status::OK(
Left in from when I was experimenting with the flags. Removed


Line 138:   // Add 20 tablets, each with size 3.
> Maybe do this once in the test constructor, and do it again in each test th
Done


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

PS17, Line 62: riped ac
> maybe even experimental, considering this isn't quite usable yet?
Done


Line 188: 
> ??
Good question. Removed.


PS17, Line 254:   if (metric_entity) {
  : metrics_.reset(new Data
> I don't think it makes sense to modify process-level state deep in an objec
RNG is used for creating groups and selecting directories within groups. I 
added this here so my dist-test runs would differ between runs. Without it, the 
groups would always be the same.

I've changed it to use GetRandomSeed32 and random_shuffle, which afaik don't 
modify state.


Line 394: InsertOrDie(_idx_by_dd, dd.get(), idx);
> Don't need std:: prefixes here.
Done


PS17, Line 410: 
> i think this and the one below could be a bit more specific and refer to da
Good call, done. Also removed "tracking" vernacular, since it's no longer used 
in function names.


PS17, Line 422: 
  : 
  :   // Adjust the disk group size to fit within the total number 
of data dirs.
  :   

[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-10 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#18).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
37 files changed, 986 insertions(+), 175 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/18
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 18
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-08 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 17:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/6636/17//COMMIT_MSG
Commit Message:

PS17, Line 9: mitigate the
: single-disk failure
nit: we aren't mitigating the failure, but rather mitigating the effects of a 
single-disk failure


PS17, Line 27:  When loading
 : tablet data from a previous version of Kudu, the tablet's 
superblock
 : will not have a DataDirGroup. One will be generated containing 
all
 : data directories, as the tablet's data may already be spread 
across
 : any number of disks.
what happens if we add a table on the new version, then downgrade, run for a 
bit, and then upgrade? the new version will still think that the tablet has 
blocks only on a subset of disks, even though in fact it has blocks on more, 
right? any way to avoid this issue?


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

Line 133:   int CountFiles(const string& path) {
would using Env::Walk() make this easier?


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

PS17, Line 62: evolving
maybe even experimental, considering this isn't quite usable yet?


Line 188:   FLAGS_fs_data_dirs_full_disk_cache_seconds = 0; // Don't cache 
device fullness.
??


PS17, Line 410: Tablet already being tracked
i think this and the one below could be a bit more specific and refer to 
datadir groups. Remember that Statuses don't keep track of the line of code or 
file where they were produced, so being pretty specific is helpful.


PS17, Line 422:  group_target_size = FLAGS_fs_target_data_dirs_per_tablet;
  :   if (group_target_size > data_dirs_.size()) {
  : group_target_size = data_dirs_.size();
  :   }
use std::min?


Line 460: // This should only be reached by some tests; in cases where 
there is no
maybe DCHECK(IsGTest()) from test_util_prod.h?


Line 535: }
why not also just skip adding the full ones here? might make the lower part 
easier


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

PS17, Line 53: PathSetPB
maybe missed something somewhere in the design, but when we get to removing and 
replacing disks, what happens to the PathSetPB? We'll have to keep all of the 
dead UUIDs in there forever, right?


PS17, Line 66: uuid_indices
std::move


Line 70: DCHECK(pb != nullptr);
can just DCHECK(pb);


Line 73:   *group.mutable_uuid_indices()->Add() = uuid_idx;
I think you can write this as group->add_uuid_indices(uuid_idx)


Line 75: *pb = group;
nit: pb->Swap(group); to avoid an extra allocation


PS17, Line 251: no_dirs_found' is set to true if the group is already
  :   // larger than the limit or if all candidates are full
why not return a bad status instead of a second out-param?


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/file_block_manager.h
File src/kudu/fs/file_block_manager.h:

PS17, Line 67: CreateBlockOptions
typo?


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/fs.proto
File src/kudu/fs/fs.proto:

Line 116: message DataDirGroupPB {
can you note where this ends up stored?


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/tablet/delta_compaction.cc
File src/kudu/tablet/delta_compaction.cc:

Line 65: const string& tablet_id)
nit: pass by value and then std::move below


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 17:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

Line 428:   ASSERT_STR_CONTAINS(s.ToString(), "DataDirGroup not found for 
tablet");
Before asserting on the status' string, assert on its type (i.e. 
ASSERT_TRUE(s.IsNotFound()) or whatever).


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs-test.cc
File src/kudu/fs/data_dirs-test.cc:

PS17, Line 56:   virtual void TearDown() override {
 : KuduTest::TearDown();
 :   }
You can omit this since it doesn't do anything beyond what the base class does.


PS17, Line 79:   // Parent block manager. The selection of FBM over LBM 
shouldn't be relevant
 :   // to these tests.
Would it be possible to instantiate a DataDirManager and forgo the block 
manager altogether? That would eliminate some unnecessary test setup.


Line 96:   Status s = dd_manager_->GetNextDataDir(non_existent_opts, nullptr);
Test the Status programmatically before testing its string representation.

Elsewhere too.


PS17, Line 115:   FLAGS_fs_data_dirs_reserved_bytes = 1;
  :   FLAGS_disk_reserved_bytes_free_for_testing = 2;
Why is it necessary to set these? Won't GetNextDataDir() return Status::OK() 
even if they're unset?


Line 138:   FLAGS_fs_target_data_dirs_per_tablet = group_size_;
Maybe do this once in the test constructor, and do it again in each test that 
needs to override group_size_?


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

PS17, Line 254:   // Seed the RNG for data directory group operations.
  :   std::srand(std::time(0));
I don't think it makes sense to modify process-level state deep in an object's 
constructor like this. Why did you need this in the first place? When we need 
to seed a PRNG in unit tests, we use SeedRandom().


Line 394: InsertOrDie(_by_uuid_idx_map_, idx, 
std::set());
Don't need std:: prefixes here.


Line 479: DataDir* candidate = data_dir_by_uuid_idx_[uuid_idx];
FindOrDie() here?


PS17, Line 518: no_dirs_found
Instead of the separate bool, could you wrap new_uuid_idx in a boost::optional? 
It'll be initialized to boost::none by default I think, and then the caller can 
check if it's still boost::none after the call or whether it's been initialized 
to a real value.


Line 548: 
RETURN_NOT_OK(candidate->RefreshIsFull(DataDir::RefreshMode::ALWAYS));
Why do we need to refresh?


http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

Line 51: // A DataDirGroup is a group of directories used by an entity for block
I don't understand why you had to move DataDirGroup back to the header. The 
only usage I see is as a value in TabletDataDirGroupMap; can't that be forward 
declared?


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-04 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#17).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
37 files changed, 1,002 insertions(+), 175 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/17
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-04 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#16).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
37 files changed, 1,023 insertions(+), 175 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/16
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-04 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6636/15/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

Line 132:   // hierarchy, ignoring '.', '..', and the file specified by 
'instance_name'.
will update this comment


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-04 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 14:

(27 comments)

I moved the definition of DataDirGroup back to the .h since it's used by one of 
the private members.

Also added data_dirs-test.cc

http://gerrit.cloudera.org:8080/#/c/6636/14/src/kudu/fs/block_manager-stress-test.cc
File src/kudu/fs/block_manager-stress-test.cc:

PS14, Line 433: "test_tablet"
> Define this once in the test fixture and use it in both places.
Done


http://gerrit.cloudera.org:8080/#/c/6636/14/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

Line 118: CHECK_OK(bm_->dd_manager()->LoadDataDirGroupFromPB("test_tablet", 
test_group_pb_));
> Why not RETURN_NOT_OK here too?
Done


Line 126:   void RunBlockDistributionTest(const vector& paths);
> Maybe add some coverage for DeleteDataDirGroup too?
Done


Line 141:   ASSERT_STR_CONTAINS(s.ToString(), "DataDirGroup not found for 
tablet");
> Create a separate test for this; don't overload setUp() with tests.
Done


Line 155: int count_files(Env* env, const string& path, const string& 
instance_name) {
> Should be CountFiles(). Also, perhaps make it a member of BlockManagerTest 
Done


PS14, Line 171: instance_name
> Isn't this always "block_manager_instance"? If so, can you hardcode it?
Done


PS14, Line 190: CHECK_OK
> ASSERT_OK
Done


PS14, Line 228: CHECK_OK
> ASSERT_OK
Done


Line 284:   CHECK_OK(bm_->dd_manager()->CreateDataDirGroup("multipath_test"));
> ASSERT_OK
Done


PS14, Line 313: CHECK_OK
> ASSERT_OK
Done


PS14, Line 385: CreateBlockOptions({ "test_tablet" })
> Maybe you can define this once as a BlockManagerTest member and refer to it
Done


http://gerrit.cloudera.org:8080/#/c/6636/14/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

PS14, Line 160: A group is represented by a list of indices within the list of
  : // all UUIDs found in the PathSetPB.
> Maybe "A group is represented by a list of 2-byte indices, which index into
Done


Line 478: tablets_by_uuid_idx_map_[uuid_idx].insert(tablet_id);
> Should this use InsertOrDie(), to emphasize that there's no reason the set 
Done


Line 528:   for (int16_t uuid_index : group->uuid_index_list()) {
> Should be uint16_t
Done


Line 529: tablets_by_uuid_idx_map_[uuid_index].erase(tablet_id);
> Should we use FindOrDie() to get the set of tablets out, since it'd be an e
Done


Line 573: candidate = data_dir_by_uuid_idx_[dir_uuids[iter1]];
> FindOrDie(). L580 too.
Done


Line 590:   tablets_by_uuid_idx_map_[uuid1].size() > 
tablets_by_uuid_idx_map_[uuid2].size()) {
> Use FindOrDie() here too.
Done


http://gerrit.cloudera.org:8080/#/c/6636/14/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

Line 27: #include "kudu/fs/fs.pb.h"
> Do you still need this? Maybe you can get away with forward-declaring DataD
I can and Done


PS14, Line 162: Adds data directories to a specific tablet's dir group, until 
the limit
  :   // specified by fs_target_data_dirs_per_tablet, or until 
there is no more space.
> Yes, but we should also emphasize that this also creates a dir group for th
Done


PS14, Line 170: tablet
> Nit: tablet's dir group.
Done


PS14, Line 171: tablet will be untracked.
> Nit: "tablet's dir group will be deleted" (to emphasize the symmetry with D
Done


PS14, Line 174: tracking
> Nit: since this method is no longer UntrackTablet, I think it'd be clearer 
Done


Line 221:   bool GetDirForGroupUnlocked(const std::vector& group, 
uint16_t* uuid_idx);
> Maybe the parameters would be more clear as 'uuid_idxes' and 'new_uuid_idx'
Hrm, I changed it to 'group_indices' and 'new_uuid_index', hopefully that's 
clearer.


http://gerrit.cloudera.org:8080/#/c/6636/14/src/kudu/fs/fs.proto
File src/kudu/fs/fs.proto:

PS14, Line 117: indexes
> Nit: indices or indexes? Pick one and apply it consistently here (and where
Done.


http://gerrit.cloudera.org:8080/#/c/6636/14/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

PS14, Line 67: "test_tablet"
> Define just once.
Done


PS14, Line 247: CreateBlockOptions({ "test_tablet" }
> Maybe define just once and use repeatedly?
Done


http://gerrit.cloudera.org:8080/#/c/6636/14/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

Line 96: fs_manager->dd_manager()->DeleteDataDirGroup(tablet_id);
> So a convention we like to use in these situations is to wrap the cleanup t
That is _neat_
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 

[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-04 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#15).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff
between I/O and failure disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
37 files changed, 1,000 insertions(+), 175 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/15
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-01 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 14:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/6636/12/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

Line 406:JoinStrings(paths_, ",")));
> Nit: indentation.
Done


Line 416: idx = i;
> Between this, emplace(), and FindOrNull, I count three accesses to group_by
An initial ContainsKey is warranted as to avoid the work below, reduced to two.


PS12, Line 421:  (idx > max_data_
> Nit: reserve kCamelCase notation for variables whose values never change; i
Done


PS12, Line 428: 
> Shouldn't this be combined with L427 in FindOrDie?
Restructured based on keeping DataDirGroups immutable, only a single 
InsertOrDie is needed (after the initial ContainsKey).


PS12, Line 432: s::OK();
> How about just 'e' for entry? And can't this be a const auto& iteration?
Done


Line 433: }
> Not an actual uuid.
Done


Line 449:   if (ContainsKey(group_by_tablet_map_, tablet_id)) {
> Won't this deadlock since it'll try to acquire dir_group_lock_ recursively?
Restructured and added test coverage


PS12, Line 460: ize == 0 || use_all_
> Use empty() on the string instead.
Done


Line 513: }
> If the expectation is that dir_group_lock_ is held while calling this metho
Done


Line 557:   for (auto uuid_and_dir_ptr : data_dir_by_uuid_idx_) {
> Nit: indentation.
Done


http://gerrit.cloudera.org:8080/#/c/6636/12/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

Line 54: 
> Since this isn't used outside of data_dirs.cc, perhaps we can declare it th
Done


Line 81:   // If 'mode' is EXPIRED_ONLY, performs the test only if the dir was 
last
> Would it be possible to enforce that DataDirGroups are immutable once creat
Done


Line 86:   // reflected via is_full().
> It occurred to me that these aren't actually UUIDs; they're indexes into th
Done


PS12, Line 209:   // 'uuid_idx' is an output denoting which uuid_idx should be 
added to
  :   // 'group'. Returns false if the group is already larger than 
the limit or if
  :   // all candidates are full.
  :   //
> Comment needs to be updated now that data dir groups are private to the Dat
Done


Line 215:   // 'uuid_idx' to 'group') warrants that these calls fall within the 
scope of
> This appears to always be followed up with a ToPB conversion. Perhaps it wo
Done


Line 249: 
> I think it would be cleaner if 'group' was an IN parameter rather than IN/O
Done


http://gerrit.cloudera.org:8080/#/c/6636/12/src/kudu/fs/fs.proto
File src/kudu/fs/fs.proto:

Line 113: // Tablet data are spread across a specified number of data 
directories. The
> These are UUID indexes, right? Not UUIDs? Could draw the connection to uuid
Done, good pointer to PathSetPB.


http://gerrit.cloudera.org:8080/#/c/6636/12/src/kudu/tablet/delta_compaction.h
File src/kudu/tablet/delta_compaction.h:

Line 119:   // The ID of the tablet being compacted.
> How about, "The ID of the tablet being compacted"?
Done


http://gerrit.cloudera.org:8080/#/c/6636/12/src/kudu/tablet/metadata.proto
File src/kudu/tablet/metadata.proto:

PS12, Line 128: a new
> data directories
Done


http://gerrit.cloudera.org:8080/#/c/6636/12/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

Line 60: 
> warning: using decl 'CreateBlockOptions' is unused [misc-unused-using-decls
Done


Line 96: fs_manager->dd_manager()->DeleteDataDirGroup(tablet_id);
> If the Flush() fails, should we delete the created DataDirGroup?
Yep, without a functioning TabletMetadata, tablet creation won't proceed.


Line 196:   // Keep a copy of the old data dir group in case of flush failure.
> What state are we left in if this is called but Flush() below fails?
Good catch, if we fail, we have to roll-back the DataDirManager state.


http://gerrit.cloudera.org:8080/#/c/6636/12/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

Line 481: // TODO(unknown): Because we begin shutdown of the tablet 
after we check our
> warning: missing username/bug in TODO [google-readability-todo]
Done


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/tserver/ts_tablet_manager.h
File src/kudu/tserver/ts_tablet_manager.h:

Line 188:  tablet::TabletDataState delete_type,
> You changed the variable reference in the comment, but not its actual name.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro 

[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-01 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#14).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all data
directories, as the tablet's data may already be spread across any
number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff between
I/O and failure disk-failure tolerance, the default behavior will be to
spread tablet data across all disks.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
35 files changed, 696 insertions(+), 168 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/14
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-04-25 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#12).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all data
directories, as the tablet's data may already be spread across any
number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff between
I/O and failure disk-failure tolerance, the default behavior will be to
spread tablet data across all disks.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
35 files changed, 660 insertions(+), 167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/12
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-04-25 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#11).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all data
directories, as the tablet's data may already be spread across any
number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff between
I/O and failure disk-failure tolerance, the default behavior will be to
spread tablet data across all disks.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
35 files changed, 660 insertions(+), 167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-04-25 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 10:

(37 comments)

http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

Line 28: #include "kudu/gutil/ref_counted.h"
> Don't need; just forward declare DataDirManager.
Done


PS9, Line 160: This is 
> No need for "For now" type language; it doesn't really add content and it's
Done


PS9, Line 253:   virtual DataDirManager* dd_manager() = 0;
 : };
 : 
> I don't think a simple accessor like this needs a broad justification. Some
Done


Line 256: // Closes a group of blocks.
> Can be a const method, right?
The DataDirManager is owned by the block managers, so const wouldn't work here 
unless we returned a const pointer.


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/data_dir_group.h
File src/kudu/fs/data_dir_group.h:

> I think the content here can be merged into data_dirs.{h,cc}, those two are
Done. Moved to data_dirs.h


Line 29
> Merits some documentation.
Done


Line 30
> Nit: indent by one char (private too).
Done


PS9, Line 32: 
> Can use {} instead?
Done


Line 44
> Seems more appropriate as a DCHECK, unless you think there's a lack of test
Done


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

PS9, Line 171: ricEntity> metric_
> I don't think this is appropriate here. We're not creating a block, so the 
This was intended on being here to support disk classes. Suppose it's not 
needed, depending on if/how disk class gets implemented.


PS9, Line 183: 
> Also inappropriate here (though admittedly the case is weaker; we generally
Here as well, could be useful in the future were we to want to get the next 
data dir for a bloomfile for a given tablet, but you're right that it's 
unnecessary at this point.
I think here more than CreateDataDirGroup, it'd be important to keep this here 
for the future, since it's only being used for block placement.


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/file_block_manager.h
File src/kudu/fs/file_block_manager.h:

Line 65: // When creating blocks, the block manager will place blocks based on 
the
> Maybe reword this to be more vague, so that if the directory selection poli
Done


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/fs.proto
File src/kudu/fs/fs.proto:

Line 113: // Tablet data are spread across a specified number of data 
directories. The
> Should document the new message itself too.
Done


Line 116:   // List of data directory's UUIDs. Must not be empty.
> Do you anticipate wanting more per-data dir state in the future? If so, it 
Per-data dir state will be useful in specifying that certain data dirs are of a 
certain disk class, for instance, but I don't think _that_ metadata needs to be 
stored in the data dir group, since there are more instances of this uuid than 
there are data dirs.

I think it'd be sufficient to separate a DataDirPB and DataDirGroupPB, with 
just the uuids acting as a sort of foreign key between the two, when we do add 
that extra state.


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/fs_manager.cc
File src/kudu/fs/fs_manager.cc:

PS9, Line 587: 
> Could just be CreateBlockOptions().
As per the other comment, I'm taking out this codepath.


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/fs_manager.h
File src/kudu/fs/fs_manager.h:

Line 135: std::unique_ptr* block);
> Nit: indentation.
Done


Line 139: 
> Do we actually need to preserve this code path? How painful would it be to 
Ah, I'd originally preserved this because default behavior was a bit different. 
Each test _needed_ something specified by the opts, so each test had to 
manually do a CreateDataDirGroup before calling CreateNewBlock.

Now it should be fine to just use the default constructor.


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

PS9, Line 105: blocks for diskrowsets
> Is there any other kind of block? Doesn't every block belong to a diskrowse
Right, should be i.e. instead of e.g., or removed altogether.


Line 215:   // Returns a container appropriate for the given 
CreateBlockOptions, creating
> Like in file block manager, perhaps it'd be better to be vague here so that
Done


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/tablet/delta_compaction.h
File src/kudu/tablet/delta_compaction.h:

Line 27: #include "kudu/tablet/compaction.h"
> No longer needed?
Done


Line 63:   Status Compact();
> See my earlier feedback: was hoping we could pass (and store_ the tablet_id
Done


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

Line 43: using fs::WritableBlock;
> Should come before ReadableBlock.
Done



[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-04-25 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#10).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all data
directories, as the tablet's data may already be spread across any
number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff between
I/O and failure disk-failure tolerance, the default behavior will be to
spread tablet data across all disks.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
35 files changed, 660 insertions(+), 167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-04-24 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 9:

(37 comments)

I reviewed everything but the data_dir changes and the new test. Mostly style 
stuff.

http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

Line 28: #include "kudu/fs/data_dirs.h"
Don't need; just forward declare DataDirManager.


PS9, Line 160: For now,
No need for "For now" type language; it doesn't really add content and it's 
just one more thing that will need to be updated (or go stale, if no one 
notices) when the struct grows a new field.

The "In the future" stuff is OK though, since that adds useful information.


PS9, Line 253:   // Exposes an interface for the DataDirManager, granting other 
entities, like
 :   // the FsManager, the ability to manage data dir groups, which 
is a function
 :   // of each tablet.
I don't think a simple accessor like this needs a broad justification. 
Something as simple as "Exposes the underlying DataDirManager" can suffice.


Line 256:   virtual DataDirManager* dd_manager() = 0;
Can be a const method, right?


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/data_dir_group.h
File src/kudu/fs/data_dir_group.h:

I think the content here can be merged into data_dirs.{h,cc}, those two aren't 
massive yet.


Line 29: class DataDirGroup {
Merits some documentation.


Line 30: public:
Nit: indent by one char (private too).


PS9, Line 32: std::vector()
Can use {} instead?


Line 44: CHECK(pb != nullptr);
Seems more appropriate as a DCHECK, unless you think there's a lack of test 
coverage for this path?


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

PS9, Line 171: CreateBlockOptions
I don't think this is appropriate here. We're not creating a block, so the use 
of CreateBlockOptions confuses more than helps.

If you need a tablet_id, pass a tablet_id.


PS9, Line 183: CreateBlockOptions
Also inappropriate here (though admittedly the case is weaker; we generally are 
trying to create a block when we call this function).


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/file_block_manager.h
File src/kudu/fs/file_block_manager.h:

Line 65: // When creating blocks, the block manager will select a random 
directory in
Maybe reword this to be more vague, so that if the directory selection policy 
changes further in the future, this comment won't need to be updated?


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/fs.proto
File src/kudu/fs/fs.proto:

Line 113: message DataDirGroupPB {
Should document the new message itself too.


Line 116:   repeated uint32 uuids = 1;
Do you anticipate wanting more per-data dir state in the future? If so, it 
might be better future proofing to define a DataDirPB message right now (with 
just the uuid as a field in it).


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/fs_manager.cc
File src/kudu/fs/fs_manager.cc:

PS9, Line 587: CreateBlockOptions({""})
Could just be CreateBlockOptions().


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/fs_manager.h
File src/kudu/fs/fs_manager.h:

Line 135:std::unique_ptr* block);
Nit: indentation.


Line 139:   // This should only be used in tests.
Do we actually need to preserve this code path? How painful would it be to get 
rid of it?


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

PS9, Line 105: blocks for diskrowsets
Is there any other kind of block? Doesn't every block belong to a diskrowset in 
some form or another?


Line 215:   // Returns a randomly-selected container within the data directory 
group
Like in file block manager, perhaps it'd be better to be vague here so that the 
comment doesn't become stale were the selection policy to change again.


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/tablet/delta_compaction.h
File src/kudu/tablet/delta_compaction.h:

Line 27: #include "kudu/fs/block_manager.h"
No longer needed?


Line 63:   Status Compact(const std::string& tablet_id);
See my earlier feedback: was hoping we could pass (and store_ the tablet_id in 
the constructor and avoid plumbing here (or in the private methods).


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

Line 43: using fs::CreateBlockOptions;
Should come before ReadableBlock.


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/tablet/diskrowset.cc
File src/kudu/tablet/diskrowset.cc:

Line 67: using fs::CreateBlockOptions;
Before Scoped...


http://gerrit.cloudera.org:8080/#/c/6636/9/src/kudu/tablet/metadata.proto
File src/kudu/tablet/metadata.proto:

PS9, Line 126:  The
 :   // group is represented by the UUIDs of the data directories 
it consists of.
This part belongs to 

[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-04-21 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#9).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all data
directories, as the tablet's data may already be spread across any
number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff between
I/O and failure disk-failure tolerance, the default behavior will be to
spread tablet data across all disks.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dir_group.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
28 files changed, 675 insertions(+), 164 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-04-21 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#8).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all data
directories, as the tablet's data may already be spread across any
number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff between
I/O and failure disk-failure tolerance, the default behavior will be to
spread tablet data across all disks.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dir_group.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
28 files changed, 675 insertions(+), 164 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-04-21 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#7).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all data
directories, as the tablet's data may already be spread across any
number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff between
I/O and failure disk-failure tolerance, the default behavior will be to
spread tablet data across all disks.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dir_group.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
28 files changed, 675 insertions(+), 164 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-04-20 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#6).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's superblock
will not have a DataDirGroup. One will be generated containing all data
directories, as the tablet's data may already be spread across any
number of disks.

As this patch only addresses block placement, it does not itself
mitigate single-disk failure. Given this, and given the tradeoff between
I/O and failure disk-failure tolerance, the default behavior will be to
spread tablet data across all disks.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dir_group.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
28 files changed, 688 insertions(+), 165 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon