Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14654 )

Change subject: [KUDU-2975]Spread WAL across multiple directories
......................................................................


Patch Set 3:

(7 comments)

I left some overall feedback. I'm curious if you and Adar agree/disagree. 
Basically, I think the concerns of the WALs and of the data are separate enough 
to warrant having a separate class for it.

Also we should think of how we want to test this:
- test the "happy" case with no disk failures. We should make sure that we get 
multiple WAL directories when we create new tablets.
- test common upgrade cases. What happens if we have an old --fs_wal_dir=/wal 
but now we supply --fs_wal_dirs=/wal ? Is that OK? How about if we supply 
--fs_wal_dirs=/wal,/wal2 or --fs_wal_dirs=/wal1,wal2 ?
- test what happens when we are missing WAL directories
- test what happens when we have more than one WAL directory for a given tablet 
(should be covered by having the UUID in the tablet metadata, but we should add 
a test for it)

http://gerrit.cloudera.org:8080/#/c/14654/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14654/3//COMMIT_MSG@7
PS3, Line 7: [KUDU-2975]Spread WAL across multiple directories
nit: for consistency with the rest of our commits, format as

KUDU-2975: Spread WAL across multiple directories


http://gerrit.cloudera.org:8080/#/c/14654/3//COMMIT_MSG@8
PS3, Line 8:
           : Change-Id: Ied496804421d91ff1fa63d49979fde971071506e
Could you add a commit message explaining the design decisions made in this 
patch? Things like:
- This is reusing a new DataDirManager instance for WALs
- How (if at all) this handles missing WALs
- How (if at all) this handles failed WAL disks
- etc.

It's ok for some of the answers here to be that we don't handle certain edge 
cases yet, but it would make it easier to review if we know what to expect in 
this implementation.


http://gerrit.cloudera.org:8080/#/c/14654/3/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/14654/3/src/kudu/fs/data_dirs.cc@723
PS3, Line 723:     if (opts_.dir_type == "data") {
             :       // Create a per-dir thread pool.
This style of checking makes the code more difficult to understand and 
maintain, which is why I think the WalDirManager should be its own class. It 
seems like the WalDirManager would only need to implement a subset of the 
functionality of the DataDirManager.

It doesn't have to worry about:
* per-data-dir work done in threadpools
* anything related to groups of directories (since there's only one WAL 
directory per tablet)
* returning a subset of a directory group (like GetDirForBlock())
* I don't think we need to worry about checking the path instance UUIDs. This 
seemed important for data because tablet data could be spread across multiple 
data directories, so opening with a subset of directories could result in 
scenarios where some blocks exist and some don't. This has gotten even better 
with data directory disk failure handling. With WALs, if we open with a subset 
of the directories, I think it might not be unreasonable to not simply fail the 
tablet if missing a WAL directory. On the other hand, I could see this being 
useful to ensure that there are no errors when inputting the gflags, so I could 
be convinced that this is important.

That means we _do_ have to worry about:
* a mapping from tablet ID to WAL directory index
* a mapping from WAL directory index to a set of tablet IDs
* registering a single WAL directory when creating a new tablet
* marking a WAL directory as failed

But I believe the implementation for these will be simpler because each tablet 
will only have a single WAL directory.

I'm curious whether you and Adar agree on this. I can be convinced that this 
approach is OK too, if there are good reasons for keeping DataDirManager and 
WALs tightly coupled like this.


http://gerrit.cloudera.org:8080/#/c/14654/3/src/kudu/tablet/metadata.proto
File src/kudu/tablet/metadata.proto:

http://gerrit.cloudera.org:8080/#/c/14654/3/src/kudu/tablet/metadata.proto@150
PS3, Line 150:   // Tablet WAL is spread across multi directories. If this is 
not set,
             :   // it is assumed that the WAL is from a version of Kudu before 
1.10.0.
             :   // In this case, WAL will be created spanning only one 
directory.
             :   optional DataDirGroupPB wal_dir_group = 19;
Do we actually want to use a group? I thought we would only want to use a 
single WAL directory per tablet, given the Log class is single-threaded per 
tablet.


http://gerrit.cloudera.org:8080/#/c/14654/3/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/14654/3/src/kudu/tserver/tablet_server-test.cc@2453
PS3, Line 2453:     const int kLimit = rand() % (kMaxLimit + 1) + 1;
Why did this change?


http://gerrit.cloudera.org:8080/#/c/14654/3/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14654/3/src/kudu/tserver/ts_tablet_manager.cc@1471
PS3, Line 1471:   // This place is not sure if there is a better way to modify 
it.
              :   
meta->fs_manager()->wd_manager()->DeleteDataDirGroup(tablet_id);
              :   RETURN_NOT_OK(meta->Flush());
Rather than including this here (and doing another Flush()), could we move the 
log deletion up to before we delete the metadata?
Then we could put this into the metadata->DeleteTabletData() call.

An edge case to think about is what happens when the WAL deletion succeeds but 
the metadata deletion fails?

Before, that would be OK because if the data deletion failed, we can assume 
that the data state is OK, and we didn't delete our WALs. If we moved this up, 
we might end up deleting our WALs but failing to delete the data. This seems 
like it might be OK though because we should be able to handle the case where 
we don't have a WAL, but we do have data/metadata -- we should just fail the 
tablet.


http://gerrit.cloudera.org:8080/#/c/14654/3/src/kudu/tserver/ts_tablet_manager.cc@1508
PS3, Line 1508: void TSTabletManager::FailTabletsInDataDir(const string& uuid) {
              :   DataDirManager* dir_manager = nullptr;
              :   int uuid_idx;
              :   if (fs_manager_->dd_manager()->FindUuidIndexByUuid(uuid, 
&uuid_idx)) {
              :     dir_manager = fs_manager_->dd_manager();
              :   } else if 
(fs_manager_->wd_manager()->FindUuidIndexByUuid(uuid, &uuid_idx)) {
              :     dir_manager = fs_manager_->wd_manager();
              :   } else {
              :     LOG(FATAL) << Substitute("No directory found with UUID $0", 
uuid);
              :   }
              :   if (dir_manager->IsDataDirFailed(uuid_idx)) {
              :     LOG(WARNING) << "Data directory is already marked failed.";
              :     return;
              :   }
              :   // Fail the directory to prevent other tablets from being 
placed in it.
              :   dir_manager->MarkDataDirFailed(uuid_idx);
              :   set<string> tablets = 
dir_manager->FindTabletsByDataDirUuidIdx(uuid_idx);
              :   LOG(INFO) << Substitute("Data dir $0 has $1 tablets", uuid, 
tablets.size());
              :   for (const string& tablet_id : 
dir_manager->FindTabletsByDataDirUuidIdx(uuid_idx)) {
              :     FailTabletAndScheduleShutdown(tablet_id);
              :   }
              : }
Rather than reusing this exactly, I'd prefer we NOT tightly couple the handling 
together. Maybe separate this out into a

 TSTabletManager::FailTabletsInWalDir(const string& uuid)

If we did that, we could keep a single FsErrorManager, right?



--
To view, visit http://gerrit.cloudera.org:8080/14654
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied496804421d91ff1fa63d49979fde971071506e
Gerrit-Change-Number: 14654
Gerrit-PatchSet: 3
Gerrit-Owner: YangSong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: YangSong <[email protected]>
Gerrit-Comment-Date: Fri, 08 Nov 2019 02:17:12 +0000
Gerrit-HasComments: Yes

Reply via email to