YangSong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14654 )
Change subject: [KUDU-2975]Spread WAL across multiple directories ...................................................................... Patch Set 3: (2 comments) This modification is general. So I haven't written test code yet. And there are still these problems. 1. during switch "fs_wal_dir" to "fs_wal_dirs", not compatible. 2. WAL disk failure not support. 3. Tool about wal not modify. 4. Lack tests. I agree that the implementation for WAL manager is much simpler than data manager. They do almost same things in initialization, make dir and create 'block_manager_instance'. This is their biggest commonality. And I find some thing wrong in DataDirManager::Create(). Having a separate class for WAL manager is my original idea. But now I'm not sure which method is better. http://gerrit.cloudera.org:8080/#/c/14654/3/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: http://gerrit.cloudera.org:8080/#/c/14654/3/src/kudu/tserver/tablet_server-test.cc@2453 PS3, Line 2453: const int kLimit = rand() % (kMaxLimit + 1) + 1; > Why did this change? Here 'kMaxLimit' may be 0, then an exception reported like "Floating point exception". http://gerrit.cloudera.org:8080/#/c/14654/3/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/14654/3/src/kudu/tserver/ts_tablet_manager.cc@1508 PS3, Line 1508: void TSTabletManager::FailTabletsInDataDir(const string& uuid) { : DataDirManager* dir_manager = nullptr; : int uuid_idx; : if (fs_manager_->dd_manager()->FindUuidIndexByUuid(uuid, &uuid_idx)) { : dir_manager = fs_manager_->dd_manager(); : } else if (fs_manager_->wd_manager()->FindUuidIndexByUuid(uuid, &uuid_idx)) { : dir_manager = fs_manager_->wd_manager(); : } else { : LOG(FATAL) << Substitute("No directory found with UUID $0", uuid); : } : if (dir_manager->IsDataDirFailed(uuid_idx)) { : LOG(WARNING) << "Data directory is already marked failed."; : return; : } : // Fail the directory to prevent other tablets from being placed in it. : dir_manager->MarkDataDirFailed(uuid_idx); : set<string> tablets = dir_manager->FindTabletsByDataDirUuidIdx(uuid_idx); : LOG(INFO) << Substitute("Data dir $0 has $1 tablets", uuid, tablets.size()); : for (const string& tablet_id : dir_manager->FindTabletsByDataDirUuidIdx(uuid_idx)) { : FailTabletAndScheduleShutdown(tablet_id); : } : } > Rather than reusing this exactly, I'd prefer we NOT tightly couple the hand Here I still don't understand how to keep a single FsErrorManager. In the initialization we have two FsErrorManager for disk error, the error handle is MarkDataDirFailedByUuid. After starting we have just one FsErrorManager, the error handle is FailTabletsInDataDir. Here uuid maybe in two managers, WAL manager and data manager. -- To view, visit http://gerrit.cloudera.org:8080/14654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ied496804421d91ff1fa63d49979fde971071506e Gerrit-Change-Number: 14654 Gerrit-PatchSet: 3 Gerrit-Owner: YangSong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: YangSong <[email protected]> Gerrit-Comment-Date: Fri, 08 Nov 2019 03:54:51 +0000 Gerrit-HasComments: Yes
