YangSong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14920 )
Change subject: KUDU-2975: Spread WAL across multiple directories ...................................................................... Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/fs/wal_dirs.h File src/kudu/fs/wal_dirs.h: http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/fs/wal_dirs.h@101 PS5, Line 101: // Deserializes a WalDirPB and associates the resulting WalDirPB : // with a tablet_id. : // : // Returns an error if the tablet already exists or if a WAL dir is missing. : Status LoadWalDirFromPB(const std::string& tablet_id, : const WalDirPB& pb); : : // Serializes the WalDirPB associated with the given tablet_id. : // : // Returns an error if the tablet was not already registered or if a WAL dir : // is missing. : Status GetWalDirPB(const std::string& tablet_id, WalDirPB* pb) const; > Could these be private? These functions are used in tablet_metadata.cc, like DataDirManager::LoadDataDirGroupFromPB and DataDirManager::GetDataDirGroupPB. They need be public. http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/fs/wal_dirs.h@148 PS5, Line 148: FindWalDirByTabletId > nit: could you rename this "FindWalDirFromDiskByTabletId()" or something si This function is used to look for WAL dir by tablet id in memory, not looking on disk. http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/fs/wal_dirs.cc File src/kudu/fs/wal_dirs.cc: http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/fs/wal_dirs.cc@268 PS5, Line 268: for (const auto& wd : dirs_) { : string tablet_path = JoinPathSegments(wd->dir(), tablet_id); : tablet_path.append(suffix); : if (env_->FileExists(tablet_path)) { : *wal_dir = tablet_path; : return Status::OK(); : } : } > Shouldn't this return an error if there is more than one directory that has This function is used to find the tablet WAL directory on disk when we delete the tablet and we cannot find the WAL directory in memory. If there is more than one directory on disk, maybe we should return all of them, then delete them. http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/fs/wal_dirs.cc@318 PS5, Line 318: target_uuid_idx > nit: maybe use our maps to determine the path? Do you mean refer to the implementation of data dir? determine the path like "DataDirManager::CreateDataDirGroup()"? The way I understand is to select the non-full directories first, then sort them randomly, compare the first two directories, pick one with fewer tablets. If them have the same number of tables, pick the one with the more free space. http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/tserver/ts_tablet_manager.cc@1465 PS5, Line 1465: string wal_dir; : string wal_recovery_dir; : WARN_NOT_OK(meta->fs_manager()->GetTabletWalDir(tablet_id, &wal_dir), : Substitute("Could not find WAL directory for tablet $0", tablet_id)); : WARN_NOT_OK(meta->fs_manager()->GetTabletWalRecoveryDir( : tablet_id, &wal_recovery_dir), Substitute("Could not find WAL recovery directory" : "for tablet $0", tablet_id)); > Why only WARN here? If a failure of the WAL directory is considered fatal b The WAL directory may not exist, there is a case, if metadata deletion succeed but log deletion failed, we cannot get the tablet's dir, but Log::DeleteOnDiskData() also can delete it by finding the tablet's dir from disk. So I think here just need a warning. http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/util/env_util.cc File src/kudu/util/env_util.cc: http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/util/env_util.cc@373 PS5, Line 373: Status s = env->IsDirectory(JoinPathSegments(path, *iter), &is_dir); > Why not RETURN_NOT_OK() here? Why should we be permissive of errors? This function is used to statistics tablets under "wal" directory, here we should use RETURN_NOT_OK(), we cannot return partial tablets. -- To view, visit http://gerrit.cloudera.org:8080/14920 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3490b2bbc0544e7c8a5f987fb83855ff155ac2c Gerrit-Change-Number: 14920 Gerrit-PatchSet: 5 Gerrit-Owner: YangSong <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: YangSong <[email protected]> Gerrit-Comment-Date: Wed, 08 Jan 2020 07:11:31 +0000 Gerrit-HasComments: Yes
