YangSong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14920 )

Change subject: KUDU-2975: Spread WAL across multiple directories
......................................................................


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/fs/wal_dirs.h
File src/kudu/fs/wal_dirs.h:

http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/fs/wal_dirs.h@101
PS5, Line 101:   // Deserializes a WalDirPB and associates the resulting 
WalDirPB
             :   // with a tablet_id.
             :   //
             :   // Returns an error if the tablet already exists or if a WAL 
dir is missing.
             :   Status LoadWalDirFromPB(const std::string& tablet_id,
             :                           const WalDirPB& pb);
             :
             :   // Serializes the WalDirPB associated with the given tablet_id.
             :   //
             :   // Returns an error if the tablet was not already registered 
or if a WAL dir
             :   // is missing.
             :   Status GetWalDirPB(const std::string& tablet_id, WalDirPB* pb) 
const;
> Could these be private?
These functions are used in tablet_metadata.cc, like 
DataDirManager::LoadDataDirGroupFromPB and
DataDirManager::GetDataDirGroupPB. They need be public.


http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/fs/wal_dirs.h@148
PS5, Line 148: FindWalDirByTabletId
> nit: could you rename this "FindWalDirFromDiskByTabletId()" or something si
This function is used to look for WAL dir by tablet id in memory, not looking 
on disk.


http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/fs/wal_dirs.cc
File src/kudu/fs/wal_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/fs/wal_dirs.cc@268
PS5, Line 268: for (const auto& wd : dirs_) {
             :     string tablet_path = JoinPathSegments(wd->dir(), tablet_id);
             :     tablet_path.append(suffix);
             :     if (env_->FileExists(tablet_path)) {
             :       *wal_dir = tablet_path;
             :       return Status::OK();
             :     }
             :   }
> Shouldn't this return an error if there is more than one directory that has
This function is used to find the tablet WAL directory on disk when we delete 
the tablet and we cannot find the WAL directory in memory. If there is more 
than one directory on disk, maybe we should return all of them, then delete 
them.


http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/fs/wal_dirs.cc@318
PS5, Line 318: target_uuid_idx
> nit: maybe use our maps to determine the path?
Do you mean refer to the implementation of data dir? determine the path like 
"DataDirManager::CreateDataDirGroup()"?
The way I understand is to select the non-full directories first, then sort 
them randomly, compare the first two directories, pick one with fewer tablets. 
If them have the same number of tables, pick the one with the more free space.


http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/tserver/ts_tablet_manager.cc@1465
PS5, Line 1465:   string wal_dir;
              :   string wal_recovery_dir;
              :   WARN_NOT_OK(meta->fs_manager()->GetTabletWalDir(tablet_id, 
&wal_dir),
              :               Substitute("Could not find WAL directory for 
tablet $0", tablet_id));
              :   WARN_NOT_OK(meta->fs_manager()->GetTabletWalRecoveryDir(
              :       tablet_id, &wal_recovery_dir), Substitute("Could not find 
WAL recovery directory"
              :       "for tablet $0", tablet_id));
> Why only WARN here? If a failure of the WAL directory is considered fatal b
The WAL directory may not exist, there is a case, if metadata deletion succeed 
but log deletion failed, we cannot get the tablet's dir, but 
Log::DeleteOnDiskData() also can delete it by finding the tablet's dir from 
disk. So I think here just need a warning.


http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/util/env_util.cc
File src/kudu/util/env_util.cc:

http://gerrit.cloudera.org:8080/#/c/14920/5/src/kudu/util/env_util.cc@373
PS5, Line 373:     Status s = env->IsDirectory(JoinPathSegments(path, *iter), 
&is_dir);
> Why not RETURN_NOT_OK() here? Why should we be permissive of errors?
This function is used to statistics tablets under "wal" directory, here we 
should use RETURN_NOT_OK(), we cannot  return partial tablets.



--
To view, visit http://gerrit.cloudera.org:8080/14920
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3490b2bbc0544e7c8a5f987fb83855ff155ac2c
Gerrit-Change-Number: 14920
Gerrit-PatchSet: 5
Gerrit-Owner: YangSong <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: YangSong <[email protected]>
Gerrit-Comment-Date: Wed, 08 Jan 2020 07:11:31 +0000
Gerrit-HasComments: Yes

Reply via email to