YangSong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14654 )

Change subject: KUDU-2975: Spread WAL across multiple directories
......................................................................


Patch Set 22:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs_manager.cc
File src/kudu/fs/fs_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs_manager.cc@84
PS22, Line 84:
             : DEFINE_string(fs_wal_dir, "",
             :               "Directory with write-ahead logs. If this is not 
specified, the "
             :               "program will not start. May be the same as 
fs_data_dirs");
             : TAG_FLAG(fs_wal_dir, stable);
             : DEFINE_string(fs_wal_dirs, "",
             :               "Directory with write-ahead logs. If this is not 
specified, the "
             :               "program will not start. May be the same as 
fs_data_dirs");
             : TAG_FLAG(fs_wal_dirs, stable);
> Can you add a group flag validator like ValidateUpdateTabletStatsInterval i
I don't know how to define this place. I try to define it as:

bool ValidateFsWalDir() {
  bool has_a = !FLAGS_fs_wal_dir.empty();
  bool has_b = !FLAGS_fs_wal_dirs.empty();

  if (has_a == has_b) {
    LOG(ERROR) << "--fs_wal_dir and --fs_wal_dirs only one can be set.";
    return false;
  }
  return true;
}
GROUP_FLAG_VALIDATOR(fs_wal_dir, ValidateFsWalDir);

but I found It's not the concept "group", they are mutually exclusive. If I 
define like this, the many tests cannot run, such as fs_manager-test.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs_manager.cc@735
PS22, Line 735: Status FsManager::GetOrphanedWalDir(const string& tablet_id, 
string* wal_dir) {
              :   if (wal_dir == nullptr) {
              :     return Status::InvalidArgument("wal_dir cannot be nullptr");
              :   }
              :   Status s = wd_manager_->FindTabletDirFromDisk(tablet_id, "", 
wal_dir);
              :   if (!s.ok()) {
              :     LOG(WARNING) << "cannot find tablet:" << tablet_id << " wal 
directory." <<
              :                  s.message().ToString();
              :   }
              :   return s;
              : }
              :
              : Status FsManager::GetOrphanedWalRecoveryDir(const string& 
tablet_id,
              :                                        string* 
wal_recovery_dir) {
              :   if (wal_recovery_dir == nullptr) {
              :     return Status::InvalidArgument("wal_recovery_dir cannot be 
nullptr");
              :   }
              :   Status s = wd_manager_->FindTabletDirFromDisk(tablet_id, 
kWalsRecoveryDirSuffix,
              :       wal_recovery_dir);
              :   if (!s.ok()) {
              :     LOG(WARNING) << "cannot find tablet:" << tablet_id << " wal 
directory." <<
              :                  s.message().ToString();
              :   }
              :   return s;
              : }
              :
> As written, I think these mean that:
Yes, because we delete metadata first then delete log when delete tablet, so 
maybe have some orphaned WAL dir. We should find this orphaned WAL dir then 
delete them at the next time it starts. When we determine if there is tablet 
WAL data, we look at the tablet's metadata first then look it up on the WAL 
dirs.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc
File src/kudu/fs/wal_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@75
PS22, Line 75: DEFINE_bool(fs_wal_dirs_consider_available_space, true,
             :             "Whether to consider available space when selecting 
a wal "
             :             "directory during tablet or wal block creation.");
             : TAG_FLAG(fs_wal_dirs_consider_available_space, runtime);
             : TAG_FLAG(fs_wal_dirs_consider_available_space, evolving);
> This isn't used, is it?
Yes.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@605
PS22, Line 605: r *wal_
> nit: move the * so it's next to the type name, i.e.
OK


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@614
PS22, Line 614:   if (other != nullptr) {
> nit: this can be written as:
OK


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@616
PS22, Line 616: "tried to load directory for tablet $0 but one is already 
registered",
              :         tablet_id));
> nit: maybe also log the existing directory?
OK


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@664
PS22, Line 664: postfix
> nit: we usually call this a "suffix"
OK


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@675
PS22, Line 675:     if (!postfix.empty()) {
              :       tablet_path.append(postfix);
              :     }
> We don't need to condition on postfix.empty(). If it's empty, tablet_path.a
OK


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@710
PS22, Line 710:     if (tablets_by_uuid.second.size() < min_tablets_num) {
              :       min_tablets_num = tablets_by_uuid.second.size();
              :       target_uuid = tablets_by_uuid.first;
              :     }
> The fact that we're always using the least-populated directory is different
Yes, The current algorithm is relatively simple, just make sure that each WAL 
directory has the same number of tablets.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@808
PS22, Line 808:
              : bool WalDirManager::IsWalDirFailed(const string& uuid) const {
              :   shared_lock<rw_spinlock> lock(dir_lock_.get_lock());
              :   return ContainsKey(failed_wal_dirs_, uuid);
              : }
> This isn't used, right? Same with MarkWalDirFailedByUuid
yes.MarkWalDirFailedByUuid is not used right now, may be used in the future in 
case of disk failure.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata-test.cc
File src/kudu/tablet/tablet_metadata-test.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata-test.cc@47
PS22, Line 47: class TestTabletMetadata : public KuduTabletTest {
             :  public:
             :   TestTabletMetadata()
             :       : KuduTabletTest(GetSimpleTestSchema()) {
             :   }
             :
             :   virtual void SetUp() OVERRIDE {
             :     KuduTabletTest::SetUp();
             :     writer_.reset(new LocalTabletWriter(harness_->tablet().get(),
             :                                         &client_schema_));
             :   }
             :
             :   void BuildPartialRow(int key, int intval, const char* strval,
             :                        gscoped_ptr<KuduPartialRow>* row);
             :
             :  protected:
             :   gscoped_ptr<LocalTabletWriter> writer_;
             : };
> TODO(awong): can we also add a test that, if we load a superblock that _doe
OK


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata.cc@463
PS22, Line 463:       // An error loading the WAL dir is non-fatal, it just 
means the
              :       // tablet will fail to bootstrap later.
> TODO(awong): make sure this is true
I will add some test.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/kudu-tool-test.cc@5219
PS22, Line 5219:     // master use "fs_wal_dir", tserver use "fs_wal_dirs".
> Is there a good reason to do this here? I know that in a real deployment, w
In the ExternalMiniCluster, master keep using "fs_wal_dir", because master has 
only one tablet, using "fs_wal_dirs" is pointless. And tserver change to use 
"fs_wal_dirs". I will change the ExternalMiniCluster to always use fs_wal_dirs.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/tool_action_fs.cc
File src/kudu/tools/tool_action_fs.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/tool_action_fs.cc@342
PS22, Line 342:       // data dir group for the tablet, and the staged 
directory config won't
              :       // affect this tablet.
              :       DCHECK(s1.IsNotFound() || s2.IsNotFound()) << 
s1.ToString();
              :       continue;
              :     }
              :     RETURN_NOT_OK_PREPEND(s1, "at least one tablet is 
configured to use removed data directory. "
              :         "Retry with --force to override this");
              :     RETURN_NOT_OK_PREPEND(s2, "at least one tablet is 
configured to use removed WAL directory. "
              :         "Retry with --force to override this");
              :   }
> TODO(awong): is this tested
I tested it by hand. I will add some test code.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/tool_action_local_replica.cc
File src/kudu/tools/tool_action_local_replica.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/tool_action_local_replica.cc@367
PS22, Line 367:   // For new version, need load metadata, find directory from 
metadata. But we just find from
              :   // wal directory first.
              :   
RETURN_NOT_OK(fs_manager.wd_manager()->FindAndRegisterWalDir(tablet_id));
> Doesn't this already happen in TabletMetadata::Load()?
Yes.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/tool_action_local_replica.cc@524
PS22, Line 524:   // For new version, need load metadata, find directory from 
metadata.
              :   // But we just find the wal dir for tablet from disk first.
              :   
RETURN_NOT_OK(fs_manager->wd_manager()->FindAndRegisterWalDir(tablet_id));
> Should we call TabletMetadata::Load() instead?
Yes.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/mini_tablet_server-test.cc
File src/kudu/tserver/mini_tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/mini_tablet_server-test.cc@41
PS22, Line 41:
             : TEST_F(MiniTabletServerTest, TestMultiDataDirServer) {
             :   // Specifying the number of data directories will create 
subdirectories under the test root.
             :   unique_ptr<MiniTabletServer> mini_server;
             :   FsManager* fs_manager;
             :
             :   int kNumDataDirs = 3;
             :   mini_server.reset(new MiniTabletServer(GetTestPath("TServer"),
             :       HostPort("127.0.0.1", 0), kNumDataDirs));
             :   ASSERT_OK(mini_server->Start());
             :   fs_manager = mini_server->server()->fs_manager();
             :   ASSERT_EQ(1, fs_manager->GetWalRootDirs().size());
             :   ASSERT_STR_CONTAINS(DirName(fs_manager->GetWalRootDirs()[0]), 
"wal");
             :   ASSERT_EQ(kNumDataDirs, fs_manager->GetDataRootDirs().size());
             : }
             :
             : TEST_F(MiniTabletServerTest, TestMultiWalDirServer) {
             :   // Specifying the number of data directories will create 
subdirectories under the test root.
             :   unique_ptr<MiniTabletServer> mini_server;
             :   FsManager* fs_manager;
             :
             :   int kNumWalDirs = 3;
             :   mini_server.reset(new MiniTabletServer(GetTestPath("TServer"),
             :       HostPort("127.0.0.1", 0), 1, kNumWalDirs));
             :   ASSERT_OK(mini_server->Start());
             :   fs_manager = mini_server->server()->fs_manager();
             :   ASSERT_EQ(1, fs_manager->GetDataRootDirs().size());
             :   ASSERT_STR_CONTAINS(DirName(fs_manager->GetDataRootDirs()[0]), 
"data");
             :   ASSERT_EQ(kNumWalDirs, fs_manager->GetWalRootDirs().size());
             : }
> nit: maybe combine these tests so we have a single test that has multiple W
OK


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/mini_tablet_server.cc
File src/kudu/tserver/mini_tablet_server.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/mini_tablet_server.cc@76
PS22, Line 76:   // This place needs to be consistent with the original logic.
             :   // Otherwise, some test cases will not pass.
             :   // If there just one wal dir and data dir, we just use fs_root.
> nit: Thanks for the note! This comment should be removed before we merge th
OK


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_copy_client-test.cc
File src/kudu/tserver/tablet_copy_client-test.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_copy_client-test.cc@507
PS22, Line 507:   fs_manager_->wd_manager()->CreateWalDir(GetTabletId());
> Why do we need this? Shouldn't this have already happened when we setup the
We need prepare a WAL dir for tablet. when we setup the test, we just init the 
fs_manager. this is done in StartCopy(). This case has not used "StartCopy()", 
so we need do it here.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_server-stress-test.cc
File src/kudu/tserver/tablet_server-stress-test.cc:

PS22:
> Instead of copying the existing test, how about making this a parameterized
OK


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_server-test.cc@206
PS22, Line 206: /*num_data_dirs=*/1, /*num_wal_dirs=*/3
> Why not leave it using the default of 1 data directory and 1 WAL directory?
Here I just want to test multiple WAL dirs. I can change back if it's not 
necessary.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_server-test.cc@3735
PS22, Line 3735:   string tablet_meta_path = JoinPathSegments(JoinPathSegments(
               :       GetTestPath("TabletServerTest-fsroot"), "wal-0"), 
"tablet-meta");
> I'm kind of surprised by this. Why did the metadata directory change?
Because there are 3 WAL directories, in function 
MiniTabletServer::MiniTabletServer(), we named the wal directory name as 
"wal-0" "wal-1" "wal-2", in fs_manager, the metadata directory use the first 
wal directory, so here the metadata directory name change to "wal-0". The 
original code had only one wal directory and one data directory, so it use 
"TabletServerTest-fsroot" as the wal and metadata directory.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/ts_tablet_manager.cc@1460
PS22, Line 1460:   meta->fs_manager()->GetTabletWalDir(tablet_id, &wal_dir);
               :   meta->fs_manager()->GetTabletWalRecoveryDir(tablet_id, 
&wal_recovery_dir);
> Should we WARN_NOT_OK here? Or maybe we should check to look if the status
yes, we should WARN_NOT_OK. there is a case, if metadata deletion succeed but 
log deletion failed, we cannot get the tablet's dir, but 
Log::DeleteOnDiskData() also can delete it by finding the tablet's dir from 
disk. So I think here just need a warning.



--
To view, visit http://gerrit.cloudera.org:8080/14654
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied496804421d91ff1fa63d49979fde971071506e
Gerrit-Change-Number: 14654
Gerrit-PatchSet: 22
Gerrit-Owner: YangSong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: YangSong <[email protected]>
Gerrit-Comment-Date: Thu, 12 Dec 2019 03:20:45 +0000
Gerrit-HasComments: Yes

Reply via email to