Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14654 )

Change subject: KUDU-2975: Spread WAL across multiple directories
......................................................................


Patch Set 22:

(30 comments)

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs.proto
File src/kudu/fs/fs.proto:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs.proto@121
PS22, Line 121: // Tablet log is spread across a specified WAL directorie. This 
PB is represented
              : // by the UUID of the WAL directorie it consists of.
nit: maybe to make it clearer that a single tablet will have a single WAL 
directory, reword this as:

"A tablet's WAL exists in a single WAL directory. This directory is represented 
by the UUID."


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs_manager.h
File src/kudu/fs/fs_manager.h:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs_manager.h@218
PS22, Line 218:
              :   Status GetOrphanedWalDir(const std::string& tablet_id, 
std::string* wal_dir);
              :
              :   Status GetOrphanedWalRecoveryDir(const std::string& tablet_id,
              :                                    std::string* 
wal_recovery_dir);
The idea of "orphaned" WAL directories is new in this patch. Could you comment 
what these mean? When would we expect there to be an orphaned WAL directory? Or 
an orphaned recovery directory?


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs_manager.cc
File src/kudu/fs/fs_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs_manager.cc@84
PS22, Line 84:
             : DEFINE_string(fs_wal_dir, "",
             :               "Directory with write-ahead logs. If this is not 
specified, the "
             :               "program will not start. May be the same as 
fs_data_dirs");
             : TAG_FLAG(fs_wal_dir, stable);
             : DEFINE_string(fs_wal_dirs, "",
             :               "Directory with write-ahead logs. If this is not 
specified, the "
             :               "program will not start. May be the same as 
fs_data_dirs");
             : TAG_FLAG(fs_wal_dirs, stable);
Can you add a group flag validator like ValidateUpdateTabletStatsInterval in 
ts_tablet_manager.cc  that verifies that exactly one of these is empty? That 
should simplify some of the behavior below.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs_manager.cc@735
PS22, Line 735: Status FsManager::GetOrphanedWalDir(const string& tablet_id, 
string* wal_dir) {
              :   if (wal_dir == nullptr) {
              :     return Status::InvalidArgument("wal_dir cannot be nullptr");
              :   }
              :   Status s = wd_manager_->FindTabletDirFromDisk(tablet_id, "", 
wal_dir);
              :   if (!s.ok()) {
              :     LOG(WARNING) << "cannot find tablet:" << tablet_id << " wal 
directory." <<
              :                  s.message().ToString();
              :   }
              :   return s;
              : }
              :
              : Status FsManager::GetOrphanedWalRecoveryDir(const string& 
tablet_id,
              :                                        string* 
wal_recovery_dir) {
              :   if (wal_recovery_dir == nullptr) {
              :     return Status::InvalidArgument("wal_recovery_dir cannot be 
nullptr");
              :   }
              :   Status s = wd_manager_->FindTabletDirFromDisk(tablet_id, 
kWalsRecoveryDirSuffix,
              :       wal_recovery_dir);
              :   if (!s.ok()) {
              :     LOG(WARNING) << "cannot find tablet:" << tablet_id << " wal 
directory." <<
              :                  s.message().ToString();
              :   }
              :   return s;
              : }
              :
As written, I think these mean that:
- a WAL directory hasn't already been registered with the WAL directory manager
- we'll search for a WAL directory from disk.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.h
File src/kudu/fs/wal_dirs.h:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.h@210
PS22, Line 210:   // Results in an error if the tablet has not a WAL dir 
associated with
              :   // it. If returning with an error, the
              :   // WalDirManager will be unchanged.
nit: reword and add a bit extra:

"Looks into each WAL directory for an existing WAL for 'tablet_id'. Results in 
an error if no such WAL exists, or if there are multiple associated with the 
given tablet ID."


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.h@353
PS22, Line 353: dir_by_tablet_map_
nit: can you rename this so it's clear whether this is referring to the UUID or 
the directory name? Like 'dir_name_by_tablet_' or 'uuid_by_tablet_'


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc
File src/kudu/fs/wal_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@75
PS22, Line 75: DEFINE_bool(fs_wal_dirs_consider_available_space, true,
             :             "Whether to consider available space when selecting 
a wal "
             :             "directory during tablet or wal block creation.");
             : TAG_FLAG(fs_wal_dirs_consider_available_space, runtime);
             : TAG_FLAG(fs_wal_dirs_consider_available_space, evolving);
This isn't used, is it?


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@605
PS22, Line 605: r *wal_
nit: move the * so it's next to the type name, i.e.

 const WalDir* wal_dir = ...


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@614
PS22, Line 614:   if (other != nullptr) {
nit: this can be written as:

 if (other) {
 ...


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@616
PS22, Line 616: "tried to load directory for tablet $0 but one is already 
registered",
              :         tablet_id));
nit: maybe also log the existing directory?


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@664
PS22, Line 664: postfix
nit: we usually call this a "suffix"


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@675
PS22, Line 675:     if (!postfix.empty()) {
              :       tablet_path.append(postfix);
              :     }
We don't need to condition on postfix.empty(). If it's empty, 
tablet_path.append(postfix) will be a no-op.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@710
PS22, Line 710:     if (tablets_by_uuid.second.size() < min_tablets_num) {
              :       min_tablets_num = tablets_by_uuid.second.size();
              :       target_uuid = tablets_by_uuid.first;
              :     }
The fact that we're always using the least-populated directory is different 
than what we do for data directories. Is that intentional? Why did you choose 
this heuristic instead of the PO2C algorithm?


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@808
PS22, Line 808:
              : bool WalDirManager::IsWalDirFailed(const string& uuid) const {
              :   shared_lock<rw_spinlock> lock(dir_lock_.get_lock());
              :   return ContainsKey(failed_wal_dirs_, uuid);
              : }
This isn't used, right? Same with MarkWalDirFailedByUuid


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata-test.cc
File src/kudu/tablet/tablet_metadata-test.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata-test.cc@47
PS22, Line 47: class TestTabletMetadata : public KuduTabletTest {
             :  public:
             :   TestTabletMetadata()
             :       : KuduTabletTest(GetSimpleTestSchema()) {
             :   }
             :
             :   virtual void SetUp() OVERRIDE {
             :     KuduTabletTest::SetUp();
             :     writer_.reset(new LocalTabletWriter(harness_->tablet().get(),
             :                                         &client_schema_));
             :   }
             :
             :   void BuildPartialRow(int key, int intval, const char* strval,
             :                        gscoped_ptr<KuduPartialRow>* row);
             :
             :  protected:
             :   gscoped_ptr<LocalTabletWriter> writer_;
             : };
TODO(awong): can we also add a test that, if we load a superblock that 
_doesn't_ have a WAL UUID, then we'll assign one based on what we find on disk? 
And if that fails, then we'll get an error.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata.cc@463
PS22, Line 463:       // An error loading the WAL dir is non-fatal, it just 
means the
              :       // tablet will fail to bootstrap later.
TODO(awong): make sure this is true


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata.cc@470
PS22, Line 470: 1.10.0
before 1.11, since this new code would land in 1.12


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata.cc@477
PS22, Line 477:       // Here cannot flush directly. May causes a deadlock in 
the tool code.
              :       // If we switch 'fs_wal_dir' to 'fs_wal_dirs', we should 
try to find the directory
              :       // where tablet's WAL is located.
              :       // If we found, we should record WAL directory UUID into 
metadata, and flush metadata.
              :       // otherwise we do nothing here. At bootstrap we will 
check the tablet's WAL directory.
              :       // And it will failed, this tablet will be marked failed.
              :       //RETURN_NOT_OK(Flush());
Do we need this anymore?


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata.cc@741
PS22, Line 741: DataDirGroupPB
nit: extend this, like:

"Serialize the tablet's data directory group and WAL directory, if they exist. 
They may.."


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/kudu-tool-test.cc@5219
PS22, Line 5219:     // master use "fs_wal_dir", tserver use "fs_wal_dirs".
Is there a good reason to do this here? I know that in a real deployment, we'd 
probably use a single WAL directory. But this test doesn't really have much to 
do with the fs_wal_dir flags specifically; --fs_wal_dir and --fs_wal_dirs 
aren't special when it comes to gflags. Could we just leave it as is? Or could 
we change the ExternalMiniCluster to always use fs_wal_dirs, even if the number 
of WAL directories is 1?


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/tool_action_fs.cc
File src/kudu/tools/tool_action_fs.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/tool_action_fs.cc@342
PS22, Line 342:       // data dir group for the tablet, and the staged 
directory config won't
              :       // affect this tablet.
              :       DCHECK(s1.IsNotFound() || s2.IsNotFound()) << 
s1.ToString();
              :       continue;
              :     }
              :     RETURN_NOT_OK_PREPEND(s1, "at least one tablet is 
configured to use removed data directory. "
              :         "Retry with --force to override this");
              :     RETURN_NOT_OK_PREPEND(s2, "at least one tablet is 
configured to use removed WAL directory. "
              :         "Retry with --force to override this");
              :   }
TODO(awong): is this tested


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/tool_action_local_replica.cc
File src/kudu/tools/tool_action_local_replica.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/tool_action_local_replica.cc@367
PS22, Line 367:   // For new version, need load metadata, find directory from 
metadata. But we just find from
              :   // wal directory first.
              :   
RETURN_NOT_OK(fs_manager.wd_manager()->FindAndRegisterWalDir(tablet_id));
Doesn't this already happen in TabletMetadata::Load()?


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/tool_action_local_replica.cc@524
PS22, Line 524:   // For new version, need load metadata, find directory from 
metadata.
              :   // But we just find the wal dir for tablet from disk first.
              :   
RETURN_NOT_OK(fs_manager->wd_manager()->FindAndRegisterWalDir(tablet_id));
Should we call TabletMetadata::Load() instead?


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/mini_tablet_server-test.cc
File src/kudu/tserver/mini_tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/mini_tablet_server-test.cc@41
PS22, Line 41:
             : TEST_F(MiniTabletServerTest, TestMultiDataDirServer) {
             :   // Specifying the number of data directories will create 
subdirectories under the test root.
             :   unique_ptr<MiniTabletServer> mini_server;
             :   FsManager* fs_manager;
             :
             :   int kNumDataDirs = 3;
             :   mini_server.reset(new MiniTabletServer(GetTestPath("TServer"),
             :       HostPort("127.0.0.1", 0), kNumDataDirs));
             :   ASSERT_OK(mini_server->Start());
             :   fs_manager = mini_server->server()->fs_manager();
             :   ASSERT_EQ(1, fs_manager->GetWalRootDirs().size());
             :   ASSERT_STR_CONTAINS(DirName(fs_manager->GetWalRootDirs()[0]), 
"wal");
             :   ASSERT_EQ(kNumDataDirs, fs_manager->GetDataRootDirs().size());
             : }
             :
             : TEST_F(MiniTabletServerTest, TestMultiWalDirServer) {
             :   // Specifying the number of data directories will create 
subdirectories under the test root.
             :   unique_ptr<MiniTabletServer> mini_server;
             :   FsManager* fs_manager;
             :
             :   int kNumWalDirs = 3;
             :   mini_server.reset(new MiniTabletServer(GetTestPath("TServer"),
             :       HostPort("127.0.0.1", 0), 1, kNumWalDirs));
             :   ASSERT_OK(mini_server->Start());
             :   fs_manager = mini_server->server()->fs_manager();
             :   ASSERT_EQ(1, fs_manager->GetDataRootDirs().size());
             :   ASSERT_STR_CONTAINS(DirName(fs_manager->GetDataRootDirs()[0]), 
"data");
             :   ASSERT_EQ(kNumWalDirs, fs_manager->GetWalRootDirs().size());
             : }
nit: maybe combine these tests so we have a single test that has multiple WAL 
directories and multiple data directories?


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/mini_tablet_server.cc
File src/kudu/tserver/mini_tablet_server.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/mini_tablet_server.cc@76
PS22, Line 76:   // This place needs to be consistent with the original logic.
             :   // Otherwise, some test cases will not pass.
             :   // If there just one wal dir and data dir, we just use fs_root.
nit: Thanks for the note! This comment should be removed before we merge this.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_copy_client-test.cc
File src/kudu/tserver/tablet_copy_client-test.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_copy_client-test.cc@507
PS22, Line 507:   fs_manager_->wd_manager()->CreateWalDir(GetTabletId());
Why do we need this? Shouldn't this have already happened when we setup the 
test?


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_server-stress-test.cc
File src/kudu/tserver/tablet_server-stress-test.cc:

PS22:
Instead of copying the existing test, how about making this a parameterized 
test, parameterized by the number of directories to use.

Search around for "TEST_P" and "INSTANTIATE_TEST_CASE_P" for examples of 
parameterized tests.


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_server-test.cc@206
PS22, Line 206: /*num_data_dirs=*/1, /*num_wal_dirs=*/3
Why not leave it using the default of 1 data directory and 1 WAL directory?


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_server-test.cc@3735
PS22, Line 3735:   string tablet_meta_path = JoinPathSegments(JoinPathSegments(
               :       GetTestPath("TabletServerTest-fsroot"), "wal-0"), 
"tablet-meta");
I'm kind of surprised by this. Why did the metadata directory change?


http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/ts_tablet_manager.cc@1460
PS22, Line 1460:   meta->fs_manager()->GetTabletWalDir(tablet_id, &wal_dir);
               :   meta->fs_manager()->GetTabletWalRecoveryDir(tablet_id, 
&wal_recovery_dir);
Should we WARN_NOT_OK here? Or maybe we should check to look if the status is 
an error and LOG(FATAL) if so?



--
To view, visit http://gerrit.cloudera.org:8080/14654
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied496804421d91ff1fa63d49979fde971071506e
Gerrit-Change-Number: 14654
Gerrit-PatchSet: 22
Gerrit-Owner: YangSong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: YangSong <[email protected]>
Gerrit-Comment-Date: Wed, 11 Dec 2019 01:10:59 +0000
Gerrit-HasComments: Yes

Reply via email to