David Ribeiro Alves has posted comments on this change.

Change subject: disk failure: make DataDirManager failure-aware
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/7028/4/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

PS4, Line 480: LOG(WARNING) << Substitute("DataDirGroup for tablet $0 is 
smaller than targeted. "
             :                                  "Target size: $1, Actual size: 
$2", tablet_id,
             :                                  
FLAGS_fs_target_data_dirs_per_tablet, group_indices.size());
can you add more info? like total num data dirs and total num failed data dirs?


PS4, Line 509: if (PREDICT_FALSE(!failed_data_dirs_.empty())) {
             :       for (uint16_t uuid_index : group->uuid_indices()) {
             :         if (!ContainsKey(failed_data_dirs_, uuid_index)) {
             :           valid_uuid_indices.emplace_back(uuid_index);
             :         }
             :       }
             :       if (valid_uuid_indices.empty()) {
             :         return Status::IOError("No healthy directories exist in 
tablet's "
             :                                "DataDirGroup", opts.tablet_id, 
ENODEV);
             :       }
             :       group_uuid_indices = &valid_uuid_indices;
how about creating a helper method to filter failed dirs from a group of 
uuid_indices , can potentially use that in other places


PS4, Line 503: // Get the data dir group for the tablet.
             :     DataDirGroup* group = FindOrNull(group_by_tablet_map_, 
opts.tablet_id);
             :     if (group == nullptr) {
             :       return Status::NotFound("Tried to get directory but no 
DataDirGroup "
             :                               "registered for tablet", 
opts.tablet_id);
             :     }
             :     if (PREDICT_FALSE(!failed_data_dirs_.empty())) {
             :       for (uint16_t uuid_index : group->uuid_indices()) {
             :         if (!ContainsKey(failed_data_dirs_, uuid_index)) {
             :           valid_uuid_indices.emplace_back(uuid_index);
             :         }
             :       }
             :       if (valid_uuid_indices.empty()) {
             :         return Status::IOError("No healthy directories exist in 
tablet's "
             :                                "DataDirGroup", opts.tablet_id, 
ENODEV);
             :       }
             :       group_uuid_indices = &valid_uuid_indices;
             :     } else {
             :       group_uuid_indices = &group->uuid_indices();
             :     }
not from this patch but can you revert the if to put the non-test logic first?


PS4, Line 573: // TODO(awong): disk fullness and, to an extent, disk failure, 
are
             :     // transient states. If a disk is in either state at the 
time of group
             :     // creation, the resulting group may be below targeted size. 
Add
             :     // functionality to resize groups.
the whole management of data dir lifecycles (retiring failed data dirs or 
fixing them, adding new ones etc) likely merits a jira and not just a comment


http://gerrit.cloudera.org:8080/#/c/7028/4/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

PS4, Line 258: const std::vector<std::unique_ptr<DataDir>>& data_dirs() const {
             :     return data_dirs_;
             :   }
can you add docs stating whether this returns healthy/failed/all data dirs?


PS4, Line 264: std::set<std::string> FindTabletsByDataDirUuidIdx(uint16_t 
uuid_idx);
maybe refactor this to return a status and pass the set by pointer as an out 
param? if you plan to ever allow an admin to (hot) remove a data dir, might be 
useful to be able to return something like NotFound here is the uuid_idx 
doesn't exist. If this is not a concern, or at least not something that we'll 
want to do online, never mind


-- 
To view, visit http://gerrit.cloudera.org:8080/7028
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iee212793152de5de5198751d649ab34fb97f6aa2
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to