Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10340 )

Change subject: WIP KUDU-2359: allow startup with missing data dirs
......................................................................


Patch Set 2:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/10340/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10340/2//COMMIT_MSG@12
PS2, Line 12: reading instance and block
            : manager instance files
Nit: "reading fs and block manager instance files".


http://gerrit.cloudera.org:8080/#/c/10340/2//COMMIT_MSG@21
PS2, Line 21: requests
Nit: requested


http://gerrit.cloudera.org:8080/#/c/10340/2//COMMIT_MSG@35
PS2, Line 35: A missing data
            : directory in this case is one that has no instance or block 
manager
            : instance file
Would it be useful to broaden the scope to include missing data_root 
directories or data_dir subdirectories as well?


http://gerrit.cloudera.org:8080/#/c/10340/2//COMMIT_MSG@54
PS2, Line 54:   "unhealthy", the same way they treat files that fail due disk 
errors
Nit: due to


http://gerrit.cloudera.org:8080/#/c/10340/2//COMMIT_MSG@58
PS2, Line 58: missing directories as
            :   an unhealthy instance used to spawn a failed DataDir in memory
Nit: singular/plural confusion here


http://gerrit.cloudera.org:8080/#/c/10340/2//COMMIT_MSG@62
PS2, Line 62:   NotFound, indicating Kudu should attempt to create a new FS 
layout
I guess what this also means is that semantically, there's now no difference 
between "brand new node" and "node whose disks all failed because instance 
files could not be found". Was that important before?


http://gerrit.cloudera.org:8080/#/c/10340/2/src/kudu/fs/block_manager_util.cc
File src/kudu/fs/block_manager_util.cc:

http://gerrit.cloudera.org:8080/#/c/10340/2/src/kudu/fs/block_manager_util.cc@52
PS2, Line 52: // Evaluates 'status_expr' and if it results in a disk-related 
error (i.e. disk
One of the challenging aspects of this change is explaining why this macro 
should consider NotFound errors as disk failures, but why we don't want to 
modify IsDiskFailure outright to include NotFound.


http://gerrit.cloudera.org:8080/#/c/10340/2/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

http://gerrit.cloudera.org:8080/#/c/10340/2/src/kudu/fs/data_dirs.h@423
PS2, Line 423: Instance
             :   // files that fail to load because they are missing or because 
of a disk
             :   // failure are considered loaded and are internally labeled 
unhealthy.
Maybe reword: "It also includes instance files that failed to load because they 
were missing or because of a disk failure; they are still considered "loaded" 
and are labeled unhealthy internally".


http://gerrit.cloudera.org:8080/#/c/10340/2/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/10340/2/src/kudu/fs/data_dirs.cc@603
PS2, Line 603:       continue;
Would it be possible to omit this to simplify the control flow? AFAICT, this 
would mean loaded_instances always has missing (and unhealthy) instances in it, 
even if missing_roots was set. Is that a big deal?


http://gerrit.cloudera.org:8080/#/c/10340/2/src/kudu/fs/data_dirs.cc@692
PS2, Line 692:     DCHECK(missing_roots.empty());
If the extent of our recheck is a DCHECK, we could probably omit it and make 
L696 apply to both code paths, no?



--
To view, visit http://gerrit.cloudera.org:8080/10340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61a71265c3cc34a7b72320149770a814ec7f8351
Gerrit-Change-Number: 10340
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Fri, 11 May 2018 21:23:43 +0000
Gerrit-HasComments: Yes

Reply via email to