Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13975 )

Change subject: KUDU-2911: Consider the available space when selecting data 
dirs for tablets
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13975/3/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

http://gerrit.cloudera.org:8080/#/c/13975/3/src/kudu/fs/data_dirs.h@470
PS3, Line 470: less load
We want a higher score though. Maybe reword that to clarify this.


http://gerrit.cloudera.org:8080/#/c/13975/3/src/kudu/fs/data_dirs.h@470
PS3, Line 470: (available disk space) / (tablets
             :   // number in the directory)
It's great that this metric tries to take into account disk space _and_ tablet 
count, but it's not clear to me that we need both. The PO2C algorithm balances 
whatever metric it is using to quantify load. Here, we're saying that we want 
to balance the amount of free disk space available per existing tablet, and 
it's difficult to understand what that metric actually is.

How about just using available disk space? If we assume that each tablet is 
roughly the same size, and that each disk has roughly the same space (which are 
assumptions that existed before), just using available disk space is roughly 
equivalent to using the tablet count only. But free disk space alone seems 
better because it accounts for the case where those assumptions aren't true.


http://gerrit.cloudera.org:8080/#/c/13975/3/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/13975/3/src/kudu/fs/data_dirs.cc@988
PS3, Line 988:     Status s = 
candidate->RefreshIsFull(DataDir::RefreshMode::EXPIRED_ONLY);
Using EXPIRED_ONLY means that we will only refresh the available bytes if the 
directory was previously full or if it has been a long time (i.e. "expired).

This means that if we call this function very quickly in succession, all the 
candidates will have the same size, and we'd end up always picking the first 
data directory. If we wanted this to take into account space, we'd have to use 
RefreshMode::ALWAYS here.

I think this is what has been making disk_failure-itest unstable, since all the 
tablet blocks are being put on a single disk.


Also, this might negatively affect parallelism for block IO in exchange for 
more balance of space, right? For example, certain tablets may entirely be 
stored on a single disk. Not sure the greedy approach is the best. Maybe we can 
reuse PO2C here too, or continue having this be randomly selected among the 
non-full candidates. I'm curious what others think about this.


http://gerrit.cloudera.org:8080/#/c/13975/3/src/kudu/fs/data_dirs.cc@1047
PS3, Line 1047: candidate_indices
nit: would be clearer to name this something like candidates_index_and_score or 
something.



--
To view, visit http://gerrit.cloudera.org:8080/13975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I194c4965ee64aed728e3b84e684c04d445cbe529
Gerrit-Change-Number: 13975
Gerrit-PatchSet: 3
Gerrit-Owner: ZhangYao <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Reviewer: ZhangYao <[email protected]>
Gerrit-Comment-Date: Thu, 01 Aug 2019 23:17:47 +0000
Gerrit-HasComments: Yes

Reply via email to