Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13975 )
Change subject: KUDU-2911: Consider the available space when selecting data dirs for tablets ...................................................................... Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/13975/3/src/kudu/fs/data_dirs.h File src/kudu/fs/data_dirs.h: http://gerrit.cloudera.org:8080/#/c/13975/3/src/kudu/fs/data_dirs.h@470 PS3, Line 470: less load We want a higher score though. Maybe reword that to clarify this. http://gerrit.cloudera.org:8080/#/c/13975/3/src/kudu/fs/data_dirs.h@470 PS3, Line 470: (available disk space) / (tablets : // number in the directory) It's great that this metric tries to take into account disk space _and_ tablet count, but it's not clear to me that we need both. The PO2C algorithm balances whatever metric it is using to quantify load. Here, we're saying that we want to balance the amount of free disk space available per existing tablet, and it's difficult to understand what that metric actually is. How about just using available disk space? If we assume that each tablet is roughly the same size, and that each disk has roughly the same space (which are assumptions that existed before), just using available disk space is roughly equivalent to using the tablet count only. But free disk space alone seems better because it accounts for the case where those assumptions aren't true. http://gerrit.cloudera.org:8080/#/c/13975/3/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: http://gerrit.cloudera.org:8080/#/c/13975/3/src/kudu/fs/data_dirs.cc@988 PS3, Line 988: Status s = candidate->RefreshIsFull(DataDir::RefreshMode::EXPIRED_ONLY); Using EXPIRED_ONLY means that we will only refresh the available bytes if the directory was previously full or if it has been a long time (i.e. "expired). This means that if we call this function very quickly in succession, all the candidates will have the same size, and we'd end up always picking the first data directory. If we wanted this to take into account space, we'd have to use RefreshMode::ALWAYS here. I think this is what has been making disk_failure-itest unstable, since all the tablet blocks are being put on a single disk. Also, this might negatively affect parallelism for block IO in exchange for more balance of space, right? For example, certain tablets may entirely be stored on a single disk. Not sure the greedy approach is the best. Maybe we can reuse PO2C here too, or continue having this be randomly selected among the non-full candidates. I'm curious what others think about this. http://gerrit.cloudera.org:8080/#/c/13975/3/src/kudu/fs/data_dirs.cc@1047 PS3, Line 1047: candidate_indices nit: would be clearer to name this something like candidates_index_and_score or something. -- To view, visit http://gerrit.cloudera.org:8080/13975 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I194c4965ee64aed728e3b84e684c04d445cbe529 Gerrit-Change-Number: 13975 Gerrit-PatchSet: 3 Gerrit-Owner: ZhangYao <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Reviewer: ZhangYao <[email protected]> Gerrit-Comment-Date: Thu, 01 Aug 2019 23:17:47 +0000 Gerrit-HasComments: Yes
