Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13975 )
Change subject: Consider the available space when selecting data dirs for blocks. ...................................................................... Patch Set 9: > Patch Set 9: > > > Patch Set 8: > > > > > (9 comments) > > > > > > I'll look into the test failure some more. If I check out this > > > patch, I see it failing a small percentage (2-6%) of the time. > > > > About the failure in DiskErrorITest.TestFailDuringScanWorkload. IIUC, it > > only inject the disk failure in data_dir[1], and after the reflush check > > change, it may be remove from the candidate dirs. So it may not trigger the > > failure and so the test will failure. > > I added some logging and think I understand what's going on. When we first > create the tablets, we always refresh the space, and this necessarily isn't > atomic between the data dirs, so we can sometimes end up with the data > directories registering that they have different amounts of available space, > even though they share the same disk. > > When this happens, because there are only three directories, this > implementation of PO2C might end up completely ignoring the data dir with the > least amount of space in it. > > So I see two paths forward for this. Either: > 1) update the implementation of PO2C to sometimes select the data dir with > the least space. For example, select two random indices (may be the same) and > compare the available space (compared to what we have now, which always > compares two different data directories). OR... > 2) update disk_failure-itest to inject failures into two data directories > instead of one. With the current PO2C implementation, it's a safe bet that > killing two data dirs will touch blocks. I tried both, and both seem to reduce the flakiness of the test (either fix would pass 500/500 times instead of ~480/500 times). -- To view, visit http://gerrit.cloudera.org:8080/13975 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I194c4965ee64aed728e3b84e684c04d445cbe529 Gerrit-Change-Number: 13975 Gerrit-PatchSet: 9 Gerrit-Owner: ZhangYao <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Reviewer: ZhangYao <[email protected]> Gerrit-Comment-Date: Fri, 09 Aug 2019 20:59:20 +0000 Gerrit-HasComments: No
