ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/13975 )
Change subject: Consider the available space when selecting data dirs for blocks. ...................................................................... Patch Set 10: > > Patch Set 9: > > > > > Patch Set 8: > > > > > > > (9 comments) > > > > > > > > I'll look into the test failure some more. If I check out > this > > > > patch, I see it failing a small percentage (2-6%) of the > time. > > > > > > About the failure in DiskErrorITest.TestFailDuringScanWorkload. > IIUC, it only inject the disk failure in data_dir[1], and after the > reflush check change, it may be remove from the candidate dirs. So > it may not trigger the failure and so the test will failure. > > > > I added some logging and think I understand what's going on. When > we first create the tablets, we always refresh the space, and this > necessarily isn't atomic between the data dirs, so we can sometimes > end up with the data directories registering that they have > different amounts of available space, even though they share the > same disk. > > > > When this happens, because there are only three directories, this > implementation of PO2C might end up completely ignoring the data > dir with the least amount of space in it. > > > > So I see two paths forward for this. Either: > > 1) update the implementation of PO2C to sometimes select the data > dir with the least space. For example, select two random indices > (may be the same) and compare the available space (compared to what > we have now, which always compares two different data directories). > OR... > > 2) update disk_failure-itest to inject failures into two data > directories instead of one. With the current PO2C implementation, > it's a safe bet that killing two data dirs will touch blocks. > > I tried both, and both seem to reduce the flakiness of the test > (either fix would pass 500/500 times instead of ~480/500 times). Thanks a lot :) Done. -- To view, visit http://gerrit.cloudera.org:8080/13975 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I194c4965ee64aed728e3b84e684c04d445cbe529 Gerrit-Change-Number: 13975 Gerrit-PatchSet: 10 Gerrit-Owner: ZhangYao <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Reviewer: ZhangYao <[email protected]> Gerrit-Comment-Date: Sun, 11 Aug 2019 15:31:32 +0000 Gerrit-HasComments: No
