ZhangYao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13975 )

Change subject: Consider the available space when selecting data dirs for 
blocks.
......................................................................


Patch Set 10:

> > Patch Set 9:
 > >
 > > > Patch Set 8:
 > > >
 > > > > (9 comments)
 > > >  >
 > > >  > I'll look into the test failure some more. If I check out
 > this
 > > >  > patch, I see it failing a small percentage (2-6%) of the
 > time.
 > > >
 > > > About the failure in DiskErrorITest.TestFailDuringScanWorkload.
 > IIUC, it only inject the disk failure in data_dir[1], and after the
 > reflush check change, it may be remove from the candidate dirs. So
 > it may not trigger the failure and so the test will failure.
 > >
 > > I added some logging and think I understand what's going on. When
 > we first create the tablets, we always refresh the space, and this
 > necessarily isn't atomic between the data dirs, so we can sometimes
 > end up with the data directories registering that they have
 > different amounts of available space, even though they share the
 > same disk.
 > >
 > > When this happens, because there are only three directories, this
 > implementation of PO2C might end up completely ignoring the data
 > dir with the least amount of space in it.
 > >
 > > So I see two paths forward for this. Either:
 > > 1) update the implementation of PO2C to sometimes select the data
 > dir with the least space. For example, select two random indices
 > (may be the same) and compare the available space (compared to what
 > we have now, which always compares two different data directories).
 > OR...
 > > 2) update disk_failure-itest to inject failures into two data
 > directories instead of one. With the current PO2C implementation,
 > it's a safe bet that killing two data dirs will touch blocks.
 >
 > I tried both, and both seem to reduce the flakiness of the test
 > (either fix would pass 500/500 times instead of ~480/500 times).

Thanks a lot :)  Done.


--
To view, visit http://gerrit.cloudera.org:8080/13975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I194c4965ee64aed728e3b84e684c04d445cbe529
Gerrit-Change-Number: 13975
Gerrit-PatchSet: 10
Gerrit-Owner: ZhangYao <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Reviewer: ZhangYao <[email protected]>
Gerrit-Comment-Date: Sun, 11 Aug 2019 15:31:32 +0000
Gerrit-HasComments: No

Reply via email to