Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13975 )

Change subject: Consider the available space when selecting data dirs for 
blocks.
......................................................................


Patch Set 9:

> Patch Set 9:
>
> > Patch Set 8:
> >
> > > (9 comments)
> >  >
> >  > I'll look into the test failure some more. If I check out this
> >  > patch, I see it failing a small percentage (2-6%) of the time.
> >
> > About the failure in DiskErrorITest.TestFailDuringScanWorkload. IIUC, it 
> > only inject the disk failure in data_dir[1], and after the reflush check 
> > change, it may be remove from the candidate dirs. So it may not trigger the 
> > failure and so the test will failure.
> 
> I added some logging and think I understand what's going on. When we first 
> create the tablets, we always refresh the space, and this necessarily isn't 
> atomic between the data dirs, so we can sometimes end up with the data 
> directories registering that they have different amounts of available space, 
> even though they share the same disk.
>
> When this happens, because there are only three directories, this 
> implementation of PO2C might end up completely ignoring the data dir with the 
> least amount of space in it.
>
> So I see two paths forward for this. Either:
> 1) update the implementation of PO2C to sometimes select the data dir with 
> the least space. For example, select two random indices (may be the same) and 
> compare the available space (compared to what we have now, which always 
> compares two different data directories). OR...
> 2) update disk_failure-itest to inject failures into two data directories 
> instead of one. With the current PO2C implementation, it's a safe bet that 
> killing two data dirs will touch blocks.

I tried both, and both seem to reduce the flakiness of the test (either fix 
would pass 500/500 times instead of ~480/500 times).


--
To view, visit http://gerrit.cloudera.org:8080/13975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I194c4965ee64aed728e3b84e684c04d445cbe529
Gerrit-Change-Number: 13975
Gerrit-PatchSet: 9
Gerrit-Owner: ZhangYao <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Reviewer: ZhangYao <[email protected]>
Gerrit-Comment-Date: Fri, 09 Aug 2019 20:59:20 +0000
Gerrit-HasComments: No

Reply via email to