Re: fix seekdir(3)

Philip Guenther Tue, 05 Nov 2013 19:58:51 -0800

On Wed, 6 Nov 2013, Ingo Schwarze wrote:
> Ingo Schwarze wrote on Mon, Nov 04, 2013 at 09:51:41AM +0100:
> > Finally, we can work out how to do the optimization.
> > Probably, that will naturally factorize into two steps:
> > 
> >  (1) Use the information available in the userland buffer
> >      to avoid the getdents(2) syscall, *without* assuming
> >      monotonicity.
> 
> That is done by the patch below; i'm asking for OKs.
> 
> Note that, even though it looks superficially similar to the
> patch i sent originally, this one does not require monotonicity.
>
> To confirm that it is really an optimization, i did some
> measurements on my notebook:
...
>  * Worst case:
>    opendir; telldir
>    then 100.000 times (seekdir; readdir) at that position
>     = The entry asked for is the very first entry in the buffer,
>       which cannot be found, because each dirent only contains
>       the address of the *following* dirent, so each readdir
>       triggers getdents anyway, but we still search the whole
>       buffer each time.


While caching the offset of the start of the buffer is possible, it's not 
obvious that that case will happen often enough to be worth it.

Hmm, rewinddir() will _always_ be this worst case.  Perhaps rewinddir() 
should not call seekdir() and just be the two assignments with lseek(), 
skipping the scan of the current buffer.  It would be assisted by the 
start-o-buffer cache though.


> >  (2) Further optimize searching when monotonicity is available.
> 
> Given the data above, i don't consider further optimization worthwhile 
> any longer, so i consider what i'm sending here to be the final 
> optimization patch.
> 
> There are two cases where further optimization might in principle
> be possible based on monotonicity:
> 
>  * Avoid searching through the whole buffer when monotonicity
>    allows to conclude that the entry cannot be anywhere in the
>    buffer.
>    But in that case, getdents() is needed anyway, which is so
>    much more expensive than searching the buffer that the
>    latter is practically irrelevant.

Yeah, doesn't seem like much a win.


>  * Exploit monotonicity to replace the linear search by a binary
>    search.

Doing a binary search would be quite tricky given the variable-size of the 
entries.  Not worth it.


> That said, here is the optimization patch:

Looks good.  Nice analysis.  ok guenther@

Re: fix seekdir(3)

Reply via email to